Can Sling load PostgreSQL data into DuckLake incrementally?

Yes. Use mode: incremental with an update_key pointing at any timestamp or auto-increment column. Sling tracks the high-water mark automatically — each subsequent run fetches only rows newer than the last batch, merges on the primary_key, and writes the delta as Parquet. No manual state file needed.

Does DuckLake store data in S3 or locally?

Both. The data_path in your DuckLake connection can point at a local directory (data_path: ./ducklake_data), an S3 bucket (data_path: s3://my-bucket/prefix/), a GCS bucket, or Azure Blob. The catalog database (DuckDB, Postgres, or SQLite) is separate and just stores metadata.

Do I need DuckDB installed to use Sling with DuckLake?

No. Sling writes directly to the Parquet files and updates the DuckLake catalog over JDBC. DuckDB is only required when you want to query the DuckLake afterward — not for ingestion.

What PostgreSQL schemas and tables can Sling replicate to DuckLake?

Any table accessible by the Postgres user you configure. You can replicate a single table, all tables in a schema with a wildcard (public.*), or a hand-picked list — all in one replication YAML file.

Is Sling free for PostgreSQL to DuckLake pipelines?

The Sling CLI is open source (AGPLv3) and free. You can run as many Postgres → DuckLake replications as you need at no cost. CLI Pro ($79/mo) adds scheduling, parallelism controls, and priority support if you need them for production workloads.

PostgreSQL to DuckLake: Load Postgres Data with Sling

Introduction

DuckLake gives you a transactional data lake — ACID semantics, time-travel queries, Parquet-on-S3 — without a JVM or a Spark cluster. The one thing it does not handle is getting your production data into it. That is where Sling comes in.

Sling is a single Go binary that connects to PostgreSQL (and 40+ other sources) and writes directly into DuckLake. One command to do a full load. One YAML file to run nightly incremental syncs. No additional infrastructure.

Prerequisites

Sling CLI installed: curl -fsSL https://slingdata.io/install.sh | bash
A running PostgreSQL database
A DuckLake catalog database (local DuckDB file works; Postgres and SQLite also supported)
Object storage for data files (local directory, S3, GCS, or Azure Blob)

Configure the connections

Sling reads connections from environment variables or from ~/.sling/env.yaml. For a quick start, environment variables are simplest.

PostgreSQL source:

export POSTGRES_CONN="postgres://user:pass@localhost:5432/mydb"

Or as a named connection with more options:

sling conns set MY_POSTGRES type=postgres host=localhost port=5432 \
  user=myuser password=mypass database=mydb

DuckLake target:

sling conns set MY_DUCKLAKE type=ducklake \
  catalog_type=duckdb \
  catalog_conn_string=./catalog.db \
  data_path=./ducklake_data

For an S3-backed DuckLake replace data_path with your bucket path:

sling conns set MY_DUCKLAKE type=ducklake \
  catalog_type=postgres \
  catalog_conn_string="postgres://user:pass@catalog-host:5432/ducklake_meta" \
  data_path=s3://my-bucket/ducklake/

Verify both connections:

sling conns list
sling conns test MY_POSTGRES
sling conns test MY_DUCKLAKE

Full load — one table

Extract the public.orders table from Postgres and load it into DuckLake:

sling run \
  --src-conn MY_POSTGRES \
  --src-stream public.orders \
  --tgt-conn MY_DUCKLAKE \
  --tgt-object main.orders

Sling infers the schema, creates the DuckLake table, writes the Parquet files, and updates the catalog. The whole operation is a single process — no staging bucket, no intermediate files to clean up.

Incremental loads — keeping DuckLake in sync

For ongoing pipelines, incremental mode fetches only rows that changed since the last run. You need a monotonically increasing column — a timestamp (updated_at, created_at) or an auto-increment ID.

Create a replication file:

# postgres_to_ducklake.yaml
source: MY_POSTGRES
target: MY_DUCKLAKE

defaults:
  mode: incremental
  primary_key: [id]
  update_key: updated_at
  object: main.{stream_table}

streams:
  public.customers:
  public.orders:
  public.line_items:
    update_key: created_at   # override for tables without updated_at

Run it:

sling run -r postgres_to_ducklake.yaml

Sling persists the high-water mark between runs automatically. Run the same command again and it will fetch only rows newer than the previous batch, then merge them on primary_key. No cron wrapper needed — just schedule sling run -r postgres_to_ducklake.yaml with whatever scheduler you already use (cron, Airflow, Dagster, systemd).

Replicate an entire schema

To mirror every table in the public schema:

source: MY_POSTGRES
target: MY_DUCKLAKE

defaults:
  mode: full-refresh
  object: main.{stream_table}

streams:
  public.*:

This is useful for bootstrapping a DuckLake from an existing Postgres database. Switch mode to incremental for subsequent runs once the initial load is complete.

Querying the result in DuckDB

Once data is in DuckLake, open a DuckDB session to query it:

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:./catalog.db' AS lake (DATA_PATH './ducklake_data/');

-- Time-travel: compare current vs two snapshots ago
SELECT * FROM lake.main.orders AT (VERSION => 1);
SELECT * FROM lake.main.orders;  -- current

Sling handles the write side. DuckDB handles the query side. Neither needs to know about the other.

Why DuckLake + Sling

Both tools share the same design philosophy: do one thing well, require no cluster, and stay out of your way.

DuckLake’s catalog is a relational database you already know how to operate. Its data format is Parquet you already know how to store. Sling’s configuration is a YAML file you can read in ten seconds. The two tools compose cleanly because neither imposes a runtime dependency on the other.

For teams who want a lakehouse without adopting a new infrastructure stack, this combination gets you there with two binaries and a YAML file.

Next steps

Full DuckLake connection reference
Sling replication options
Extract data from any database into DuckLake — the broader multi-source guide
Sling CLI installation