Introduction
DuckLake gives you a transactional data lake — ACID semantics, time-travel queries, Parquet-on-S3 — without a JVM or a Spark cluster. The one thing it does not handle is getting your production data into it. That is where Sling comes in.
Sling is a single Go binary that connects to PostgreSQL (and 40+ other sources) and writes directly into DuckLake. One command to do a full load. One YAML file to run nightly incremental syncs. No additional infrastructure.
Prerequisites
- Sling CLI installed:
curl -fsSL https://slingdata.io/install.sh | bash - A running PostgreSQL database
- A DuckLake catalog database (local DuckDB file works; Postgres and SQLite also supported)
- Object storage for data files (local directory, S3, GCS, or Azure Blob)
Configure the connections
Sling reads connections from environment variables or from ~/.sling/env.yaml. For a quick start, environment variables are simplest.
PostgreSQL source:
export POSTGRES_CONN="postgres://user:pass@localhost:5432/mydb"
Or as a named connection with more options:
sling conns set MY_POSTGRES type=postgres host=localhost port=5432 \
user=myuser password=mypass database=mydb
DuckLake target:
sling conns set MY_DUCKLAKE type=ducklake \
catalog_type=duckdb \
catalog_conn_string=./catalog.db \
data_path=./ducklake_data
For an S3-backed DuckLake replace data_path with your bucket path:
sling conns set MY_DUCKLAKE type=ducklake \
catalog_type=postgres \
catalog_conn_string="postgres://user:pass@catalog-host:5432/ducklake_meta" \
data_path=s3://my-bucket/ducklake/
Verify both connections:
sling conns list
sling conns test MY_POSTGRES
sling conns test MY_DUCKLAKE
Full load — one table
Extract the public.orders table from Postgres and load it into DuckLake:
sling run \
--src-conn MY_POSTGRES \
--src-stream public.orders \
--tgt-conn MY_DUCKLAKE \
--tgt-object main.orders
Sling infers the schema, creates the DuckLake table, writes the Parquet files, and updates the catalog. The whole operation is a single process — no staging bucket, no intermediate files to clean up.
Incremental loads — keeping DuckLake in sync
For ongoing pipelines, incremental mode fetches only rows that changed since the last run. You need a monotonically increasing column — a timestamp (updated_at, created_at) or an auto-increment ID.
Create a replication file:
# postgres_to_ducklake.yaml
source: MY_POSTGRES
target: MY_DUCKLAKE
defaults:
mode: incremental
primary_key: [id]
update_key: updated_at
object: main.{stream_table}
streams:
public.customers:
public.orders:
public.line_items:
update_key: created_at # override for tables without updated_at
Run it:
sling run -r postgres_to_ducklake.yaml
Sling persists the high-water mark between runs automatically. Run the same command again and it will fetch only rows newer than the previous batch, then merge them on primary_key. No cron wrapper needed — just schedule sling run -r postgres_to_ducklake.yaml with whatever scheduler you already use (cron, Airflow, Dagster, systemd).
Replicate an entire schema
To mirror every table in the public schema:
source: MY_POSTGRES
target: MY_DUCKLAKE
defaults:
mode: full-refresh
object: main.{stream_table}
streams:
public.*:
This is useful for bootstrapping a DuckLake from an existing Postgres database. Switch mode to incremental for subsequent runs once the initial load is complete.
Querying the result in DuckDB
Once data is in DuckLake, open a DuckDB session to query it:
INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:./catalog.db' AS lake (DATA_PATH './ducklake_data/');
-- Time-travel: compare current vs two snapshots ago
SELECT * FROM lake.main.orders AT (VERSION => 1);
SELECT * FROM lake.main.orders; -- current
Sling handles the write side. DuckDB handles the query side. Neither needs to know about the other.
Why DuckLake + Sling
Both tools share the same design philosophy: do one thing well, require no cluster, and stay out of your way.
DuckLake’s catalog is a relational database you already know how to operate. Its data format is Parquet you already know how to store. Sling’s configuration is a YAML file you can read in ten seconds. The two tools compose cleanly because neither imposes a runtime dependency on the other.
For teams who want a lakehouse without adopting a new infrastructure stack, this combination gets you there with two binaries and a YAML file.
Next steps
- Full DuckLake connection reference
- Sling replication options
- Extract data from any database into DuckLake — the broader multi-source guide
- Sling CLI installation


