Introduction
Apache Iceberg is the table format that turns a pile of Parquet files in object storage into something that behaves like a warehouse table. You get schema evolution, hidden partitioning, time travel, and consistent reads from whichever engine you point at the table. PostgreSQL is where most operational data starts. Moving it into Iceberg gives you an analytics copy that DuckDB, Spark, Trino, Snowflake, and Athena can all read without anyone needing to agree on a single warehouse vendor first.
Sling speaks the Iceberg REST catalog directly. From the configuration side an Iceberg target is just another database connection: point Sling at the catalog URL and the underlying object store, then declare your streams. No JVM, no Spark, no manual manifest writing.
This guide replicates a Postgres schema into Iceberg using Sling. The catalog is Cloudflare R2’s managed Iceberg REST catalog and the storage layer underneath is R2. Every CLI line, row count, and timing below comes from an actual run against those endpoints.
Installing Sling
Sling is a single binary. Pick whichever install fits:
# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash
# Windows
irm https://slingdata.io/install.ps1 | iex
# Python
pip install sling
Confirm:
sling --version
Full install notes are in the Sling CLI Getting Started Guide.
Configuring the Postgres Source
Sling reads connection details from ~/.sling/env.yaml, environment variables, or sling conns set. A read-only user is enough:
CREATE USER sling WITH PASSWORD '<password>';
GRANT CONNECT ON DATABASE mydb TO sling;
GRANT USAGE ON SCHEMA public TO sling;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO sling;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO sling;
Then register the connection:
sling conns set POSTGRES type=postgres host=host.ip user=sling \
database=mydb password=mypass port=5432
Or in ~/.sling/env.yaml:
connections:
POSTGRES:
type: postgres
host: host.ip
user: sling
password: mypass
port: 5432
database: mydb
If your Postgres requires SSL, append sslmode: require. Test it:
sling conns test POSTGRES
The Postgres connection docs cover SSL, IAM, and the rest.
Configuring the Iceberg Target
Sling treats Iceberg as a database-class target. The connection captures two things: the catalog, which stores table metadata, and the warehouse, which stores the actual Parquet data files. Sling supports REST, AWS Glue, and SQL catalogs. This guide uses REST.
For Cloudflare R2’s Iceberg catalog you need the catalog URL, an API token, the warehouse identifier (account-id + bucket name), and S3-compatible credentials for the R2 bucket underneath. All four come from the R2 dashboard.
connections:
ICEBERG:
type: iceberg
catalog_type: rest
rest_uri: https://catalog.cloudflarestorage.com/<accountid>/<bucket>
rest_token: <r2_catalog_api_token>
rest_warehouse: <accountid>_<bucket>
s3_access_key_id: <r2_access_key_id>
s3_secret_access_key: <r2_secret_access_key>
For a self-hosted Lakekeeper or Nessie catalog, the shape is the same; only the rest_uri and rest_warehouse change. For AWS Glue, set catalog_type: glue and glue_warehouse: s3://my-bucket/warehouse. The Iceberg connection docs walk through each catalog type.
Test it:
sling conns test ICEBERG
A Full-Refresh Replication
For this run the Postgres source has three tables in a demo_postgres_iceberg schema:
users— 8,000 rowsorders— 35,000 rowsevents— 60,000 rows, with anoccurred_attimestamp
The replication file:
# replication.yaml
source: POSTGRES
target: ICEBERG
defaults:
mode: full-refresh
object: demo_postgres_iceberg.{stream_table}
streams:
demo_postgres_iceberg.users:
demo_postgres_iceberg.orders:
demo_postgres_iceberg.events:
mode: incremental
primary_key: [event_id]
update_key: occurred_at
A few notes:
object:follows the usual<namespace>.<table>shape. Sling creates the Iceberg namespace if it doesn’t already exist in the catalog.{stream_table}is a runtime variable. Sling substitutes the source table name so you don’t repeat yourself.- The third stream switches to
mode: incrementalwith anupdate_key. That’s the only diff between a one-shot bulk load and an ongoing append flow.
Run it:
sling run -r replication.yaml
Real output, trimmed:
INF Sling CLI | https://slingdata.io
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
incremental merge is not yet supported (only appends)
INF Sling Replication [3 streams] | POSTGRES -> ICEBERG
INF [1 / 3] running stream demo_postgres_iceberg.users
INF created table "demo_postgres_iceberg"."users"
INF streaming data (direct insert)
INF inserted 8000 rows into "demo_postgres_iceberg"."users" in 11 secs [713 r/s] [519 kB]
INF [2 / 3] running stream demo_postgres_iceberg.orders
INF created table "demo_postgres_iceberg"."orders"
INF inserted 35000 rows into "demo_postgres_iceberg"."orders" in 9 secs [3,721 r/s] [2.1 MB]
INF [3 / 3] running stream demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF created table "demo_postgres_iceberg"."events"
INF inserted 60000 rows into "demo_postgres_iceberg"."events" in 7 secs [8,190 r/s] [4.5 MB]
INF Sling Replication Completed in 29s | POSTGRES -> ICEBERG | 3 Successes | 0 Failures
103,000 rows across three tables, 29 seconds end-to-end. The warning at the top deserves a real answer; see the section on incremental modes further down.
Verification
Sling can query Iceberg tables directly through its DuckDB-backed reader. Tables are addressed as iceberg_catalog.<namespace>.<table>:
sling conns exec ICEBERG \
"select 'users' as t, count(*) as c
from iceberg_catalog.demo_postgres_iceberg.users
union all
select 'orders', count(*) from iceberg_catalog.demo_postgres_iceberg.orders
union all
select 'events', count(*) from iceberg_catalog.demo_postgres_iceberg.events"
+--------+-------+
| T | C |
+--------+-------+
| users | 8000 |
| orders | 35000 |
| events | 60000 |
+--------+-------+
Row counts match the source. A sample of users confirms columns and types survived the trip:
sling conns exec ICEBERG \
"select user_id, email, country, signup_at
from iceberg_catalog.demo_postgres_iceberg.users
order by user_id limit 5"
+---------+-------------------+---------+-------------------------------+
| USER_ID | EMAIL | COUNTRY | SIGNUP_AT |
+---------+-------------------+---------+-------------------------------+
| 1 | [email protected] | BR | 2025-01-01 00:14:00 -0300 -03 |
| 2 | [email protected] | DE | 2025-01-01 00:28:00 -0300 -03 |
| 3 | [email protected] | FR | 2025-01-01 00:42:00 -0300 -03 |
| 4 | [email protected] | JP | 2025-01-01 00:56:00 -0300 -03 |
| 5 | [email protected] | UK | 2025-01-01 01:10:00 -0300 -03 |
+---------+-------------------+---------+-------------------------------+
Postgres jsonb lands as a structured column too. Sampling events:
+----------+---------+------------+----------------------+----------------------+
| EVENT_ID | USER_ID | EVENT_TYPE | PAYLOAD | OCCURRED_AT |
+----------+---------+------------+----------------------+----------------------+
| 60001 | 2 | click | {"v": 1, "utm": "x"} | 2026-05-11 ... |
| 60002 | 3 | signup | {"v": 2, "utm": "x"} | 2026-05-11 ... |
| 60003 | 4 | purchase | {"v": 3, "utm": "x"} | 2026-05-11 ... |
+----------+---------+------------+----------------------+----------------------+
Any other Iceberg reader sees the same data: DuckDB with the iceberg extension, Spark, Trino, Athena, Snowflake’s catalog-linked databases. That portability is the reason for the catalog in the first place.
Running an Incremental Append
After the bulk load, the day-to-day shape is: every few minutes (or hours, or once a day), pick up the new rows since the last run and append them to the Iceberg table. Sling’s incremental mode does this. The state (the last seen value of the update_key) is tracked by Sling itself, so you don’t need to manage a state file the way you would for a file-based target.
Insert 2,500 new events on the source (a stand-in for fresh activity):
insert into demo_postgres_iceberg.events (event_id, user_id, event_type, payload, occurred_at)
select 60000 + n, 1 + (n % 8000), 'click',
jsonb_build_object('utm','x','v', n % 100),
now() - (n * interval '1 second')
from generate_series(1, 2500) g(n);
Run a single-stream replication that touches only events:
# replication-incremental.yaml
source: POSTGRES
target: ICEBERG
defaults:
object: demo_postgres_iceberg.{stream_table}
streams:
demo_postgres_iceberg.events:
mode: incremental
update_key: occurred_at
sling run -r replication-incremental.yaml
INF Sling Replication | POSTGRES -> ICEBERG | demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF streaming data (direct insert)
INF inserted 2500 rows into "demo_postgres_iceberg"."events" in 8 secs [294 r/s] [178 kB]
INF execution succeeded
Sling read the saved checkpoint, pulled only rows newer than the last occurred_at it saw, and appended exactly the 2,500 new rows. A readback confirms the new total:
sling conns exec ICEBERG \
"select min(occurred_at), max(occurred_at), count(*)
from iceberg_catalog.demo_postgres_iceberg.events"
+-------------------------------+--------------------------------------+--------+
| MIN_OCCURRED_AT | MAX_OCCURRED_AT | COUNT |
+-------------------------------+--------------------------------------+--------+
| 2025-03-01 00:00:40 -0300 -03 | 2026-05-11 08:42:59.533692 -0300 -03 | 62500 |
+-------------------------------+--------------------------------------+--------+
60,000 + 2,500 = 62,500. The new high-water mark on occurred_at is the timestamp of the freshest insert. The next scheduled run will start from there.
Append-incremental vs merge-incremental
That warning Sling printed on the first run matters:
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
incremental merge is not yet supported (only appends)
For database targets like Postgres or Snowflake, Sling’s incremental mode is a merge: a row whose primary_key already exists in the target gets updated in place. For an Iceberg target today, incremental means append only. New rows go in, existing rows stay as-is, and a primary_key declared on the stream is parsed but not enforced.
That is fine when your source is append-only: events, immutable transactions, log data. It is the wrong default if your source has mutable rows you need reflected on the lake side. Until merge lands, two patterns work:
- Snapshot replays. Run
mode: full-refreshon a cadence that matches your freshness budget. Iceberg’s snapshot model means readers always see a consistent table; the old snapshot is replaced atomically. For tables in the low millions this is faster than it sounds. - CDC-style append plus downstream resolution. Append every Postgres change to Iceberg as-is (using a logical-replication tool or trigger-based capture) and resolve the latest-state view at read time with something like
qualify row_number() over (partition by pk order by event_ts desc) = 1. A bit more work at query time, very cheap at write time.
Track the Iceberg connector docs for when full merge mode ships.
Common tweaks
- Choose the right catalog. REST is the most portable: the same connection shape works for Cloudflare R2, Lakekeeper, Nessie, Polaris, and any other REST-compatible catalog. Glue is the simplest in AWS-native shops. SQL catalog is fine for local dev. Avoid wiring a different catalog per environment if you can help it; the table layout doesn’t care, but the metadata location does.
- Namespace organization. Treat namespaces (
demo_postgres_iceberg.users) the way you treat warehouse schemas: one per source system, or one per data domain. Don’t dump everything intodefault. - Filter at the source. Use a
sql:block per stream to project columns or filter rows before they leave Postgres. Smaller Parquet files, smaller manifests, cheaper queries downstream. - Time travel for free. Every replication produces a new Iceberg snapshot. Readers can time-travel to a previous snapshot, which is useful for “what did this table look like before yesterday’s run?” without storing your own backups.
- Maintain the table. Like any Iceberg table, periodic compaction and snapshot expiration keep the file count and metadata size from growing without bound. Set this up on a separate schedule from the replication itself.
Where to go next
The same pattern works for any of Sling’s 30+ database sources into Iceberg: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.
If the underlying R2 storage is what brought you here, the Postgres → R2 as Parquet walkthrough shows the same source landing as raw Parquet files instead of an Iceberg table, which is useful when downstream readers don’t need a catalog. For a deeper comparison of file-format targets, see Postgres → S3 as Parquet and Postgres → DuckDB.
For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the Sling Platform.
Questions go to Discord or GitHub Issues.


