Introduction

Apache Iceberg is the table format that turns a pile of Parquet files in object storage into something that behaves like a warehouse table. You get schema evolution, hidden partitioning, time travel, and consistent reads from whichever engine you point at the table. PostgreSQL is where most operational data starts. Moving it into Iceberg gives you an analytics copy that DuckDB, Spark, Trino, Snowflake, and Athena can all read without anyone needing to agree on a single warehouse vendor first.

Sling speaks the Iceberg REST catalog directly. From the configuration side an Iceberg target is just another database connection: point Sling at the catalog URL and the underlying object store, then declare your streams. No JVM, no Spark, no manual manifest writing.

This guide replicates a Postgres schema into Iceberg using Sling. The catalog is Cloudflare R2’s managed Iceberg REST catalog and the storage layer underneath is R2. Every CLI line, row count, and timing below comes from an actual run against those endpoints.

Installing Sling

Sling is a single binary. Pick whichever install fits:

# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash

# Windows
irm https://slingdata.io/install.ps1 | iex

# Python
pip install sling

Confirm:

sling --version

Full install notes are in the Sling CLI Getting Started Guide.

Configuring the Postgres Source

Sling reads connection details from ~/.sling/env.yaml, environment variables, or sling conns set. A read-only user is enough:

CREATE USER sling WITH PASSWORD '<password>';
GRANT CONNECT ON DATABASE mydb TO sling;
GRANT USAGE ON SCHEMA public TO sling;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO sling;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO sling;

Then register the connection:

sling conns set POSTGRES type=postgres host=host.ip user=sling \
  database=mydb password=mypass port=5432

Or in ~/.sling/env.yaml:

connections:
  POSTGRES:
    type: postgres
    host: host.ip
    user: sling
    password: mypass
    port: 5432
    database: mydb

If your Postgres requires SSL, append sslmode: require. Test it:

sling conns test POSTGRES

The Postgres connection docs cover SSL, IAM, and the rest.

Configuring the Iceberg Target

Sling treats Iceberg as a database-class target. The connection captures two things: the catalog, which stores table metadata, and the warehouse, which stores the actual Parquet data files. Sling supports REST, AWS Glue, and SQL catalogs. This guide uses REST.

For Cloudflare R2’s Iceberg catalog you need the catalog URL, an API token, the warehouse identifier (account-id + bucket name), and S3-compatible credentials for the R2 bucket underneath. All four come from the R2 dashboard.

connections:
  ICEBERG:
    type: iceberg
    catalog_type: rest
    rest_uri: https://catalog.cloudflarestorage.com/<accountid>/<bucket>
    rest_token: <r2_catalog_api_token>
    rest_warehouse: <accountid>_<bucket>
    s3_access_key_id: <r2_access_key_id>
    s3_secret_access_key: <r2_secret_access_key>

For a self-hosted Lakekeeper or Nessie catalog, the shape is the same; only the rest_uri and rest_warehouse change. For AWS Glue, set catalog_type: glue and glue_warehouse: s3://my-bucket/warehouse. The Iceberg connection docs walk through each catalog type.

Test it:

sling conns test ICEBERG

A Full-Refresh Replication

For this run the Postgres source has three tables in a demo_postgres_iceberg schema:

users — 8,000 rows
orders — 35,000 rows
events — 60,000 rows, with an occurred_at timestamp

The replication file:

# replication.yaml
source: POSTGRES
target: ICEBERG

defaults:
  mode: full-refresh
  object: demo_postgres_iceberg.{stream_table}

streams:
  demo_postgres_iceberg.users:
  demo_postgres_iceberg.orders:
  demo_postgres_iceberg.events:
    mode: incremental
    primary_key: [event_id]
    update_key: occurred_at

A few notes:

object: follows the usual <namespace>.<table> shape. Sling creates the Iceberg namespace if it doesn’t already exist in the catalog.
{stream_table} is a runtime variable. Sling substitutes the source table name so you don’t repeat yourself.
The third stream switches to mode: incremental with an update_key. That’s the only diff between a one-shot bulk load and an ongoing append flow.

Run it:

sling run -r replication.yaml

Real output, trimmed:

INF Sling CLI | https://slingdata.io
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
INF Sling Replication [3 streams] | POSTGRES -> ICEBERG

INF [1 / 3] running stream demo_postgres_iceberg.users
INF created table "demo_postgres_iceberg"."users"
INF streaming data (direct insert)
INF inserted 8000 rows into "demo_postgres_iceberg"."users" in 11 secs [713 r/s] [519 kB]

INF [2 / 3] running stream demo_postgres_iceberg.orders
INF created table "demo_postgres_iceberg"."orders"
INF inserted 35000 rows into "demo_postgres_iceberg"."orders" in 9 secs [3,721 r/s] [2.1 MB]

INF [3 / 3] running stream demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF created table "demo_postgres_iceberg"."events"
INF inserted 60000 rows into "demo_postgres_iceberg"."events" in 7 secs [8,190 r/s] [4.5 MB]

INF Sling Replication Completed in 29s | POSTGRES -> ICEBERG | 3 Successes | 0 Failures

103,000 rows across three tables, 29 seconds end-to-end. The warning at the top deserves a real answer; see the section on incremental modes further down.

Verification

Sling can query Iceberg tables directly through its DuckDB-backed reader. Tables are addressed as iceberg_catalog.<namespace>.<table>:

sling conns exec ICEBERG \
  "select 'users' as t, count(*) as c
     from iceberg_catalog.demo_postgres_iceberg.users
   union all
   select 'orders', count(*) from iceberg_catalog.demo_postgres_iceberg.orders
   union all
   select 'events', count(*) from iceberg_catalog.demo_postgres_iceberg.events"

+--------+-------+
| T      |     C |
+--------+-------+
| users  |  8000 |
| orders | 35000 |
| events | 60000 |
+--------+-------+

Row counts match the source. A sample of users confirms columns and types survived the trip:

sling conns exec ICEBERG \
  "select user_id, email, country, signup_at
     from iceberg_catalog.demo_postgres_iceberg.users
    order by user_id limit 5"

+---------+-------------------+---------+-------------------------------+
| USER_ID | EMAIL             | COUNTRY | SIGNUP_AT                     |
+---------+-------------------+---------+-------------------------------+
|       1 | [email protected] | BR      | 2025-01-01 00:14:00 -0300 -03 |
|       2 | [email protected] | DE      | 2025-01-01 00:28:00 -0300 -03 |
|       3 | [email protected] | FR      | 2025-01-01 00:42:00 -0300 -03 |
|       4 | [email protected] | JP      | 2025-01-01 00:56:00 -0300 -03 |
|       5 | [email protected] | UK      | 2025-01-01 01:10:00 -0300 -03 |
+---------+-------------------+---------+-------------------------------+

Postgres jsonb lands as a structured column too. Sampling events:

+----------+---------+------------+----------------------+----------------------+
| EVENT_ID | USER_ID | EVENT_TYPE | PAYLOAD              | OCCURRED_AT          |
+----------+---------+------------+----------------------+----------------------+
|    60001 |       2 | click      | {"v": 1, "utm": "x"} | 2026-05-11 ...       |
|    60002 |       3 | signup     | {"v": 2, "utm": "x"} | 2026-05-11 ...       |
|    60003 |       4 | purchase   | {"v": 3, "utm": "x"} | 2026-05-11 ...       |
+----------+---------+------------+----------------------+----------------------+

Any other Iceberg reader sees the same data: DuckDB with the iceberg extension, Spark, Trino, Athena, Snowflake’s catalog-linked databases. That portability is the reason for the catalog in the first place.

Running an Incremental Append

After the bulk load, the day-to-day shape is: every few minutes (or hours, or once a day), pick up the new rows since the last run and append them to the Iceberg table. Sling’s incremental mode does this. The state (the last seen value of the update_key) is tracked by Sling itself, so you don’t need to manage a state file the way you would for a file-based target.

Insert 2,500 new events on the source (a stand-in for fresh activity):

insert into demo_postgres_iceberg.events (event_id, user_id, event_type, payload, occurred_at)
select 60000 + n, 1 + (n % 8000), 'click',
       jsonb_build_object('utm','x','v', n % 100),
       now() - (n * interval '1 second')
  from generate_series(1, 2500) g(n);

Run a single-stream replication that touches only events:

# replication-incremental.yaml
source: POSTGRES
target: ICEBERG

defaults:
  object: demo_postgres_iceberg.{stream_table}

streams:
  demo_postgres_iceberg.events:
    mode: incremental
    update_key: occurred_at

sling run -r replication-incremental.yaml

INF Sling Replication | POSTGRES -> ICEBERG | demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF streaming data (direct insert)
INF inserted 2500 rows into "demo_postgres_iceberg"."events" in 8 secs [294 r/s] [178 kB]
INF execution succeeded

Sling read the saved checkpoint, pulled only rows newer than the last occurred_at it saw, and appended exactly the 2,500 new rows. A readback confirms the new total:

sling conns exec ICEBERG \
  "select min(occurred_at), max(occurred_at), count(*)
     from iceberg_catalog.demo_postgres_iceberg.events"

+-------------------------------+--------------------------------------+--------+
| MIN_OCCURRED_AT               | MAX_OCCURRED_AT                      | COUNT  |
+-------------------------------+--------------------------------------+--------+
| 2025-03-01 00:00:40 -0300 -03 | 2026-05-11 08:42:59.533692 -0300 -03 |  62500 |
+-------------------------------+--------------------------------------+--------+

60,000 + 2,500 = 62,500. The new high-water mark on occurred_at is the timestamp of the freshest insert. The next scheduled run will start from there.

Append-incremental vs merge-incremental

That warning Sling printed on the first run matters:

WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)

For database targets like Postgres or Snowflake, Sling’s incremental mode is a merge: a row whose primary_key already exists in the target gets updated in place. For an Iceberg target today, incremental means append only. New rows go in, existing rows stay as-is, and a primary_key declared on the stream is parsed but not enforced.

That is fine when your source is append-only: events, immutable transactions, log data. It is the wrong default if your source has mutable rows you need reflected on the lake side. Until merge lands, two patterns work:

Snapshot replays. Run mode: full-refresh on a cadence that matches your freshness budget. Iceberg’s snapshot model means readers always see a consistent table; the old snapshot is replaced atomically. For tables in the low millions this is faster than it sounds.
CDC-style append plus downstream resolution. Append every Postgres change to Iceberg as-is (using a logical-replication tool or trigger-based capture) and resolve the latest-state view at read time with something like qualify row_number() over (partition by pk order by event_ts desc) = 1. A bit more work at query time, very cheap at write time.

Track the Iceberg connector docs for when full merge mode ships.

Common tweaks

Choose the right catalog. REST is the most portable: the same connection shape works for Cloudflare R2, Lakekeeper, Nessie, Polaris, and any other REST-compatible catalog. Glue is the simplest in AWS-native shops. SQL catalog is fine for local dev. Avoid wiring a different catalog per environment if you can help it; the table layout doesn’t care, but the metadata location does.
Namespace organization. Treat namespaces (demo_postgres_iceberg.users) the way you treat warehouse schemas: one per source system, or one per data domain. Don’t dump everything into default.
Filter at the source. Use a sql: block per stream to project columns or filter rows before they leave Postgres. Smaller Parquet files, smaller manifests, cheaper queries downstream.
Time travel for free. Every replication produces a new Iceberg snapshot. Readers can time-travel to a previous snapshot, which is useful for “what did this table look like before yesterday’s run?” without storing your own backups.
Maintain the table. Like any Iceberg table, periodic compaction and snapshot expiration keep the file count and metadata size from growing without bound. Set this up on a separate schedule from the replication itself.

Where to go next

The same pattern works for any of Sling’s 30+ database sources into Iceberg: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.

If the underlying R2 storage is what brought you here, the Postgres → R2 as Parquet walkthrough shows the same source landing as raw Parquet files instead of an Iceberg table, which is useful when downstream readers don’t need a catalog. For a deeper comparison of file-format targets, see Postgres → S3 as Parquet and Postgres → DuckDB.

For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the Sling Platform.

Questions go to Discord or GitHub Issues.

Load PostgreSQL into Apache Iceberg with Sling