Introduction
StarRocks is an MPP analytical database, forked from Apache Doris back in 2020. It has vectorized execution, a cost-based optimizer, and external catalogs for Iceberg, Hive, Hudi, and Delta. The storage engine is its own columnar format, but the query engine talks MySQL wire protocol, so any tool that speaks MySQL can read from it.
PostgreSQL is where most operational data starts. Pushing it into StarRocks gives you an analytics copy you can run aggregates and joins against, without taxing the transactional database.
The piece in the middle, the thing that copies tables and keeps them current, is what Sling does.
This guide replicates a Postgres schema into StarRocks with Sling, in both full-refresh and incremental modes. The CLI output, row counts, and timings below all come from an actual run against StarRocks 3.5.7 and a remote Postgres source. The same configuration works against managed StarRocks like CelerData Cloud; only the connection URL changes.
Installing Sling
Sling is a single binary. Pick whichever install method fits your environment:
# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash
# Windows
irm https://slingdata.io/install.ps1 | iex
# Python
pip install sling
Confirm the install:
sling --version
Installation notes for every platform are in the Sling CLI Getting Started Guide.
Configuring the Postgres Source
Sling reads connection details from ~/.sling/env.yaml, environment variables, or sling conns set. A read-only user is the right shape for replication:
CREATE USER sling WITH PASSWORD '<password>';
GRANT CONNECT ON DATABASE mydb TO sling;
GRANT USAGE ON SCHEMA public TO sling;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO sling;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO sling;
Then register the connection:
sling conns set POSTGRES type=postgres host=host.ip user=sling \
database=mydb password=mypass port=5432
Or in ~/.sling/env.yaml:
connections:
POSTGRES:
type: postgres
host: host.ip
user: sling
password: mypass
port: 5432
database: mydb
If your Postgres needs SSL, append sslmode: require. Then test it:
sling conns test POSTGRES
The Postgres connection docs cover SSL, IAM, and the rest.
Configuring the StarRocks Target
StarRocks needs two endpoints from Sling: a MySQL-protocol URL for queries and DDL (default port 9030), and an HTTP URL for stream load (default port 8030 on the FE or 8040 on a BE). Sling uses stream load for bulk ingest. It is the fastest path StarRocks offers: batches go through HTTP, parallelized across BEs, with no row-by-row INSERTs.
In ~/.sling/env.yaml:
connections:
STARROCKS:
type: starrocks
url: starrocks://root:@starrocks-host:9030/sys
fe_url: http://root:@starrocks-host:8040
url is for DDL, metadata, and read-back. fe_url is for stream load. The database in the URL (sys above) is just where Sling lands when discovering; the actual target database is whatever you reference in the replication. For CelerData Cloud or any managed StarRocks, swap the host and keep the same ports.
Test the connection:
sling conns test STARROCKS
The StarRocks connection docs cover all the available options.
Before the first replication, create the target database:
sling conns exec STARROCKS \
"create database if not exists demo_postgres_starrocks"
StarRocks doesn’t auto-create databases the way it auto-creates tables; this is a one-time bootstrap.
A Full-Refresh Replication
For this run the Postgres source has three tables in a demo_postgres_starrocks schema:
users— 10,000 rowsorders— 60,000 rowsevents— 30,000 rows, with anoccurred_attimestamp
The replication file:
# replication.yaml
source: POSTGRES
target: STARROCKS
defaults:
object: demo_postgres_starrocks.{stream_table}
mode: full-refresh
streams:
demo_postgres_starrocks.users:
demo_postgres_starrocks.orders:
demo_postgres_starrocks.events:
A few notes on what’s happening:
object:follows the<database>.<table>shape StarRocks expects.{stream_table}is a runtime variable. Sling substitutes the source table name so the same line works for every stream.mode: full-refreshrecreates the target tables from scratch on every run. Sling stages each batch to a<table>_tmptable first and swaps it in at the end, so reads against the live table see a consistent snapshot instead of a half-loaded one.
Run it:
sling run -r replication.yaml
Real output, trimmed:
INF Sling CLI | https://slingdata.io
INF Sling Replication [3 streams] | POSTGRES -> STARROCKS
INF [1 / 3] running stream demo_postgres_starrocks.users
INF created table `demo_postgres_starrocks`.`users_tmp`
INF streaming data
INF importing into StarRocks via stream load
INF created table `demo_postgres_starrocks`.`users`
INF inserted 10000 rows into `demo_postgres_starrocks`.`users` in 1 secs [9,382 r/s] [1.1 MB]
INF [2 / 3] running stream demo_postgres_starrocks.orders
INF created table `demo_postgres_starrocks`.`orders_tmp`
INF importing into StarRocks via stream load
INF created table `demo_postgres_starrocks`.`orders`
INF inserted 60000 rows into `demo_postgres_starrocks`.`orders` in 2 secs [22,830 r/s] [5.3 MB]
INF [3 / 3] running stream demo_postgres_starrocks.events
INF created table `demo_postgres_starrocks`.`events_tmp`
INF importing into StarRocks via stream load
INF created table `demo_postgres_starrocks`.`events`
INF inserted 30000 rows into `demo_postgres_starrocks`.`events` in 1 secs [16,019 r/s] [3.3 MB]
INF Sling Replication Completed in 7s | POSTGRES -> STARROCKS | 3 Successes | 0 Failures
100,000 rows across three tables, 7 seconds end-to-end. The importing into StarRocks via stream load line is the path that matters: Sling buffers each batch as a file, posts it to the FE’s stream-load endpoint, and lets StarRocks’s BEs ingest in parallel.
Verification
Sling speaks the same MySQL wire protocol StarRocks does, so verification is a direct query:
sling conns exec STARROCKS \
"select 'users' as t, count(*) as n from demo_postgres_starrocks.users
union all
select 'orders', count(*) from demo_postgres_starrocks.orders
union all
select 'events', count(*) from demo_postgres_starrocks.events"
+--------+-------+
| T | N |
+--------+-------+
| users | 10000 |
| orders | 60000 |
| events | 30000 |
+--------+-------+
Row counts match the source. A sample of orders confirms columns and types survived the trip:
sling conns exec STARROCKS \
"select id, user_id, amount, currency, status, placed_at
from demo_postgres_starrocks.orders order by id limit 3"
+-------+---------+------------+----------+-----------+-------------------------+
| ID | USER_ID | AMOUNT | CURRENCY | STATUS | PLACED_AT |
+-------+---------+------------+----------+-----------+-------------------------+
| 15361 | 5362 | 351.330000 | EUR | paid | 2026-05-22 19:45:24 ... |
| 8019 | 8020 | 287.380000 | BRL | cancelled | 2026-05-21 05:45:24 ... |
| 1868 | 1869 | 105.890000 | USD | refunded | 2026-05-22 12:45:24 ... |
+-------+---------+------------+----------+-----------+-------------------------+
Notice the extra _sling_row_id column on the StarRocks side: Sling adds it as the table key by default so that stream-load deduplication and primary-key matching have something stable to work with. You can disable it or override the key with the primary_key stream option if you have a natural one.
Postgres jsonb columns land as StarRocks JSON columns and stay queryable as structured data:
sling conns exec STARROCKS \
"select event_name, properties from demo_postgres_starrocks.events limit 3"
Running an Incremental Append
After the bulk load, the day-to-day shape is: pick up the new rows since the last run and append them. Sling’s incremental mode does this. For database targets like StarRocks, Sling queries MAX(update_key) directly on the target table — there’s no separate state file to manage. The function that does this lives in core/sling/task_func.go (getIncrementalValueViaDB).
Insert 2,500 new events on the source (a stand-in for fresh activity):
insert into demo_postgres_starrocks.events (id, user_id, event_name, properties, occurred_at)
select 30000 + n, 1 + (n % 10000), 'click',
jsonb_build_object('source', 'mobile', 'value', n),
now() + (n * interval '1 second')
from generate_series(1, 2500) g(n);
Switch to incremental mode in a single-stream replication that touches only events:
# replication-incremental.yaml
source: POSTGRES
target: STARROCKS
defaults:
object: demo_postgres_starrocks.{stream_table}
streams:
demo_postgres_starrocks.events:
mode: incremental
primary_key: [id]
update_key: occurred_at
sling run -r replication-incremental.yaml
INF Sling Replication | POSTGRES -> STARROCKS | demo_postgres_starrocks.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF created table `demo_postgres_starrocks`.`events_tmp`
INF streaming data
INF importing into StarRocks via stream load
INF inserted 2500 rows into `demo_postgres_starrocks`.`events` in 0 secs [3,642 r/s] [216 kB]
INF execution succeeded
Sling read the saved checkpoint, pulled only rows newer than the last occurred_at it saw, and merged exactly the 2,500 new rows in. A readback confirms the total:
sling conns exec STARROCKS \
"select count(*) as total, max(occurred_at) as latest
from demo_postgres_starrocks.events"
+-------+-------------------------+
| TOTAL | LATEST |
+-------+-------------------------+
| 32500 | 2026-05-25 09:28:11 ... |
+-------+-------------------------+
30,000 + 2,500 = 32,500. The next scheduled run will start from the new high-water mark on occurred_at.
The empty-load gotcha
There is one thing StarRocks does differently from most targets that’s worth knowing about up front. By default, StarRocks treats a stream load with zero rows as a hard failure:
fail to execute commit task: No partitions have data available for loading.
If you are sure there may be no data to be loaded, you can use
ADMIN SET FRONTEND CONFIG ('empty_load_as_error' = 'false')
to ensure such load jobs can succeed
This bites the moment you schedule an incremental replication on a table that doesn’t always have new rows between runs. Sling reads zero rows from the source, posts an empty batch to stream load, and StarRocks rejects it.
The fix is one line on the FE:
sling conns exec STARROCKS \
"ADMIN SET FRONTEND CONFIG ('empty_load_as_error' = 'false')"
After that, a no-op run logs cleanly instead of failing:
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
WRN no data or records found in stream. Nothing to do.
To allow Sling to create empty tables, set SLING_ALLOW_EMPTY=TRUE
INF inserted 0 rows into `demo_postgres_starrocks`.`events` in 0 secs
INF execution succeeded
This is a one-time cluster-level change. It survives restarts.
Common tweaks
StarRocks has four table types: duplicate, aggregate, unique, and primary key. Each has different read and write tradeoffs. Sling defaults to a duplicate-key table built on _sling_row_id, which is the safest choice for replicated data. If your downstream consumers need point lookups or row-level updates, set primary_key: on the stream and Sling will create a primary-key table instead.
StarRocks doesn’t auto-partition the way some columnar engines do. You set the partition and bucket strategy at table creation. For replicated time-series tables like events, logs, or transactions, it’s worth adding partition by date_trunc('month', occurred_at) via a pre_sql: hook so that partition pruning works on read.
Use a sql: block per stream to project columns or filter rows before they leave Postgres. Smaller stream-load batches mean smaller compactions on the BE side.
One last thing: a single-BE dev cluster needs 'replication_num' = '1' set as a property; production clusters with three or more BEs default to 3. If Sling’s auto-created tables land on the wrong default for your cluster, override it with a table_keys: block in the replication.
Where to go next
The same pattern works for any of Sling’s 30+ database sources into StarRocks: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.
If you came here because you’re picking a columnar target, the Postgres → ClickHouse walkthrough covers a very similar pattern with a different engine, and Postgres → DuckDB is the local-first option for prototyping. For a file-based landing zone instead of a database, Postgres → R2 as Parquet and Postgres → Apache Iceberg show the lakehouse path.
For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the Sling Platform.
Questions go to Discord or GitHub Issues.


