Sync MySQL to MotherDuck with Sling

Slinger avatar
Slinger
Cover for Sync MySQL to MotherDuck with Sling

Introduction

MotherDuck is a serverless analytics service built on DuckDB. It hosts DuckDB databases in the cloud and keeps the same SQL surface you’d use locally. MySQL is what most apps run on for transactional data.

So you usually want both: MySQL for the app, MotherDuck for analytics. The part in the middle that copies tables across is what Sling does.

This guide replicates a MySQL schema into MotherDuck with Sling, in both full-refresh and incremental modes. The CLI output and row counts below come from an actual run, not a fabricated one.

Installing Sling

Sling is a single binary. Pick whichever install method fits your environment:

# Homebrew (macOS)
brew install slingdata-io/sling/sling

# curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Python (pip)
pip install sling

Confirm the install:

sling --version

Full installation notes are in the Sling CLI Getting Started Guide.

Configuring the MySQL Source

Sling reads connection details from ~/.sling/env.yaml, environment variables, or sling conns set. For MySQL you’ll need host, port, database, user, and password.

A read-only Sling user is the right shape for replication:

CREATE USER 'sling'@'%' IDENTIFIED BY '<password>';
GRANT SELECT ON <source_schema>.* TO 'sling'@'%';

Using sling conns set:

sling conns set MYSQL_SOURCE type=mysql host=host.ip user=sling \
  database=mydb password=mypass port=3306

Or in ~/.sling/env.yaml:

connections:
  MYSQL_SOURCE:
    type: mysql
    host: host.ip
    user: sling
    password: mypass
    port: 3306
    database: mydb

If your MySQL requires TLS, append ?tls=skip-verify to a URL form, or set tls: skip-verify in the YAML. Test it:

sling conns test MYSQL_SOURCE

The MySQL connection docs cover SSL, IAM auth, and other options.

Configuring the MotherDuck Target

A MotherDuck connection needs the database name and a service token. You can generate a token from the MotherDuck UI.

sling conns set MOTHERDUCK type=motherduck \
  database=my_db motherduck_token=eyJhbGciOi...

Or the URL form:

sling conns set MOTHERDUCK url="motherduck://my_db?motherduck_token=eyJhbGciOi..."

Or in ~/.sling/env.yaml:

connections:
  MOTHERDUCK:
    type: motherduck
    database: my_db
    motherduck_token: eyJhbGciOi...

Test it:

sling conns test MOTHERDUCK

Full options (attach modes, copy method, DuckDB version pinning) are in the MotherDuck connection docs.

A Full-Refresh Replication

For this run the MySQL source has three tables in a demo_mysql_motherduck schema:

  • customers — 5,000 rows
  • orders — 30,000 rows, with an updated_at timestamp
  • events — 60,000 rows, with an occurred_at timestamp

The replication file lives next to wherever you want to run Sling from:

# replication.yaml
source: MYSQL_SOURCE
target: MOTHERDUCK

defaults:
  mode: full-refresh
  object: demo_mysql_motherduck.{stream_table}

streams:
  demo_mysql_motherduck.customers:
    primary_key: [customer_id]

  demo_mysql_motherduck.orders:
    primary_key: [order_id]
    update_key: updated_at

  demo_mysql_motherduck.events:
    primary_key: [event_id]
    update_key: occurred_at

A few things to point out:

  • object: demo_mysql_motherduck.{stream_table} is a runtime variable. Sling substitutes the source table name into the target object, so you don’t repeat yourself per stream.
  • primary_key and update_key are set even though the mode here is full-refresh. The next section flips to incremental without touching those declarations; only the mode changes.
  • The target schema gets created automatically by Sling on the first run. No manual CREATE SCHEMA needed on the MotherDuck side.

Run it:

sling run -r replication.yaml

Real output, trimmed for readability:

INF Sling Replication [3 streams] | MYSQL_SOURCE -> MOTHERDUCK

INF [1 / 3] running stream demo_mysql_motherduck.customers
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table "demo_mysql_motherduck"."customers"
INF inserted 5000 rows into "demo_mysql_motherduck"."customers" in 6 secs [722 r/s] [386 kB]
INF execution succeeded

INF [2 / 3] running stream demo_mysql_motherduck.orders
INF created table "demo_mysql_motherduck"."orders"
INF inserted 30000 rows into "demo_mysql_motherduck"."orders" in 6 secs [4,433 r/s] [3.0 MB]
INF execution succeeded

INF [3 / 3] running stream demo_mysql_motherduck.events
INF created table "demo_mysql_motherduck"."events"
INF inserted 60000 rows into "demo_mysql_motherduck"."events" in 7 secs [8,204 r/s] [3.9 MB]
INF execution succeeded

INF Sling Replication Completed in 24s | MYSQL_SOURCE -> MOTHERDUCK | 3 Successes | 0 Failures

95,000 rows across three tables, end to end, in 24 seconds. The _tmp tables that show up in the full log are Sling’s staging step before it swaps the data into the final target. They get cleaned up automatically.

Verification

A count(*) from MotherDuck right after the run:

select 'customers' as t, count(*) c from demo_mysql_motherduck.customers
union all select 'orders',    count(*)   from demo_mysql_motherduck.orders
union all select 'events',    count(*)   from demo_mysql_motherduck.events;
+-----------+-------+
| T         |     C |
+-----------+-------+
| customers |  5000 |
| orders    | 30000 |
| events    | 60000 |
+-----------+-------+

A small sample to confirm the data made the trip with types intact:

select event_id, customer_id, event_type, region, occurred_at
from demo_mysql_motherduck.events
order by event_id limit 5;
+----------+-------------+------------+--------+-------------------------------+
| EVENT_ID | CUSTOMER_ID | EVENT_TYPE | REGION | OCCURRED_AT                   |
+----------+-------------+------------+--------+-------------------------------+
|        1 |           2 | click      | us-2   | 2025-01-01 00:00:01 +0000 UTC |
|        2 |           3 | signup     | us-3   | 2025-01-01 00:00:02 +0000 UTC |
|        3 |           4 | purchase   | us-4   | 2025-01-01 00:00:03 +0000 UTC |
|        4 |           5 | logout     | us-1   | 2025-01-01 00:00:04 +0000 UTC |
|        5 |           6 | page_view  | us-2   | 2025-01-01 00:00:05 +0000 UTC |
+----------+-------------+------------+--------+-------------------------------+

Numeric, varchar, and timestamp columns round-tripped cleanly. Nullable columns (region is null on every seventh row in the source) are preserved as nulls, not as the string "NULL".

Switching to Incremental

Full-refreshing a 60,000-row table every day is fine. Full-refreshing a 600-million-row event table every day is not. Sling’s incremental mode reads only the rows newer than the highest update_key already in the target.

Drop customers from the streams (it changes slowly enough to keep on full-refresh in a separate run, or rebuild weekly) and switch the mode:

# replication-incremental.yaml
source: MYSQL_SOURCE
target: MOTHERDUCK

defaults:
  mode: incremental
  object: demo_mysql_motherduck.{stream_table}

streams:
  demo_mysql_motherduck.orders:
    primary_key: [order_id]
    update_key: updated_at

  demo_mysql_motherduck.events:
    primary_key: [event_id]
    update_key: occurred_at

Insert 1,000 new orders and 2,500 new events on the source (this simulates a day’s worth of data flowing in), then run again:

sling run -r replication-incremental.yaml
INF Sling Replication [2 streams] | MYSQL_SOURCE -> MOTHERDUCK

INF [1 / 2] running stream demo_mysql_motherduck.orders
INF getting checkpoint value (updated_at)
INF writing to target database [mode: incremental]
INF inserted 1000 rows into "demo_mysql_motherduck"."orders" in 6 secs [166 r/s] [102 kB]
INF execution succeeded

INF [2 / 2] running stream demo_mysql_motherduck.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF inserted 2500 rows into "demo_mysql_motherduck"."events" in 6 secs [397 r/s] [166 kB]
INF execution succeeded

INF Sling Replication Completed in 14s | MYSQL_SOURCE -> MOTHERDUCK | 2 Successes | 0 Failures

The getting checkpoint value line is where Sling looks at the target, finds the largest updated_at already present, and uses that as the lower bound on the source query. Only the new rows come across:

select 'orders' as t, count(*) c from demo_mysql_motherduck.orders
union all select 'events',  count(*)   from demo_mysql_motherduck.events;
+--------+-------+
| T      |     C |
+--------+-------+
| orders | 31000 |
| events | 62500 |
+--------+-------+

Orders went from 30,000 to 31,000. Events went from 60,000 to 62,500. Matches what was inserted on the source.

If you need updates as well as inserts (a row’s updated_at changes and the existing row should be replaced rather than duplicated), keep mode: incremental and make sure primary_key is set. Sling will upsert against the primary key instead of appending. The replication modes docs cover the trade-offs.

Common Tweaks

A few options you’ll reach for once the basics are in place:

  • Schema and column casing. MotherDuck (DuckDB) is case-sensitive, and Sling defaults to keeping the source casing. MySQL identifiers are typically lowercase already, but if your schema uses backticked mixed-case names, add target_options: { column_casing: snake } under defaults to land everything snake_case in MotherDuck.
  • Add new columns automatically. When the source schema changes, set target_options: { add_new_columns: true } so Sling alters the MotherDuck table on the next run. Without it, new source columns get dropped at the boundary.
  • Pick a copy method. The default for MotherDuck is csv_http. For very wide rows or large text values, switch to arrow_http via copy_method: arrow_http in the connection config. It’s usually faster and avoids CSV escaping edge cases.
  • Filter at the source. Use a custom sql: block in a stream to project columns or filter rows before they leave MySQL. Cheaper than dragging unused columns to MotherDuck, and it keeps row payloads small for the network hop.

Where to Go Next

The same replication pattern works for any of Sling’s 30+ database sources into MotherDuck: PostgreSQL, SQL Server, Snowflake, BigQuery, and the rest. Swap the source connection and leave the target alone.

If you’d rather store flat files than warehouse tables, see Exporting from MySQL and Loading into S3 with Sling, which uses the same replication file shape with a file-system target. For the equivalent flow from a Postgres source, see PostgreSQL to MotherDuck. For team workflows with scheduling and alerting on top of the same CLI, look at the Sling Platform.

Questions go to Discord or GitHub Issues.