How to Replicate MySQL to BigQuery with Sling

Slinger avatar
Slinger
Cover for How to Replicate MySQL to BigQuery with Sling

How to Replicate MySQL to BigQuery with Sling

Last updated: June 2026

Getting MySQL data into BigQuery usually means picking a tradeoff. Hand-rolled scripts are cheap to start and expensive to keep alive once schemas drift. Managed connectors are quick to set up but bill per row and put your pipeline behind someone else’s control plane. Sling sits in between: a single binary, a few lines of YAML, and a load path that uses BigQuery’s own bulk ingest underneath.

This guide walks through a real replication, end to end. Everything below — the row counts, the timings, the type mapping — comes from an actual run against a MySQL 8.4 source and a live BigQuery dataset. You can reproduce it.

Installation

Sling is a single binary with no runtime dependencies. Install it however suits your setup:

# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash

# Windows
irm https://slingdata.io/install.ps1 | iex

# Python
pip install sling

Confirm it’s on your path:

sling --version

Connection setup

Sling needs two connections: the MySQL source and the BigQuery target. Both can be set with sling conns set, which writes them to ~/.sling/env.yaml.

MySQL source

sling conns set mysql_source type=mysql host=127.0.0.1 port=3306 \
  user=root password=mypass database=demo

Or with a connection string:

sling conns set mysql_source url="mysql://root:[email protected]:3306/demo"

BigQuery target

BigQuery authenticates with a service-account key. The account needs BigQuery Data Editor and BigQuery Job User on the target project.

sling conns set bigquery_target type=bigquery \
  project=my-project dataset=demo \
  key_file=/path/to/service-account.json

If you have a Google Cloud Storage bucket handy, add gc_bucket=my-bucket. Sling will stage batches there and trigger a BigQuery load job from GCS, which is the fastest bulk path. Without a bucket, Sling stages locally and still loads in bulk — that’s the setup used for every number in this guide.

Test both connections

sling conns test mysql_source
sling conns test bigquery_target
8:42AM INF success!

If the BigQuery test fails, it’s almost always the service account missing a role or the project not matching the key file. Fix that before going further — a half-working credential will fail mid-run, not at test time.

The source tables

The lab source is a small e-commerce schema: 20,000 customers, 2,000 products, and 80,000 orders. The orders table carries an ordered_at timestamp, which matters later for incremental loads, plus a nullable note column so the type-handling examples have something to point at.

-- on the MySQL source
SELECT 'customers' AS t, COUNT(*) AS c FROM customers
UNION ALL SELECT 'products', COUNT(*) FROM products
UNION ALL SELECT 'orders',   COUNT(*) FROM orders;
t          c
customers  20000
products   2000
orders     80000

Full refresh: the first load

A replication is one YAML file. The defaults block sets the mode and the target naming; streams lists what to move. The {stream_table} token maps each source table to a BigQuery table of the same name.

# replication.yaml
source: mysql_source
target: bigquery_target

defaults:
  mode: full-refresh
  object: demo.{stream_table}

streams:
  demo.customers:
  demo.products:
  demo.orders:

Run it:

sling run -r replication.yaml
INF [1 / 3] running stream demo.customers
INF writing to target database [mode: full-refresh]
INF created table `demo`.`customers_tmp`
INF streaming data
INF importing into bigquery via local storage
INF inserted 20000 rows into `demo`.`customers` in 24 secs [831 r/s] [1.2 MB]
...
INF [3 / 3] running stream demo.orders
INF created table `demo`.`orders_tmp`
INF importing into bigquery via local storage
INF inserted 80000 rows into `demo`.`orders` in 23 secs [3,393 r/s] [5.7 MB]
INF execution succeeded

INF Sling Replication Completed in 1m 12s | mysql_source -> bigquery_target | 3 Successes | 0 Failures

Three tables, 102,000 rows, 1 minute 12 seconds. Two things in that log are worth noticing. First, Sling writes to a _tmp table and swaps it in once the load is clean — a failed run leaves the existing table untouched rather than a half-loaded mess. Second, importing into bigquery via local storage is the bulk load path: Sling batches rows into files and hands them to a BigQuery load job instead of inserting one row at a time. That’s why 80,000 rows land in 23 seconds even without a GCS bucket configured.

You don’t have to pre-create the tables. Sling creates them on first run and adjusts them on later runs when the schema changes. You do have to create the dataset once — BigQuery won’t let a load job create a dataset implicitly.

Verification

Trust the load, but check it. Run a count and a min/max against the target:

sling conns exec bigquery_target \
  "select count(*) as orders, min(ordered_at) as first_order, max(ordered_at) as last_order from demo.orders"
ORDERS  FIRST_ORDER                    LAST_ORDER
80000   2025-01-01 00:01:00 +0000 UTC  2025-02-25 13:20:00 +0000 UTC

And a sample:

sling conns exec bigquery_target \
  "select order_id, customer_id, quantity, amount, ordered_at from demo.orders order by order_id limit 5"
ORDER_ID  CUSTOMER_ID  QUANTITY  AMOUNT        ORDERED_AT
1         2            2         12.020000000  2025-01-01 00:01:00 +0000 UTC
2         3            3         21.060000000  2025-01-01 00:02:00 +0000 UTC
3         4            4         32.120000000  2025-01-01 00:03:00 +0000 UTC
4         5            5         45.200000000  2025-01-01 00:04:00 +0000 UTC
5         6            6         60.300000000  2025-01-01 00:05:00 +0000 UTC

Row count matches the source, timestamps survived the trip, and the decimal amounts kept their precision.

Type mapping

MySQL and BigQuery don’t share a type system, so Sling maps between them. You can see the result by reading the target schema:

sling conns exec bigquery_target \
  "select column_name, data_type from demo.INFORMATION_SCHEMA.COLUMNS where table_name='orders' order by ordinal_position"
COLUMN_NAME  DATA_TYPE
order_id     INT64
customer_id  INT64
product_id   INT64
quantity     INT64
amount       NUMERIC
note         STRING
ordered_at   TIMESTAMP

MySQL INT becomes BigQuery INT64, DECIMAL(12,2) becomes NUMERIC (so the amounts keep full precision instead of being flattened to a float), VARCHAR becomes STRING, and TIMESTAMP stays TIMESTAMP. The nullable note column carries its nullability across. When a clean mapping isn’t possible, Sling renders the value as a string rather than dropping precision silently — and you can override any column with a columns: block in the stream if you need a specific BigQuery type.

Incremental loads: only what changed

Full-refresh is fine for a first load or a small dimension table. For anything that grows, you want to move only the new rows. Switch the mode to incremental and tell Sling which column tracks change and which column identifies a row:

# replication-incremental.yaml
source: mysql_source
target: bigquery_target

defaults:
  mode: incremental
  object: demo.{stream_table}
  primary_key: [order_id]
  update_key: ordered_at

streams:
  demo.orders:

Say 2,500 new orders land on the source. Re-run with the incremental file:

sling run -r replication-incremental.yaml
INF Sling Replication | mysql_source -> bigquery_target | demo.orders
INF getting checkpoint value (ordered_at)
INF created table `demo`.`orders_tmp`
INF streaming data
INF importing into bigquery via local storage
INF inserted 2500 rows into `demo`.`orders` in 44 secs [56 r/s] [181 kB]
INF execution succeeded

The line that does the work is getting checkpoint value (ordered_at). Sling reads the maximum ordered_at already in the BigQuery target, then pulls only source rows newer than that. There’s no separate state file to manage — the target table is the checkpoint. The primary_key lets Sling upsert, so a row that was updated rather than inserted is merged instead of duplicated.

Run it again with nothing new on the source and you get a clean no-op:

INF getting checkpoint value (ordered_at)
INF inserted 0 rows into `demo`.`orders` in 15 secs [0 r/s]
INF execution succeeded

Zero rows, no error. That’s what you want from a scheduled job that fires on a cron whether or not there’s anything to move.

A final count confirms the math:

sling conns exec bigquery_target "select count(*) as orders, max(ordered_at) as last_order from demo.orders"
ORDERS  LAST_ORDER
82500   2026-06-02 17:40:00 +0000 UTC

80,000 from the first load plus 2,500 from the incremental run, with the high-water mark advanced to the newest order.

Replicating many tables at once

Listing every table by hand doesn’t scale past a handful. Use a wildcard in streams and the {stream_table} token in the target object to move a whole schema:

source: mysql_source
target: bigquery_target

defaults:
  mode: incremental
  object: demo.{stream_table}
  primary_key: [id]
  update_key: updated_at

streams:
  demo.*:

Every table in demo gets replicated, each into a BigQuery table of the same name. You can still override a single table by listing it explicitly below the wildcard — the explicit entry wins. This is the pattern to reach for when you’re mirroring an entire application database into the warehouse.

Scheduling

Once the YAML works, scheduling is just a cron entry:

0 * * * * cd /path/to/configs && sling run -r replication-incremental.yaml >> /var/log/sling.log 2>&1

Because incremental mode tracks its own checkpoint against the target, each run is idempotent — if one fails or is skipped, the next picks up exactly where the last left off. If you’d rather not babysit cron, the Sling Platform handles scheduling, alerting, and run history for you, while keeping the actual data movement on your own infrastructure.

Conclusion

Moving MySQL to BigQuery with Sling is two YAML files and one command. Full-refresh for the first load, incremental for everything after, a wildcard when you’re mirroring a whole schema. The load path uses BigQuery’s bulk ingest. The type mapping is reasonable out of the box and you can override it per column. And the checkpoint lives in the target table itself, so there’s no separate state store to back up or lose.

The numbers here are from a real run: 102,000 rows on the first load in 72 seconds, 2,500 rows on the incremental in 44, a clean no-op when there’s nothing to move. Point it at your own MySQL instance and you’ll get the same shape.

If you work with MySQL as a source, these companion walkthroughs cover other common targets:

And if BigQuery is your destination from other sources:

For the reasoning behind Sling’s single-binary, no-control-plane design, see the Sling blog.

Frequently asked questions

Do I need a GCS bucket to load into BigQuery with Sling?

No, but it helps at scale. With gc_bucket set, Sling stages batches in GCS and fires a BigQuery load job from there — the fastest bulk path. Without it, Sling stages locally and still loads in bulk; that’s how the 102,000-row load in this guide ran in 72 seconds with no bucket configured. For very large tables, add the bucket.

Do I have to pre-create the BigQuery tables?

No. Sling creates target tables on first run and adjusts them on later runs (adding columns, widening types) when the connection allows DDL. You do need to create the dataset once, because BigQuery won’t create datasets implicitly during a load job.

How does Sling handle MySQL types that don’t map cleanly to BigQuery?

It applies a built-in mapping — INT to INT64, DECIMAL to NUMERIC, VARCHAR to STRING, TIMESTAMP to TIMESTAMP, and so on. When a fully lossless mapping isn’t available, the value is rendered as a string so precision is never dropped silently. Override any column with a columns: block in the stream.

Can I run incremental loads without a timestamp column?

Yes, if the table has a monotonically increasing key. Set that column as the update_key and Sling tracks the high-water mark against the target. If there’s neither a timestamp nor a monotonic key, you’re limited to full-refresh or snapshot modes.

Where does Sling store the incremental checkpoint?

For database targets like BigQuery, there’s no separate state file — Sling reads MAX(update_key) directly from the target table on each run. The target is the checkpoint, which means a restored backup or a manual reload self-corrects the next time the job runs.

How do I replicate a whole MySQL database into BigQuery?

Use a wildcard stream (demo.*) with {stream_table} in the target object. Every table is replicated into a same-named BigQuery table. Override individual tables by listing them explicitly below the wildcard.