Effortless MySQL to BigQuery Data Migration with Sling

Slinger avatar
Slinger
Cover for Effortless MySQL to BigQuery Data Migration with Sling

Introduction

Moving data from MySQL databases to Google BigQuery traditionally involves complex ETL processes, custom scripts, and significant engineering effort. Organizations often face challenges such as:

  • Setting up and maintaining data extraction processes from MySQL
  • Managing authentication and permissions across platforms
  • Handling schema compatibility and data type conversions
  • Implementing efficient data loading into BigQuery
  • Monitoring and maintaining the data pipeline
  • Dealing with incremental updates and schema changes

According to industry research, setting up a traditional data pipeline between MySQL and BigQuery can take weeks or even months, requiring specialized knowledge of both platforms and custom code development. This complexity often leads to increased costs, maintenance overhead, and potential reliability issues.

Sling simplifies this entire process by providing a streamlined, configuration-based approach that eliminates the need for custom code and complex infrastructure setup. With Sling, you can:

  • Configure connections with simple environment variables or CLI commands
  • Automatically handle schema mapping and data type conversions
  • Optimize performance with built-in batch processing and parallel execution
  • Monitor and manage replications through both CLI and web interface
  • Implement incremental updates with minimal configuration

In this guide, we’ll walk through the process of setting up a MySQL to BigQuery replication using Sling, demonstrating how to overcome common challenges and implement an efficient data pipeline in minutes rather than days or weeks.

Installation

Getting started with Sling is straightforward. You can install it using various package managers depending on your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by running:

# Check sling version
sling --version

For more detailed installation instructions and options, visit the installation guide.

Setting Up Connections

Before we can start replicating data, we need to configure our source (MySQL) and target (BigQuery) connections. Sling provides multiple ways to manage connections, including environment variables, the sling conns command, and a YAML configuration file.

MySQL Connection Setup

For MySQL, you’ll need:

  • Host and port information
  • Username and password
  • Database name
  • Schema (optional)
  • SSL/TLS configuration (if required)

Here’s how to set up the MySQL connection:

  1. Using the sling conns set Command
# Set up MySQL connection using CLI
sling conns set mysql_source type=mysql \
  host=<host> \
  user=<user> \
  password=<password> \
  database=<database> \
  port=3306
  1. Using Environment Variables
# Set up using connection URL format
export MYSQL_SOURCE='mysql://user:password@host:3306/database?tls=skip-verify'
  1. Using Sling Environment YAML

Add to your ~/.sling/env.yaml:

connections:
  mysql_source:
    type: mysql
    host: your_host
    user: your_username
    password: your_password
    database: your_database
    port: 3306
    schema: public    # optional

BigQuery Connection Setup

For BigQuery, you’ll need:

  • Google Cloud project ID
  • Service account credentials with appropriate permissions
  • Dataset information
  • Google Cloud Storage bucket (for data transfer)

Here’s how to set up the BigQuery connection:

  1. Using the sling conns set Command
# Set up BigQuery connection using CLI
sling conns set bigquery_target type=bigquery \
  project=<project> \
  dataset=<dataset> \
  gc_bucket=<gc_bucket> \
  key_file=/path/to/service.account.json \
  location=US
  1. Using Environment Variables
# Set up using service account JSON content
export GC_KEY_BODY='{"type": "service_account", ...}'
export BIGQUERY_TARGET='{type: bigquery, project: <project>, dataset: <dataset>, gc_bucket: <gc_bucket>}'
  1. Using Sling Environment YAML

Add to your ~/.sling/env.yaml:

connections:
  bigquery_target:
    type: bigquery
    project: your-project
    dataset: your_dataset
    gc_bucket: your-bucket
    key_file: /path/to/service.account.json
    location: US

Testing Connections

After setting up your connections, it’s important to verify they work correctly:

# Test MySQL connection
sling conns test mysql_source

# Test BigQuery connection
sling conns test bigquery_target

# List available tables in MySQL
sling conns discover mysql_source

You can also manage your connections through the Sling Platform’s web interface:

Sling Connections Management

For more details about connection configuration, visit the environment documentation.

Data Replication Methods

Sling provides multiple ways to replicate data from MySQL to BigQuery. Let’s explore both CLI-based and YAML-based approaches, starting from simple configurations to more advanced use cases.

Using CLI Flags

The quickest way to start a replication is using CLI flags. Here are two examples:

Basic CLI Example

This example shows how to replicate a single table with default settings:

# Replicate a single table from MySQL to BigQuery
sling run \
  --src-conn mysql_source \
  --src-stream "analytics.daily_sales" \
  --tgt-conn bigquery_target \
  --tgt-object "ANALYTICS.DAILY_SALES" \
  --tgt-options '{ "column_casing": "upper", "add_new_columns": true }'

Advanced CLI Example

This example demonstrates more advanced options including column selection and incremental updates:

# Replicate with advanced options
sling run \
  --src-conn mysql_source \
  --src-stream "analytics.customer_orders" \
  --select "order_id, customer_id, order_date, total_amount" \
  --tgt-conn bigquery_target \
  --tgt-object "ANALYTICS.CUSTOMER_ORDERS" \
  --mode incremental \
  --primary-key order_id \
  --update-key order_date

For more CLI flag options, visit the CLI flags documentation.

Using YAML Configuration

For more complex replication scenarios, YAML configuration files provide better maintainability and reusability. Let’s look at two examples:

Basic YAML Example

Create a file named mysql_to_bigquery.yaml:

# Define source and target connections
source: mysql_source
target: bigquery_target

# Default settings for all streams
defaults:
  mode: full-refresh
  target_options:
    column_casing: snake

# Define streams to replicate
streams:
  analytics.products:
    object: ANALYTICS.PRODUCTS
    primary_key: product_id

  analytics.orders:
    object: ANALYTICS.ORDERS
    primary_key: order_id
    mode: incremental
    update_key: order_date
    select:
      - order_id
      - customer_id
      - order_date
      - total_amount
      - status

Advanced YAML Example

Here’s a more complex example that demonstrates additional features:

source: mysql_source
target: bigquery_target

defaults:
  mode: incremental
  target_options:
    column_casing: upper
    add_new_columns: true

streams:
  analytics.customers:
    object: ANALYTICS.CUSTOMERS
    primary_key: id
    update_key: updated_at
    select:
      - id
      - email
      - first_name
      - last_name
      - created_at
      - updated_at
    target_options:
      table_keys:
        unique:
          - email

  analytics.orders_*:
    object: ANALYTICS.ORDERS
    primary_key: order_id
    update_key: order_date
    columns:
      total_amount: float64
      status: string
    target_options:
      add_new_columns: true
      adjust_column_type: true

To run a replication using a YAML file:

# Run the replication
sling run -r mysql_to_bigquery.yaml

For more information about YAML configuration options, see:

Using Sling Platform

While the CLI is powerful for local development and automation, the Sling Platform provides a user-friendly interface for managing and monitoring your data pipelines.

Creating Replications in the UI

The Sling Platform features a visual editor for creating and managing replications:

Sling Replication Editor

Monitoring Executions

Track the progress and performance of your replications in real-time:

Sling Execution Monitoring

Getting Started

To get started with Sling:

  1. Install the Sling CLI using the instructions above
  2. Set up your MySQL and BigQuery connections
  3. Create a simple replication using either CLI flags or YAML
  4. Monitor the replication progress
  5. Explore more advanced features as needed

For more examples and detailed documentation:

The Sling community is here to help! Join us on Discord or check out our GitHub repository for more resources and support.