Exporting from MySQL and Loading into S3 with Sling

Slinger avatar
Slinger
Cover for Exporting from MySQL and Loading into S3 with Sling

The Challenge of Data Migration

Moving data from MySQL to Amazon S3 traditionally involves complex processes and multiple tools. Organizations often face challenges such as:

  • Writing and maintaining custom scripts using multiple libraries
  • Managing database connections and AWS credentials
  • Handling different file formats (CSV, Parquet, JSON)
  • Implementing error handling and retries
  • Setting up monitoring and logging
  • Dealing with schema changes and data type mappings
  • Managing incremental updates

According to various online discussions and Stack Overflow posts, developers frequently struggle with these challenges, leading to brittle solutions that require constant maintenance. Let’s explore how Sling simplifies this entire process with its modern approach to data movement.

Getting Started with Sling

Before we dive into exporting data from MySQL to S3, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.

Installation

Choose the installation method that matches your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by checking its version:

# Check Sling version
sling --version

Setting Up Connections

Before we can start transferring data, we need to configure our source (MySQL) and target (S3) connections. Sling provides multiple ways to manage connections securely.

MySQL Connection Setup

For MySQL, we need to ensure our database user has the appropriate permissions to read the data we want to replicate. Here’s how to create a user with the necessary permissions:

-- Create a new user for Sling
CREATE USER 'sling'@'%' IDENTIFIED BY '<password>';

-- Grant read permissions for the source data
GRANT SELECT ON <source_schema>.* TO 'sling'@'%';

Now we can set up the MySQL connection in Sling using any of these methods:

Using sling conns set Command

# Set up MySQL connection using command line
sling conns set mysql_source type=mysql host=<host> user=<user> database=<database> password=<password> port=3306

Using Environment Variables

# Set up MySQL connection using environment variables
export MYSQL_SOURCE='mysql://myuser:mypass@host:3306/mydatabase?tls=skip-verify'

Using Sling Environment File

Add to your ~/.sling/env.yaml:

connections:
  mysql_source:
    type: mysql
    host: localhost
    database: mydatabase
    user: myuser
    password: mypassword
    port: 3306

S3 Connection Setup

For S3, you’ll need to provide AWS credentials and bucket information. Here’s how to set up the connection:

Using sling conns set Command

# Set up S3 connection using AWS credentials
sling conns set s3_target type=s3 bucket=my-bucket access_key_id=YOUR_ACCESS_KEY secret_access_key=YOUR_SECRET_KEY region=us-east-1

Using Environment Variables

# Set S3 connection using environment variables
export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
export AWS_REGION=us-east-1

Using Sling Environment File

Add to your ~/.sling/env.yaml:

connections:
  s3_target:
    type: s3
    bucket: my-bucket
    access_key_id: YOUR_ACCESS_KEY
    secret_access_key: YOUR_SECRET_KEY
    region: us-east-1

Testing Connections

After setting up your connections, verify them using Sling’s connection management commands:

# List all configured connections
sling conns list

# Test MySQL connection
sling conns test mysql_source

# Test S3 connection
sling conns test s3_target

# List available tables in MySQL
sling conns discover mysql_source

Basic Data Export with CLI Flags

The quickest way to start exporting data from MySQL to S3 is using Sling’s CLI flags. This method is perfect for simple transfers and testing your setup.

Simple Export Example

Here’s a basic example of exporting a MySQL table to different formats in S3:

# Export to CSV in S3
sling run \
  --src-conn mysql_source \
  --src-stream "users" \
  --tgt-conn s3_target \
  --tgt-object "data/users.csv"

# Export to JSON in S3
sling run \
  --src-conn mysql_source \
  --src-stream "users" \
  --tgt-conn s3_target \
  --tgt-object "data/users.json"

# Export to Parquet in S3
sling run \
  --src-conn mysql_source \
  --src-stream "users" \
  --tgt-conn s3_target \
  --tgt-object "data/users.parquet"

Advanced Export with Options

For more control over the export process, you can use additional options:

# Advanced export with source and target options
sling run \
  --src-conn mysql_source \
  --src-stream "users" \
  --select "id, name, email, created_at, updated_at" \
  --where "updated_at > '2024-01-01'" \
  --tgt-conn s3_target \
  --tgt-object "data/users.parquet" \
  --tgt-options '{
    "compression": "snappy",
    "file_max_bytes": 104857600
  }' \
  --mode incremental \
  --primary-key "id" \
  --update-key "updated_at"

Advanced Data Export with YAML Configuration

For more complex scenarios or when you need to export multiple tables with specific configurations, using a YAML configuration file is recommended. This approach provides better maintainability and version control.

Basic YAML Configuration

Here’s a simple example that exports multiple tables to different formats:

# mysql_to_s3.yaml
source: mysql_source
target: s3_target

streams:
  # Export users table to CSV
  users:
    object: "data/users/users.csv"
    mode: incremental
    primary_key: [id]
    update_key: updated_at
    select: [id, name, email, created_at, updated_at]
    target_options:
      format: csv
      datetime_format: "YYYY-MM-DDTHH:mm:ss"

  # Export orders table to Parquet
  orders:
    object: "data/orders/{stream_table}_{run_timestamp}.parquet"
    mode: incremental
    primary_key: [order_id]
    update_key: modified_at
    
    source_options:
      table_keys:
        primary: [order_id]
      add_new_columns: true
    
    target_options:
      format: parquet
      compression: snappy
      file_max_bytes: 104857600

Complex YAML Configuration

Here’s a more advanced configuration that demonstrates additional features:

# mysql_to_s3_advanced.yaml
source: mysql_source
target: s3_target

streams:
  # Export customer data with transformations
  'mysql.customers':
    object: "data/customers/{stream_table}/{date}/data.json"
    mode: incremental
    primary_key: [customer_id]
    update_key: last_modified
    where: "status = 'active'"
    
    target_options:
      format: json
      datetime_format: "YYYY-MM-DDTHH:mm:ss"
      column_casing: snake

  # Export transaction data with partitioning
  transactions:
    object: "data/transactions/{part_year}/{part_month}/"
    mode: incremental
    primary_key: [transaction_id]
    update_key: created_at
    
    target_options:
      format: parquet
      compression: snappy
      file_max_bytes: 104857600

To run these YAML configurations:

# Run the basic configuration
sling run -r mysql_to_s3.yaml

# Run the advanced configuration
sling run -r mysql_to_s3_advanced.yaml

Using Sling Platform

While the CLI is excellent for development and testing, Sling Platform provides a comprehensive web interface for managing your data operations at scale. The platform offers:

Sling Platform Editor

  • Visual connection management
  • YAML configuration editor with syntax highlighting
  • Job scheduling and monitoring
  • Team collaboration features
  • Audit logging and history tracking

The platform makes it easy to:

  1. Set up and manage connections visually
  2. Create and edit replication configurations
  3. Monitor job execution and performance
  4. Schedule recurring exports
  5. Collaborate with team members

Next Steps

Now that you’ve learned how to export MySQL data to S3 using Sling, here are some resources to help you go further:

Whether you’re just getting started with data migration or looking to optimize your existing workflows, Sling provides the tools and flexibility you need to succeed. Start with the CLI for local development, then scale up to the platform as your needs grow.