Export MySQL Data to S3 Parquet Files with Sling

Slinger avatar
Slinger
Cover for Export MySQL Data to S3 Parquet Files with Sling

The Challenge of Modern Data Pipelines

Building and maintaining data pipelines can be a complex endeavor. When you need to export data from MySQL to AWS S3 in Parquet format, you typically face several challenges:

  • Setting up and maintaining infrastructure
  • Writing and testing ETL code
  • Managing data formats and schema changes
  • Handling incremental updates
  • Monitoring and error handling

Sling simplifies this entire process by providing a streamlined solution that handles all these complexities. With just a simple YAML configuration, you can set up a robust data pipeline that exports your MySQL data to S3 in optimized Parquet format.

Getting Started with Sling

To begin using Sling, you’ll need to install the CLI tool. The installation process is straightforward:

# Install Sling CLI on macOS using Homebrew
brew install slingdata-io/sling/sling

# Install on Linux/WSL using curl
curl -L https://github.com/slingdata-io/sling-cli/releases/latest/download/sling-linux-amd64.tar.gz | tar -xz && sudo mv sling /usr/local/bin/

# Verify installation
sling --version

After installation, you’ll need to configure your connections. Sling makes this process simple with the sling conns set command:

# Set up MySQL connection
sling conns set mysql_source url="mysql://user:pass@host:3306/dbname"

# Set up S3 connection
sling conns set s3_target url="s3://access_key:secret_key@bucket_name"

Exporting MySQL to S3 with Sling CLI

The heart of Sling’s functionality lies in its YAML-based replication configuration. Here’s a complete example that exports MySQL data to S3 in Parquet format:

# mysql_to_s3.yaml - Export MySQL tables to S3 as Parquet
source: mysql_source
target: s3_target

streams:
  # Use wildcard to replicate all tables in the schema
  'my_schema.*':
    # Target S3 path using stream_table variable
    object: 'data/{stream_table}'
    
    # Configure how to handle the replication
    mode: incremental
    primary_key: [id]
    update_key: updated_at
    
    # Target specific options
    target_options:
      format: parquet
      compression: snappy
      file_max_bytes: 104857600

To run the replication:

# Execute the replication
sling run -r mysql_to_s3.yaml

This configuration will:

  • Connect to your MySQL database
  • Read data from the users table
  • Convert it to Parquet format with Snappy compression
  • Upload it to your S3 bucket under the data/users prefix
  • Track changes using the updated_at column for incremental updates

Using Sling Platform

While the CLI is powerful for local development and testing, Sling Platform provides a comprehensive web interface for managing your data operations at scale. The platform offers:

  • Visual connection management
  • YAML configuration editor with syntax highlighting
  • Job scheduling and monitoring
  • Team collaboration features

Sling Platform Connections

Setting up connections in the platform is straightforward through the visual interface. You can manage both MySQL and S3 connections, along with credentials, in a secure and centralized way.

Sling Platform Editor

The built-in YAML editor provides syntax highlighting and validation, making it easy to create and modify your replication configurations. You can test your configurations directly in the platform before deploying them to production.

Next Steps

Now that you’ve learned how to export MySQL data to S3 using Sling, here are some resources to help you go further:

Whether you’re just getting started with data pipelines or looking to optimize your existing workflows, Sling provides the tools and flexibility you need to succeed. Start with the CLI for local development, then scale up to the platform as your needs grow.