The Challenge of Data Migration
Moving data from MySQL to Amazon S3 traditionally involves complex processes and multiple tools. Organizations often face challenges such as:
- Writing and maintaining custom scripts using multiple libraries
- Managing database connections and AWS credentials
- Handling different file formats (CSV, Parquet, JSON)
- Implementing error handling and retries
- Setting up monitoring and logging
- Dealing with schema changes and data type mappings
- Managing incremental updates
According to various online discussions and Stack Overflow posts, developers frequently struggle with these challenges, leading to brittle solutions that require constant maintenance. Let’s explore how Sling simplifies this entire process with its modern approach to data movement.
Getting Started with Sling
Before we dive into exporting data from MySQL to S3, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.
Installation
Choose the installation method that matches your operating system:
# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling
# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
&& tar xf sling_linux_amd64.tar.gz \
&& rm -f sling_linux_amd64.tar.gz \
&& chmod +x sling
# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling
# Install using Python pip
pip install sling
After installation, verify that Sling is properly installed by checking its version:
# Check Sling version
sling --version
Setting Up Connections
Before we can start transferring data, we need to configure our source (MySQL) and target (S3) connections. Sling provides multiple ways to manage connections securely.
MySQL Connection Setup
For MySQL, we need to ensure our database user has the appropriate permissions to read the data we want to replicate. Here’s how to create a user with the necessary permissions:
-- Create a new user for Sling
CREATE USER 'sling'@'%' IDENTIFIED BY '<password>';
-- Grant read permissions for the source data
GRANT SELECT ON <source_schema>.* TO 'sling'@'%';
Now we can set up the MySQL connection in Sling using any of these methods:
Using sling conns set
Command
# Set up MySQL connection using command line
sling conns set mysql_source type=mysql host=<host> user=<user> database=<database> password=<password> port=3306
Using Environment Variables
# Set up MySQL connection using environment variables
export MYSQL_SOURCE='mysql://myuser:mypass@host:3306/mydatabase?tls=skip-verify'
Using Sling Environment File
Add to your ~/.sling/env.yaml
:
connections:
mysql_source:
type: mysql
host: localhost
database: mydatabase
user: myuser
password: mypassword
port: 3306
S3 Connection Setup
For S3, you’ll need to provide AWS credentials and bucket information. Here’s how to set up the connection:
Using sling conns set
Command
# Set up S3 connection using AWS credentials
sling conns set s3_target type=s3 bucket=my-bucket access_key_id=YOUR_ACCESS_KEY secret_access_key=YOUR_SECRET_KEY region=us-east-1
Using Environment Variables
# Set S3 connection using environment variables
export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
export AWS_REGION=us-east-1
Using Sling Environment File
Add to your ~/.sling/env.yaml
:
connections:
s3_target:
type: s3
bucket: my-bucket
access_key_id: YOUR_ACCESS_KEY
secret_access_key: YOUR_SECRET_KEY
region: us-east-1
Testing Connections
After setting up your connections, verify them using Sling’s connection management commands:
# List all configured connections
sling conns list
# Test MySQL connection
sling conns test mysql_source
# Test S3 connection
sling conns test s3_target
# List available tables in MySQL
sling conns discover mysql_source
Basic Data Export with CLI Flags
The quickest way to start exporting data from MySQL to S3 is using Sling’s CLI flags. This method is perfect for simple transfers and testing your setup.
Simple Export Example
Here’s a basic example of exporting a MySQL table to different formats in S3:
# Export to CSV in S3
sling run \
--src-conn mysql_source \
--src-stream "users" \
--tgt-conn s3_target \
--tgt-object "data/users.csv"
# Export to JSON in S3
sling run \
--src-conn mysql_source \
--src-stream "users" \
--tgt-conn s3_target \
--tgt-object "data/users.json"
# Export to Parquet in S3
sling run \
--src-conn mysql_source \
--src-stream "users" \
--tgt-conn s3_target \
--tgt-object "data/users.parquet"
Advanced Export with Options
For more control over the export process, you can use additional options:
# Advanced export with source and target options
sling run \
--src-conn mysql_source \
--src-stream "users" \
--select "id, name, email, created_at, updated_at" \
--where "updated_at > '2024-01-01'" \
--tgt-conn s3_target \
--tgt-object "data/users.parquet" \
--tgt-options '{
"compression": "snappy",
"file_max_bytes": 104857600
}' \
--mode incremental \
--primary-key "id" \
--update-key "updated_at"
Advanced Data Export with YAML Configuration
For more complex scenarios or when you need to export multiple tables with specific configurations, using a YAML configuration file is recommended. This approach provides better maintainability and version control.
Basic YAML Configuration
Here’s a simple example that exports multiple tables to different formats:
# mysql_to_s3.yaml
source: mysql_source
target: s3_target
streams:
# Export users table to CSV
users:
object: "data/users/users.csv"
mode: incremental
primary_key: [id]
update_key: updated_at
select: [id, name, email, created_at, updated_at]
target_options:
format: csv
datetime_format: "YYYY-MM-DDTHH:mm:ss"
# Export orders table to Parquet
orders:
object: "data/orders/{stream_table}_{run_timestamp}.parquet"
mode: incremental
primary_key: [order_id]
update_key: modified_at
source_options:
table_keys:
primary: [order_id]
add_new_columns: true
target_options:
format: parquet
compression: snappy
file_max_bytes: 104857600
Complex YAML Configuration
Here’s a more advanced configuration that demonstrates additional features:
# mysql_to_s3_advanced.yaml
source: mysql_source
target: s3_target
streams:
# Export customer data with transformations
'mysql.customers':
object: "data/customers/{stream_table}/{date}/data.json"
mode: incremental
primary_key: [customer_id]
update_key: last_modified
where: "status = 'active'"
target_options:
format: json
datetime_format: "YYYY-MM-DDTHH:mm:ss"
column_casing: snake
# Export transaction data with partitioning
transactions:
object: "data/transactions/{part_year}/{part_month}/"
mode: incremental
primary_key: [transaction_id]
update_key: created_at
target_options:
format: parquet
compression: snappy
file_max_bytes: 104857600
To run these YAML configurations:
# Run the basic configuration
sling run -r mysql_to_s3.yaml
# Run the advanced configuration
sling run -r mysql_to_s3_advanced.yaml
Using Sling Platform
While the CLI is excellent for development and testing, Sling Platform provides a comprehensive web interface for managing your data operations at scale. The platform offers:
- Visual connection management
- YAML configuration editor with syntax highlighting
- Job scheduling and monitoring
- Team collaboration features
- Audit logging and history tracking
The platform makes it easy to:
- Set up and manage connections visually
- Create and edit replication configurations
- Monitor job execution and performance
- Schedule recurring exports
- Collaborate with team members
Next Steps
Now that you’ve learned how to export MySQL data to S3 using Sling, here are some resources to help you go further:
- Replication Concepts - Learn more about Sling’s replication capabilities
- Replication Modes - Understand different replication modes
- Source Options - Explore MySQL source configuration options
- Target Options - Learn about S3 and file format target options
- Runtime Variables - Use variables in your configurations
- Platform Getting Started - Get started with Sling Platform
- CLI Getting Started - Learn more about the Sling CLI
Whether you’re just getting started with data migration or looking to optimize your existing workflows, Sling provides the tools and flexibility you need to succeed. Start with the CLI for local development, then scale up to the platform as your needs grow.