Loading Local JSON Data into MySQL Using Sling

Slinger avatar
Slinger
Cover for Loading Local JSON Data into MySQL Using Sling

Introduction

Data migration from local JSON files to MySQL databases is a common requirement in modern data workflows. However, traditional approaches often involve writing custom scripts, dealing with data type conversions, and managing complex ETL processes. This can be time-consuming and error-prone, especially when handling large datasets or complex JSON structures.

Enter Sling: a modern data movement tool that simplifies this process significantly. In this article, we’ll explore how to use Sling to efficiently migrate data from local JSON files to MySQL databases, making the process both straightforward and reliable.

Understanding the Challenge

Traditional approaches to migrating JSON data to MySQL often involve multiple steps:

  1. Writing custom Python or Node.js scripts to parse JSON
  2. Handling data type conversions manually
  3. Creating and managing database connections
  4. Implementing error handling and retry logic
  5. Setting up proper batching for performance

These steps can become quite complex, especially when dealing with nested JSON structures or when you need to maintain and update the migration process over time. Let’s see how Sling simplifies this entire workflow.

Getting Started with Sling

Before we dive into the data migration process, let’s get Sling installed and configured on your system. Sling provides multiple installation methods to suit different operating systems and preferences.

Installation

You can install Sling using any of the following methods:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by checking its version:

# Check Sling version
sling --version

For more detailed installation instructions, visit the Sling installation guide.

Setting Up Connections

Now that we have Sling installed, let’s configure our connections. We’ll need to set up two connections:

  1. A local connection for reading JSON files
  2. A MySQL connection for the target database

Using the CLI to Set Up Connections

The easiest way to set up connections is using the sling conns set command. Here’s how to set up the MySQL connection:

# Set up MySQL connection
sling conns set MYSQL type=mysql host=localhost database=mydatabase user=myuser password=mypassword

Using Environment Variables

You can also set up connections using environment variables:

# Set MySQL connection using environment variables
export MYSQL='mysql://user:pass@localhost:3306/mydatabase'

Using YAML Configuration

For more complex setups or when managing multiple connections, you can use a YAML configuration file at ~/.sling/env.yaml:

# Connection configuration in env.yaml
connections:
  MYSQL:
    type: mysql
    host: localhost
    database: mydatabase
    user: myuser
    password: mypassword

For more details about connection configuration, refer to the environment documentation. For MySQL-specific connection options and configurations, see the MySQL documentation.

Data Migration with Sling CLI

Sling provides two main approaches for data migration:

  1. Using CLI flags for quick, one-off migrations
  2. Using YAML configuration files for more complex, repeatable workflows

Let’s start with the CLI approach.

Basic CLI Example

Here’s a simple example of loading a JSON file into MySQL:

# Load a simple JSON file into MySQL
sling run \
  --src-stream 'file:///path/to/users.json' \
  --src-options '{flatten: true}' \
  --tgt-conn mysql_target \
  --tgt-object 'myschema.users' \
  --mode full-refresh

In this example:

  • --src-stream: Path to the JSON file
  • --src-options: JSON options for source, including flatten to handle nested structures
  • --tgt-conn: Target MySQL connection
  • --tgt-object: Target schema and table name
  • --mode: Replication mode (full-refresh in this case)

You can also pipe JSON data directly into Sling using stdin:

# Load JSON data using stdin
cat /path/to/users.json | sling run \
  --src-options '{flatten: true}' \
  --tgt-conn MYSQL \
  --tgt-object 'myschema.users' \
  --mode full-refresh

Advanced CLI Example

Here’s a more complex example that includes additional options:

# Load JSON data with advanced options
sling run \
  --src-stream 'file:///path/to/transactions.json' \
  --src-options '{
    flatten: true,
    empty_as_null: true,
    datetime_format: "YYYY-MM-DD"
  }' \
  --tgt-conn mysql_target \
  --tgt-object 'myschema.transactions' \
  --mode full-refresh \
  --tgt-options '{ table_keys: { primary: [transaction_id] } }'

This example includes:

  • Source options:
    • flatten: Flattens nested JSON structures
    • empty_as_null: Converts empty strings to NULL
    • datetime_format: Specifies the format for parsing dates
  • Target options:
    • primary-key: Defines the primary key column

Using Replication YAML

For more complex scenarios or when you want to maintain your data pipeline configuration in version control, using a YAML configuration file is recommended. Let’s look at some examples.

Basic Replication Example

Create a file named local_to_mysql.yaml:

source: local
target: mysql_target

defaults:
  mode: full-refresh
  object: '{target_schema}.{stream_table}'

streams:
  customers:
    stream: 'file:///path/to/customers.json'
    object: myschema.customers
    options:
      flatten: true
      empty_as_null: true

  orders:
    stream: 'file:///path/to/orders.json'
    object: myschema.orders
    options:
      flatten: true

Run the replication with:

sling run -r local_to_mysql.yaml

Advanced Replication Example

Here’s a more complex example that uses runtime variables and advanced options:

source: local
target: mysql_target

defaults:
  mode: full-refresh
  options:
    flatten: true
    empty_as_null: true
    datetime_format: "YYYY-MM-DDTHH:mm:ssZ"

streams:
  transactions:
    stream: 'file://{folder}/*.json'
    object: '{stream_file_folder}.{stream_file_name}'   # create a table for each file

env:
  folder: ${MY_FOLDER}  # put env var 'MY_FOLDER' into var 'folder'

Run this configuration with runtime variables:

export MY_FOLDER='/path/to/folder'
sling run -r local_to_mysql.yaml

For more information about runtime variables, check out the runtime variables documentation.

Sling Platform Overview

While the CLI is powerful for local development and simple workflows, the Sling Platform provides a comprehensive web-based interface for managing your data operations at scale. Let’s look at some key features of the platform.

Key Components

The Sling Platform consists of several components that work together to provide a complete data movement solution:

  1. Web Interface: A user-friendly interface for managing connections, creating replications, and monitoring jobs
  2. Agents: Workers that execute data operations in your infrastructure
  3. Scheduling: Built-in job scheduling capabilities
  4. Monitoring: Real-time monitoring and alerting for your data operations

Getting Started with the Platform

To get started with the Sling Platform:

  1. Sign up at platform.slingdata.io
  2. Install and configure a Sling agent in your infrastructure
  3. Set up your connections through the web interface
  4. Create and schedule your replications

For more information about the platform, visit the Sling Platform documentation.

Best Practices and Tips

When working with JSON to MySQL migrations using Sling, keep these best practices in mind:

  1. Source Data Preparation:

    • Use flatten when dealing with nested JSON structures
    • Set empty_as_null to handle empty strings appropriately
    • Use jmespath to extract specific data when needed
  2. Target Configuration:

    • Always specify table_keys for proper data updates
    • Use add_new_columns to handle schema evolution
    • Configure appropriate table_ddl options for MySQL
  3. Performance Optimization:

    • Use multiple streams for parallel processing
    • Configure appropriate batch sizes
    • Monitor and adjust connection pool settings
  4. Maintenance and Monitoring:

    • Use version control for your replication configurations
    • Implement proper logging and monitoring
    • Regularly test your data pipelines

Conclusion

Sling provides a powerful and flexible solution for migrating data from local JSON files to MySQL databases. Whether you’re working with simple flat files or complex nested structures, Sling’s features make the process straightforward and maintainable.

Key benefits of using Sling include:

  • Simplified data pipeline setup
  • Automatic handling of JSON structures
  • Flexible configuration options
  • Scale from CLI to platform as needed

Next Steps

To continue your journey with Sling:

  1. Visit the Sling documentation for detailed guides
  2. Join the community for support and updates
  3. Explore the Sling Platform for enterprise features

Remember to check out more examples in our documentation: