Export SQL Server to Parquet Files

Slinger avatar
Slinger
Cover for Export SQL Server to Parquet Files

Export SQL Server to Parquet Files

Setting up data pipelines has traditionally been a complex and time-consuming process. From managing dependencies and writing custom scripts to handling errors and monitoring the process, there are numerous challenges to overcome. This is especially true when working with enterprise databases like Microsoft SQL Server and modern file formats like Parquet.

Enter Sling - a modern data movement platform that dramatically simplifies this process. In this guide, we’ll walk through how to efficiently export SQL Server tables to Parquet files on Azure Storage, demonstrating how Sling turns what would typically be a complex pipeline into a straightforward configuration.

We’ll cover everything from installation and setup to creating and running your first replication, using both the CLI for automation and the Platform UI for visual management. By the end of this guide, you’ll have a robust, production-ready pipeline for exporting your SQL Server data to Parquet files.

Getting Started with Sling

Getting Sling up and running is straightforward. You can use either the CLI for automation and scripting, or the Platform UI for visual management. Let’s start with installing the CLI:

Mac Installation

If you’re using a Mac, you can install Sling using Homebrew:

# Install using Homebrew
brew install slingdata-io/sling/sling

# Verify the installation
sling -h

Windows Installation

For Windows users, installation is available through Scoop:

# Add the Sling bucket
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git

# Install Sling
scoop install sling

# Verify the installation
sling -h

Linux Installation

On Linux systems, you can download and install Sling directly:

# Download the latest version
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz'

# Extract and install
tar xf sling_linux_amd64.tar.gz
rm -f sling_linux_amd64.tar.gz
chmod +x sling

# Verify the installation
./sling -h

Once Sling is installed, you’ll have access to its powerful data movement capabilities through the command line interface.

Basic Setup Requirements

Before we can start moving data, we need:

  1. A Microsoft SQL Server instance with the tables you want to export
  2. An Azure Storage account with a container for storing the Parquet files
  3. Appropriate credentials for both services

Make sure you have these ready before proceeding to the next section. For more detailed installation instructions and requirements, visit the getting started guide.

Setting Up Connections

Sling needs to know how to connect to both your SQL Server database and Azure Storage. Let’s set up these connections:

SQL Server Connection

First, let’s set up the SQL Server connection. You can do this via the CLI:

# Set up SQL Server connection
sling conns set sqlserver type=sqlserver \
  host="host" \
  port="port" \
  username="username" \
  password="password" \
  database="mydb"

Or using environment variables:

# Set connection using environment variables
export SLING_CONN_SQLSERVER="sqlserver://username:password@host:port?database=mydb"

See the SQL Server connection documentation for more details.

Azure Storage Connection

Next, set up the Azure Storage connection:

# Set up Azure Storage connection
sling conns set azure type=azure \
  storage_account="account_name" \
  storage_key="account_key" \
  container="container"

Or using environment variables:

# Set connection using environment variables
export SLING_CONN_AZURE="azure://account_name:account_key@container"

See the Azure Storage connection documentation for more details.

Testing Connections

After setting up both connections, it’s good practice to test them:

# Test SQL Server connection
sling conns test sqlserver

# Test Azure Storage connection
sling conns test azure

You can also manage your connections through the Sling Platform UI, which provides a user-friendly interface for setting up and testing connections:

Sling Platform Connections

For more details about connection configuration, refer to the SQL Server and Azure Storage documentation.

Creating the Replication Configuration

Now that our connections are set up, we can create a replication configuration to export SQL Server tables to Parquet files. We’ll create a file called sqlserver_to_azure.yaml:

# Source SQL Server connection
source: sqlserver

# Target Azure Storage connection
target: azure

# Default settings for all streams
defaults:
  # Target options for Parquet files
  target_options:
    format: parquet
    compression: snappy
    file_max_rows: 1000000
  
  mode: full-refresh

# Define the tables to export using a wildcard
streams:
  'sales.*':
    # Target path in Azure Storage using stream_table variable
    object: 'data/sales/{stream_table}.parquet'

This configuration:

  1. Specifies the source (SQL Server) and target (Azure Storage) connections
  2. Sets default options for Parquet file output
  3. Defines multiple streams (tables) to export
  4. Includes an example of using custom SQL for data transformation

You can also create and edit replications through the Sling Platform UI:

Sling Platform Editor

For more details about replication configuration options, see the replication documentation.

Running the Replication

With our configuration ready, we can now run the replication to export our SQL Server tables to Parquet files. There are two ways to do this:

Using the CLI

To run the replication using the CLI:

# Run the replication
sling run -r sqlserver_to_azure.yaml

You can also run specific streams:

# Run only the sales.orders stream
sling run -r sqlserver_to_azure.yaml --stream sales.orders

Using the Platform

The Sling Platform provides a visual interface for running and monitoring replications. You can:

  1. Create a new job from your replication
  2. Schedule it to run automatically
  3. Monitor its progress
  4. View detailed logs and statistics

The Platform also provides features like:

  • Job scheduling and automation
  • Error notifications
  • Performance monitoring
  • Team collaboration

For more information about running replications, see the replication modes documentation.

Next Steps

Now that you have a working pipeline for exporting SQL Server tables to Parquet files, here are some next steps to explore:

  1. Explore Advanced Features

  2. Try the Platform

    • Sign up for the Sling Platform
    • Set up automated jobs
    • Monitor your data pipelines
  3. Get Support

    • Join our community
    • Read the documentation
    • Contact our support team

With Sling, you’ve simplified what would typically be a complex data pipeline into a maintainable, scalable solution. Whether you’re using the CLI for automation or the Platform for visual management, you now have the tools to efficiently move your SQL Server data to Parquet files on Azure Storage.