Export SQL Server to Parquet Files
Setting up data pipelines has traditionally been a complex and time-consuming process. From managing dependencies and writing custom scripts to handling errors and monitoring the process, there are numerous challenges to overcome. This is especially true when working with enterprise databases like Microsoft SQL Server and modern file formats like Parquet.
Enter Sling - a modern data movement platform that dramatically simplifies this process. In this guide, we’ll walk through how to efficiently export SQL Server tables to Parquet files on Azure Storage, demonstrating how Sling turns what would typically be a complex pipeline into a straightforward configuration.
We’ll cover everything from installation and setup to creating and running your first replication, using both the CLI for automation and the Platform UI for visual management. By the end of this guide, you’ll have a robust, production-ready pipeline for exporting your SQL Server data to Parquet files.
Getting Started with Sling
Getting Sling up and running is straightforward. You can use either the CLI for automation and scripting, or the Platform UI for visual management. Let’s start with installing the CLI:
Mac Installation
If you’re using a Mac, you can install Sling using Homebrew:
# Install using Homebrew
brew install slingdata-io/sling/sling
# Verify the installation
sling -h
Windows Installation
For Windows users, installation is available through Scoop:
# Add the Sling bucket
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
# Install Sling
scoop install sling
# Verify the installation
sling -h
Linux Installation
On Linux systems, you can download and install Sling directly:
# Download the latest version
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz'
# Extract and install
tar xf sling_linux_amd64.tar.gz
rm -f sling_linux_amd64.tar.gz
chmod +x sling
# Verify the installation
./sling -h
Once Sling is installed, you’ll have access to its powerful data movement capabilities through the command line interface.
Basic Setup Requirements
Before we can start moving data, we need:
- A Microsoft SQL Server instance with the tables you want to export
- An Azure Storage account with a container for storing the Parquet files
- Appropriate credentials for both services
Make sure you have these ready before proceeding to the next section. For more detailed installation instructions and requirements, visit the getting started guide.
Setting Up Connections
Sling needs to know how to connect to both your SQL Server database and Azure Storage. Let’s set up these connections:
SQL Server Connection
First, let’s set up the SQL Server connection. You can do this via the CLI:
# Set up SQL Server connection
sling conns set sqlserver type=sqlserver \
host="host" \
port="port" \
username="username" \
password="password" \
database="mydb"
Or using environment variables:
# Set connection using environment variables
export SLING_CONN_SQLSERVER="sqlserver://username:password@host:port?database=mydb"
See the SQL Server connection documentation for more details.
Azure Storage Connection
Next, set up the Azure Storage connection:
# Set up Azure Storage connection
sling conns set azure type=azure \
storage_account="account_name" \
storage_key="account_key" \
container="container"
Or using environment variables:
# Set connection using environment variables
export SLING_CONN_AZURE="azure://account_name:account_key@container"
See the Azure Storage connection documentation for more details.
Testing Connections
After setting up both connections, it’s good practice to test them:
# Test SQL Server connection
sling conns test sqlserver
# Test Azure Storage connection
sling conns test azure
You can also manage your connections through the Sling Platform UI, which provides a user-friendly interface for setting up and testing connections:
For more details about connection configuration, refer to the SQL Server and Azure Storage documentation.
Creating the Replication Configuration
Now that our connections are set up, we can create a replication configuration to export SQL Server tables to Parquet files. We’ll create a file called sqlserver_to_azure.yaml
:
# Source SQL Server connection
source: sqlserver
# Target Azure Storage connection
target: azure
# Default settings for all streams
defaults:
# Target options for Parquet files
target_options:
format: parquet
compression: snappy
file_max_rows: 1000000
mode: full-refresh
# Define the tables to export using a wildcard
streams:
'sales.*':
# Target path in Azure Storage using stream_table variable
object: 'data/sales/{stream_table}.parquet'
This configuration:
- Specifies the source (SQL Server) and target (Azure Storage) connections
- Sets default options for Parquet file output
- Defines multiple streams (tables) to export
- Includes an example of using custom SQL for data transformation
You can also create and edit replications through the Sling Platform UI:
For more details about replication configuration options, see the replication documentation.
Running the Replication
With our configuration ready, we can now run the replication to export our SQL Server tables to Parquet files. There are two ways to do this:
Using the CLI
To run the replication using the CLI:
# Run the replication
sling run -r sqlserver_to_azure.yaml
You can also run specific streams:
# Run only the sales.orders stream
sling run -r sqlserver_to_azure.yaml --stream sales.orders
Using the Platform
The Sling Platform provides a visual interface for running and monitoring replications. You can:
- Create a new job from your replication
- Schedule it to run automatically
- Monitor its progress
- View detailed logs and statistics
The Platform also provides features like:
- Job scheduling and automation
- Error notifications
- Performance monitoring
- Team collaboration
For more information about running replications, see the replication modes documentation.
Next Steps
Now that you have a working pipeline for exporting SQL Server tables to Parquet files, here are some next steps to explore:
Explore Advanced Features
- Learn about source options for customizing data extraction
- Discover target options for fine-tuning file output
- Use runtime variables for dynamic configurations
Try the Platform
- Sign up for the Sling Platform
- Set up automated jobs
- Monitor your data pipelines
Get Support
- Join our community
- Read the documentation
- Contact our support team
With Sling, you’ve simplified what would typically be a complex data pipeline into a maintainable, scalable solution. Whether you’re using the CLI for automation or the Platform for visual management, you now have the tools to efficiently move your SQL Server data to Parquet files on Azure Storage.