Migrating Data from SQL Server to DuckDB with Sling

Slinger avatar
Slinger
Cover for Migrating Data from SQL Server to DuckDB with Sling

Introduction

In today’s data-driven landscape, organizations often need to migrate data between different database systems to optimize their analytics and processing workflows. SQL Server, while powerful for transactional workloads, may not be the most efficient choice for analytical queries. This is where DuckDB comes in, offering lightning-fast analytical processing capabilities. However, moving data between these systems traditionally requires complex ETL pipelines and significant engineering effort. Enter Sling, a modern data movement tool that simplifies this process dramatically.

Understanding SQL Server and DuckDB

SQL Server is a robust relational database management system that excels at:

  • ACID compliance for transactional workloads
  • Enterprise-grade security features
  • High availability and disaster recovery
  • Complex stored procedures and triggers
  • Integration with Microsoft ecosystem

DuckDB, on the other hand, is an innovative in-process SQL OLAP database that offers:

  • Columnar-vectorized query execution
  • Exceptional analytical query performance
  • Zero configuration setup
  • Efficient data compression
  • Seamless integration with Python and R

The Challenge of Data Migration

Traditional approaches to migrating data from SQL Server to DuckDB often involve:

  1. Writing custom scripts to extract data
  2. Managing data type conversions
  3. Handling large dataset limitations
  4. Dealing with schema changes
  5. Setting up intermediate storage
  6. Managing incremental updates

This complexity can lead to:

  • Extended development time
  • Resource-intensive processes
  • Error-prone implementations
  • Difficult maintenance

Introducing Sling as a Solution

Sling simplifies this entire process by providing:

  • Direct connectivity between SQL Server and DuckDB
  • Automated schema mapping and creation
  • Efficient data streaming
  • Built-in data type conversion
  • Simple configuration options
  • Both CLI and UI-based approaches

In this guide, we’ll walk through the process of using Sling to efficiently migrate your data from SQL Server to DuckDB, covering both basic and advanced scenarios.

Getting Started with Sling

Before we dive into data migration, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.

Installation

Choose the installation method that matches your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by checking its version:

# Check Sling version
sling --version

Initial Configuration

Sling uses a configuration directory to store connection details and other settings. The configuration directory is typically located at:

  • Linux/macOS: ~/.sling/
  • Windows: C:\Users\<username>\.sling\

The first time you run Sling, it will automatically create this directory and a default configuration file. You can also specify a custom location using the SLING_HOME_DIR environment variable.

For more detailed installation instructions and configuration options, visit the Sling CLI Getting Started Guide.

Setting Up Database Connections

Before we can migrate data, we need to configure our source (SQL Server) and target (DuckDB) connections. Sling provides multiple ways to set up and manage connections securely.

Configuring SQL Server Connection

SQL Server connections in Sling require your server credentials and connection details. Here’s how to set them up:

Using Environment Variables

The simplest way is to use environment variables:

# Set SQL Server connection using environment variable
export SQLSERVER_SOURCE='sqlserver://username:password@host:port/database?encrypt=true'

Using the Sling CLI

Alternatively, use the sling conns set command:

# Set up SQL Server connection with individual parameters
sling conns set SQLSERVER_SOURCE type=sqlserver host=localhost port=1433 database=mydb username=myuser password=mypass encrypt=true

# Or use a connection URL
sling conns set SQLSERVER_SOURCE url="sqlserver://username:password@host:port/database?encrypt=true"

Using the Sling Environment File

You can also add the connection details to your ~/.sling/env.yaml file:

connections:
  SQLSERVER_SOURCE:
    type: sqlserver
    host: localhost
    port: 1433
    database: mydb
    username: myuser
    password: mypass
    options:
      encrypt: true

Setting Up DuckDB Connection

DuckDB connections in Sling are straightforward as they primarily involve specifying a file path:

Using the Sling CLI

# Set up DuckDB connection
sling conns set DUCKDB_TARGET type=duckdb path=/path/to/analytics.duckdb

Using Environment Variables

# Set DuckDB connection using environment variable
export DUCKDB_TARGET='duckdb:///path/to/analytics.duckdb'

Using the Sling Environment File

Add to your ~/.sling/env.yaml:

connections:
  DUCKDB_TARGET:
    type: duckdb
    path: /path/to/analytics.duckdb

Testing Connections

After setting up your connections, it’s important to verify them:

# List all configured connections
sling conns list

# Test SQL Server connection
sling conns test SQLSERVER_SOURCE

# Test DuckDB connection
sling conns test DUCKDB_TARGET

# Discover available tables in SQL Server
sling conns discover SQLSERVER_SOURCE

For more details about connection configuration and options, refer to:

Basic Data Replication with CLI

Once you have your connections set up, you can start replicating data from SQL Server to DuckDB using Sling’s CLI flags. Let’s explore various replication scenarios.

Simple Table Replication

The most basic way to replicate data is using the sling run command with source and target specifications:

# Replicate a single table from SQL Server to DuckDB
sling run \
  --src-conn SQLSERVER_SOURCE \
  --src-stream "customers" \
  --tgt-conn DUCKDB_TARGET \
  --tgt-object "main.customers"

Using Custom SQL Queries

You can use custom SQL queries to transform or filter data during replication:

# Replicate with a custom SQL query
sling run \
  --src-conn SQLSERVER_SOURCE \
  --src-stream "SELECT id, name, email, created_at FROM customers WHERE created_at > '2024-01-01'" \
  --tgt-conn DUCKDB_TARGET \
  --tgt-object "main.recent_customers" \
  --tgt-options '{ "column_casing": "snake", "table_keys": { "unique": ["email"] } }'

For more information about CLI flags and options, visit the CLI Flags Overview.

Advanced Replication with YAML

While CLI flags are great for simple operations, YAML configurations provide more flexibility and reusability for complex replication scenarios. Let’s explore how to use YAML configurations for data migration.

Basic YAML Configuration

Here’s a basic example that replicates multiple tables:

# sqlserver_to_duckdb.yaml
source: SQLSERVER_SOURCE
target: DUCKDB_TARGET

defaults:
  mode: full-refresh
  target_options:
    add_new_columns: true
    column_casing: snake

streams:
  # Replicate customers table
  dbo.customers:
    object: main.customers
    primary_key: [customer_id]
    target_options:
      table_keys:
        unique: [email]
    
  # Replicate orders table
  dbo.orders:
    object: main.orders
    mode: incremental
    primary_key: [order_id]

Complex Multi-Stream Example

Here’s a more complex example that showcases advanced features:

# complex_sqlserver_to_duckdb.yaml
source: SQLSERVER_SOURCE
target: DUCKDB_TARGET

defaults:
  mode: incremental
  target_options:
    add_new_columns: true
    column_casing: snake

streams:
  dbo.customers:
    object: main.customers
    primary_key: [customer_id]
    update_key: last_modified_date
    select: [ -sensitive_data ]  # Exclude this column
    target_options:
      add_new_columns: true
      column_casing: snake
      table_keys:
        unique: [email]
    
  dbo.orders:
    object: main.orders
    primary_key: [order_id]
    update_key: order_date
    sql: |
      SELECT 
        o.*,
        c.customer_name,
        c.customer_email
      FROM dbo.orders o
      JOIN dbo.customers c ON o.customer_id = c.customer_id
      WHERE o.order_date >= '2024-01-01'
    
  dbo.order_items:
    object: main.order_items
    primary_key: [order_id, item_id]
    update_key: last_modified_date

To run these replications:

# Run the basic replication
sling run -r sqlserver_to_duckdb.yaml

# Run the complex replication
sling run -r complex_sqlserver_to_duckdb.yaml

For more information about configuration options, visit:

The Sling Platform

While the CLI is powerful for developers and automation scenarios, the Sling Platform provides a user-friendly interface for managing data migrations visually. Let’s explore how to use the platform for SQL Server to DuckDB migrations.

Visual Replication Editor

The Sling Platform features an intuitive replication editor that makes it easy to configure and manage your data migrations:

Sling Platform Editor

Key features of the editor include:

  • Visual stream configuration
  • Syntax highlighting for SQL queries
  • Real-time validation
  • Connection management
  • Version control integration

Monitoring and Execution

The platform provides comprehensive monitoring and execution capabilities:

Sling Platform Execution

Benefits of using the platform include:

  • Real-time progress monitoring
  • Detailed execution logs
  • Performance metrics
  • Error tracking and alerts
  • Historical execution data

Platform Components

The Sling Platform consists of several key components:

  1. Connection Manager

    • Secure credential storage
    • Connection testing and validation
    • Discovery of available tables and schemas
  2. Replication Designer

    • Visual stream configuration
    • SQL query builder
    • Schema mapping
    • Options configuration
  3. Execution Engine

    • Parallel processing
    • Automatic retries
    • Resource management
    • Error handling
  4. Monitoring Dashboard

    • Real-time metrics
    • Historical trends
    • Alert configuration
    • Log aggregation

For more information about the Sling Platform, visit:

Getting Started and Next Steps

Now that we’ve covered both CLI and Platform approaches to migrating data from SQL Server to DuckDB, here are some recommended next steps:

  1. Start Small

    • Begin with a single table migration
    • Test different replication modes
    • Validate data accuracy
  2. Optimize Performance

    • Adjust batch sizes
    • Configure table keys
    • Fine-tune SQL queries
  3. Automate and Monitor

    • Set up scheduled replications
    • Configure alerts
    • Monitor performance metrics
  4. Explore Advanced Features

    • Custom transformations
    • Complex mappings
    • Multi-stream configurations

Additional Resources

To learn more about Sling and its capabilities:

  1. Documentation

  2. Examples

  3. Community and Support