Exporting SQL Server to BigQuery Using Sling

Slinger avatar
Slinger
Cover for Exporting SQL Server to BigQuery Using Sling

Introduction

Setting up a data pipeline from SQL Server to BigQuery traditionally involves numerous complex steps and considerations. Organizations often struggle with infrastructure setup, authentication management, schema compatibility, and maintaining optimal performance. This complexity can lead to increased development time, higher costs, and potential reliability issues.

Sling simplifies this entire process by providing a streamlined, zero-infrastructure approach to data replication. With Sling, you can:

  • Configure connections with simple environment variables or CLI commands
  • Automatically handle schema mapping and data type conversions
  • Optimize performance with built-in batch processing and parallel execution
  • Monitor and manage replications through both CLI and web interface

In this guide, we’ll walk through the process of setting up a SQL Server to BigQuery replication using Sling, demonstrating how to overcome common challenges and implement an efficient data pipeline in minutes rather than days or weeks.

Installation

Getting started with Sling is straightforward. You can install it using various package managers depending on your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz'   && tar xf sling_linux_amd64.tar.gz   && rm -f sling_linux_amd64.tar.gz   && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by running:

# Check sling version
sling --version

For more detailed installation instructions and options, visit the installation guide.

Setting Up Connections

Before we can start replicating data, we need to set up our source SQL Server and target BigQuery connections. Sling provides multiple ways to manage connections, including environment variables and the sling conns command.

SQL Server Connection

For SQL Server, you’ll need the following information:

  • Server host and port
  • Database name
  • Username and password
  • Additional connection parameters (if needed)

You can set up the SQL Server connection using environment variables:

# Set SQL Server connection using environment variables
export SQLSERVER="sqlserver://username:password@host:port/database?options"

Alternatively, use the sling conns set command:

# Set SQL Server connection using CLI
sling conns set sqlserver url="sqlserver://username:password@host:port/database?options"

BigQuery Connection

For BigQuery, you’ll need:

  • Google Cloud project ID
  • Service account credentials
  • Dataset information

Set up the BigQuery connection:

# Set BigQuery connection using CLI
sling conns set BIGQUERY type=bigquery project=<project> dataset=<dataset> gc_bucket=<gc_bucket> key_file=/path/to/service.account.json location=<location>

Testing Connections

After setting up your connections, it’s important to verify they work correctly:

# Test SQL Server connection
sling conns test sqlserver

# Test BigQuery connection
sling conns test bigquery

You can also list available streams (tables) in your SQL Server database:

# List available tables in SQL Server
sling conns discover sqlserver

For more details about connection configuration, visit the environment documentation.

Creating the Replication Configuration

Sling uses YAML files to define replication configurations. Let’s create a configuration file called sqlserver_to_bigquery.yaml that will handle our data replication:

# Define source and target connections
source: sqlserver
target: bigquery

# Default settings for all streams
defaults:
  target_options:
    # Automatically add new columns if they appear in source
    add_new_columns: true
    # Automatically convert casing to snake
    column_casing: snake
  mode: full-refresh

# Define the streams to replicate
streams:
  # Use wildcard to replicate all tables in the 'sales' schema
  'sales.*':
    object: 'mydataset.{stream_table}'

  # Incremental replication with primary key
  "sales.orders":
    object: "mydataset.orders"
    mode: incremental
    primary_key: ["order_id"]
    update_key: "last_modified_date"
    
  # Replication with custom SQL query
  "custom_orders":
    object: "mydataset.filtered_orders"
    mode: full-refresh
    sql: |
      SELECT 
        o.order_id,
        o.order_date,
        c.customer_name,
        p.product_name,
        o.quantity,
        o.total_amount
      FROM sales.orders o
      JOIN sales.customers c ON o.customer_id = c.customer_id
      JOIN sales.products p ON o.product_id = p.product_id
      WHERE o.order_date >= '2023-01-01'

Running the Replication

Once you have your configuration file ready, you can start the replication using the Sling CLI:

# Run the replication
sling run -r sqlserver_to_bigquery.yaml

You can also run specific streams:

# Run only the customers table replication
sling run -r sqlserver_to_bigquery.yaml --stream sales.customers

Advanced Configuration Options

Sling provides various options to customize your replication process. Here are some common configurations:

streams:
  "sales.transactions":
    object: "mydataset.transactions"
    mode: incremental
    primary_key: ["transaction_id"]
    update_key: "transaction_date"
    
    # Source-specific options
    source_options:
      # Limit the number of rows per batch
      batch_limit: 10000
      
    # Target-specific options
    target_options:
      # Execute SQL before replication starts
      pre_sql: "TRUNCATE TABLE mydataset.transactions_staging"
      # Execute SQL after replication completes
      post_sql: |
        MERGE mydataset.transactions_main t
        USING mydataset.transactions s
        ON t.transaction_id = s.transaction_id
        WHEN MATCHED THEN UPDATE SET ...
        WHEN NOT MATCHED THEN INSERT ...

For more detailed information about replication configuration options, visit:

Understanding Sling Platform Components

While the CLI provides powerful command-line capabilities, Sling also offers a comprehensive web-based platform that simplifies data replication management through a visual interface.

Visual Configuration Editor

The Sling Platform includes a sophisticated configuration editor that makes it easy to create and modify replication configurations through a user-friendly interface.

Sling Configuration Editor

The editor provides:

  • Syntax highlighting for YAML configurations
  • Auto-completion for connection names and options
  • Real-time validation of your configuration
  • Easy access to documentation and examples

Connection Management

The platform provides a centralized interface for managing all your connections, making it easy to organize and maintain your data sources and destinations.

Sling Connections Management

Key features include:

  • Secure credential management
  • Connection testing and validation
  • Stream discovery and exploration
  • Role-based access control

Execution Monitoring

Monitor your replications in real-time with detailed execution statistics and logs.

Sling Execution Monitoring

The monitoring interface provides:

  • Real-time progress tracking
  • Detailed performance metrics
  • Error logging and troubleshooting
  • Historical execution records

Platform Benefits

The Sling Platform offers several advantages over CLI-only usage:

  • Team collaboration and sharing
  • Centralized configuration management
  • Scheduled executions
  • Monitoring and alerting
  • Audit logging

For more information about the Sling Platform, visit the platform documentation.

Next Steps and Resources

Now that you have a solid understanding of how to use Sling for SQL Server to BigQuery replication, here are some additional resources to help you get the most out of the platform:

Additional Examples

Database Connection Guides

Advanced Topics

Sling supports many other databases and storage systems. Check out our complete list of supported connections to explore more possibilities for your data movement needs.