Introduction
Setting up a data pipeline from SQL Server to BigQuery traditionally involves numerous complex steps and considerations. Organizations often struggle with infrastructure setup, authentication management, schema compatibility, and maintaining optimal performance. This complexity can lead to increased development time, higher costs, and potential reliability issues.
Sling simplifies this entire process by providing a streamlined, zero-infrastructure approach to data replication. With Sling, you can:
- Configure connections with simple environment variables or CLI commands
- Automatically handle schema mapping and data type conversions
- Optimize performance with built-in batch processing and parallel execution
- Monitor and manage replications through both CLI and web interface
In this guide, we’ll walk through the process of setting up a SQL Server to BigQuery replication using Sling, demonstrating how to overcome common challenges and implement an efficient data pipeline in minutes rather than days or weeks.
Installation
Getting started with Sling is straightforward. You can install it using various package managers depending on your operating system:
# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling
# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' && tar xf sling_linux_amd64.tar.gz && rm -f sling_linux_amd64.tar.gz && chmod +x sling
# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling
# Install using Python pip
pip install sling
After installation, verify that Sling is properly installed by running:
# Check sling version
sling --version
For more detailed installation instructions and options, visit the installation guide.
Setting Up Connections
Before we can start replicating data, we need to set up our source SQL Server and target BigQuery connections. Sling provides multiple ways to manage connections, including environment variables and the sling conns
command.
SQL Server Connection
For SQL Server, you’ll need the following information:
- Server host and port
- Database name
- Username and password
- Additional connection parameters (if needed)
You can set up the SQL Server connection using environment variables:
# Set SQL Server connection using environment variables
export SQLSERVER="sqlserver://username:password@host:port/database?options"
Alternatively, use the sling conns set
command:
# Set SQL Server connection using CLI
sling conns set sqlserver url="sqlserver://username:password@host:port/database?options"
BigQuery Connection
For BigQuery, you’ll need:
- Google Cloud project ID
- Service account credentials
- Dataset information
Set up the BigQuery connection:
# Set BigQuery connection using CLI
sling conns set BIGQUERY type=bigquery project=<project> dataset=<dataset> gc_bucket=<gc_bucket> key_file=/path/to/service.account.json location=<location>
Testing Connections
After setting up your connections, it’s important to verify they work correctly:
# Test SQL Server connection
sling conns test sqlserver
# Test BigQuery connection
sling conns test bigquery
You can also list available streams (tables) in your SQL Server database:
# List available tables in SQL Server
sling conns discover sqlserver
For more details about connection configuration, visit the environment documentation.
Creating the Replication Configuration
Sling uses YAML files to define replication configurations. Let’s create a configuration file called sqlserver_to_bigquery.yaml
that will handle our data replication:
# Define source and target connections
source: sqlserver
target: bigquery
# Default settings for all streams
defaults:
target_options:
# Automatically add new columns if they appear in source
add_new_columns: true
# Automatically convert casing to snake
column_casing: snake
mode: full-refresh
# Define the streams to replicate
streams:
# Use wildcard to replicate all tables in the 'sales' schema
'sales.*':
object: 'mydataset.{stream_table}'
# Incremental replication with primary key
"sales.orders":
object: "mydataset.orders"
mode: incremental
primary_key: ["order_id"]
update_key: "last_modified_date"
# Replication with custom SQL query
"custom_orders":
object: "mydataset.filtered_orders"
mode: full-refresh
sql: |
SELECT
o.order_id,
o.order_date,
c.customer_name,
p.product_name,
o.quantity,
o.total_amount
FROM sales.orders o
JOIN sales.customers c ON o.customer_id = c.customer_id
JOIN sales.products p ON o.product_id = p.product_id
WHERE o.order_date >= '2023-01-01'
Running the Replication
Once you have your configuration file ready, you can start the replication using the Sling CLI:
# Run the replication
sling run -r sqlserver_to_bigquery.yaml
You can also run specific streams:
# Run only the customers table replication
sling run -r sqlserver_to_bigquery.yaml --stream sales.customers
Advanced Configuration Options
Sling provides various options to customize your replication process. Here are some common configurations:
streams:
"sales.transactions":
object: "mydataset.transactions"
mode: incremental
primary_key: ["transaction_id"]
update_key: "transaction_date"
# Source-specific options
source_options:
# Limit the number of rows per batch
batch_limit: 10000
# Target-specific options
target_options:
# Execute SQL before replication starts
pre_sql: "TRUNCATE TABLE mydataset.transactions_staging"
# Execute SQL after replication completes
post_sql: |
MERGE mydataset.transactions_main t
USING mydataset.transactions s
ON t.transaction_id = s.transaction_id
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...
For more detailed information about replication configuration options, visit:
Understanding Sling Platform Components
While the CLI provides powerful command-line capabilities, Sling also offers a comprehensive web-based platform that simplifies data replication management through a visual interface.
Visual Configuration Editor
The Sling Platform includes a sophisticated configuration editor that makes it easy to create and modify replication configurations through a user-friendly interface.
The editor provides:
- Syntax highlighting for YAML configurations
- Auto-completion for connection names and options
- Real-time validation of your configuration
- Easy access to documentation and examples
Connection Management
The platform provides a centralized interface for managing all your connections, making it easy to organize and maintain your data sources and destinations.
Key features include:
- Secure credential management
- Connection testing and validation
- Stream discovery and exploration
- Role-based access control
Execution Monitoring
Monitor your replications in real-time with detailed execution statistics and logs.
The monitoring interface provides:
- Real-time progress tracking
- Detailed performance metrics
- Error logging and troubleshooting
- Historical execution records
Platform Benefits
The Sling Platform offers several advantages over CLI-only usage:
- Team collaboration and sharing
- Centralized configuration management
- Scheduled executions
- Monitoring and alerting
- Audit logging
For more information about the Sling Platform, visit the platform documentation.
Next Steps and Resources
Now that you have a solid understanding of how to use Sling for SQL Server to BigQuery replication, here are some additional resources to help you get the most out of the platform:
Additional Examples
Database Connection Guides
Advanced Topics
Sling supports many other databases and storage systems. Check out our complete list of supported connections to explore more possibilities for your data movement needs.