Introduction

Loading data from CSV files into SQL Server databases is a common requirement in data engineering and analytics workflows. However, traditional methods often involve multiple steps, complex configurations, and potential compatibility issues. This guide will show you how to use Sling to simplify and automate this process.

Traditional approaches to loading CSV files into SQL Server might involve:

Using SQL Server Integration Services (SSIS)
Writing custom scripts with Python or other languages
Manual import/export wizards
Bulk Copy Program (BCP) utility

These methods often require significant setup time, maintenance overhead, and can be error-prone. Sling offers a more streamlined approach with:

Simple command-line interface
Automated data type inference
Built-in error handling
Support for various file formats and encodings
Efficient bulk loading capabilities

Let’s explore how to use Sling to build an efficient data pipeline from your local CSV files to SQL Server.

Understanding Sling

Sling is a modern data movement and transformation platform designed to simplify data operations. It provides both a powerful CLI tool and a comprehensive platform for managing data workflows between various sources and destinations.

Key Features

When working with local CSV files and SQL Server, Sling offers several key features:

Automated Type Inference: Sling automatically detects column data types from your CSV files
Flexible Configuration: Support for various CSV formats, delimiters, and encodings
Efficient Loading: Uses bulk loading capabilities for optimal performance
Data Transformation: Apply transformations during the loading process
Error Handling: Built-in error detection and reporting
Progress Monitoring: Real-time progress tracking and logging

Architecture Overview

Sling operates with two main components:

Sling CLI: A command-line tool for local development and automation
Sling Platform: A web-based interface for managing and monitoring data operations

For loading CSV files to SQL Server, you can use either component. The CLI is perfect for local development and automation, while the Platform provides a user-friendly interface for team collaboration and monitoring.

Installation and Setup

Before we can start loading data, we need to install and configure Sling. Let’s go through the process step by step.

Installing Sling CLI

Sling provides multiple installation options depending on your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed:

# Check Sling version
sling --version

For more detailed installation instructions, visit the Getting Started guide.

Environment Setup

Sling uses a configuration file to store connection details. The configuration file is typically located at:

Linux/Mac: ~/.sling/env.yaml
Windows: C:\Users\<username>\.sling\env.yaml

You can also use environment variables for connection details, which is useful in CI/CD pipelines or when you want to keep credentials separate from configuration files.

Configuring Connections

To transfer data from CSV files to SQL Server, we need to configure the SQL Server connections.

Configuring SQL Server Connection

For SQL Server, you’ll need to provide connection details such as host, database, username, and password (see docs here). Here are different ways to set up the connection:

# Using sling conns set
sling conns set MSSQL type=sqlserver host=<host> user=<user> database=<database> password=<password> port=<port>

# Or using a connection URL
sling conns set MSSQL url="sqlserver://myuser:[email protected]:1433?database=mydatabase"

Alternatively, you can add the connection details to your env.yaml:

connections:
  MSSQL:
    type: sqlserver
    host: <host>
    user: <user>
    port: <port>
    instance: <instance>
    database: <database>
    schema: <schema>
    password: <password>
    encrypt: 'true'
    trust_server_certificate: 'true'

Or use an environment variable:

export MSSQL='sqlserver://myuser:[email protected]:1433?database=mydatabase'

Testing Connections

After setting up your connections, it’s good practice to test them:

# Test the local connection (local is built-in)
sling conns test LOCAL

# Test the SQL Server connection
sling conns test MSSQL

For more details about connection configuration, see the Environment documentation and SQL Server connection guide.

Basic Data Transfer

Once your connections are set up, you can start transferring data from your CSV files to SQL Server. Let’s look at some basic examples using CLI flags.

Simple CSV to SQL Server Transfer

The most basic way to transfer data is using the sling run command with CLI flags:

# Load a single CSV file to a SQL Server table
sling run \
  --src-conn LOCAL \
  --src-stream "file:///path/to/your/data.csv" \
  --tgt-conn MSSQL \
  --tgt-object "dbo.my_table"

# You can also pipe data directly
cat /path/to/your/data.csv | sling run \
  --tgt-conn MSSQL \
  --tgt-object "dbo.my_table"

Configuring Load Options

You can customize the load behavior using additional flags:

# Load with specific options
sling run \
  --src-conn LOCAL \
  --src-stream "file:///path/to/your/data.csv" \
  --src-options '{ delimiter: "|", empty_as_null: true }' \
  --tgt-conn MSSQL \
  --tgt-object "dbo.my_table" \
  --tgt-options '{ column_casing: snake, add_new_columns: true }' \
  --mode full-refresh

Common source options for CSV files include:

delimiter: Specify the CSV delimiter (default is comma)
empty_as_null: Treat empty strings as NULL values
datetime_format: Specify the format for date/time fields
header: Whether the file has a header row (default is true)

Common target options for SQL Server include:

column_casing: Control the case of column names (snake, lower, upper)
add_new_columns: Automatically add new columns if they appear in the source
table_keys: Specify primary key columns
batch_limit: Control the batch size for loading

For more details about available options, see the Source Options and Target Options documentation.

Advanced Configuration

For more complex data loading scenarios, Sling supports using YAML configuration files. This approach provides better reusability and more advanced features.

Using Replication YAML

Create a file named local_to_sqlserver.yaml with your replication configuration:

source: LOCAL
target: MSSQL

# Default settings for all streams
defaults:
  mode: full-refresh
  source_options:
    empty_as_null: true
    datetime_format: "YYYY-MM-DD HH:mm:ss"
  target_options:
    column_casing: snake
    add_new_columns: true

# Define your data streams
streams:
  # Single file example
  "file:///path/to/customers.csv":
    object: dbo.customers
    primary_key: [customer_id]
    transforms:
      - column: email
        type: lower
      - column: created_at
        type: timestamp

  # Multiple files example using wildcard
  "file:///path/to/orders/*.csv":
    object: dbo.orders
    primary_key: [order_id]
    source_options:
      delimiter: "|"
    target_options:
      table_keys:
        primary: [order_id]
        unique: [order_number]

  # Complex example with runtime variables
  "file:///path/to/transactions/{date}.csv":
    object: "dbo.transactions_{stream_file_name}"
    mode: incremental
    update_key: transaction_date
    columns:
      transaction_id: string
      amount: decimal
      transaction_date: timestamp

Run the replication using:

sling run -r local_to_sqlserver.yaml

Understanding Runtime Variables

Sling supports various runtime variables that you can use in your configuration:

{stream_file_name}: The name of the current file being processed
{stream_table}: The name of the current table
{date}: Current date in YYYY-MM-DD format
{timestamp}: Current timestamp

For more information about runtime variables, see the Runtime Variables documentation.

Advanced Features

The YAML configuration supports several advanced features:

Data Transformations:

transforms:
  email: lower
  time: set_timezone("America/New_York")

Column Mapping and Types:

columns:
  original_column: new_column_name
  amount: decimal(10,2)
  created_at: timestamp

Table Keys:

target_options:
  table_keys:
    primary: [id]
    unique: [email]

For more examples and detailed configuration options, visit the Replication documentation.

Sling Platform Features

While the CLI is powerful for local development and automation, the Sling Platform provides a user-friendly interface for managing and monitoring your data operations.

Platform Overview

The Sling Platform offers:

Visual interface for creating and managing connections
Drag-and-drop replication configuration
Real-time monitoring and logging
Team collaboration features
Scheduling and automation capabilities

Managing Connections

The Platform provides a centralized way to manage your connections:

Sling Platform Connections

You can:

Create and edit connections
Test connection health
Share connections with team members
Monitor connection usage

Creating Replications

The Platform’s visual editor makes it easy to create and manage replications:

Sling Platform Editor

Features include:

Visual stream configuration
Real-time validation
Version control
Collaboration tools

Monitoring and Scheduling

The Platform provides comprehensive monitoring capabilities:

Sling Platform Jobs