Export Data from Snowflake to Local Files with Sling

Slinger avatar
Slinger
Cover for Export Data from Snowflake to Local Files with Sling

Introduction

In today’s data-driven world, Snowflake has emerged as a powerful cloud data warehouse solution, offering scalability, performance, and flexibility for organizations of all sizes. However, there are many scenarios where you need to export data from Snowflake to your local environment - whether for analysis, backup, development, or integration with other tools. While Snowflake provides native options for data export, setting up an efficient and automated pipeline can be complex and time-consuming.

The Challenge of Snowflake Data Export

Traditional approaches to exporting data from Snowflake to local files often involve:

  • Writing custom scripts using SnowSQL or programming languages
  • Managing database connections and authentication
  • Handling data type conversions for different file formats
  • Implementing proper error handling and retry mechanisms
  • Setting up pagination for large datasets
  • Ensuring data consistency during export
  • Managing file format-specific requirements

This complexity is further compounded when dealing with multiple tables, incremental updates, or different file formats. According to various online discussions, developers often struggle with these challenges, leading to brittle solutions that require constant maintenance.

How Sling Simplifies the Process

Sling provides a modern, streamlined approach to database exports with:

  • Simple command-line interface for quick exports
  • Support for multiple file formats (CSV, JSON, Parquet)
  • YAML-based configuration for complex operations
  • Built-in connection management
  • Automatic data type handling
  • Support for incremental updates
  • Efficient handling of large datasets

Let’s dive into how you can use Sling to simplify your Snowflake to local file export workflows.

Getting Started with Sling

Before we dive into exporting data from Snowflake to various file formats, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.

Installation

Choose the installation method that matches your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by checking its version:

# Check Sling version
sling --version

For more detailed installation instructions, visit the Sling CLI Getting Started Guide.

Initial Setup

Sling uses a configuration directory to store connection details and other settings. The configuration directory is typically located at:

  • Linux/macOS: ~/.sling/
  • Windows: C:\Users\<username>\.sling\

The first time you run Sling, it will automatically create this directory and a default configuration file. You can also specify a custom location using the SLING_HOME_DIR environment variable.

Understanding Sling’s Architecture

Sling’s architecture is designed for efficient data movement:

  1. Connection Management

    • Secure credential storage
    • Multiple connection methods support
    • Connection pooling and reuse
  2. Data Processing

    • Streaming data transfer
    • Automatic type conversion
    • Configurable batch sizes
  3. Monitoring and Control

    • Real-time progress tracking
    • Detailed logging
    • Error handling and recovery

For more detailed configuration options and best practices, refer to the Sling CLI Environment documentation.

Setting Up Connections

Before we can start exporting data, we need to configure our source (Snowflake) and target (local storage) connections. Let’s look at what you’ll need for each system and how to set them up.

Snowflake Connection Setup

To connect to Snowflake, you’ll need:

  • Account identifier (e.g., xy12345.us-east-1)
  • Username and password
  • Warehouse name
  • Database name
  • Schema name (optional, defaults to PUBLIC)

You can find these details in your Snowflake account settings or by asking your database administrator.

Using Environment Variables

The simplest way to set up your Snowflake connection is through environment variables:

# Set Snowflake connection using environment variable
export SNOWFLAKE='snowflake://username:password@account/database/schema?warehouse=compute_wh'

Using the Sling CLI

For a more maintainable approach, use the sling conns set command:

# Set up Snowflake connection with individual parameters
sling conns set snowflake_source type=snowflake account=xy12345.us-east-1 user=myuser password=mypassword database=mydb warehouse=compute_wh

# Or use a connection URL
sling conns set snowflake_source url="snowflake://username:password@account/database/schema?warehouse=compute_wh"

Using the Sling Environment File

For a permanent configuration, add the connection details to your ~/.sling/env.yaml file:

connections:
  snowflake_source:
    type: snowflake
    account: xy12345.us-east-1
    user: myuser
    password: mypassword
    database: mydb
    warehouse: compute_wh
    schema: public  # optional

Local Storage Connection

For local storage (CSV, JSON, or Parquet files), Sling automatically configures a default connection named LOCAL. You don’t need any additional configuration for accessing local files.

Understanding Local Storage in Sling

When working with local files in Sling:

  • Use the file:// prefix to specify file paths
  • Paths can be relative or absolute
  • Format-specific options can be configured in the replication settings
  • File permissions are based on the user running Sling

Testing Connections

After setting up your connections, it’s important to verify they work correctly:

# List all configured connections
sling conns list

# Test Snowflake connection
sling conns test snowflake_source

# Test local connection
sling conns test local

# Discover available tables in Snowflake
sling conns discover snowflake_source

For more details about connection configuration and options, refer to:

Basic Data Export with CLI

Once your connections are set up, you can start exporting data from Snowflake to various file formats using Sling’s CLI flags. Let’s explore different approaches, from simple exports to more advanced configurations.

Simple Table Export

The most basic way to export data is using the sling run command with source and target specifications. Here are examples for different file formats:

# Export to CSV
sling run \
  --src-conn snowflake_source \
  --src-stream "public.users" \
  --tgt-object "file://exports/users.csv"

# Export to JSON
sling run \
  --src-conn snowflake_source \
  --src-stream "public.users" \
  --tgt-object "file://exports/users.json"

# Export to Parquet
sling run \
  --src-conn snowflake_source \
  --src-stream "public.users" \
  --tgt-object "file://exports/users.parquet"

These commands will:

  1. Connect to your Snowflake database
  2. Read all data from the “users” table
  3. Create the exports directory if it doesn’t exist
  4. Save the data in the specified format

Using Custom SQL Queries

For more control over the exported data, you can use custom SQL queries with format-specific options:

# Export to CSV with custom options
sling run \
  --src-conn snowflake_source \
  --src-stream "SELECT id, name, email, created_at FROM users WHERE status = 'active'" \
  --tgt-object "file://exports/active_users.csv" \
  --tgt-options '{ "delimiter": "|", "datetime_format": "YYYY-MM-DDTHH:mm:ss" }'

# Export to JSON with custom options and filtering
sling run \
  --src-conn snowflake_source \
  --src-stream "SELECT * FROM orders WHERE created_at > DATEADD(day, -7, CURRENT_DATE())" \
  --tgt-object "file://exports/recent_orders.json" \
  --tgt-options '{ "datetime_format": "YYYY-MM-DDTHH:mm:ss" }'

# Export to Parquet with advanced options
sling run \
  --src-conn snowflake_source \
  --src-stream "SELECT o.*, c.name as customer_name FROM orders o JOIN customers c ON o.customer_id = c.id" \
  --tgt-object "file://exports/orders_with_customers.parquet" \
  --tgt-options '{ "compression": "snappy" }'

Understanding CLI Options

Let’s break down the key CLI options available:

Source Options

  • --src-conn: The name of your Snowflake connection
  • --src-stream: The table name or SQL query to execute
  • --src-options: JSON-formatted options for the source connection

Target Options

  • --tgt-object: The destination file path with format extension
  • --tgt-options: JSON-formatted options for the target file

Common target options include:

  • For CSV:

    • delimiter: Field separator (default: ”,”)
    • datetime_format: Format for datetime values
    • null_value: String to use for NULL values
  • For JSON:

    • datetime_format: Format for datetime values
    • pretty: Boolean to enable pretty printing
  • For Parquet:

    • compression: Compression codec (snappy, gzip, etc.)
    • row_group_size: Number of rows per group
    • file_max_bytes: Maximum file size before splitting

For a complete list of available options and their descriptions, refer to:

Advanced Data Export with Replication YAML

While CLI flags are great for simple transfers, YAML configuration files provide more flexibility and reusability for complex data transfer scenarios. Let’s explore how to use YAML configurations with Sling.

Basic Multi-Stream Example

Create a file named snowflake_to_local.yaml with the following content:

# Basic configuration for exporting multiple tables
source: snowflake_source
target: local

defaults:
  mode: full-refresh
  target_options:
    datetime_format: "YYYY-MM-DDTHH:mm:ss"

streams:
  # Export users table with specific columns
  users:
    sql: "SELECT id, name, email, created_at FROM users"
    object: "file://exports/users.csv"
    target_options:
      delimiter: "|"

  # Export orders with custom filtering
  orders:
    sql: "SELECT * FROM orders WHERE status = 'completed'"
    object: "file://exports/orders.json"

  # Export customer data with advanced options
  public.customers:
    object: "file://exports/customers.parquet"
    target_options:
      compression: "snappy"

To run this replication:

# Execute the replication configuration
sling run -r snowflake_to_local.yaml

Advanced Configuration Example

Here’s a more complex example that demonstrates advanced features:

# Advanced configuration with runtime variables and transformations
source: snowflake_source
target: local

env:
  start_date: ${START_DATE}
  end_date: ${END_DATE}

defaults:
  mode: full-refresh
  source_options:
    datetime_format: "YYYY-MM-DDTHH:mm:ss"
  target_options:
    file_max_bytes: 100000000

streams:
  # Export sales data with runtime variables
  sales:
    sql: >
      SELECT s.*, p.name as product_name, c.name as customer_name
      FROM sales s
      JOIN products p ON s.product_id = p.id
      JOIN customers c ON s.customer_id = c.id
      WHERE s.created_at >= '{start_date}'
  object: "file://exports/sales/{YYYY}_{MM}_{DD}.parquet"
  target_options:
    compression: "snappy"

  # Export inventory with column selection and transformation
  public.inventory:
    select:
      - product_id
      - quantity
      - last_updated
      - warehouse_id
    object: "file://exports/inventory/{stream_table}.csv"
    target_options:
      delimiter: ","

  # Export customer analytics with complex query
  customer_analytics:
    sql: >
      WITH monthly_stats AS (
        SELECT 
          customer_id,
          DATE_TRUNC('month', order_date) as month,
          COUNT(*) as order_count,
          SUM(total_amount) as total_spent
        FROM orders
        GROUP BY 1, 2
      )
      SELECT 
        c.*,
        ms.month,
        ms.order_count,
        ms.total_spent
      FROM customers c
      LEFT JOIN monthly_stats ms ON c.id = ms.customer_id
    object: "file://exports/analytics/customer_stats_{date_yyyy_mm}.json"
    target_options:
      datetime_format: "YYYY-MM-DDTHH:mm:ss"

This advanced configuration demonstrates:

  • Using runtime variables ({date_yyyy_mm_dd}, {stream_table})
  • Complex SQL queries with joins and CTEs
  • Column selection and filtering
  • Table keys for incremental updates
  • Dynamic file naming
  • Format-specific options for each stream

Understanding Replication YAML Components

Source Configuration

  • stream: Table name or SQL query
  • columns: List of columns to include
  • table_keys: Primary key columns for incremental updates
  • select: Custom SELECT statement template
  • options: Source-specific options

Target Configuration

  • object: Destination file path with optional variables
  • options: Format-specific options for the target file

Runtime Variables

Sling supports various runtime variables that can be used in paths and queries:

  • {YYYY}, {MM}, {DD}: Current date in YYYY-MM-DD format
  • {stream_table}: Name of the current stream/table
  • {start_date}, {end_date}: Custom date range variables

For more details about replication configuration and options, refer to:

Getting Started Guide

Now that we’ve covered the various aspects of using Sling for Snowflake to local file exports, let’s summarize the key steps to get started:

  1. Install Sling

    • Choose the appropriate installation method for your OS
    • Verify the installation with sling --version
    • Review the installation guide
  2. Configure Connections

    • Set up your Snowflake connection
    • Test the connection with sling conns test
    • Explore available tables with sling conns discover
  3. Start Simple

    • Begin with basic CLI commands
    • Export a single table to CSV
    • Experiment with different file formats
  4. Scale Up

    • Create replication YAML files
    • Use runtime variables
    • Implement incremental updates

Additional Resources

For more information and support:

Start building your data pipelines with Sling today and experience the simplicity of modern data movement.