Exporting from Snowflake to Snowflake with Sling

Slinger avatar
Slinger
Cover for Exporting from Snowflake to Snowflake with Sling

Introduction

Moving data between Snowflake accounts or databases is a common requirement in modern data architectures. Whether you’re migrating data between environments, sharing data with partners, or setting up data replication pipelines, the process traditionally involves complex ETL workflows, custom scripts, and significant engineering effort.

Traditional approaches to Snowflake-to-Snowflake data transfer often require:

  • Writing custom Python scripts using Snowflake Connector
  • Managing authentication and permissions across accounts
  • Implementing error handling and retry mechanisms
  • Setting up staging areas for data transfer
  • Monitoring and maintaining the data pipeline
  • Dealing with schema changes and data type mappings

According to various online discussions, organizations often struggle with these challenges, leading to brittle solutions that require constant maintenance.

Enter Sling: a modern data movement tool that simplifies this entire process. With its intuitive CLI and powerful platform features, Sling transforms what would typically be a complex undertaking into a straightforward configuration-based approach. In this comprehensive guide, we’ll explore how to use Sling to efficiently move data between Snowflake accounts, covering both simple and advanced scenarios.

Understanding Snowflake-to-Snowflake Data Transfer

Moving data between Snowflake accounts traditionally involves several complex steps and considerations. Let’s examine why this process can be challenging and how Sling addresses these challenges.

Traditional Methods and Their Complexities

The conventional approach to Snowflake data transfer often involves:

  1. Manual Export Process

    • Writing SQL queries to extract data
    • Managing large result sets
    • Handling data type conversions
    • Setting up intermediate storage
  2. Custom Scripts Development

    • Creating Python/Java scripts using Snowflake connectors
    • Implementing error handling and retries
    • Managing dependencies and versions
    • Maintaining documentation
  3. Resource Management

    • Configuring warehouse sizes
    • Optimizing query performance
    • Managing costs across accounts
    • Monitoring system resources
  4. Security Considerations

    • Managing credentials securely
    • Setting up appropriate roles and permissions
    • Ensuring data encryption in transit
    • Complying with data governance policies

How Sling Simplifies the Process

Sling addresses these challenges by providing:

  • Unified Configuration: Simple YAML-based setup for both source and target
  • Automated Handling: Built-in support for data types, schema changes, and error recovery
  • Efficient Transfer: Direct data movement without intermediate storage
  • Security Integration: Secure credential management and role-based access
  • Monitoring: Real-time progress tracking and error reporting
  • Scalability: Automatic handling of large datasets and parallel processing

Getting Started with Sling

Before we dive into data transfer configurations, let’s get Sling installed and set up on your system. Sling provides multiple installation methods to suit different operating systems and preferences.

Installation

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed:

# Check Sling version
sling --version

For more detailed installation instructions, visit the Sling CLI Getting Started Guide.

Setting Up Snowflake Connections

Before we can start transferring data, we need to configure connections to both our source and target Snowflake accounts. Sling provides multiple ways to manage these connections securely.

Connection Setup Methods

Using Environment Variables

The simplest way to set up Snowflake connections is through environment variables:

# Source Snowflake connection
export SNOWFLAKE_SOURCE='snowflake://user:[email protected]/database/schema?warehouse=compute_wh'

# Target Snowflake connection
export SNOWFLAKE_TARGET='snowflake://user:[email protected]/database/schema?warehouse=compute_wh'

Using the Sling CLI

For a more secure and maintainable approach, use Sling’s connection management commands:

# Set up source Snowflake connection
sling conns set snowflake_source type=snowflake \
  account=myaccount.region \
  user=myuser \
  password=mypassword \
  database=mydatabase \
  schema=myschema \
  warehouse=compute_wh

# Set up target Snowflake connection
sling conns set snowflake_target type=snowflake \
  account=myaccount.region \
  user=myuser \
  password=mypassword \
  database=mydatabase \
  schema=myschema \
  warehouse=compute_wh

Using YAML Configuration

For a more permanent setup, you can define your connections in the ~/.sling/env.yaml file:

# Environment configuration file
connections:
  snowflake_source:
    type: snowflake
    account: myaccount.region
    user: myuser
    password: mypassword
    database: mydatabase
    schema: myschema
    warehouse: compute_wh
    role: myrole  # optional

  snowflake_target:
    type: snowflake
    account: myaccount.region
    user: myuser
    password: mypassword
    database: mydatabase
    schema: myschema
    warehouse: compute_wh
    role: myrole  # optional

For more details on connection configuration options, visit the Snowflake Connection Documentation.

Verifying Connections

After setting up your connections, it’s important to verify that they work correctly:

# List all configured connections
sling conns list

# Test source connection
sling conns test snowflake_source

# Test target connection
sling conns test snowflake_target

Data Transfer Using CLI Flags

The quickest way to start transferring data between Snowflake accounts is using Sling’s CLI flags. This approach is perfect for simple transfers or when you’re just getting started.

Basic Transfer Example

Here’s a simple example that transfers a single table from one Snowflake account to another:

# Transfer a single table using CLI flags
sling run \
  --src-conn snowflake_source \
  --tgt-conn snowflake_target \
  --src-stream "sales.orders" \
  --tgt-object "sales.orders_copy" \
  --tgt-options '{ "add_new_columns": true }'

Advanced Transfer Example

For more complex scenarios, you can use additional CLI flags to customize the transfer:

# Transfer multiple tables with transformations
sling run \
  --src-conn snowflake_source \
  --tgt-conn snowflake_target \
  --src-stream "select o.*, c.customer_name from sales.orders o join sales.customers c on o.customer_id = c.customer_id" \
  --tgt-object "analytics.enriched_orders" \
  --select: 'order_id,order_date,customer_id,customer_name,total_amount' \
  --tgt-options '{
    "add_new_columns": true,
    "column_casing": "snake"
  }'

For a complete overview of available CLI flags, visit the CLI Flags Documentation.

Creating Replication YAML Configurations

While CLI flags are great for simple transfers, YAML configurations provide a more maintainable and powerful way to define your data transfer pipelines. Let’s look at some examples of using YAML configurations for Snowflake-to-Snowflake transfers.

Basic YAML Configuration

Here’s a simple example that transfers multiple tables:

# snowflake_to_snowflake_basic.yaml
source: snowflake_source
target: snowflake_target

# Global options applied to all streams
defaults:
  primary_key: [id]  # Default primary key for all tables

  target_options:
    column_casing: snake  # Convert all column names to snake_case
    add_new_columns: true  # Add new columns if they appear in source

streams:
  sales.orders:
    description: "Orders table replication"
    mode: full-refresh
    primary_key: [order_id]
    select:
      - order_id
      - order_date
      - customer_id
      - total_amount
      - status
    transforms:
      "*": [replace_non_printable]  # Apply to all columns

  sales.customers:
    description: "Customers table replication"
    mode: incremental
    primary_key: [customer_id]
    update_key: last_modified
    select:
      - customer_id
      - customer_name
      - email
      - created_at
      - last_modified
    where: customer_id > 999
    transforms:
      email: [trim_space]
      "*": [replace_non_printable]

To run this replication:

# Run the replication using the YAML config
sling run -r snowflake_to_snowflake_basic.yaml

Advanced YAML Configuration

For more complex scenarios, here’s an example that includes transformations, custom SQL, and advanced options:

# snowflake_to_snowflake_advanced.yaml
source: snowflake_source
target: snowflake_target

# Environment variables for runtime
env:
  DAYS_TO_SYNC: "7"
  TARGET_SCHEMA: "analytics"

# Global options applied to all streams
defaults:
  mode: incremental
  primary_key: [id]
  update_key: modified_at

# Global source options
source_options:
  datetime_format: "YYYY-MM-DD HH:MI:SS"

# Global target options
target_options:
  add_new_columns: true
  column_casing: snake
  table_ddl:
    created_at: timestamp
    modified_at: timestamp
    is_active: boolean

# Replication level hooks
hooks:
  start:
    - type: query
      conn: snowflake_target
      sql: "create schema if not exists {TARGET_SCHEMA}"
  
  end:
    - type: http
      url: http://my.hook.com/on_finish

streams:
  # Orders with custom SQL and runtime variables
  sales.orders:
    sql: |
      select 
        o.order_id,
        o.order_date,
        o.total_amount,
        c.customer_id,
        c.customer_name,
        c.email,
        current_timestamp() as sync_timestamp
      from sales.orders o
      join sales.customers c on o.customer_id = c.customer_id
      where o.modified_at >= current_date - {DAYS_TO_SYNC}
    primary_key: [order_id]
    object: "{TARGET_SCHEMA}.enriched_orders"
    
    hooks:
      pre:
        - type: log
          message: "Starting orders sync"
      post:
        - type: query
          conn: snowflake_target
          sql: "grant select on {env.TARGET_SCHEMA}.enriched_orders to role analyst"

  # Customer metrics with aggregations
  sales.customer_metrics:
    sql: |
      select
        customer_id,
        count(distinct order_id) as total_orders,
        sum(total_amount) as lifetime_value,
        min(order_date) as first_order_date,
        max(order_date) as last_order_date,
        current_timestamp() as calculated_at
      from sales.orders
      group by customer_id
    primary_key: [customer_id]
    object: "{TARGET_SCHEMA}.customer_metrics"
    mode: full-refresh

To run this replication:

# Run the advanced replication
sling run -r snowflake_to_snowflake_advanced.yaml

For more details on replication configuration options and best practices, visit:

Using Sling Platform

While the CLI is powerful for development and automation, Sling Platform provides a user-friendly web interface for managing and monitoring your data transfers. Let’s explore how to use the Platform for Snowflake-to-Snowflake transfers.

Platform Components

The Sling Platform consists of several key components:

  1. Web Interface: A modern UI for managing connections, creating replications, and monitoring transfers
  2. Agents: Lightweight workers that execute data transfers in your infrastructure
  3. API: RESTful API for programmatic access and integration
  4. Monitoring: Real-time dashboards and alerts for transfer status

Setting Up Sling Platform

To get started with Sling Platform:

  1. Sign up at platform.slingdata.io
  2. Create a new project
  3. Install and configure a Sling Agent in your environment
  4. Set up your Snowflake connections through the UI

Creating Transfers in the Platform

The Platform provides an intuitive interface for:

  1. Connection Management: Securely store and manage your Snowflake credentials
  2. Replication Design: Visual editor for creating and testing replications
  3. Scheduling: Set up recurring transfers with flexible schedules
  4. Monitoring: Track progress and troubleshoot issues
  5. Team Collaboration: Share configurations and access controls

For detailed instructions on using the Platform, visit the Sling Platform Documentation.

Getting Started

Now that we’ve covered both CLI and Platform approaches, here’s how to get started with your Snowflake-to-Snowflake data transfer:

  1. Choose Your Approach:

    • Use CLI for development and simple transfers
    • Use Platform for team collaboration and ongoing operations
  2. Set Up Connections:

    • Configure source and target Snowflake credentials
    • Test connections to ensure proper access
  3. Create Your First Transfer:

    • Start with a simple table-to-table transfer
    • Test and validate the results
    • Gradually add more complex configurations
  4. Monitor and Optimize:

    • Track transfer performance
    • Adjust warehouse sizes if needed
    • Set up alerts for any issues

For more examples and detailed documentation, visit:

Conclusion

Moving data between Snowflake accounts doesn’t have to be complex. With Sling, you can:

  • Eliminate the need for custom scripts and complex ETL processes
  • Automate data transfers with simple configurations
  • Ensure reliable and efficient data movement
  • Monitor and manage transfers through CLI or Platform

Whether you’re performing one-time migrations or setting up ongoing replications, Sling provides the tools and flexibility you need to streamline your Snowflake data operations.

Get started today by installing Sling or signing up for Sling Platform.