Exporting Snowflake to BigQuery Using Sling

Slinger avatar
Slinger
Cover for Exporting Snowflake to BigQuery Using Sling

Last updated: May 2026

The Challenge of Snowflake to BigQuery Data Migration

Moving data between cloud data warehouses like Snowflake and BigQuery traditionally involves complex ETL processes, custom scripts, and significant engineering effort. Common challenges include:

  • Setting up and maintaining data extraction processes from Snowflake
  • Managing data type compatibility between platforms
  • Implementing efficient data loading into BigQuery
  • Monitoring and maintaining the data pipeline
  • Handling incremental updates and schema changes

Sling simplifies this entire process by providing a streamlined, configuration-based approach that eliminates the need for custom code and complex infrastructure setup.

Installing Sling

Getting started with Sling is straightforward. You can install the CLI tool using various package managers:

# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash

# Windows
irm https://slingdata.io/install.ps1 | iex

# Python
pip install sling

For more detailed installation instructions, visit the official documentation.

Setting Up Connections

Before we can start replicating data, we need to configure our Snowflake and BigQuery connections. Sling makes this process simple with its connection management system.

First, let’s set up our Snowflake connection:

# Set up Snowflake connection
export SNOWFLAKE_SOURCE="snowflake://${SNOWFLAKE_USER}:${SNOWFLAKE_PASSWORD}@${SNOWFLAKE_ACCOUNT}/${SNOWFLAKE_DATABASE}?warehouse=${SNOWFLAKE_WAREHOUSE}&role=${SNOWFLAKE_ROLE}"

# we should be able to test our connection now
sling conns test snowflake_source

Next, let’s configure the BigQuery connection:

# Set up BigQuery connection
sling conns set bigquery_target type=bigquery project=<project> dataset=<dataset> key_file=/path/to/service.account.json

# we should be able to test our connection now
sling conns test bigquery_target

Creating a Snowflake to BigQuery Replication

Now that our connections are set up, we can create a replication configuration. Create a file named snowflake_to_bigquery.yaml with the following content:

# Define source and target connections
source: snowflake_source
target: bigquery_target

# Set default options for all streams
defaults:
  mode: full-refresh

# Define the tables to replicate
streams:
  # Replicate a single table
  "SALES.ORDERS":
    object: "sales_dataset.orders"
    primary_key: ["order_id"]
    
  # Replicate multiple tables using wildcards
  "SALES.*":
    object: "sales_dataset.{stream_table}"
    mode: incremental
    update_key: "last_modified_at"
    target_options:
      # Use BigQuery's bulk loading for better performance
      use_bulk: true

For more detailed configuration options, refer to the replication documentation.

Running the Replication

With our configuration in place, we can now run the replication using the Sling CLI:

# Run the replication
sling run -r snowflake_to_bigquery.yaml

The Sling Platform

While the CLI provides powerful functionality for data replication, the Sling Platform offers a comprehensive UI-based solution for managing your data pipelines at scale.

Sling Platform Editor

The platform provides:

  • Visual replication configuration
  • Real-time monitoring and logging
  • Team collaboration features
  • Scheduled executions
  • Agent management for distributed workloads

Best Practices and Tips

To get the most out of your Snowflake to BigQuery replications:

  1. Use incremental mode for large tables that update frequently
  2. Implement appropriate primary keys for data integrity
  3. Leverage bulk loading for better performance
  4. Monitor replication logs regularly
  5. Use runtime variables for flexible configurations

Next Steps

To learn more about Sling’s capabilities:

For more Snowflake and BigQuery workflows, these articles cover related paths:

Frequently Asked Questions

Does Sling pull data from Snowflake using UNLOAD to a stage, or does it stream rows over the wire?

Sling streams rows over the standard Snowflake driver and buffers them in batches before pushing to BigQuery. There’s no Snowflake stage or external table involved, which makes the setup simpler but means very large tables benefit from running on hardware close to the BigQuery region.

How does Sling map Snowflake’s VARIANT and OBJECT columns to BigQuery?

Variant, object, and array columns are serialized to JSON strings during extraction and land in BigQuery as STRING by default. If you want them as JSON in BigQuery, run a post-load SQL step that casts the column with SAFE.PARSE_JSON() into a new column or view.

Can I replicate a Snowflake share without copying data into my own database first?

Yes. As long as the Snowflake role on the connection has IMPORTED PRIVILEGES on the share, you can address the shared database and schema directly in your stream names. Sling reads from the share the same way it reads from any other database.

What’s the right approach for handling case-sensitive Snowflake identifiers?

Snowflake stores unquoted identifiers in uppercase. Sling preserves the source casing by default, so streams like SALES.ORDERS keep their uppercase form. Set target_options.column_casing: snake if you want lower_snake_case columns on the BigQuery side, which is the BigQuery convention.

Will Sling create the target dataset in BigQuery automatically?

Sling will create the target tables, but the dataset itself must exist before the run starts. This is by design because dataset creation involves location and billing decisions that Sling shouldn’t make for you. Create the dataset once, then point your replications at it.

How can I throttle the load on Snowflake during a large initial backfill?

Use source_options.batch_limit to cap rows per batch and run streams sequentially by leaving the default parallelism. You can also point the replication at a smaller Snowflake warehouse so it auto-suspends quickly if the run pauses.

Does the use_bulk: true option actually change anything for BigQuery targets?

BigQuery loads are already done via the bulk load API by default, so use_bulk: true is effectively a no-op for this target. You can safely omit it in BigQuery replications; it’s only meaningful for targets that have both row-by-row and bulk paths.