Export PostgreSQL Data to BigQuery Using Sling

Slinger avatar
Slinger
Cover for Export PostgreSQL Data to BigQuery Using Sling

Export PostgreSQL Data to BigQuery Using Sling

Moving data from PostgreSQL to Google BigQuery is a common requirement in modern data architectures. Whether you’re consolidating data warehouses, setting up analytics pipelines, or migrating databases, having a reliable and efficient tool is crucial. Sling provides a streamlined solution for this exact scenario.

Sling is a modern data movement and transformation platform that makes it simple to transfer data between different systems. With its powerful CLI tool and intuitive platform interface, you can set up PostgreSQL to BigQuery replications in minutes, not hours.

Key features that make Sling perfect for Postgres to BigQuery transfers:

  • Simple CLI and UI interfaces
  • Support for various replication modes
  • Automatic schema mapping
  • Built-in data type conversions
  • Progress monitoring and logging
  • Scalable performance

Installation

Getting started with Sling is straightforward. You can install the CLI tool using various package managers:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by running:

# Check Sling version
sling --version

The CLI tool provides all the functionality needed for database migrations, while the Sling Platform offers a web interface for team collaboration and monitoring. You can sign up for the platform at slingdata.io.

Connection Setup

Before we can start transferring data, we need to set up connections to both PostgreSQL and BigQuery. Let’s configure these connections step by step.

PostgreSQL Connection

You can set up a PostgreSQL connection using either the CLI or the platform interface. Here’s how to do it using the CLI:

# Set up PostgreSQL connection using connection string
sling conns set postgres_source url="postgresql://user:pass@host:5432/dbname?sslmode=require"

# Or set up using individual parameters
sling conns set postgres_source type=postgres host=host.example.com user=myuser password=mypass database=mydb port=5432

BigQuery Connection

For BigQuery, you’ll need a Google Cloud service account with appropriate permissions. Here’s how to set up the connection:

# Set up BigQuery connection with service account key file
sling conns set bigquery_target type=bigquery project=my-project dataset=my_dataset gc_bucket=my-bucket key_file=/path/to/service-account.json

# Or set up using service account JSON content in environment variable
export GC_KEY_BODY='{"type": "service_account", ...}'
sling conns set bigquery_target type=bigquery project=my-project dataset=my_dataset gc_bucket=my-bucket

Testing Connections

Once you’ve set up both connections, it’s important to test them to ensure they’re working correctly:

# Test PostgreSQL connection
sling conns test postgres_source

# Test BigQuery connection
sling conns test bigquery_target

Here’s how connections appear in the Sling Platform interface:

Connections in Sling Platform

Data Replication Examples

Let’s look at different ways to replicate data from PostgreSQL to BigQuery using Sling. We’ll cover both CLI commands and YAML configurations.

Basic CLI Commands

The simplest way to start a replication is using the CLI command:

# Replicate a single table with full refresh
sling run --src-conn postgres_source --src-stream "public.users" --tgt-conn bigquery_target --tgt-object "users" --mode full-refresh

# Replicate with specific columns
sling run --src-conn postgres_source --src-stream "public.users" --tgt-conn bigquery_target --tgt-object "users" --select "id,name,email,created_at"

YAML Configuration Examples

For more complex replications, YAML configurations provide better maintainability and version control. Here are some examples:

# using wildcards and stream variables
source: postgres_source
target: bigquery_target
defaults:
  mode: full-refresh

streams:
  'public.*':
    object: 'public.{stream_table}'
# using defaults and specific streams
source: postgres_source
target: bigquery_target

defaults:
  mode: full-refresh
  object: "public.{stream_table}"

streams: 
  # use defaults for all streams
  public.users:
  public.orders:
    primary_key: [order_id]
  public.products:
# incremental
source: postgres_source
target: bigquery_target

streams:
  public.orders:
    update_key: updated_at
    primary_key: [order_id]

To run these YAML configurations:

# Run the replication
sling run -r postgres-bigquery.yaml

Replication Modes

Sling supports different replication modes to suit your needs:

  1. Full Refresh Mode:
mode: full-refresh

This mode completely replaces the target table with source data.

  1. Incremental Mode:
mode: incremental
primary_key: [id]
update_key: updated_at

This mode only updates changed records based on a primary key.

  1. Snapshot Mode:
mode: snapshot

This mode adds new records without modifying existing ones. See the Replication docs for more details.

Conclusion

Sling provides a powerful and flexible solution for moving data from PostgreSQL to BigQuery. Its combination of simple CLI commands and comprehensive YAML configurations makes it suitable for both quick migrations and complex data pipelines.

Key takeaways from this guide:

  • Sling offers both CLI and Platform interfaces for managing data transfers
  • Multiple replication modes support different use cases
  • YAML configurations provide version-controlled, repeatable replications
  • Built-in features like data type conversion and transformation simplify the process

To learn more about Sling’s capabilities:

Start using Sling today to streamline your PostgreSQL to BigQuery data movement workflows.