Introduction

Moving data between Google BigQuery and PostgreSQL traditionally involves complex ETL processes, custom scripts, and significant engineering effort. Organizations often face challenges such as:

Setting up and maintaining data extraction processes from BigQuery
Managing authentication and permissions across platforms
Handling schema compatibility and data type conversions
Implementing efficient data loading into PostgreSQL
Monitoring and maintaining the data pipeline
Dealing with incremental updates and schema changes

According to industry research, setting up a traditional data pipeline between BigQuery and PostgreSQL can take weeks or even months, requiring specialized knowledge of both platforms and custom code development. Common approaches include:

Writing custom Python scripts using libraries like pandas and sqlalchemy
Using ETL tools that require extensive configuration and maintenance
Implementing Apache Airflow DAGs with custom operators
Developing and maintaining complex data transformation logic

These approaches often lead to:

Increased development and maintenance costs
Complex error handling and retry mechanisms
Difficulty in handling schema changes
Performance bottlenecks
Limited monitoring and observability

Sling simplifies this entire process by providing a streamlined, configuration-based approach that eliminates the need for custom code and complex infrastructure setup. With Sling, you can:

Configure connections with simple environment variables or CLI commands
Automatically handle schema mapping and data type conversions
Optimize performance with built-in batch processing and parallel execution
Monitor and manage replications through both CLI and web interface
Implement incremental updates with minimal configuration

In this guide, we’ll walk through the process of setting up a BigQuery to PostgreSQL replication using Sling, demonstrating how to overcome common challenges and implement an efficient data pipeline in minutes rather than days or weeks.

Installation

Getting started with Sling is straightforward. You can install it using various package managers depending on your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by running:

# Check sling version
sling --version

For more detailed installation instructions and options, visit the installation guide.

Setting Up Connections

Before we can start replicating data, we need to configure our source (BigQuery) and target (PostgreSQL) connections. Sling provides multiple ways to manage connections, including environment variables, the sling conns command, and a YAML configuration file.

BigQuery Connection Setup

For BigQuery, you’ll need:

Google Cloud project ID
Service account credentials with appropriate permissions
Dataset information
Google Cloud Storage bucket (for data transfer)

You can set up the BigQuery connection in several ways:

Using the sling conns set Command

# Set up BigQuery connection using CLI
sling conns set bigquery_source type=bigquery \
  project=<project> \
  dataset=<dataset> \
  gc_bucket=<gc_bucket> \
  key_file=/path/to/service.account.json \
  location=<location>

Using Environment Variables

# Set up using service account JSON content
export GC_KEY_BODY='{"type": "service_account", ...}'
export BIGQUERY_SOURCE='{type: bigquery, project: <project>, dataset: <dataset>, gc_bucket: <gc_bucket>}'

Using Sling Environment YAML

Create or edit ~/.sling/env.yaml:

connections:
  bigquery_source:
    type: bigquery
    project: your-project
    dataset: your_dataset
    gc_bucket: your-bucket
    key_file: /path/to/service.account.json
    location: US  # optional

PostgreSQL Connection Setup

For PostgreSQL, you’ll need:

Host address
Port number (default: 5432)
Database name
Username and password
Schema (optional)
SSL mode (if required)

Here’s how to set up the PostgreSQL connection:

Using the sling conns set Command

# Set up PostgreSQL connection using CLI
sling conns set postgres_target type=postgres \
  host=<host> \
  user=<user> \
  database=<database> \
  password=<password> \
  port=<port> \
  schema=<schema>

# Or use connection URL format
sling conns set postgres_target url="postgresql://user:password@host:5432/database?sslmode=require"

Using Environment Variables

# Set up using connection URL format
export POSTGRES_TARGET='postgresql://user:password@host:5432/database?sslmode=require'

Using Sling Environment YAML

Add to your ~/.sling/env.yaml:

connections:
  postgres_target:
    type: postgres
    host: your-host
    user: your-username
    password: your-password
    database: your-database
    port: 5432
    schema: public
    sslmode: require  # optional

Testing Connections

After setting up your connections, it’s important to verify they work correctly:

# Test BigQuery connection
sling conns test bigquery_source

# Test PostgreSQL connection
sling conns test postgres_target

# List available tables in BigQuery
sling conns discover bigquery_source

You can also manage your connections through the Sling Platform’s web interface:

Sling Connections Management

For more details about connection configuration, visit the environment documentation.

Data Replication Methods

Sling provides multiple ways to replicate data from BigQuery to PostgreSQL. Let’s explore both CLI-based and YAML-based approaches, starting from simple configurations to more advanced use cases.

Using CLI Flags

The quickest way to start a replication is using CLI flags. Here are two examples:

Basic CLI Example

This example shows how to replicate a single table with default settings:

# Replicate a single table from BigQuery to PostgreSQL
sling run \
  --src-conn bigquery_source \
  --src-stream "analytics.daily_sales" \
  --tgt-conn postgres_target \
  --tgt-object "analytics.daily_sales" \
  --tgt-options '{ "column_casing": "snake" }'

Advanced CLI Example

This example demonstrates more advanced options including column selection and incremental updates:

# Replicate with advanced options
sling run \
  --src-conn bigquery_source \
  --src-stream "analytics.customer_orders" \
  --select "order_id, customer_id, order_date, total_amount" \
  --tgt-conn postgres_target \
  --tgt-object "analytics.customer_orders" \
  --mode incremental \
  --primary-key order_id \
  --update-key order_date \
  --tgt-options '{ "column_casing": "snake", "add_new_columns": true, "table_keys": { "unique": ["order_id"] } }'

For more CLI flag options, visit the CLI flags documentation.

Using YAML Configuration

For more complex replication scenarios, YAML configuration files provide better maintainability and reusability. Let’s look at two examples:

Basic YAML Example

Create a file named bigquery_to_postgres.yaml:

# Define source and target connections
source: bigquery_source
target: postgres_target

# Default settings for all streams
defaults:
  mode: full-refresh
  target_options:
    column_casing: snake
    add_new_columns: true

# Define streams to replicate
streams:
  analytics.daily_sales:
    object: analytics.daily_sales
    primary_key: [date, product_id]

  analytics.customer_orders:
    object: analytics.{stream_table}
    primary_key: order_id

Run the replication:

# Run the replication using YAML config
sling run -r bigquery_to_postgres.yaml

Advanced YAML Example

Here’s a more complex example that demonstrates various features including runtime variables, custom SQL, and multiple streams:

source: bigquery_source
target: postgres_target

env:
  DATE: ${DATE}  # from env var

defaults:
  mode: incremental
  target_options:
    column_casing: snake
    add_new_columns: true

streams:
  # Stream with custom SQL and runtime variables
  analytics.orders_{DATE}:
    object: analytics.orders
    sql: |
      SELECT *
      FROM analytics.orders
      WHERE DATE(created_at) = '{DATE}'
    primary_key: order_id
    update_key: updated_at

  # Stream with column selection and transforms
  analytics.customers:
    object: analytics.customers
    select:
      - customer_id
      - first_name
      - last_name
      - email
      - -internal_notes  # exclude this column
    transforms:
      email: [lower, trim]
    primary_key: customer_id
    target_options:
      table_keys:
        primary: [customer_id]
        unique: [email]

  # Stream with wildcard pattern
  analytics.events_*:
    object: analytics.events
    mode: full-refresh
    primary_key: event_id

Run the replication with runtime variables:

# Run the replication with a specific date
export DATE=2024-02-10
sling run -r bigquery_to_postgres.yaml

For more details about replication configuration, visit:

Sling Platform UI

While the CLI provides powerful functionality for local development and automation, the Sling Platform offers a comprehensive web interface for managing and monitoring your data replications at scale.

Platform Overview

The Sling Platform provides:

Visual interface for creating and managing data workflows
Team collaboration features
Monitoring and alerting
Centralized connection management
Job scheduling and orchestration
Agent-based architecture for secure execution

Managing Connections

The platform provides an intuitive interface for managing your connections:

Sling Connections Management

You can:

Create and edit connections with a visual form
Test connections directly from the UI
View and manage connection permissions
Share connections with team members

Visual Replication Editor

The platform includes a powerful visual editor for creating and managing replications:

Sling Editor

Features include:

Visual stream configuration
Syntax highlighting for SQL and YAML
Real-time validation
Version control integration

Monitoring and Execution

Track your replication jobs with detailed execution information:

Sling Execution

The platform provides:

Real-time execution monitoring
Detailed logs and error messages
Performance metrics and statistics
Historical execution data

For more information about the Sling Platform, visit the platform documentation.

Getting Started

Now that we’ve covered the various aspects of using Sling for BigQuery to PostgreSQL data replication, here’s a quick guide to get you started:

Install Sling
- Choose the appropriate installation method for your system
- Verify the installation with sling --version
Set Up Connections
- Configure BigQuery source connection
- Configure PostgreSQL target connection
- Test both connections using sling conns test
Start Simple
- Begin with a basic CLI command to replicate a single table
- Monitor the replication process
- Verify the data in PostgreSQL
Scale Up
- Create a YAML configuration file for multiple streams
- Add incremental updates and transformations
- Implement runtime variables for flexibility
Consider Platform Features
- Sign up for the Sling Platform for advanced features
- Set up team access and permissions
- Configure monitoring and alerts

Best Practices

Connection Management
- Use environment variables or YAML files for connection configuration
- Keep credentials secure and never commit them to version control
- Use separate connections for development and production
Replication Configuration
- Start with simple configurations and gradually add complexity
- Use YAML files for better maintainability
- Document your configurations with clear descriptions
Performance Optimization
- Use appropriate batch sizes for your data volume
- Implement incremental updates when possible
- Monitor and adjust configurations based on performance metrics
Monitoring and Maintenance
- Regularly check replication logs
- Set up alerts for failed replications
- Keep Sling updated to the latest version

Next Steps

To learn more about Sling’s capabilities, explore these resources:

For additional examples and community support:

Join the Sling Discord Community
Follow Sling on GitHub
Contact [email protected] for assistance

Export from BigQuery to PostgreSQL with Sling