Moving CSV Data from AWS S3 to PostgreSQL with Sling

Slinger avatar
Slinger
Cover for Moving CSV Data from AWS S3 to PostgreSQL with Sling

Introduction

Moving data from cloud storage to databases efficiently is a critical requirement in modern data architectures. When dealing with CSV files stored in Amazon S3 and PostgreSQL databases, organizations often face challenges that require significant development effort.

Traditional approaches like custom scripts, AWS Data Pipeline, or ETL tools often involve:

  • Complex error handling and maintenance
  • High development and licensing costs
  • Limited flexibility and scalability
  • Manual implementation of monitoring

Sling provides a streamlined solution to these challenges, offering built-in support for S3 and PostgreSQL connections, automatic CSV parsing, efficient bulk loading, and comprehensive monitoring.

Setting Up Connections

Before we can start moving data, we need to configure our source (S3) and target (PostgreSQL) connections.

S3 Connection Setup

You can set up an S3 connection using environment variables or the CLI:

# Set AWS credentials
export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"
export AWS_REGION="us-west-2"  # optional

# Create S3 connection using CLI
sling conns set S3 type=s3 bucket="my-data-bucket"

PostgreSQL Connection Setup

For PostgreSQL, you can use environment variables or the CLI:

# Set PostgreSQL connection using environment variable
export POSTGRES_DB='postgresql://user:password@host:5432/database'

# Or use the CLI
sling conns set POSTGRES url='postgresql://user:password@host:5432/database'

Verifying Connections

After setting up your connections, verify them:

# List all configured connections
sling conns list

# Test S3 connection
sling conns test S3

# Test PostgreSQL connection
sling conns test POSTGRES

Basic Data Transfer with CLI Flags

The quickest way to start transferring data from S3 to PostgreSQL is using Sling’s CLI flags.

Simple Transfer Example

# Transfer a single CSV file to PostgreSQL
sling run \
  --src-conn S3 \
  --src-stream "data/users.csv" \
  --tgt-conn POSTGRES \
  --tgt-object "public.users"

Advanced CLI Options

# Transfer with custom options and transformations
sling run \
  --src-conn S3 \
  --src-stream "data/*.csv" \
  --src-options '{ 
    "delimited": ",",
    "empty_as_null": true,
    "datetime_format": "YYYY-MM-DD HH:mm:ss"
  }' \
  --tgt-conn POSTGRES \
  --tgt-object "public.{stream_file_name}" \
  --tgt-options '{
    "column_casing": "snake",
    "add_new_columns": true,
    "table_keys": { "primary": ["id"] }
  }' \
  --mode incremental \
  --update-key "updated_at"

Advanced Configuration with YAML

For more complex data transfer scenarios, YAML configurations provide better maintainability and reusability.

Basic YAML Example

Create a file named s3_to_postgres.yaml:

source: S3
target: POSTGRES

defaults:
  mode: full-refresh
  target_options:
    add_new_columns: true
    column_casing: snake

streams:
  # Daily sales data
  data/sales/daily/*.csv:
    object: analytics.daily_sales
    source_options:
      delimited: ","
    primary_key: [id]
    
  # Customer data
  data/customers/current.csv:
    object: analytics.customers
    source_options:
      delimited: ","
      empty_as_null: true
    primary_key: [customer_id]

Complex YAML Example

Here’s a more advanced configuration with multiple streams and transformations:

source: S3
target: POSTGRES

defaults:
  mode: incremental
  source_options:
    delimited: ","
    empty_as_null: true
    datetime_format: "YYYY-MM-DD HH:mm:ss"
  target_options:
    add_new_columns: true
    column_casing: snake
    table_keys:
      primary: [id]
      unique: [email]

streams:
  # Customer data with specific columns
  data/customers/{YYYY}_{MM}_{DD}.csv:
    object: analytics.customers
    select:
      - customer_id
      - name
      - email
      - status
      - created_at
      - updated_at
    update_key: updated_at

  # Orders with custom SQL and options
  data/orders/{part_year}/{part_month}.csv:
    object: analytics.orders
    primary_key: [order_id]
    update_key: created_at

Run the replication using:

# Execute the replication
sling run -r s3_to_postgres.yaml

The Sling Platform

While the CLI is powerful for local development and simple workflows, the Sling Platform provides a comprehensive web-based interface for managing your data operations at scale. Let’s explore the platform’s capabilities.

Platform Overview

Sling Platform UI

The Sling Platform consists of several key components that work together to provide a complete data movement solution:

Web Interface

  • User-friendly interface for managing connections
  • Visual replication builder
  • Real-time monitoring dashboard
  • Team collaboration features

Agents

  • Distributed workers that execute data operations
  • Run in your infrastructure
  • Secure and scalable deployment
  • Automatic updates and maintenance

Scheduling

  • Built-in job scheduling
  • Dependency management
  • Retry policies
  • Notification system

Monitoring

  • Real-time execution tracking
  • Performance metrics
  • Error reporting
  • Audit logging

Getting Started with the Platform

To start using the Sling Platform:

  1. Sign up at platform.slingdata.io
  2. Install and configure a Sling agent in your infrastructure
  3. Set up your connections through the web interface
  4. Create and schedule your replications

For more information about the platform, visit:

Conclusion

Moving data from S3 to PostgreSQL doesn’t have to be complicated. Sling provides a powerful yet simple solution that can handle everything from basic file transfers to complex data pipelines. Whether you’re using the CLI for local development or the Platform for enterprise-scale operations, Sling offers the flexibility and features you need.

Next Steps

To take your data integration to the next level:

  1. Install Sling and set up your connections
  2. Start with simple transfers using CLI flags
  3. Progress to replication YAML for complex scenarios
  4. Consider the Platform for enterprise needs
  5. Join the Sling community for support and updates

For more examples and detailed documentation, visit: