Export Data from Cloudflare D1 to Local Files with Sling

Slinger avatar
Slinger
Cover for Export Data from Cloudflare D1 to Local Files with Sling

Introduction

In today’s cloud-native world, managing data across different environments is a common challenge. Cloudflare D1, a serverless SQL database that runs on the edge, is gaining popularity for its seamless integration with Cloudflare Workers. However, there are times when you need to export data from D1 to your local environment for analysis, backup, or development purposes. This is where Sling comes in, offering a streamlined solution for data movement between D1 and local files in various formats (CSV, JSON, or Parquet).

The Challenge of D1 Data Export

Traditional approaches to exporting data from D1 to local files often involve:

  • Writing custom scripts using programming languages
  • Managing database connections and authentication
  • Handling data type conversions for different file formats
  • Implementing proper error handling
  • Setting up pagination for large datasets
  • Ensuring data consistency during export
  • Managing file format-specific requirements

How Sling Simplifies the Process

Sling provides a modern approach to database exports with:

  • Simple command-line interface for quick exports
  • Support for multiple file formats (CSV, JSON, Parquet)
  • YAML-based configuration for complex operations
  • Built-in connection management
  • Automatic data type handling
  • Support for incremental updates

Let’s dive into how you can use Sling to simplify your D1 to local file export workflows.

Getting Started with Sling

Before we dive into exporting data from D1 to various file formats, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.

Installation

Choose the installation method that matches your operating system:

# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling

# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Install using Python pip
pip install sling

After installation, verify that Sling is properly installed by checking its version:

# Check Sling version
sling --version

For more detailed installation instructions, visit the Sling CLI Getting Started Guide.

Initial Setup

Sling uses a configuration directory to store connection details and other settings. The configuration directory is typically located at:

  • Linux/macOS: ~/.sling/
  • Windows: C:\Users\<username>\.sling\

The first time you run Sling, it will automatically create this directory and a default configuration file. You can also specify a custom location using the SLING_HOME_DIR environment variable.

Understanding Sling’s Architecture

Sling’s architecture is designed for efficient data movement:

  1. Connection Management

    • Secure credential storage
    • Multiple connection methods support
    • Connection pooling and reuse
  2. Data Processing

    • Streaming data transfer
    • Automatic type conversion
    • Configurable batch sizes
  3. Monitoring and Control

    • Real-time progress tracking
    • Detailed logging
    • Error handling and recovery

For more detailed configuration options and best practices, refer to the Sling CLI Environment documentation.

Setting Up Connections

Before we can start exporting data, we need to configure our source (D1) and target (local storage) connections. Let’s look at what you’ll need for each system and how to set them up.

D1 Connection Setup

To connect to Cloudflare D1, you’ll need:

  • Cloudflare Account ID
  • API Token with appropriate permissions
  • D1 Database name

You can find these details in your Cloudflare dashboard under the Workers & Pages section.

Using Environment Variables

The simplest way to set up your D1 connection is through environment variables:

# Set D1 connection using environment variable
export D1_SOURCE='d1://account_id:api_token@database_name'

Using the Sling CLI

For a more maintainable approach, use the sling conns set command:

# Set up D1 connection with individual parameters
sling conns set d1_source type=d1 account_id=your_account_id api_token=your_api_token database=your_database_name

# Or use a connection URL
sling conns set d1_source url="d1://account_id:api_token@database_name"

Using the Sling Environment File

For a permanent configuration, add the connection details to your ~/.sling/env.yaml file:

connections:
  d1_source:
    type: d1
    account_id: your_account_id
    api_token: your_api_token
    database: your_database_name

Local Storage Connection

For local storage (CSV files), Sling automatically configures a default connection named LOCAL. You don’t need any additional configuration for accessing local files.

Understanding Local Storage in Sling

When working with local files in Sling:

  • Use the file:// prefix to specify file paths
  • Paths can be relative or absolute
  • CSV-specific options can be configured in the replication settings
  • File permissions are based on the user running Sling

Testing Connections

After setting up your connections, it’s important to verify they work correctly:

# List all configured connections
sling conns list

# Test D1 connection
sling conns test d1_source

# Test local connection
sling conns test local

# Discover available tables in D1
sling conns discover d1_source

For more details about connection configuration and options, refer to:

Basic Data Export with CLI

Once your connections are set up, you can start exporting data from D1 to various file formats using Sling’s CLI flags. Let’s explore different approaches, from simple exports to more advanced configurations.

Simple Table Export

The most basic way to export data is using the sling run command with source and target specifications. Here are examples for different file formats:

# Export to CSV
sling run \
  --src-conn d1_source \
  --src-stream "users" \
  --tgt-object "file://exports/users.csv"

# Export to JSON
sling run \
  --src-conn d1_source \
  --src-stream "users" \
  --tgt-object "file://exports/users.json"

# Export to Parquet
sling run \
  --src-conn d1_source \
  --src-stream "users" \
  --tgt-object "file://exports/users.parquet"

These commands will:

  1. Connect to your D1 database
  2. Read all data from the “users” table
  3. Create the exports directory if it doesn’t exist
  4. Save the data in the specified format

Using Custom SQL Queries

For more control over the exported data, you can use custom SQL queries with format-specific options:

# Export to CSV with custom options
sling run \
  --src-conn d1_source \
  --src-stream "SELECT id, name, email, created_at FROM users WHERE status = 'active'" \
  --tgt-object "file://exports/active_users.csv" \
  --src-options '{ "datetime_format": "YYYY-MM-DD HH:mm:ss" }' \
  --tgt-options '{ "delimiter": "|" }'

# Export to JSON with custom options
sling run \
  --src-conn d1_source \
  --src-stream "SELECT * FROM users WHERE created_at > '2024-01-01'" \
  --tgt-object "file://exports/recent_users.json"

# Export to Parquet with custom options
sling run \
  --src-conn d1_source \
  --src-stream "SELECT * FROM orders WHERE status = 'completed'" \
  --tgt-object "file://exports/completed_orders.parquet" \
  --tgt-options '{ "compression": "snappy", "row_group_size": 100000 }'

Advanced CLI Options

For more complex scenarios, you can use additional flags to customize the export:

# Advanced CSV export
sling run \
  --src-conn d1_source \
  --src-stream "orders" \
  --tgt-object "file://exports/orders_{run_timestamp}.csv" \
  --select "id,customer_id,total_amount,status,created_at" \
  --mode incremental \
  --update-key "created_at" \
  --src-options '{ "batch_size": 1000, "datetime_format": "YYYY-MM-DD HH:mm:ss" }' \
  --tgt-options '{ "file_max_bytes": 104857600, "column_casing": "snake", "delimiter": "|" }'

# Advanced JSON export
sling run \
  --src-conn d1_source \
  --src-stream "products" \
  --tgt-object "file://exports/products_{run_timestamp}.json" \
  --select "id,name,category,price,inventory" \
  --mode incremental \
  --update-key "updated_at" \
  --src-options '{ "batch_size": 1000 }' \
  --tgt-options '{ "file_max_bytes": 104857600 }'

# Advanced Parquet export
sling run \
  --src-conn d1_source \
  --src-stream "transactions" \
  --tgt-object "file://exports/transactions_{run_timestamp}.parquet" \
  --select "id,user_id,amount,type,status,created_at" \
  --mode incremental \
  --update-key "created_at" \
  --src-options '{ "batch_size": 5000 }' \
  --tgt-options '{ "compression": "snappy", "row_group_size": 100000 }'

These advanced examples include:

  • Dynamic file naming with timestamps
  • Column selection
  • Incremental mode with update key
  • Batch size configuration
  • Date format specification
  • File size limits
  • Format-specific options:
    • CSV: delimiter, column casing
    • JSON: array format, pretty printing
    • Parquet: compression, row group size

For more information about CLI flags and options, visit:

Using YAML Configuration

While CLI flags are great for quick exports, YAML configurations provide a more maintainable and repeatable way to define your data exports. Let’s explore how to use YAML configurations with different file formats.

Basic YAML Configuration

Create a file named d1_to_files.yaml with configurations for different formats:

# Basic configuration for exporting to multiple formats
source: d1_source
target: local

streams:
  # CSV export
  users_csv:
    object: file://exports/users.csv
    source_options:
      datetime_format: YYYY-MM-DD HH:mm:ss
      empty_as_null: true
    target_options:
      column_casing: snake
      delimiter: "|"

  # JSON export
  users_json:
    object: file://exports/users.json
    source_options:
      empty_as_null: true

  # Parquet export
  users_parquet:
    object: file://exports/users.parquet
    source_options:
      batch_size: 5000
    target_options:
      compression: snappy
      row_group_size: 100000

Advanced YAML Configuration

For more complex scenarios, you can create a more detailed configuration:

# Advanced configuration with multiple streams and formats
source: d1_source
target: local

streams:
  # CSV export with specific columns and filtering
  users_csv:
    object: file://exports/users_{run_timestamp}.csv
    sql: 
      select
        id,
        first_name,
        last_name,
        email,
        created_at,
        updated_at
      from my_users
      where deleted_at IS NULL
    source_options:
      datetime_format: YYYY-MM-DD HH:mm:ss
      batch_size: 1000
    target_options:
      file_max_bytes: 104857600
      column_casing: snake
      delimiter: "|"

  # JSON export with incremental updates
  orders_json:
    object: file://exports/orders_{run_timestamp}.json
    mode: incremental
    update_key: updated_at
    source_options:
      batch_size: 500
      empty_as_null: true
    target_options:
      file_max_bytes: 104857600

  # Parquet export with custom options
  transactions_parquet:
    object: file://exports/transactions_{run_timestamp}.parquet
    mode: incremental
    update_key: created_at
    source_options:
      batch_size: 5000
      datetime_format: YYYY-MM-DD HH:mm:ss
    target_options:
      compression: snappy
      row_group_size: 100000

Using Runtime Variables

Sling supports runtime variables in YAML configurations for dynamic behavior:

# Configuration with runtime variables
source: d1_source
target: local

env:
  start_date: ${START_DATE}
  end_date: ${END_DATE}
  format: ${FORMAT}

streams:
  # Export all tables in a schema
  my_schema.*:
    object: file://exports/{stream_table}/{format}/{run_timestamp}.
    where: "created_at >= '{start_date}' AND created_at < '{end_date}'"
    target_options:
      format: '{format}'
      datetime_format: YYYY-MM-DD HH:mm:ss
      
      # CSV-specific options
      delimiter: ","
      
      # Parquet-specific options
      compression: snappy
      row_group_size: 100000

Run this configuration with runtime variables:

# Set environment variables
export START_DATE=2024-01-01
export END_DATE=2024-02-01

# Run with runtime variables for CSV
FORMAT=csv sling run -r d1_to_files.yaml

# Run with runtime variables for JSON
FORMAT=json sling run -r d1_to_files.yaml

# Run with runtime variables for Parquet
FORMAT=parquet sling run -r d1_to_files.yaml

Next Steps

Now that you have your D1 to local file export workflow set up with Sling, here are some ways to take your implementation further.

Additional Resources

  1. Documentation

  2. Examples and Tutorials

  3. Connection Types

Getting Help

Join the Sling community to get help and share experiences: