Introduction
In today’s cloud-native world, managing data across different environments is a common challenge. Cloudflare D1, a serverless SQL database that runs on the edge, is gaining popularity for its seamless integration with Cloudflare Workers. However, there are times when you need to export data from D1 to your local environment for analysis, backup, or development purposes. This is where Sling comes in, offering a streamlined solution for data movement between D1 and local files in various formats (CSV, JSON, or Parquet).
The Challenge of D1 Data Export
Traditional approaches to exporting data from D1 to local files often involve:
- Writing custom scripts using programming languages
- Managing database connections and authentication
- Handling data type conversions for different file formats
- Implementing proper error handling
- Setting up pagination for large datasets
- Ensuring data consistency during export
- Managing file format-specific requirements
How Sling Simplifies the Process
Sling provides a modern approach to database exports with:
- Simple command-line interface for quick exports
- Support for multiple file formats (CSV, JSON, Parquet)
- YAML-based configuration for complex operations
- Built-in connection management
- Automatic data type handling
- Support for incremental updates
Let’s dive into how you can use Sling to simplify your D1 to local file export workflows.
Getting Started with Sling
Before we dive into exporting data from D1 to various file formats, let’s set up Sling on your system. Sling offers multiple installation methods to suit different operating systems and preferences.
Installation
Choose the installation method that matches your operating system:
# Install using Homebrew (macOS)
brew install slingdata-io/sling/sling
# Install using curl (Linux)
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
&& tar xf sling_linux_amd64.tar.gz \
&& rm -f sling_linux_amd64.tar.gz \
&& chmod +x sling
# Install using Scoop (Windows)
scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git
scoop install sling
# Install using Python pip
pip install sling
After installation, verify that Sling is properly installed by checking its version:
# Check Sling version
sling --version
For more detailed installation instructions, visit the Sling CLI Getting Started Guide.
Initial Setup
Sling uses a configuration directory to store connection details and other settings. The configuration directory is typically located at:
- Linux/macOS:
~/.sling/
- Windows:
C:\Users\<username>\.sling\
The first time you run Sling, it will automatically create this directory and a default configuration file. You can also specify a custom location using the SLING_HOME_DIR
environment variable.
Understanding Sling’s Architecture
Sling’s architecture is designed for efficient data movement:
Connection Management
- Secure credential storage
- Multiple connection methods support
- Connection pooling and reuse
Data Processing
- Streaming data transfer
- Automatic type conversion
- Configurable batch sizes
Monitoring and Control
- Real-time progress tracking
- Detailed logging
- Error handling and recovery
For more detailed configuration options and best practices, refer to the Sling CLI Environment documentation.
Setting Up Connections
Before we can start exporting data, we need to configure our source (D1) and target (local storage) connections. Let’s look at what you’ll need for each system and how to set them up.
D1 Connection Setup
To connect to Cloudflare D1, you’ll need:
- Cloudflare Account ID
- API Token with appropriate permissions
- D1 Database name
You can find these details in your Cloudflare dashboard under the Workers & Pages section.
Using Environment Variables
The simplest way to set up your D1 connection is through environment variables:
# Set D1 connection using environment variable
export D1_SOURCE='d1://account_id:api_token@database_name'
Using the Sling CLI
For a more maintainable approach, use the sling conns set
command:
# Set up D1 connection with individual parameters
sling conns set d1_source type=d1 account_id=your_account_id api_token=your_api_token database=your_database_name
# Or use a connection URL
sling conns set d1_source url="d1://account_id:api_token@database_name"
Using the Sling Environment File
For a permanent configuration, add the connection details to your ~/.sling/env.yaml
file:
connections:
d1_source:
type: d1
account_id: your_account_id
api_token: your_api_token
database: your_database_name
Local Storage Connection
For local storage (CSV files), Sling automatically configures a default connection named LOCAL
. You don’t need any additional configuration for accessing local files.
Understanding Local Storage in Sling
When working with local files in Sling:
- Use the
file://
prefix to specify file paths - Paths can be relative or absolute
- CSV-specific options can be configured in the replication settings
- File permissions are based on the user running Sling
Testing Connections
After setting up your connections, it’s important to verify they work correctly:
# List all configured connections
sling conns list
# Test D1 connection
sling conns test d1_source
# Test local connection
sling conns test local
# Discover available tables in D1
sling conns discover d1_source
For more details about connection configuration and options, refer to:
Basic Data Export with CLI
Once your connections are set up, you can start exporting data from D1 to various file formats using Sling’s CLI flags. Let’s explore different approaches, from simple exports to more advanced configurations.
Simple Table Export
The most basic way to export data is using the sling run
command with source and target specifications. Here are examples for different file formats:
# Export to CSV
sling run \
--src-conn d1_source \
--src-stream "users" \
--tgt-object "file://exports/users.csv"
# Export to JSON
sling run \
--src-conn d1_source \
--src-stream "users" \
--tgt-object "file://exports/users.json"
# Export to Parquet
sling run \
--src-conn d1_source \
--src-stream "users" \
--tgt-object "file://exports/users.parquet"
These commands will:
- Connect to your D1 database
- Read all data from the “users” table
- Create the exports directory if it doesn’t exist
- Save the data in the specified format
Using Custom SQL Queries
For more control over the exported data, you can use custom SQL queries with format-specific options:
# Export to CSV with custom options
sling run \
--src-conn d1_source \
--src-stream "SELECT id, name, email, created_at FROM users WHERE status = 'active'" \
--tgt-object "file://exports/active_users.csv" \
--src-options '{ "datetime_format": "YYYY-MM-DD HH:mm:ss" }' \
--tgt-options '{ "delimiter": "|" }'
# Export to JSON with custom options
sling run \
--src-conn d1_source \
--src-stream "SELECT * FROM users WHERE created_at > '2024-01-01'" \
--tgt-object "file://exports/recent_users.json"
# Export to Parquet with custom options
sling run \
--src-conn d1_source \
--src-stream "SELECT * FROM orders WHERE status = 'completed'" \
--tgt-object "file://exports/completed_orders.parquet" \
--tgt-options '{ "compression": "snappy", "row_group_size": 100000 }'
Advanced CLI Options
For more complex scenarios, you can use additional flags to customize the export:
# Advanced CSV export
sling run \
--src-conn d1_source \
--src-stream "orders" \
--tgt-object "file://exports/orders_{run_timestamp}.csv" \
--select "id,customer_id,total_amount,status,created_at" \
--mode incremental \
--update-key "created_at" \
--src-options '{ "batch_size": 1000, "datetime_format": "YYYY-MM-DD HH:mm:ss" }' \
--tgt-options '{ "file_max_bytes": 104857600, "column_casing": "snake", "delimiter": "|" }'
# Advanced JSON export
sling run \
--src-conn d1_source \
--src-stream "products" \
--tgt-object "file://exports/products_{run_timestamp}.json" \
--select "id,name,category,price,inventory" \
--mode incremental \
--update-key "updated_at" \
--src-options '{ "batch_size": 1000 }' \
--tgt-options '{ "file_max_bytes": 104857600 }'
# Advanced Parquet export
sling run \
--src-conn d1_source \
--src-stream "transactions" \
--tgt-object "file://exports/transactions_{run_timestamp}.parquet" \
--select "id,user_id,amount,type,status,created_at" \
--mode incremental \
--update-key "created_at" \
--src-options '{ "batch_size": 5000 }' \
--tgt-options '{ "compression": "snappy", "row_group_size": 100000 }'
These advanced examples include:
- Dynamic file naming with timestamps
- Column selection
- Incremental mode with update key
- Batch size configuration
- Date format specification
- File size limits
- Format-specific options:
- CSV: delimiter, column casing
- JSON: array format, pretty printing
- Parquet: compression, row group size
For more information about CLI flags and options, visit:
Using YAML Configuration
While CLI flags are great for quick exports, YAML configurations provide a more maintainable and repeatable way to define your data exports. Let’s explore how to use YAML configurations with different file formats.
Basic YAML Configuration
Create a file named d1_to_files.yaml
with configurations for different formats:
# Basic configuration for exporting to multiple formats
source: d1_source
target: local
streams:
# CSV export
users_csv:
object: file://exports/users.csv
source_options:
datetime_format: YYYY-MM-DD HH:mm:ss
empty_as_null: true
target_options:
column_casing: snake
delimiter: "|"
# JSON export
users_json:
object: file://exports/users.json
source_options:
empty_as_null: true
# Parquet export
users_parquet:
object: file://exports/users.parquet
source_options:
batch_size: 5000
target_options:
compression: snappy
row_group_size: 100000
Advanced YAML Configuration
For more complex scenarios, you can create a more detailed configuration:
# Advanced configuration with multiple streams and formats
source: d1_source
target: local
streams:
# CSV export with specific columns and filtering
users_csv:
object: file://exports/users_{run_timestamp}.csv
sql:
select
id,
first_name,
last_name,
email,
created_at,
updated_at
from my_users
where deleted_at IS NULL
source_options:
datetime_format: YYYY-MM-DD HH:mm:ss
batch_size: 1000
target_options:
file_max_bytes: 104857600
column_casing: snake
delimiter: "|"
# JSON export with incremental updates
orders_json:
object: file://exports/orders_{run_timestamp}.json
mode: incremental
update_key: updated_at
source_options:
batch_size: 500
empty_as_null: true
target_options:
file_max_bytes: 104857600
# Parquet export with custom options
transactions_parquet:
object: file://exports/transactions_{run_timestamp}.parquet
mode: incremental
update_key: created_at
source_options:
batch_size: 5000
datetime_format: YYYY-MM-DD HH:mm:ss
target_options:
compression: snappy
row_group_size: 100000
Using Runtime Variables
Sling supports runtime variables in YAML configurations for dynamic behavior:
# Configuration with runtime variables
source: d1_source
target: local
env:
start_date: ${START_DATE}
end_date: ${END_DATE}
format: ${FORMAT}
streams:
# Export all tables in a schema
my_schema.*:
object: file://exports/{stream_table}/{format}/{run_timestamp}.
where: "created_at >= '{start_date}' AND created_at < '{end_date}'"
target_options:
format: '{format}'
datetime_format: YYYY-MM-DD HH:mm:ss
# CSV-specific options
delimiter: ","
# Parquet-specific options
compression: snappy
row_group_size: 100000
Run this configuration with runtime variables:
# Set environment variables
export START_DATE=2024-01-01
export END_DATE=2024-02-01
# Run with runtime variables for CSV
FORMAT=csv sling run -r d1_to_files.yaml
# Run with runtime variables for JSON
FORMAT=json sling run -r d1_to_files.yaml
# Run with runtime variables for Parquet
FORMAT=parquet sling run -r d1_to_files.yaml
Next Steps
Now that you have your D1 to local file export workflow set up with Sling, here are some ways to take your implementation further.
Additional Resources
Documentation
Examples and Tutorials
Connection Types
Getting Help
Join the Sling community to get help and share experiences:
- GitHub Discussions
- Discord Community
- Contact support at [email protected]