Introduction
Loading large tables into SQL Server is one of the places ETL tools quietly fall over. The TDS protocol works, but row-by-row inserts are slow at scale, and bcp requires you to leave Python and write to disk first. ADBC — Arrow Database Connectivity — was designed to fix that: a single columnar protocol with native bulk-load support across drivers.
Sling supports SQL Server through both the standard TDS path and the ADBC path. This article walks through the ADBC path end to end with a working lab:
- A SQL Server 2022 container.
- A CLI pipeline that loads a CSV into SQL Server via ADBC.
- A Python script that generates 100k rows as multiple parquet files, pushes them into a SeaweedFS S3 bucket, lets Sling replicate the whole prefix into SQL Server via ADBC, and reads the result back through
stream_arrow()for native Arrow types.
All output below is from an actual run. The full source lives in the Sling CLI repository at tests/pipelines/adbc/sqlserver. The connector docs are at docs.slingdata.io/connections/database-connections/sqlserver#adbc.
Why ADBC for SQL Server
Three concrete reasons to use the ADBC path over the default TDS driver:
- Bulk load is built in. The Microsoft ADBC driver pushes rows as Arrow columnar batches. No
bcpshell-out, no temp files. - Type fidelity from Arrow sources. When the source is already Arrow (DuckDB, Parquet, Polars), ADBC carries types through without a round-trip to SQL strings.
- One driver, one configuration. The same
dbc install <name>mechanism installs drivers for SQL Server, Postgres, MySQL, DuckDB, Snowflake, and Trino. Sling auto-detects the installed drivers; you flip a single flag.
The trade-off: the ADBC driver manager is an external dependency you have to install once. After that, it’s transparent.
Setting Up the Lab
Sling needs four things on the host: the binary itself, the dbc driver manager, a SQL Server instance, and (for the Python part) an S3 endpoint.
Install Sling
# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash
# Python (for the Python example below)
pip install 'sling[arrow]'
The [arrow] extra is needed for stream_arrow() in the Python example.
Install the dbc driver manager
The dbc CLI ships ADBC drivers. Install it once, then ask it for the SQL Server driver:
curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install mssql
This drops libadbc_driver_mssql.* somewhere Sling can find it (~/.dbc/drivers/). Sling auto-resolves the path at connection time.
Start SQL Server and SeaweedFS
For the lab, one docker-compose.yaml brings up both: SQL Server 2022 on port 51444, and SeaweedFS exposing an S3-compatible API on port 18333.
# docker-compose.yaml
services:
mssql:
image: mcr.microsoft.com/mssql/server:2022-latest
container_name: sling-adbc-mssql
ports:
- "51444:1433"
environment:
ACCEPT_EULA: "Y"
MSSQL_SA_PASSWORD: "AdbcPipeline123!"
MSSQL_PID: "Developer"
healthcheck:
test: ["CMD-SHELL", "/opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P 'AdbcPipeline123!' -C -Q 'select 1'"]
interval: 5s
retries: 30
seaweedfs:
image: chrislusf/seaweedfs:latest
container_name: sling-adbc-seaweedfs
ports:
- "18333:8333"
- "19333:9333"
command: server -s3 -dir=/data
volumes:
- seaweedfs_data:/data
volumes:
seaweedfs_data:
docker compose up -d
docker inspect sling-adbc-mssql --format='{{.State.Health.Status}}'
# wait until → healthy
SeaweedFS’s server -s3 mode runs anonymous-write by default. That’s fine for a local lab. For production, configure credentials with weed shell and the s3.configure command.
Configuring the Sling Connections
Sling reads connection details from ~/.sling/env.yaml, environment variables, or sling conns set. We need two: one for SQL Server (with use_adbc: true) and one for SeaweedFS (a standard S3 connection pointed at the local endpoint).
sling conns set MSSQL_ADBC type=sqlserver \
host=localhost port=51444 user=sa password='AdbcPipeline123!' \
database=master encrypt=disable use_adbc=true
sling conns set SEAWEED_S3 type=s3 endpoint=http://localhost:18333 \
bucket=pipeline access_key_id=any secret_access_key=any
Or in ~/.sling/env.yaml:
connections:
MSSQL_ADBC:
type: sqlserver
host: localhost
port: 51444
user: sa
password: 'AdbcPipeline123!'
database: master
encrypt: disable
use_adbc: true
SEAWEED_S3:
type: s3
endpoint: http://localhost:18333
bucket: pipeline
access_key_id: any
secret_access_key: any
Test them:
sling conns test MSSQL_ADBC # → INF success!
sling conns test SEAWEED_S3 # → INF success!
Behind the scenes, the use_adbc: true flag on a regular type: sqlserver connection routes both reads and writes through the ADBC driver. Flip it off and the same connection falls back to the standard TDS path; everything else in your replication or pipeline stays the same.
CLI Pipeline: CSV → ADBC
The CLI side of the lab stays simple: write a CSV, load it into dbo.adbc_pipeline via ADBC, read it back, verify the rows. The whole thing runs as a Sling pipeline so the setup, the load, and the assertions live in one file:
steps:
- type: write
to: file:///tmp/adbc_sqlserver_pipeline.csv
content: |
id,name,score
1,alice,98.5
2,bob,82.0
3,charlie,75.25
4,dana,91.75
5,eve,88.0
- type: query
connection: MSSQL_ADBC
query: |
if object_id('dbo.adbc_pipeline', 'U') is not null drop table dbo.adbc_pipeline
- replication:
source: LOCAL
target: MSSQL_ADBC
defaults:
mode: full-refresh
streams:
file:///tmp/adbc_sqlserver_pipeline.csv:
object: dbo.adbc_pipeline
- type: query
connection: MSSQL_ADBC
query: select id, name, score from dbo.adbc_pipeline order by id
into: rows
- type: check
check: length(store.rows) == 5
failure_message: "Expected 5 rows, got {length(store.rows)}"
Two things worth noting:
- The
replication:block embedded inside the pipeline is a regular Sling replication. The same YAML body runs unchanged outside the pipeline. - Sling’s hooks (
query,check,log,write) are first-class steps in pipelines, so the lab’s setup, assertions, and cleanup all stay in one file.
Run it:
sling run -p tests/pipelines/adbc/sqlserver/p.41.adbc_sqlserver.yaml
Real output (trimmed):
INF Sling CLI | https://slingdata.io
INF Wrote /tmp/adbc_sqlserver_pipeline.csv
INF Sling Replication | LOCAL -> MSSQL_ADBC | file:///tmp/adbc_sqlserver_pipeline.csv
INF connecting to target database (sqlserver)
INF reading from source file system (file)
INF writing to target database [mode: full-refresh]
INF created table "dbo"."adbc_pipeline"
INF inserted 5 rows into "dbo"."adbc_pipeline" in 0 secs [33 r/s]
INF execution succeeded
INF ADBC write completed
INF Loaded rows: [{"id":1,"name":"alice","score":"98.500000"},{"id":2,"name":"bob","score":"82.000000"},...]
INF ADBC SQL Server pipeline test PASSED
A single replication, an ADBC read, three check assertions, all under a second. The Python script below shows the same flow at scale — 100k rows via S3 — with native Arrow types on the read-back.
Python: 100k Rows via S3 and stream_arrow()
The Python side does the more interesting thing — the shape you’d actually see in production. The script generates 100,000 rows in DuckDB, writes them as four parquet files into a SeaweedFS S3 bucket via PyArrow, and lets Sling replicate the full S3 prefix into SQL Server via ADBC. Then it reads back through stream_arrow() so the rows arrive as native Arrow RecordBatch objects instead of stringified CSV-shaped dicts.
Save it as run_pipeline.py — the inline script header tells uv which dependencies to fetch:
# /// script
# requires-python = ">=3.10"
# dependencies = ["sling[arrow]", "duckdb", "pyarrow", "boto3"]
# ///
import io
import boto3
import duckdb
import pyarrow as pa
from pyarrow import parquet as pq
from sling import Mode, Replication, ReplicationStream, Sling
from sling.hooks import HookQuery
MSSQL_CONN = "MSSQL_ADBC"
S3_CONN = "SEAWEED_S3"
S3_BUCKET = "pipeline"
S3_PREFIX = "adbc-py/"
ROWS_PER_FILE = 25_000
NUM_FILES = 4 # 100k rows total
# 1. Generate 100k rows in DuckDB, split into 4 parquet files via pyarrow,
# push each to SeaweedFS S3 under the same prefix.
s3 = boto3.client(
"s3",
endpoint_url="http://localhost:18333",
aws_access_key_id="any", aws_secret_access_key="any",
region_name="us-east-1",
)
try: s3.create_bucket(Bucket=S3_BUCKET)
except Exception: pass
# Clear any prior run.
for obj in s3.list_objects_v2(Bucket=S3_BUCKET, Prefix=S3_PREFIX).get("Contents", []):
s3.delete_object(Bucket=S3_BUCKET, Key=obj["Key"])
con = duckdb.connect()
for n in range(NUM_FILES):
start_id = n * ROWS_PER_FILE + 1
end_id = start_id + ROWS_PER_FILE
arrow_tbl: pa.Table = con.execute(f"""
select i as id,
'user_' || i::varchar as name,
round(i * 1.5, 2) as score,
'cat_' || ((i % 10)::varchar) as category
from range({start_id}, {end_id}) t(i)
""").arrow()
buf = io.BytesIO()
pq.write_table(arrow_tbl, buf, compression="snappy")
buf.seek(0)
s3.upload_fileobj(buf, S3_BUCKET, f"{S3_PREFIX}part-{n+1:04d}.parquet")
# 2. Replicate the S3 prefix → SQL Server via ADBC.
Replication(
source=S3_CONN,
target=MSSQL_CONN,
streams={
f"{S3_BUCKET}/{S3_PREFIX}": ReplicationStream(
mode=Mode.FULL_REFRESH,
object="dbo.adbc_pipeline_py",
),
},
hooks={
"start": [HookQuery(
connection=MSSQL_CONN,
query="if object_id('dbo.adbc_pipeline_py','U') is not null drop table dbo.adbc_pipeline_py",
)],
},
).run()
# 3. Read back through ADBC with stream_arrow().
stream = Sling(
src_conn=MSSQL_CONN,
src_stream="select top 5 id, name, score, category from dbo.adbc_pipeline_py order by id",
).stream_arrow()
batches = list(stream)
print(f"got {len(batches)} arrow batch(es), schema: {batches[0].schema}")
for row in pa.Table.from_batches(batches).to_pylist():
print(f" {row}")
# 4. Verify the count.
cnt_batches = list(Sling(
src_conn=MSSQL_CONN,
src_stream="select count(*) as cnt from dbo.adbc_pipeline_py",
).stream_arrow())
total = pa.Table.from_batches(cnt_batches).to_pylist()[0]["cnt"]
print(f"total rows: {total:,}")
assert total == NUM_FILES * ROWS_PER_FILE
Run it without setting up a virtualenv:
uv run run_pipeline.py
Actual output:
generating 100,000 rows in pyarrow...
wrote s3://pipeline/adbc-py/part-0001.parquet (25,000 rows)
wrote s3://pipeline/adbc-py/part-0002.parquet (25,000 rows)
wrote s3://pipeline/adbc-py/part-0003.parquet (25,000 rows)
wrote s3://pipeline/adbc-py/part-0004.parquet (25,000 rows)
replicating s3://pipeline/adbc-py/ -> MSSQL_ADBC via ADBC (full-refresh)...
INF connecting to source file system (s3)
INF connecting to target database (sqlserver)
INF reading from source file system (s3)
INF writing to target database [mode: full-refresh]
INF created table "dbo"."adbc_pipeline_py"
INF inserted 100,000 rows into "dbo"."adbc_pipeline_py" in 3 secs [33,333 r/s]
reading back via ADBC stream_arrow() (first 5 rows by id)...
got 1 arrow batch(es), 5 rows
schema: id: int64, name: string, score: double, category: string
{'id': 1, 'name': 'user_1', 'score': 1.5, 'category': 'cat_1'}
{'id': 2, 'name': 'user_2', 'score': 3.0, 'category': 'cat_2'}
{'id': 3, 'name': 'user_3', 'score': 4.5, 'category': 'cat_3'}
{'id': 4, 'name': 'user_4', 'score': 6.0, 'category': 'cat_4'}
{'id': 5, 'name': 'user_5', 'score': 7.5, 'category': 'cat_5'}
total rows: 100,000
ADBC SQL Server Python pipeline PASSED
A few things worth calling out:
- One Sling replication, N parquet files. The stream key
pipeline/adbc-py/resolves as a prefix on the S3 connection; Sling picks up every parquet file underneath. Add more files and the next run just loads them. stream_arrow()returns ArrowRecordBatches.idis anint64,scoreis adouble— not a stringified version. That’s the difference that makes downstream Polars / DuckDB / pandas conversions free copies. The plainstream()method returns a Python-dict iterator with string-shaped values; use it when you want to iterate row-by-row without an Arrow dependency.- DuckDB → PyArrow is a zero-copy hop.
con.execute(sql).arrow()returns the result as apa.Tabledirectly. From there,pq.write_table()produces real columnar parquet, not pandas-shaped parquet — which matters when the receiving end is also Arrow-native.
When to Use ADBC vs. the Default Path
Sling’s default SQL Server path is the go-mssqldb TDS driver, which is fine for most workloads — connect, run queries, stream results. ADBC is the right call in three situations:
- Bulk loads from Arrow-native sources. Loading from DuckDB, Parquet, or a Polars DataFrame? ADBC keeps the data in Arrow format end to end and bulk-inserts using SQL Server’s columnar bulk path. The
go-mssqldbroute would re-serialize through SQL strings. - Big writes you’d otherwise reach for
bcpfor. ADBC’s bulk insert is faster than row-by-row TDS and doesn’t need an externalbcpinvocation, which means it stays inside one process and one transaction boundary. - Read-heavy analytics. Sling’s ADBC reads return Arrow record batches; downstream Polars / DuckDB / pandas conversions are free copies.
The default path is still the right answer for short interactive queries, smaller writes, anywhere you want to avoid the dbc driver install, or anywhere you need features the ADBC driver doesn’t expose yet (Kerberos, certain Azure auth modes).
Wrapping Up
Two flags carry the difference between Sling-with-TDS and Sling-with-ADBC: dbc install mssql and use_adbc: true. Same connection, same replications, same hooks — only the protocol underneath changes. From there, the same YAML pipeline drops into Python through the sling library, and Sling.stream_arrow() gives you native Arrow RecordBatches when the next hop is also columnar.
The full lab — Docker compose file (SQL Server + SeaweedFS), CLI pipeline, Python script — is in the sling-cli repo. Clone it, docker compose up, run the pipeline, then run the Python script. Both should finish in under ten seconds against the local containers.
For Sling’s full SQL Server connector docs (TDS and ADBC together), see docs.slingdata.io/connections/database-connections/sqlserver. Sling Platform handles the same workloads with a UI, scheduling, and observability on top — see slingdata.io/platform. For local CLI work past the free tier, slingdata.io/cli-pro covers higher concurrency and longer-running pipelines.


