Most Oracle-to-Snowflake migrations don’t fail on the easy stuff. NUMBER, VARCHAR2, DATE, those land cleanly. The trouble starts at the LOB column: a 40 KB CLOB, a BLOB holding a PDF or a serialized protobuf, an XMLTYPE that nobody on the team has actually queried in two years. These columns are where homegrown ETL pipelines tend to silently corrupt data. A UTF-8 decode applied to binary bytes. An XML payload truncated at 4000 chars. An NCLOB whose emoji rendered as ??. By the time anyone notices, the source has rolled forward and the bad rows are gone.
This guide walks through replicating Oracle’s special types into Snowflake with Sling: CLOB, NCLOB, BLOB, RAW, and XMLTYPE. We’ll cover the default mapping, when you need the RAWTOHEX escape hatch for true binary fidelity, and how to verify that every byte actually made it across. Every code block and log line below is from a real run.
Why LOB replication is hard
Oracle and Snowflake disagree on what bytes mean.
CLOBandNCLOBare character data. Oracle stores them in the database charset and the national charset respectively. Snowflake storesVARCHARas UTF-8. The same Unicode codepoint sequence will produce different byte hashes on each side.BLOBandRAWare binary data. Snowflake has a nativeBINARYtype that maps cleanly, but only if Sling tells Snowflake it’s binary. If those columns get mapped toVARCHAR, every byte goes through a UTF-8 decode, and any byte sequence that isn’t valid UTF-8 gets replaced withU+FFFD. Silent and irreversible.XMLTYPEis Oracle-specific. There is no Snowflake equivalent, so you have to project it to text on the source side.
Sling handles the default mapping for you. CLOB, NCLOB, and XMLTYPE go to VARCHAR on Snowflake. BLOB and RAW go to BINARY. The escape hatch (selecting RAWTOHEX(blob_col) in your source SQL) is there when you want to land binary as a readable hex string instead, or when you’re targeting a system that doesn’t have a binary type at all.
Prerequisites
# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash
# Windows (Powershell)
irm https://slingdata.io/install.ps1 | iex
# Python
pip install sling
sling --version
Oracle source connections also need the Oracle Instant Client installed on the machine running Sling. See the Oracle connection docs for the platform-specific install.
Setting up the connections
Either set the connections inline via environment variables, or add them to ~/.sling/env.yaml once and reuse them.
# ~/.sling/env.yaml
connections:
ORACLE:
type: oracle
host: your-oracle-host
port: 1521
user: your_user
password: your_password
service_name: your_service
SNOWFLAKE:
type: snowflake
account: your-account
user: your_user
password: your_password
database: your_database
schema: PUBLIC
warehouse: COMPUTE_WH
role: your_role
Test both before going further:
sling conns test ORACLE
sling conns test SNOWFLAKE
A source table with every type that’s likely to bite you
Build a small Oracle table covering the five types: two CLOBs of different sizes, an NCLOB with non-ASCII, a BLOB, a fixed-length RAW, and an XMLTYPE. Add a row of nulls too, since null handling is the other place LOB pipelines tend to break.
CREATE TABLE ORACLE.SLING_TEST_LOBS (
id NUMBER(10) PRIMARY KEY,
name VARCHAR2(100) NOT NULL,
clob_small CLOB,
clob_large CLOB,
nclob_col NCLOB,
blob_col BLOB,
raw_col RAW(16),
xml_col XMLTYPE
);
INSERT INTO ORACLE.SLING_TEST_LOBS VALUES (
1,
'row_one',
'small clob text - hello world',
RPAD('A', 8000, 'B'),
'unicode nclob café 中文 🚀',
UTL_RAW.CAST_TO_RAW('binary-as-text-payload'),
HEXTORAW('DEADBEEFCAFEBABE0011223344556677'),
XMLTYPE('<root><item id="1">Test</item><value>100</value></root>')
);
-- ... plus a row with a 40 KB CLOB and a row of NULLs
COMMIT;
Replication 1: the default mapping
The simplest possible Oracle-to-Snowflake replication. No source SQL, no column overrides, just point Sling at the table.
source: ORACLE
target: SNOWFLAKE
defaults:
mode: full-refresh
streams:
ORACLE.SLING_TEST_LOBS:
object: PUBLIC.SLING_TEST_LOBS_DIRECT
Run it:
sling run -r replication.yaml
Real output from the run:
INF Sling Replication | ORACLE -> SNOWFLAKE | ORACLE.SLING_TEST_LOBS
INF connecting to source database (oracle)
INF connecting to target database (snowflake)
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table "PUBLIC"."SLING_TEST_LOBS_DIRECT_TMP"
INF streaming data
INF created table "PUBLIC"."SLING_TEST_LOBS_DIRECT"
INF inserted 2003 rows into "PUBLIC"."SLING_TEST_LOBS_DIRECT" in 27 secs [73 r/s] [9.9 MB]
INF execution succeeded
That’s 2003 rows: 3 hand-crafted fidelity rows plus a stress payload of 2000 random rows generated inside Oracle. The full pipeline that produces this output lives in the sling-cli repo at tests/pipelines/p.31.oracle_special_types.yaml. Copy it as a starting point for your own verification harness.
Inspect the resulting Snowflake table:
sling conns exec SNOWFLAKE -q "describe table PUBLIC.SLING_TEST_LOBS_DIRECT"
The columns Sling lands by default:
| Oracle type | Snowflake type |
|---|---|
CLOB | VARCHAR |
NCLOB | VARCHAR |
XMLTYPE | VARCHAR |
BLOB | BINARY |
RAW(n) | BINARY |
This is the mapping you want. The two to double-check are BLOB and RAW. If you see a VARCHAR in their place, you’re on an older Sling build that mapped BLOB → text, and any non-UTF-8 byte in the payload will have been mangled by the time it hit Snowflake. Upgrade to a recent Sling, drop the target table, and re-run.
Replication 2: when you need hex on the wire
Sometimes you don’t want a BINARY column on Snowflake. Maybe the downstream consumer is a dashboard tool that doesn’t render binary nicely, or you’re staging into a system with no binary type at all. In that case, encode the binary as hex on the source side and let it land as plain text:
source: ORACLE
target: SNOWFLAKE
defaults:
mode: full-refresh
streams:
ORACLE.SLING_TEST_LOBS_HEX_VIEW:
object: PUBLIC.SLING_TEST_LOBS_HEX
sql: |-
SELECT
id,
name,
clob_small,
clob_large,
nclob_col,
RAWTOHEX(DBMS_LOB.SUBSTR(blob_col, 2000, 1)) AS blob_hex,
RAWTOHEX(raw_col) AS raw_hex,
t.xml_col.getClobVal() AS xml_text
FROM ORACLE.SLING_TEST_LOBS t
Two things to notice:
DBMS_LOB.SUBSTR(blob_col, 2000, 1)chunks the BLOB beforeRAWTOHEXbecauseRAWTOHEXon a full LOB will blow Oracle’sVARCHAR24000-byte limit. Bump 2000 up to the actual max size of your payload, or chunk into multiple columns if you exceed it.t.xml_col.getClobVal()is how you projectXMLTYPEto a regular character LOB. Thet.alias is mandatory. Oracle refuses the barexml_col.getClobVal()form.
Run:
INF Sling Replication | ORACLE -> SNOWFLAKE | ORACLE.SLING_TEST_LOBS_HEX_VIEW
INF inserted 2003 rows into "PUBLIC"."SLING_TEST_LOBS_HEX" in 18 secs [106 r/s] [10 MB]
INF execution succeeded
Verification: did every byte actually make it?
Row counts agreeing is necessary but not sufficient. For LOB columns you want a byte-level check. The pattern is: compute a hash on each side of the wire and compare.
For BLOB and RAW, the cleanest cross-database check is the hex representation. On Oracle:
SELECT
LOWER(RAWTOHEX(DBMS_LOB.SUBSTR(blob_col, 2000, 1))) AS hex_blob,
LOWER(RAWTOHEX(raw_col)) AS hex_raw
FROM ORACLE.SLING_TEST_LOBS
ORDER BY id;
On Snowflake, TO_VARCHAR against a BINARY column gives you uppercase hex. Lowercase it and the two sides line up:
SELECT
LOWER(TO_VARCHAR(BLOB_COL)) AS hex_blob,
LOWER(TO_VARCHAR(RAW_COL)) AS hex_raw
FROM PUBLIC.SLING_TEST_LOBS_DIRECT
ORDER BY ID;
Real output, side by side for our hand-crafted rows:
Oracle source:
row_one : hex_blob = 62696e6172792d61732d746578742d7061796c6f6164
hex_raw = deadbeefcafebabe0011223344556677
row_two : hex_blob = 616e6f74686572207061796c6f6164
hex_raw = 0102030405060708090a0b0c0d0e0f10
Snowflake DIRECT:
row_one : hex_blob = 62696e6172792d61732d746578742d7061796c6f6164
hex_raw = deadbeefcafebabe0011223344556677
row_two : hex_blob = 616e6f74686572207061796c6f6164
hex_raw = 0102030405060708090a0b0c0d0e0f10
Byte-for-byte equal. The same check on the 2000 random-row stress payload also matches. The per-column hex sums collapse to identical aggregates on each side:
Oracle source Snowflake DIRECT
SUM_LEN_BLOB 512000 512000
SUM_HEX_BLOB 2391745240069 2391745240069
SUM_LEN_RAW 32000 32000
SUM_HEX_RAW 2378812438710 2378812438710
For CLOB and XMLTYPE, the MD5 of the first 4000 chars works as a fingerprint:
Oracle Snowflake DIRECT
md5_clob_small (row1) eefd5f78... eefd5f78...
md5_clob_large (row1) dd21e48b... dd21e48b...
md5_xml (row1) 23541a37... 23541a37...
The one caveat: NCLOB
There’s one case where the hashes will not match across the wire, and it’s worth understanding before you flag it as a bug.
Oracle stores NCLOB in the national character set, which is typically AL16UTF16 (UTF-16). Snowflake stores all character data as UTF-8. Both encodings can represent the same Unicode codepoints, but the byte sequences differ. MD5("hello") against UTF-16 bytes is not the same hash as MD5("hello") against UTF-8 bytes. Same text, different byte hash.
The fair cross-database check for NCLOB is the codepoint count, not the byte hash:
-- Oracle: codepoint count
SELECT DBMS_LOB.GETLENGTH(nclob_col) FROM ORACLE.SLING_TEST_LOBS;
-- Snowflake: codepoint count
SELECT LENGTH(NCLOB_COL) FROM PUBLIC.SLING_TEST_LOBS_DIRECT;
For ASCII-only NCLOBs in our 2000-row stress payload, the codepoint sums match exactly. For our hand-crafted “unicode nclob café 中文 🚀” row, the rocket emoji surfaces a second wrinkle: Oracle counts it as 2 UTF-16 code units while Snowflake counts it as 1 Unicode codepoint, so the lengths differ by 1. The actual text is identical. You just have to pick which counting convention you trust, and document it for the team.
Common pitfalls
A few failure modes worth knowing about up front, all from the real lab.
BLOB landed as VARCHAR. Symptom: random characters where there should be hex, or visible replacement characters (?) in the data. Cause: older Sling versions mapped blob → text. Fix: upgrade Sling, drop the target column or table, re-run. If the column already exists in Snowflake typed as VARCHAR, Sling won’t auto-promote it. You have to either let Sling recreate the table (mode: full-refresh does this) or ALTER TABLE it yourself.
ORA-00904: invalid identifier on xml_col.getClobVal(). Cause: Oracle requires a table alias for XMLTYPE method calls. Always write FROM ORACLE.SLING_TEST_LOBS t and reference the column as t.xml_col.getClobVal().
ORA-06502: numeric or value error on RAWTOHEX(blob_col). Cause: the BLOB exceeds the 4000-byte VARCHAR2 cap. Wrap with DBMS_LOB.SUBSTR to chunk before hex-encoding, or land the BLOB as BINARY and skip the hex step entirely.
NCLOB hash mismatch with all ASCII content. Not a bug. Oracle UTF-16 vs Snowflake UTF-8 byte representation. Compare codepoint counts instead.
You actually want a BLOB column to land as text because it really does hold UTF-8 text payloads. Use the per-column override:
streams:
ORACLE.MY_TABLE:
columns:
my_blob: text
This pins the legacy mapping for that one column without affecting the rest of the table.
When to use which approach
- Binary semantics matter (PDFs, serialized objects, signed payloads): use the default mapping. Let
BLOBandRAWland asBINARY. Verify withLOWER(TO_VARCHAR(...)). - Downstream consumer doesn’t speak BINARY: use the
RAWTOHEXsource SQL pattern from replication 2. - BLOB column actually holds UTF-8 text: per-column
columns: { my_blob: text }override. - XMLTYPE on any target: project to a CLOB on the Oracle side with
t.xml_col.getClobVal(). Don’t rely on driver-level XMLTYPE handling.
Putting it together
For a production-shaped pipeline, the recipe is:
- Inventory the LOB columns in your source schema with
SELECT column_name, data_type FROM all_tab_columns WHERE owner = ... AND data_type IN ('CLOB', 'NCLOB', 'BLOB', 'RAW', 'XMLTYPE'). - Decide on default mapping versus hex-text per column, based on what your downstream consumers can ingest.
- Land the first replication into a staging schema with
mode: full-refresh. Run the byte-level verification described above against a representative sample of rows. - Promote to incremental mode once verification passes. For tables with LOBs, incremental on a
last_updatedcolumn is usually fine. Sling re-fetches the full LOB content per changed row, so there’s no partial-update concern. - Keep the verification queries around as a scheduled job. LOB drift is the kind of thing that goes unnoticed for weeks if nobody is looking.
For more on replication configuration and advanced options, see:
- Replication concepts
- Source options, including
sql:overrides and column typing - Target options:
add_new_columns,adjust_column_type,column_casing - Oracle connection details
- Snowflake connection details
Conclusion
LOB replication isn’t where most Oracle-to-Snowflake guides spend much time, which is why so many homegrown pipelines silently lose data on these columns. Sling’s default mapping handles the common cases without ceremony: BLOB and RAW to BINARY, character LOBs and XMLTYPE to VARCHAR. The RAWTOHEX source-SQL pattern covers the rest, and the verification queries above give you a way to prove the bytes actually made it across instead of trusting that they did.
If you’re migrating off Fivetran HVR or a similar batch tool and these columns were part of why you’re looking around, this is the path. Run it against your own schema, compare the hashes, and the LOB question stops being scary.


