Most Oracle-to-Snowflake migrations don’t fail on the easy stuff. NUMBER, VARCHAR2, DATE, those land cleanly. The trouble starts at the LOB column: a 40 KB CLOB, a BLOB holding a PDF or a serialized protobuf, an XMLTYPE that nobody on the team has actually queried in two years. These columns are where homegrown ETL pipelines tend to silently corrupt data. A UTF-8 decode applied to binary bytes. An XML payload truncated at 4000 chars. An NCLOB whose emoji rendered as ??. By the time anyone notices, the source has rolled forward and the bad rows are gone.

This guide walks through replicating Oracle’s special types into Snowflake with Sling: CLOB, NCLOB, BLOB, RAW, and XMLTYPE. We’ll cover the default mapping, when you need the RAWTOHEX escape hatch for true binary fidelity, and how to verify that every byte actually made it across. Every code block and log line below is from a real run.

Why LOB replication is hard

Oracle and Snowflake disagree on what bytes mean.

CLOB and NCLOB are character data. Oracle stores them in the database charset and the national charset respectively. Snowflake stores VARCHAR as UTF-8. The same Unicode codepoint sequence will produce different byte hashes on each side.
BLOB and RAW are binary data. Snowflake has a native BINARY type that maps cleanly, but only if Sling tells Snowflake it’s binary. If those columns get mapped to VARCHAR, every byte goes through a UTF-8 decode, and any byte sequence that isn’t valid UTF-8 gets replaced with U+FFFD. Silent and irreversible.
XMLTYPE is Oracle-specific. There is no Snowflake equivalent, so you have to project it to text on the source side.

Sling handles the default mapping for you. CLOB, NCLOB, and XMLTYPE go to VARCHAR on Snowflake. BLOB and RAW go to BINARY. The escape hatch (selecting RAWTOHEX(blob_col) in your source SQL) is there when you want to land binary as a readable hex string instead, or when you’re targeting a system that doesn’t have a binary type at all.

Prerequisites

# macOS / Linux
curl -fsSL https://slingdata.io/install.sh | bash

# Windows (Powershell)
irm https://slingdata.io/install.ps1 | iex

# Python
pip install sling

sling --version

Oracle source connections also need the Oracle Instant Client installed on the machine running Sling. See the Oracle connection docs for the platform-specific install.

Setting up the connections

Either set the connections inline via environment variables, or add them to ~/.sling/env.yaml once and reuse them.

# ~/.sling/env.yaml
connections:
  ORACLE:
    type: oracle
    host: your-oracle-host
    port: 1521
    user: your_user
    password: your_password
    service_name: your_service

  SNOWFLAKE:
    type: snowflake
    account: your-account
    user: your_user
    password: your_password
    database: your_database
    schema: PUBLIC
    warehouse: COMPUTE_WH
    role: your_role

Test both before going further:

sling conns test ORACLE
sling conns test SNOWFLAKE

A source table with every type that’s likely to bite you

Build a small Oracle table covering the five types: two CLOBs of different sizes, an NCLOB with non-ASCII, a BLOB, a fixed-length RAW, and an XMLTYPE. Add a row of nulls too, since null handling is the other place LOB pipelines tend to break.

CREATE TABLE ORACLE.SLING_TEST_LOBS (
  id         NUMBER(10)    PRIMARY KEY,
  name       VARCHAR2(100) NOT NULL,
  clob_small CLOB,
  clob_large CLOB,
  nclob_col  NCLOB,
  blob_col   BLOB,
  raw_col    RAW(16),
  xml_col    XMLTYPE
);

INSERT INTO ORACLE.SLING_TEST_LOBS VALUES (
  1,
  'row_one',
  'small clob text - hello world',
  RPAD('A', 8000, 'B'),
  'unicode nclob café 中文 🚀',
  UTL_RAW.CAST_TO_RAW('binary-as-text-payload'),
  HEXTORAW('DEADBEEFCAFEBABE0011223344556677'),
  XMLTYPE('<root><item id="1">Test</item><value>100</value></root>')
);
-- ... plus a row with a 40 KB CLOB and a row of NULLs
COMMIT;

Replication 1: the default mapping

The simplest possible Oracle-to-Snowflake replication. No source SQL, no column overrides, just point Sling at the table.

source: ORACLE
target: SNOWFLAKE

defaults:
  mode: full-refresh

streams:
  ORACLE.SLING_TEST_LOBS:
    object: PUBLIC.SLING_TEST_LOBS_DIRECT

Run it:

sling run -r replication.yaml

Real output from the run:

INF Sling Replication | ORACLE -> SNOWFLAKE | ORACLE.SLING_TEST_LOBS
INF connecting to source database (oracle)
INF connecting to target database (snowflake)
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table "PUBLIC"."SLING_TEST_LOBS_DIRECT_TMP"
INF streaming data
INF created table "PUBLIC"."SLING_TEST_LOBS_DIRECT"
INF inserted 2003 rows into "PUBLIC"."SLING_TEST_LOBS_DIRECT" in 27 secs [73 r/s] [9.9 MB]
INF execution succeeded

That’s 2003 rows: 3 hand-crafted fidelity rows plus a stress payload of 2000 random rows generated inside Oracle. The full pipeline that produces this output lives in the sling-cli repo at tests/pipelines/p.31.oracle_special_types.yaml. Copy it as a starting point for your own verification harness.

Inspect the resulting Snowflake table:

sling conns exec SNOWFLAKE -q "describe table PUBLIC.SLING_TEST_LOBS_DIRECT"

The columns Sling lands by default:

Oracle type	Snowflake type
`CLOB`	`VARCHAR`
`NCLOB`	`VARCHAR`
`XMLTYPE`	`VARCHAR`
`BLOB`	`BINARY`
`RAW(n)`	`BINARY`

This is the mapping you want. The two to double-check are BLOB and RAW. If you see a VARCHAR in their place, you’re on an older Sling build that mapped BLOB → text, and any non-UTF-8 byte in the payload will have been mangled by the time it hit Snowflake. Upgrade to a recent Sling, drop the target table, and re-run.

Replication 2: when you need hex on the wire

Sometimes you don’t want a BINARY column on Snowflake. Maybe the downstream consumer is a dashboard tool that doesn’t render binary nicely, or you’re staging into a system with no binary type at all. In that case, encode the binary as hex on the source side and let it land as plain text:

source: ORACLE
target: SNOWFLAKE

defaults:
  mode: full-refresh

streams:
  ORACLE.SLING_TEST_LOBS_HEX_VIEW:
    object: PUBLIC.SLING_TEST_LOBS_HEX
    sql: |-
      SELECT
        id,
        name,
        clob_small,
        clob_large,
        nclob_col,
        RAWTOHEX(DBMS_LOB.SUBSTR(blob_col, 2000, 1)) AS blob_hex,
        RAWTOHEX(raw_col)                            AS raw_hex,
        t.xml_col.getClobVal()                       AS xml_text
      FROM ORACLE.SLING_TEST_LOBS t

Two things to notice:

DBMS_LOB.SUBSTR(blob_col, 2000, 1) chunks the BLOB before RAWTOHEX because RAWTOHEX on a full LOB will blow Oracle’s VARCHAR2 4000-byte limit. Bump 2000 up to the actual max size of your payload, or chunk into multiple columns if you exceed it.
t.xml_col.getClobVal() is how you project XMLTYPE to a regular character LOB. The t. alias is mandatory. Oracle refuses the bare xml_col.getClobVal() form.

Run:

INF Sling Replication | ORACLE -> SNOWFLAKE | ORACLE.SLING_TEST_LOBS_HEX_VIEW
INF inserted 2003 rows into "PUBLIC"."SLING_TEST_LOBS_HEX" in 18 secs [106 r/s] [10 MB]
INF execution succeeded

Verification: did every byte actually make it?

Row counts agreeing is necessary but not sufficient. For LOB columns you want a byte-level check. The pattern is: compute a hash on each side of the wire and compare.

For BLOB and RAW, the cleanest cross-database check is the hex representation. On Oracle:

SELECT
  LOWER(RAWTOHEX(DBMS_LOB.SUBSTR(blob_col, 2000, 1))) AS hex_blob,
  LOWER(RAWTOHEX(raw_col))                            AS hex_raw
FROM ORACLE.SLING_TEST_LOBS
ORDER BY id;

On Snowflake, TO_VARCHAR against a BINARY column gives you uppercase hex. Lowercase it and the two sides line up:

SELECT
  LOWER(TO_VARCHAR(BLOB_COL)) AS hex_blob,
  LOWER(TO_VARCHAR(RAW_COL))  AS hex_raw
FROM PUBLIC.SLING_TEST_LOBS_DIRECT
ORDER BY ID;

Real output, side by side for our hand-crafted rows:

Oracle source:
  row_one : hex_blob = 62696e6172792d61732d746578742d7061796c6f6164
            hex_raw  = deadbeefcafebabe0011223344556677
  row_two : hex_blob = 616e6f74686572207061796c6f6164
            hex_raw  = 0102030405060708090a0b0c0d0e0f10

Snowflake DIRECT:
  row_one : hex_blob = 62696e6172792d61732d746578742d7061796c6f6164
            hex_raw  = deadbeefcafebabe0011223344556677
  row_two : hex_blob = 616e6f74686572207061796c6f6164
            hex_raw  = 0102030405060708090a0b0c0d0e0f10

Byte-for-byte equal. The same check on the 2000 random-row stress payload also matches. The per-column hex sums collapse to identical aggregates on each side:

                       Oracle source     Snowflake DIRECT
SUM_LEN_BLOB           512000            512000
SUM_HEX_BLOB           2391745240069     2391745240069
SUM_LEN_RAW            32000             32000
SUM_HEX_RAW            2378812438710     2378812438710

For CLOB and XMLTYPE, the MD5 of the first 4000 chars works as a fingerprint:

                       Oracle             Snowflake DIRECT
md5_clob_small (row1)  eefd5f78...        eefd5f78...
md5_clob_large (row1)  dd21e48b...        dd21e48b...
md5_xml         (row1) 23541a37...        23541a37...

The one caveat: NCLOB

There’s one case where the hashes will not match across the wire, and it’s worth understanding before you flag it as a bug.

Oracle stores NCLOB in the national character set, which is typically AL16UTF16 (UTF-16). Snowflake stores all character data as UTF-8. Both encodings can represent the same Unicode codepoints, but the byte sequences differ. MD5("hello") against UTF-16 bytes is not the same hash as MD5("hello") against UTF-8 bytes. Same text, different byte hash.

The fair cross-database check for NCLOB is the codepoint count, not the byte hash:

-- Oracle: codepoint count
SELECT DBMS_LOB.GETLENGTH(nclob_col) FROM ORACLE.SLING_TEST_LOBS;

-- Snowflake: codepoint count
SELECT LENGTH(NCLOB_COL) FROM PUBLIC.SLING_TEST_LOBS_DIRECT;

For ASCII-only NCLOBs in our 2000-row stress payload, the codepoint sums match exactly. For our hand-crafted “unicode nclob café 中文 🚀” row, the rocket emoji surfaces a second wrinkle: Oracle counts it as 2 UTF-16 code units while Snowflake counts it as 1 Unicode codepoint, so the lengths differ by 1. The actual text is identical. You just have to pick which counting convention you trust, and document it for the team.

Common pitfalls

A few failure modes worth knowing about up front, all from the real lab.

BLOB landed as VARCHAR. Symptom: random characters where there should be hex, or visible replacement characters (?) in the data. Cause: older Sling versions mapped blob → text. Fix: upgrade Sling, drop the target column or table, re-run. If the column already exists in Snowflake typed as VARCHAR, Sling won’t auto-promote it. You have to either let Sling recreate the table (mode: full-refresh does this) or ALTER TABLE it yourself.

ORA-00904: invalid identifier on xml_col.getClobVal(). Cause: Oracle requires a table alias for XMLTYPE method calls. Always write FROM ORACLE.SLING_TEST_LOBS t and reference the column as t.xml_col.getClobVal().

ORA-06502: numeric or value error on RAWTOHEX(blob_col). Cause: the BLOB exceeds the 4000-byte VARCHAR2 cap. Wrap with DBMS_LOB.SUBSTR to chunk before hex-encoding, or land the BLOB as BINARY and skip the hex step entirely.

NCLOB hash mismatch with all ASCII content. Not a bug. Oracle UTF-16 vs Snowflake UTF-8 byte representation. Compare codepoint counts instead.

You actually want a BLOB column to land as text because it really does hold UTF-8 text payloads. Use the per-column override:

streams:
  ORACLE.MY_TABLE:
    columns:
      my_blob: text

This pins the legacy mapping for that one column without affecting the rest of the table.

When to use which approach

Binary semantics matter (PDFs, serialized objects, signed payloads): use the default mapping. Let BLOB and RAW land as BINARY. Verify with LOWER(TO_VARCHAR(...)).
Downstream consumer doesn’t speak BINARY: use the RAWTOHEX source SQL pattern from replication 2.
BLOB column actually holds UTF-8 text: per-column columns: { my_blob: text } override.
XMLTYPE on any target: project to a CLOB on the Oracle side with t.xml_col.getClobVal(). Don’t rely on driver-level XMLTYPE handling.

Putting it together

For a production-shaped pipeline, the recipe is:

Inventory the LOB columns in your source schema with SELECT column_name, data_type FROM all_tab_columns WHERE owner = ... AND data_type IN ('CLOB', 'NCLOB', 'BLOB', 'RAW', 'XMLTYPE').
Decide on default mapping versus hex-text per column, based on what your downstream consumers can ingest.
Land the first replication into a staging schema with mode: full-refresh. Run the byte-level verification described above against a representative sample of rows.
Promote to incremental mode once verification passes. For tables with LOBs, incremental on a last_updated column is usually fine. Sling re-fetches the full LOB content per changed row, so there’s no partial-update concern.
Keep the verification queries around as a scheduled job. LOB drift is the kind of thing that goes unnoticed for weeks if nobody is looking.

For more on replication configuration and advanced options, see:

Replication concepts
Source options, including sql: overrides and column typing
Target options: add_new_columns, adjust_column_type, column_casing
Oracle connection details
Snowflake connection details

Conclusion

LOB replication isn’t where most Oracle-to-Snowflake guides spend much time, which is why so many homegrown pipelines silently lose data on these columns. Sling’s default mapping handles the common cases without ceremony: BLOB and RAW to BINARY, character LOBs and XMLTYPE to VARCHAR. The RAWTOHEX source-SQL pattern covers the rest, and the verification queries above give you a way to prove the bytes actually made it across instead of trusting that they did.

If you’re migrating off Fivetran HVR or a similar batch tool and these columns were part of why you’re looking around, this is the path. Run it against your own schema, compare the hashes, and the LOB question stops being scary.

Migrating Oracle LOB, CLOB, BLOB and XMLTYPE to Snowflake with Sling