Codestin Search App

DAlperin · 2026-05-11T02:13:51Z

Follow up to some of the complicated type work.

Three runtime errors surface when a parallel-workload run drives an Iceberg sink with a wider mix of column types:

"Datum Int16 does not match builder Int32Builder" — Iceberg has no smallint, so the writer context arrow schema (derived from the iceberg schema) uses Int32 while Materialize rows still carry Datum::Int16. Add a lossless Int16 -> Int32 promotion in ArrowColumn::append_datum, mirroring the existing UInt16 -> Int32 case.
"Field 'value' missing extension metadata" — Materialize names map fields entries/keys/values, but iceberg-rust's arrow conversion names them key_value/key/value. merge_field_metadata_recursive matched by name and silently dropped the value field's extension metadata, so ArrowBuilder later failed when constructing the inner builder. Match the map entries struct positionally instead.
"Failed to create EqualityDeleteWriterConfig: field_id N not found" — the planner accepted Range types as Iceberg equality delete keys, but ranges lower into Iceberg structs and iceberg-rust's RecordBatchProjector skips nested fields, so the equality field id is unreachable at runtime. Drop Range from the allow-list so the failure is caught at sink creation instead.

Fixes SS-144, SS-143, SS-142

Three runtime errors surface when a parallel-workload run drives an Iceberg sink with a wider mix of column types: 1. "Datum Int16 does not match builder Int32Builder" — Iceberg has no smallint, so the writer context arrow schema (derived from the iceberg schema) uses Int32 while Materialize rows still carry Datum::Int16. Add a lossless Int16 -> Int32 promotion in ArrowColumn::append_datum, mirroring the existing UInt16 -> Int32 case. 2. "Field 'value' missing extension metadata" — Materialize names map fields entries/keys/values, but iceberg-rust's arrow conversion names them key_value/key/value. merge_field_metadata_recursive matched by name and silently dropped the value field's extension metadata, so ArrowBuilder later failed when constructing the inner builder. Match the map entries struct positionally instead. 3. "Failed to create EqualityDeleteWriterConfig: field_id N not found" — the planner accepted Range types as Iceberg equality delete keys, but ranges lower into Iceberg structs and iceberg-rust's RecordBatchProjector skips nested fields, so the equality field id is unreachable at runtime. Drop Range from the allow-list so the failure is caught at sink creation instead.

Explain why positional matching in `merge_map_entries_metadata` is correct by citing the Arrow `Schema.fbs` definition of `Map` as `List<entries: Struct<key, value>>` with non-enforced field names.

Pins the three runtime failures fixed in the previous commit: - `merge_map_entries_preserves_value_extension_metadata`: unit test that builds a materialize-shaped map field (entries/keys/values) and an iceberg-shaped one (key_value/key/value) and asserts the merge copies the value field's extension metadata positionally. - `test/iceberg/key-validation.td`: adds a Range key rejection block alongside the existing Map/List rejections. - `test/iceberg/catalog.td`: adds smallint and map[text=>text] sinks exercising the Int16->Int32 promotion and the entries-struct metadata merge end-to-end against a real Iceberg catalog.

DuckDB's iceberg_scan returns 0 rows for map-valued tables in the versions we test against, so the round-trip via map_keys/map_values was not actually exercising the metadata merge — the assertion just failed with no actionable signal. Check mz_sink_statuses for `running` instead: without `merge_map_entries_metadata` the sink stalls with "Field 'value' missing extension metadata" during ArrowBuilder construction, which is exactly the regression we want to pin.

builder_for_datatype was hard-coding MapFieldNames::default() (entries/keys/values) when constructing the MapBuilder, regardless of what the surrounding Schema actually said. For Iceberg the schema's map fields are key_value/key/value (preserved by merge_map_entries_metadata), so the resulting MapArray's nested DataType disagreed with the schema and RecordBatch::try_new rejected every row — the sink stalled silently and iceberg_scan saw an empty table. Read entry/key/value names off the schema's entries struct so the MapArray matches whichever convention the caller chose. COPY TO S3 PARQUET keeps building its schema with the arrow-rs defaults, so its output is unchanged. Also restores the DuckDB iceberg_scan assertion on the map_table sink in catalog.td now that the round-trip actually works.

def-

Thanks for adding tests. I will rebase my iceberg sink test in parallel-workload on top of this and #36499 and report if anything else falls over. Edit: Just noticed you made it draft again, so I'll wait it's un-drafted before doing that.

…field

…r sorts last

def- · 2026-05-11T15:35:03Z

New failure in https://buildkite.com/materialize/nightly/builds/16384:

Sinks in stalled state: icesink-18: iceberg: failed to convert row to recordbatch: Failed to add insert row to builder: cannot represent decimal value 494699.19575454644 in column with scale 10

claude added 3 commits May 11, 2026 01:33

iceberg: link Arrow Map spec from positional-merge comment

57c22b1

Explain why positional matching in `merge_map_entries_metadata` is correct by citing the Arrow `Schema.fbs` definition of `Map` as `List<entries: Struct<key, value>>` with non-enforced field names.

DAlperin requested a review from def- May 11, 2026 02:13

DAlperin requested review from a team as code owners May 11, 2026 02:13

claude added 2 commits May 11, 2026 04:28

iceberg: drop useless as_ref on Option<&Field>

d3f67cd

DAlperin marked this pull request as draft May 11, 2026 05:16

def- reviewed May 11, 2026

View reviewed changes

DAlperin added 2 commits May 11, 2026 08:38

arrow-util: forward key field metadata in MapBuilder, not just value …

1ac2724

…field

ci/testdrive: pick libduckdb.so matching the image arch, not whicheve…

da617af

…r sorts last

DAlperin marked this pull request as ready for review May 11, 2026 14:44

DAlperin requested a review from a team as a code owner May 11, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix row to recordbatch conversion errors#36500

Fix row to recordbatch conversion errors#36500
DAlperin wants to merge 8 commits into
mainfrom
claude/fix-recordbatch-conversion-tFyg9

DAlperin commented May 11, 2026

Uh oh!

def- left a comment •

edited

Loading

Uh oh!

def- commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DAlperin commented May 11, 2026

Uh oh!

def- left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

def- commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

def- left a comment •

edited

Loading