Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix row to recordbatch conversion errors#36500

Open
DAlperin wants to merge 8 commits into
mainfrom
claude/fix-recordbatch-conversion-tFyg9
Open

Fix row to recordbatch conversion errors#36500
DAlperin wants to merge 8 commits into
mainfrom
claude/fix-recordbatch-conversion-tFyg9

Conversation

@DAlperin
Copy link
Copy Markdown
Member

Follow up to some of the complicated type work.

Three runtime errors surface when a parallel-workload run drives an Iceberg sink with a wider mix of column types:

  1. "Datum Int16 does not match builder Int32Builder" — Iceberg has no smallint, so the writer context arrow schema (derived from the iceberg schema) uses Int32 while Materialize rows still carry Datum::Int16. Add a lossless Int16 -> Int32 promotion in ArrowColumn::append_datum, mirroring the existing UInt16 -> Int32 case.

  2. "Field 'value' missing extension metadata" — Materialize names map fields entries/keys/values, but iceberg-rust's arrow conversion names them key_value/key/value. merge_field_metadata_recursive matched by name and silently dropped the value field's extension metadata, so ArrowBuilder later failed when constructing the inner builder. Match the map entries struct positionally instead.

  3. "Failed to create EqualityDeleteWriterConfig: field_id N not found" — the planner accepted Range types as Iceberg equality delete keys, but ranges lower into Iceberg structs and iceberg-rust's RecordBatchProjector skips nested fields, so the equality field id is unreachable at runtime. Drop Range from the allow-list so the failure is caught at sink creation instead.

Fixes SS-144, SS-143, SS-142

claude added 3 commits May 11, 2026 01:33
Three runtime errors surface when a parallel-workload run drives an
Iceberg sink with a wider mix of column types:

1. "Datum Int16 does not match builder Int32Builder" — Iceberg has no
   smallint, so the writer context arrow schema (derived from the iceberg
   schema) uses Int32 while Materialize rows still carry Datum::Int16.
   Add a lossless Int16 -> Int32 promotion in ArrowColumn::append_datum,
   mirroring the existing UInt16 -> Int32 case.

2. "Field 'value' missing extension metadata" — Materialize names map
   fields entries/keys/values, but iceberg-rust's arrow conversion names
   them key_value/key/value. merge_field_metadata_recursive matched by
   name and silently dropped the value field's extension metadata, so
   ArrowBuilder later failed when constructing the inner builder. Match
   the map entries struct positionally instead.

3. "Failed to create EqualityDeleteWriterConfig: field_id N not found" —
   the planner accepted Range types as Iceberg equality delete keys, but
   ranges lower into Iceberg structs and iceberg-rust's
   RecordBatchProjector skips nested fields, so the equality field id is
   unreachable at runtime. Drop Range from the allow-list so the failure
   is caught at sink creation instead.
Explain why positional matching in `merge_map_entries_metadata` is correct
by citing the Arrow `Schema.fbs` definition of `Map` as
`List<entries: Struct<key, value>>` with non-enforced field names.
Pins the three runtime failures fixed in the previous commit:

- `merge_map_entries_preserves_value_extension_metadata`: unit test that
  builds a materialize-shaped map field (entries/keys/values) and an
  iceberg-shaped one (key_value/key/value) and asserts the merge copies
  the value field's extension metadata positionally.
- `test/iceberg/key-validation.td`: adds a Range key rejection block
  alongside the existing Map/List rejections.
- `test/iceberg/catalog.td`: adds smallint and map[text=>text] sinks
  exercising the Int16->Int32 promotion and the entries-struct metadata
  merge end-to-end against a real Iceberg catalog.
@DAlperin DAlperin requested a review from def- May 11, 2026 02:13
@DAlperin DAlperin requested review from a team as code owners May 11, 2026 02:13
claude added 2 commits May 11, 2026 04:28
DuckDB's iceberg_scan returns 0 rows for map-valued tables in the
versions we test against, so the round-trip via map_keys/map_values was
not actually exercising the metadata merge — the assertion just failed
with no actionable signal. Check mz_sink_statuses for `running` instead:
without `merge_map_entries_metadata` the sink stalls with "Field 'value'
missing extension metadata" during ArrowBuilder construction, which is
exactly the regression we want to pin.
@DAlperin DAlperin marked this pull request as draft May 11, 2026 05:16
builder_for_datatype was hard-coding MapFieldNames::default()
(entries/keys/values) when constructing the MapBuilder, regardless of
what the surrounding Schema actually said. For Iceberg the schema's map
fields are key_value/key/value (preserved by merge_map_entries_metadata),
so the resulting MapArray's nested DataType disagreed with the schema and
RecordBatch::try_new rejected every row — the sink stalled silently and
iceberg_scan saw an empty table.

Read entry/key/value names off the schema's entries struct so the
MapArray matches whichever convention the caller chose. COPY TO S3
PARQUET keeps building its schema with the arrow-rs defaults, so its
output is unchanged.

Also restores the DuckDB iceberg_scan assertion on the map_table sink in
catalog.td now that the round-trip actually works.
Copy link
Copy Markdown
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests. I will rebase my iceberg sink test in parallel-workload on top of this and #36499 and report if anything else falls over. Edit: Just noticed you made it draft again, so I'll wait it's un-drafted before doing that.

@DAlperin DAlperin marked this pull request as ready for review May 11, 2026 14:44
@DAlperin DAlperin requested a review from a team as a code owner May 11, 2026 14:44
@def-
Copy link
Copy Markdown
Contributor

def- commented May 11, 2026

New failure in https://buildkite.com/materialize/nightly/builds/16384:

Sinks in stalled state: icesink-18: iceberg: failed to convert row to recordbatch: Failed to add insert row to builder: cannot represent decimal value 494699.19575454644 in column with scale 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants