Codestin Search App

viirya · 2026-03-03T00:28:52Z

What changes were proposed in this pull request?

ArrowWriter.sizeInBytes() and SliceBytesArrowOutputProcessorImpl .getBatchBytes() both accumulated per-column buffer sizes (each an Int) into an Int accumulator. When the total exceeds 2 GB the sum silently wraps negative, causing the byte-limit checks controlled by spark.sql.execution.arrow.maxBytesPerBatch and
spark.sql.execution.arrow.maxBytesPerOutputBatch to behave incorrectly and potentially allow oversized batches through.

Fix by changing both accumulators and return types to Long.

Why are the changes needed?

Fix possible overflow when calculating Arrow batch bytes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6 [email protected]

viirya · 2026-03-03T00:35:08Z

This issue is reported by @sunchao.

sunchao · 2026-03-03T00:41:56Z

Thanks @viirya !

dongjoon-hyun

+1, LGTM.

`ArrowWriter.sizeInBytes()` and `SliceBytesArrowOutputProcessorImpl .getBatchBytes()` both accumulated per-column buffer sizes (each an `Int`) into an `Int` accumulator. When the total exceeds 2 GB the sum silently wraps negative, causing the byte-limit checks controlled by `spark.sql.execution.arrow.maxBytesPerBatch` and `spark.sql.execution.arrow.maxBytesPerOutputBatch` to behave incorrectly and potentially allow oversized batches through. Fix by changing both accumulators and return types to `Long`. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

HyukjinKwon approved these changes Mar 3, 2026

View reviewed changes

sunchao approved these changes Mar 3, 2026

View reviewed changes

zhengruifeng approved these changes Mar 3, 2026

View reviewed changes

dongjoon-hyun approved these changes Mar 3, 2026

View reviewed changes

viirya force-pushed the fix-arrow-batch-bytes-overflow branch from 6a9089f to edca3d5 Compare March 3, 2026 03:10

yaooqinn approved these changes Mar 3, 2026

View reviewed changes

viirya force-pushed the fix-arrow-batch-bytes-overflow branch from edca3d5 to 8611e31 Compare March 3, 2026 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55802][SQL] Fix integer overflow when computing Arrow batch bytes#54584

[SPARK-55802][SQL] Fix integer overflow when computing Arrow batch bytes#54584
viirya wants to merge 1 commit intoapache:masterfrom
viirya:fix-arrow-batch-bytes-overflow

viirya commented Mar 3, 2026

Uh oh!

viirya commented Mar 3, 2026

Uh oh!

sunchao commented Mar 3, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

viirya commented Mar 3, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

viirya commented Mar 3, 2026

Uh oh!

sunchao commented Mar 3, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants