Codestin Search App

andygrove · 2026-03-02T17:53:38Z

Summary

Replace the 4-node expression tree (Cast(Decimal256→Decimal128, BinaryExpr(op, Cast(Decimal128→Decimal256, left), Cast(Decimal128→Decimal256, right)))) used for Decimal128 add/sub/mul that may overflow with a single fused WideDecimalBinaryExpr that performs i256 register arithmetic directly
Reduces per-batch allocation from 4 intermediate arrays (3 Decimal256 @ 32 bytes/elem + 1 Decimal128 @ 16 bytes/elem = 112 bytes/elem) to 1 output array (16 bytes/elem)
Add criterion benchmark comparing old vs fused approach

Benchmark results (8192 element batches)

Case	Old	Fused	Speedup
add (same scale)	171 µs	57 µs	3.0x
add (diff scale)	173 µs	57 µs	3.0x
multiply	361 µs	305 µs	1.2x
subtract	173 µs	58 µs	3.0x

How it works

WideDecimalBinaryExpr evaluates left/right children, performs add/sub/mul using i256 intermediates via arrow::compute::kernels::arity::try_binary, applies scale adjustment with HALF_UP rounding, checks precision bounds, and outputs a single Decimal128 array. Follows the same pattern as decimal_div in div.rs.

Overflow handling matches existing behavior:

Ansi mode: returns ArrowError::ComputeError
Legacy/Try mode: uses i128::MAX sentinel + null_if_overflow_precision

Test plan

11 new Rust unit tests (add/sub/mul same/different scales, HALF_UP rounding, overflow in both modes, null propagation, edge cases)
cargo clippy --all-targets --workspace -- -D warnings passes
cargo test passes
Existing JVM tests (CometExpressionSuite) pass unchanged

Replace the 4-node expression tree (Cast→BinaryExpr→Cast→Cast) used for Decimal128 arithmetic that may overflow with a single fused expression that performs i256 register arithmetic directly. This reduces per-batch allocation from 4 intermediate arrays (112 bytes/elem) to 1 output array (16 bytes/elem). The new WideDecimalBinaryExpr evaluates children, performs add/sub/mul using i256 intermediates via try_binary, applies scale adjustment with HALF_UP rounding, checks precision bounds, and outputs a single Decimal128 array. Follows the same pattern as decimal_div.

Add benchmark comparing old Cast->BinaryExpr->Cast chain vs fused WideDecimalBinaryExpr for Decimal128 add/sub/mul. Covers four cases: add with same scale, add with different scales, multiply, and subtract.

andygrove · 2026-03-02T17:55:52Z

@sqlbenchmark run tpch --iterations 3

andygrove · 2026-03-02T18:44:22Z

@sqlbenchmark run tpch --iterations 3

Eliminate redundant CheckOverflow when wrapping WideDecimalBinaryExpr (which already handles overflow). Fuse Cast(Decimal128→Decimal128) + CheckOverflow into a single DecimalRescaleCheckOverflow expression that rescales and validates precision in one pass.

andygrove · 2026-03-02T20:49:24Z

@sqlbenchmark run tpch --iterations 3

sqlbenchmark · 2026-03-02T21:47:04Z

Benchmark job comet-pr-3619-c3986837690 failed due to an error.

andygrove added 2 commits March 2, 2026 10:55

feat: add criterion benchmark for wide decimal binary expr

d7495bd

Add benchmark comparing old Cast->BinaryExpr->Cast chain vs fused WideDecimalBinaryExpr for Decimal128 add/sub/mul. Covers four cases: add with same scale, add with different scales, multiply, and subtract.

andygrove force-pushed the wide-decimal-binary-expr branch from cb52636 to d7495bd Compare March 2, 2026 17:55

andygrove force-pushed the wide-decimal-binary-expr branch from 5a21500 to 91092a6 Compare March 2, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fused WideDecimalBinaryExpr for Decimal128 add/sub/mul#3619

feat: fused WideDecimalBinaryExpr for Decimal128 add/sub/mul#3619
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:wide-decimal-binary-expr

andygrove commented Mar 2, 2026

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

sqlbenchmark commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented Mar 2, 2026

Summary

Benchmark results (8192 element batches)

How it works

Test plan

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

andygrove commented Mar 2, 2026

Uh oh!

sqlbenchmark commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants