feat: fused WideDecimalBinaryExpr for Decimal128 add/sub/mul#3619
Draft
andygrove wants to merge 3 commits intoapache:mainfrom
Draft
feat: fused WideDecimalBinaryExpr for Decimal128 add/sub/mul#3619andygrove wants to merge 3 commits intoapache:mainfrom
andygrove wants to merge 3 commits intoapache:mainfrom
Conversation
Replace the 4-node expression tree (Cast→BinaryExpr→Cast→Cast) used for Decimal128 arithmetic that may overflow with a single fused expression that performs i256 register arithmetic directly. This reduces per-batch allocation from 4 intermediate arrays (112 bytes/elem) to 1 output array (16 bytes/elem). The new WideDecimalBinaryExpr evaluates children, performs add/sub/mul using i256 intermediates via try_binary, applies scale adjustment with HALF_UP rounding, checks precision bounds, and outputs a single Decimal128 array. Follows the same pattern as decimal_div.
Add benchmark comparing old Cast->BinaryExpr->Cast chain vs fused WideDecimalBinaryExpr for Decimal128 add/sub/mul. Covers four cases: add with same scale, add with different scales, multiply, and subtract.
cb52636 to
d7495bd
Compare
Member
Author
|
@sqlbenchmark run tpch --iterations 3 |
1 similar comment
Member
Author
|
@sqlbenchmark run tpch --iterations 3 |
Eliminate redundant CheckOverflow when wrapping WideDecimalBinaryExpr (which already handles overflow). Fuse Cast(Decimal128→Decimal128) + CheckOverflow into a single DecimalRescaleCheckOverflow expression that rescales and validates precision in one pass.
5a21500 to
91092a6
Compare
Member
Author
|
@sqlbenchmark run tpch --iterations 3 |
|
Benchmark job |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cast(Decimal256→Decimal128, BinaryExpr(op, Cast(Decimal128→Decimal256, left), Cast(Decimal128→Decimal256, right)))) used for Decimal128 add/sub/mul that may overflow with a single fusedWideDecimalBinaryExprthat performs i256 register arithmetic directlyBenchmark results (8192 element batches)
How it works
WideDecimalBinaryExprevaluates left/right children, performs add/sub/mul usingi256intermediates viaarrow::compute::kernels::arity::try_binary, applies scale adjustment with HALF_UP rounding, checks precision bounds, and outputs a singleDecimal128array. Follows the same pattern asdecimal_divindiv.rs.Overflow handling matches existing behavior:
ArrowError::ComputeErrori128::MAXsentinel +null_if_overflow_precisionTest plan
cargo clippy --all-targets --workspace -- -D warningspassescargo testpassesCometExpressionSuite) pass unchanged