GH-49420: [C++][Gandiva] Fix castVARCHAR memory allocation and len<=0 handling#49421
Open
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
Open
GH-49420: [C++][Gandiva] Fix castVARCHAR memory allocation and len<=0 handling#49421dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
Conversation
Author
|
Let me go through first-timer's hurdles (like pre-commit clang format failures). |
69b39cf to
61e5ec2
Compare
Author
|
Pushed clang-format fixes |
Author
|
Additional context from benchmarking expression evaluation for the scenario I was troubleshooting: LPAD+castVARCHAR
3-column expression with nested LPAD/CASE/CONCAT:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
The
castVARCHARfunctions in Gandiva have memory allocation inefficiencies and missing edge case handling. See GH-49420 for details.What changes are included in this PR?
Functional fixes:
bool: Remove unused 5-byte arena allocation; return string literal directlyint32/int64: Add handling forlen=0(return empty string) andlen<0(set error)Memory allocation optimizations:
int32/int64: Use stack buffer with digit-pair conversion, allocate onlymin(len, actual_size)bytesdate64: Allocate onlymin(len, 10)bytes upfront (output is always "YYYY-MM-DD")float32/float64: Allocate onlymin(len, 24)bytes upfront (max output length)Code cleanup:
Are these changes tested?
Yes. Added tests for
len=0andlen<0edge cases for int64, date64, float32, float64, and bool types. All existing Gandiva tests pass. Adhoc perfomance benchmarking was performed both via direct expression evaluation as well as via query execution via Dremio.Are there any user-facing changes?
No. Users will see reduced memory usage and proper error messages for invalid len parameter values.
Note: Error messages for negative
lenremain different between precompiled ("Output buffer length can't be negative") and interpreted ("Buffer length cannot be negative") code paths, preserving existing behavior.