Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GH-49420: [C++][Gandiva] Fix castVARCHAR memory allocation and len<=0 handling#49421

Open
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
dmitry-chirkov-dremio:gandiva-castvarchar-optimization
Open

GH-49420: [C++][Gandiva] Fix castVARCHAR memory allocation and len<=0 handling#49421
dmitry-chirkov-dremio wants to merge 2 commits intoapache:mainfrom
dmitry-chirkov-dremio:gandiva-castvarchar-optimization

Conversation

@dmitry-chirkov-dremio
Copy link

@dmitry-chirkov-dremio dmitry-chirkov-dremio commented Mar 2, 2026

Rationale for this change

The castVARCHAR functions in Gandiva have memory allocation inefficiencies and missing edge case handling. See GH-49420 for details.

What changes are included in this PR?

Functional fixes:

  • bool: Remove unused 5-byte arena allocation; return string literal directly
  • int32/int64: Add handling for len=0 (return empty string) and len<0 (set error)

Memory allocation optimizations:

  • int32/int64: Use stack buffer with digit-pair conversion, allocate only min(len, actual_size) bytes
  • date64: Allocate only min(len, 10) bytes upfront (output is always "YYYY-MM-DD")
  • float32/float64: Allocate only min(len, 24) bytes upfront (max output length)

Code cleanup:

  • Extract common code into helper macros to reduce duplication

Are these changes tested?

Yes. Added tests for len=0 and len<0 edge cases for int64, date64, float32, float64, and bool types. All existing Gandiva tests pass. Adhoc perfomance benchmarking was performed both via direct expression evaluation as well as via query execution via Dremio.

Are there any user-facing changes?

No. Users will see reduced memory usage and proper error messages for invalid len parameter values.
Note: Error messages for negative len remain different between precompiled ("Output buffer length can't be negative") and interpreted ("Buffer length cannot be negative") code paths, preserving existing behavior.

@dmitry-chirkov-dremio
Copy link
Author

dmitry-chirkov-dremio commented Mar 2, 2026

Let me go through first-timer's hurdles (like pre-commit clang format failures).
GH-49347 for 'aws/core/utils/pagination/Paginator.h' file not found

@dmitry-chirkov-dremio dmitry-chirkov-dremio force-pushed the gandiva-castvarchar-optimization branch from 69b39cf to 61e5ec2 Compare March 2, 2026 14:53
@dmitry-chirkov-dremio
Copy link
Author

Pushed clang-format fixes

@dmitry-chirkov-dremio
Copy link
Author

Additional context from benchmarking expression evaluation for the scenario I was troubleshooting:

LPAD+castVARCHAR

  • Before: LPAD(castVARCHAR(int)) = 1.85x overhead vs LPAD(string)
  • After: LPAD(castVARCHAR(int)) = 1.59x overhead vs LPAD(string)
  • Improvement: ~14% reduction in castVARCHAR overhead

3-column expression with nested LPAD/CASE/CONCAT:

  • Before: ~1350 ms
  • After: ~857 ms
  • Improvement: ~37% faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant