perf: Optimize lpad, rpad for scalar args#20657
Open
neilconway wants to merge 2 commits intoapache:mainfrom
Open
perf: Optimize lpad, rpad for scalar args#20657neilconway wants to merge 2 commits intoapache:mainfrom
neilconway wants to merge 2 commits intoapache:mainfrom
Conversation
Contributor
Author
|
Benchmarks: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
lpadandrpadare commonly called with constant (scalar) target length and fill arguments, e.g.lpad(column, 20, '0'). We can special-case this scenario to improve performance by avoiding the overhead ofmake_scalar_function, and also by precomputing the padding buffer and reusing it for each row.For scalar args, this improves performance by ~65% for ASCII inputs and ~41% for Unicode inputs.
What changes are included in this PR?
lpadandrpadthat precomputes a padding buffer. We only use the fast path if the requested pad length is reasonably small (<= 8KB), to avoid using too much memory on a scratch buffer.try_as_scalar_strandtry_as_scalar_i64helpers.target_lenconsistently instead oflength, because the latter is ambiguous.rpadandlpadmore similar by removing needless and probably unintended differences between the two functions. We could go further and refactor them to remove the redundancy but I won't attempt that for now.Are these changes tested?
Yes; covered by existing tests. Added new benchmarks.
Are there any user-facing changes?
No.
AI usage
Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code.