-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@cursor review |
|
@cursor review |
…stribution table in share data mode. Why I'm doing: In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is bad for range-distribution table in share data mode. What I'm doing: We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code. V1: encoding as the previous way. v2: always encoding as big-endian way, Currently, we only support V2 encoding type for range-distribution table in share data mode. Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
|
@cursor review |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 47 / 58 (81.03%) file detail
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| } | ||
| auto clone_pk_column = _pk_column->clone_empty(); | ||
| TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get())); | ||
| TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get(), pk_encoding_type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached column type may mismatch encoding type
Low Severity
The _pk_column member is cached (created only when null at line 42-49), but pk_encoding_type is re-fetched on every call via tablet.primary_key_encoding_type(). Since each call creates a new Tablet object, if metadata changes between calls, the cached _pk_column type (e.g., Int32Column for V1) could mismatch the new encoding type (V2 expects BinaryColumn). The encode() function has a DCHECK at line 430 of primary_key_encoder.cpp that would fail in debug builds; release builds would have undefined behavior.
Why I'm doing:
In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is bad for range-distribution table in share data mode.
What I'm doing:
We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code. ORIGINAL type: encoding as the previous way.
BIG_ENDIAN type: always encoding as big-endian way, Currently, we only support BIG_ENDIAN encoding type for range-distribution table in share data mode.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: