Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@srlch
Copy link
Contributor

@srlch srlch commented Jan 21, 2026

Why I'm doing:

In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is bad for range-distribution table in share data mode.

What I'm doing:

We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code. ORIGINAL type: encoding as the previous way.
BIG_ENDIAN type: always encoding as big-endian way, Currently, we only support BIG_ENDIAN encoding type for range-distribution table in share data mode.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@srlch srlch requested review from a team as code owners January 21, 2026 02:17
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot assigned srlch Jan 21, 2026
@github-actions github-actions bot added the 4.1 label Jan 21, 2026
@srlch srlch requested a review from a team as a code owner January 21, 2026 07:08
@StarRocks-Reviewer
Copy link

@cursor review

@StarRocks-Reviewer
Copy link

@cursor review

srlch added 2 commits January 26, 2026 17:04
…stribution table in share data mode.

Why I'm doing:
In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only
one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is
bad for range-distribution table in share data mode.

What I'm doing:
We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code.
V1: encoding as the previous way.
v2: always encoding as big-endian way,
Currently, we only support V2 encoding type for range-distribution table in share data mode.

Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
@StarRocks-Reviewer
Copy link

@cursor review

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 47 / 58 (81.03%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/storage/local_tablet_reader.cpp 0 1 00.00% [119]
🔵 src/storage/primary_key_encoder.cpp 11 17 64.71% [255, 256, 408, 409, 410, 411]
🔵 src/storage/memtable.cpp 2 3 66.67% [72]
🔵 src/storage/primary_index.cpp 5 7 71.43% [1073, 1233]
🔵 src/storage/lake/rowset_update_state.cpp 5 6 83.33% [329]
🔵 src/storage/lake/update_compaction_state.cpp 3 3 100.00% []
🔵 src/storage/rowset_column_update_state.cpp 4 4 100.00% []
🔵 src/storage/schema_change.cpp 2 2 100.00% []
🔵 src/storage/lake/delta_writer.cpp 5 5 100.00% []
🔵 src/storage/rowset_update_state.cpp 3 3 100.00% []
🔵 src/storage/lake/update_manager.cpp 1 1 100.00% []
🔵 src/storage/tablet_reader.cpp 2 2 100.00% []
🔵 src/storage/persistent_index.cpp 1 1 100.00% []
🔵 src/storage/lake/lake_primary_index.cpp 3 3 100.00% []

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

}
auto clone_pk_column = _pk_column->clone_empty();
TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get()));
TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get(), pk_encoding_type));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cached column type may mismatch encoding type

Low Severity

The _pk_column member is cached (created only when null at line 42-49), but pk_encoding_type is re-fetched on every call via tablet.primary_key_encoding_type(). Since each call creates a new Tablet object, if metadata changes between calls, the cached _pk_column type (e.g., Int32Column for V1) could mismatch the new encoding type (V2 expects BinaryColumn). The encode() function has a DCHECK at line 430 of primary_key_encoder.cpp that would fail in debug builds; release builds would have undefined behavior.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants