[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191

srlch · 2026-01-21T02:17:35Z

Why I'm doing:

In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is bad for range-distribution table in share data mode.

What I'm doing:

We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code. ORIGINAL type: encoding as the previous way.
BIG_ENDIAN type: always encoding as big-endian way, Currently, we only support BIG_ENDIAN encoding type for range-distribution table in share data mode.

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
- This pr needs auto generate documentation
This is a backport pr

Bugfix cherry-pick branch check:

chatgpt-codex-connector · 2026-01-21T02:17:42Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

StarRocks-Reviewer · 2026-01-21T07:09:20Z

@cursor review

be/src/storage/lake/rowset_update_state.cpp

StarRocks-Reviewer · 2026-01-26T03:24:23Z

@cursor review

be/test/storage/primary_key_encoder_v2_test.cpp

be/src/storage/memtable.cpp

…stribution table in share data mode. Why I'm doing: In current implementation, we always encode the primary key without BIG_ENDIAN transformation if the there is only one pk column and it is not string type. Such encoding method can not perserve the key order after encoding which is bad for range-distribution table in share data mode. What I'm doing: We introduce this PrimaryKeyEncodingType enum class to support compatibility with existing code. V1: encoding as the previous way. v2: always encoding as big-endian way, Currently, we only support V2 encoding type for range-distribution table in share data mode. Signed-off-by: srlch <[email protected]>

Signed-off-by: srlch <[email protected]>

StarRocks-Reviewer · 2026-01-26T09:33:14Z

@cursor review

github-actions · 2026-01-26T09:37:29Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2026-01-26T09:37:33Z

[FE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2026-01-26T09:38:57Z

[BE Incremental Coverage Report]

✅ pass : 47 / 58 (81.03%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	src/storage/local_tablet_reader.cpp	0	1	00.00%	[119]
🔵	src/storage/primary_key_encoder.cpp	11	17	64.71%	[255, 256, 408, 409, 410, 411]
🔵	src/storage/memtable.cpp	2	3	66.67%	[72]
🔵	src/storage/primary_index.cpp	5	7	71.43%	[1073, 1233]
🔵	src/storage/lake/rowset_update_state.cpp	5	6	83.33%	[329]
🔵	src/storage/lake/update_compaction_state.cpp	3	3	100.00%	[]
🔵	src/storage/rowset_column_update_state.cpp	4	4	100.00%	[]
🔵	src/storage/schema_change.cpp	2	2	100.00%	[]
🔵	src/storage/lake/delta_writer.cpp	5	5	100.00%	[]
🔵	src/storage/rowset_update_state.cpp	3	3	100.00%	[]
🔵	src/storage/lake/update_manager.cpp	1	1	100.00%	[]
🔵	src/storage/tablet_reader.cpp	2	2	100.00%	[]
🔵	src/storage/persistent_index.cpp	1	1	100.00%	[]
🔵	src/storage/lake/lake_primary_index.cpp	3	3	100.00%	[]

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-26T09:41:15Z

be/src/storage/lake/pk_tablet_sst_writer.cpp

    }
    auto clone_pk_column = _pk_column->clone_empty();
-    TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get()));
+    TRY_CATCH_BAD_ALLOC(PrimaryKeyEncoder::encode(_pkey_schema, data, 0, data.num_rows(), clone_pk_column.get(), pk_encoding_type));


Cached column type may mismatch encoding type

Low Severity

The _pk_column member is cached (created only when null at line 42-49), but pk_encoding_type is re-fetched on every call via tablet.primary_key_encoding_type(). Since each call creates a new Tablet object, if metadata changes between calls, the cached _pk_column type (e.g., Int32Column for V1) could mismatch the new encoding type (V2 expects BinaryColumn). The encode() function has a DCHECK at line 430 of primary_key_encoder.cpp that would fail in debug builds; release builds would have undefined behavior.

srlch requested review from a team as code owners January 21, 2026 02:17

mergify bot assigned srlch Jan 21, 2026

github-actions bot added the 4.1 label Jan 21, 2026

srlch requested a review from a team as a code owner January 21, 2026 07:08

cursor bot reviewed Jan 21, 2026

View reviewed changes

be/src/storage/lake/rowset_update_state.cpp Outdated Show resolved Hide resolved

srlch force-pushed the range_endian branch from f2bd025 to fd8a642 Compare January 26, 2026 03:23

cursor bot reviewed Jan 26, 2026

View reviewed changes

be/test/storage/primary_key_encoder_v2_test.cpp Outdated Show resolved Hide resolved

be/src/storage/memtable.cpp Show resolved Hide resolved

srlch added 2 commits January 26, 2026 17:04

fix and add ut

d4f5e00

Signed-off-by: srlch <[email protected]>

srlch force-pushed the range_endian branch from fd8a642 to d4f5e00 Compare January 26, 2026 09:32

cursor bot reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191

[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191

srlch commented Jan 21, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Jan 21, 2026

Uh oh!

StarRocks-Reviewer commented Jan 21, 2026

Uh oh!

Uh oh!

StarRocks-Reviewer commented Jan 26, 2026

Uh oh!

Uh oh!

Uh oh!

StarRocks-Reviewer commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191

Are you sure you want to change the base?

[Enhancement] Support Big-endian encoding for primary key in range-distribution table in share data mode. #68191

Conversation

srlch commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

chatgpt-codex-connector bot commented Jan 21, 2026

Uh oh!

StarRocks-Reviewer commented Jan 21, 2026

Uh oh!

Uh oh!

StarRocks-Reviewer commented Jan 26, 2026

Uh oh!

Uh oh!

Uh oh!

StarRocks-Reviewer commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

[Java-Extensions Incremental Coverage Report]

Uh oh!

github-actions bot commented Jan 26, 2026

[FE Incremental Coverage Report]

Uh oh!

github-actions bot commented Jan 26, 2026

[BE Incremental Coverage Report]

file detail

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Cached column type may mismatch encoding type

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srlch commented Jan 21, 2026 •

edited

Loading