Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Kafka ingestion support for subset partitions#17587

Open
xiangfu0 wants to merge 2 commits intoapache:masterfrom
xiangfu0:kafka-subset-partitions
Open

Add Kafka ingestion support for subset partitions#17587
xiangfu0 wants to merge 2 commits intoapache:masterfrom
xiangfu0:kafka-subset-partitions

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Jan 27, 2026

Summary

Add support for Kafka partition-subset realtime ingestion so Pinot can assign and consume only selected topic partitions for a table.

Changes

  • Add stream.kafka.partition.ids parser/validation utilities in pinot-kafka-base to interpret configured partition subsets.
  • Update controller assignment logic (segment and instance selectors) to support partition-group assignment across subset partitions.
  • Update Kafka metadata providers (pinot-kafka-3.0 and pinot-kafka-4.0) to:
    • honor configured partition subsets in partition counts/group metadata
    • validate configured IDs against topic metadata
    • support stable subset-based partition-group behavior
  • Add unit tests for subset parsing and Kafka metadata-provider partition selection.
  • Add quickstart examples for split-topic ingestion:
    • fineFoodReviews-part-0
    • fineFoodReviews-part-1

@xiangfu0 xiangfu0 requested a review from Copilot January 27, 2026 16:30
@xiangfu0 xiangfu0 added feature release-notes Referenced by PRs that need attention when compiling the next release notes kafka ingestion labels Jan 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for configuring Kafka ingestion to consume only a subset of topic partitions via the stream.kafka.partition.ids configuration property. This enables multiple tables to share a single Kafka topic by consuming different partitions.

Changes:

  • Added stream.kafka.partition.ids configuration property and parsing utilities
  • Modified Kafka metadata providers (2.0 and 3.0) to validate and respect partition subsets
  • Updated instance assignment logic to support non-contiguous partition IDs
  • Added comprehensive unit tests and example configurations

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pinot-kafka-base/KafkaStreamConfigProperties.java Defines new PARTITION_IDS constant for subset configuration
pinot-kafka-base/KafkaPartitionSubsetUtils.java Implements parsing, validation, and deduplication of partition ID lists
pinot-kafka-base/KafkaPartitionSubsetUtilsTest.java Comprehensive unit tests for partition ID parsing
pinot-kafka-base/KafkaPartitionLevelStreamConfig.java Exposes stream config map for partition subset utilities
pinot-kafka-2.0/KafkaStreamMetadataProvider.java Overrides partition methods to validate and return subset partitions
pinot-kafka-3.0/KafkaStreamMetadataProvider.java Mirrors 2.0 implementation for Kafka 3.0 compatibility
pinot-kafka-2.0/KafkaPartitionLevelConsumerTest.java Tests subset validation and partition count/ID fetching
pinot-kafka-2.0/README.md Documents the partition subset feature
InstanceReplicaGroupPartitionSelector.java Supports explicit partition IDs in instance assignment
ImplicitRealtimeTablePartitionSelector.java Fetches and uses stream partition IDs for instance assignment
RealtimeSegmentAssignment.java Updates segment assignment to handle non-contiguous partition IDs
InstanceAssignmentTest.java Tests single-partition subset with non-zero ID
QuickStartBase.java Adds fineFoodReviews-part-0 and fineFoodReviews-part-1 examples
examples/stream/subsetPartitions/* Example configuration and documentation
examples/stream/fineFoodReviews-part-/ Demo tables consuming single partitions

@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 64.77273% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.30%. Comparing base (7f70124) to head (c7bace2).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
...in/stream/kafka30/KafkaStreamMetadataProvider.java 40.90% 25 Missing and 1 partial ⚠️
.../core/realtime/PinotLLCRealtimeSegmentManager.java 76.19% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17587      +/-   ##
============================================
+ Coverage     63.21%   63.30%   +0.09%     
  Complexity     1454     1454              
============================================
  Files          3183     3186       +3     
  Lines        191500   191628     +128     
  Branches      29289    29316      +27     
============================================
+ Hits         121048   121304     +256     
+ Misses        60986    60834     -152     
- Partials       9466     9490      +24     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.21% <64.77%> (+0.02%) ⬆️
java-21 63.25% <64.77%> (+0.07%) ⬆️
temurin 63.30% <64.77%> (+0.09%) ⬆️
unittests 63.29% <64.77%> (+0.09%) ⬆️
unittests1 55.61% <ø> (+0.03%) ⬆️
unittests2 34.20% <64.77%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from ab92eca to 3760d06 Compare January 28, 2026 05:45
@xiangfu0 xiangfu0 requested a review from Copilot January 28, 2026 10:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 3760d06 to 4235903 Compare January 28, 2026 13:01
@xiangfu0 xiangfu0 requested a review from Copilot January 28, 2026 13:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from 0379c91 to 0374b0b Compare January 28, 2026 17:18
@xiangfu0 xiangfu0 requested a review from Copilot January 29, 2026 13:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 9220944 to d045fa9 Compare January 29, 2026 16:09
@xiangfu0 xiangfu0 requested a review from Copilot January 29, 2026 16:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 03fed47 to 3855de1 Compare February 3, 2026 16:36
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang February 3, 2026 16:39
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 3855de1 to 548b943 Compare February 5, 2026 11:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 8566b36 to 7ba5447 Compare February 24, 2026 18:54
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from bac1ecf to 32fa5a1 Compare February 26, 2026 15:26
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 32fa5a1 to 159ba84 Compare February 27, 2026 21:51
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Changing InstancePartitions format can be risky because it is involved in partition handling, where numPartitions is used.
Alternatively, we can use a different setup to avoid changing it. We can make multiple tables share the same InstancePartition, where each of them consume a subset of partitions. This way the only change needed is on segment creation side, where we only create segment for the configured partitions. Other part automatically works.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 49b4b3e to 5f1d26f Compare February 28, 2026 08:24
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from ca98a53 to 7202b65 Compare March 1, 2026 03:19
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 702cbdf to ad40be3 Compare March 2, 2026 21:21
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from ad40be3 to a5e8de3 Compare March 2, 2026 23:26
… instance assignment

Instead of keying instance partitions by subset partition IDs, use the total Kafka
partition count so that subset tables share the same InstancePartitions layout as
full-consumption tables. The only change on the segment creation side is that
computePartitionGroupMetadata() already filters to the configured subset.

Key changes:
- fetchPartitionCount()/fetchPartitionIds() return total Kafka partition count/IDs
  so ImplicitRealtimeTablePartitionSelector assigns instances for all N partitions
- PinotLLCRealtimeSegmentManager uses instancePartitions.getNumPartitions() (= total N)
  as numPartitions for segment ZK metadata, ensuring correct broker query routing
- Revert unnecessary changes to InstancePartitions, ImplicitRealtimeTablePartitionSelector,
  InstanceReplicaGroupPartitionSelector (only keep null-safety NPE fixes), and
  RealtimeSegmentAssignment (original modulo segmentPartitionId % numPartitions
  works correctly since all N partition slots are now present in instance partitions)
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from a5e8de3 to c7bace2 Compare March 3, 2026 06:41
streamConfigs.forEach(_flushThresholdUpdateManager::clearFlushThresholdUpdater);
InstancePartitions instancePartitions = getConsumingInstancePartitions(tableConfig);
int numPartitionGroups = consumeMeta.size();
int numPartitionGroups = getPartitionCountForRouting(streamConfigs, consumeMeta.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you calculate this on the caller side and pass it in? This info should be available when calculating the consumeMeta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature ingestion kafka release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants