[WIP] Kafka 4.x Queue Semantics support #189

Shekharrajak · 2025-09-01T09:16:16Z

Ref https://issues.apache.org/jira/browse/FLINK-38287

This implementation adds Kafka 4.x share group semantics to Flink's Kafka connector while maintaining full backward compatibility with existing code. The code changes are following KIP-932 and FLIP-27 main source architecture and implicit mode acknowledgement.

This directly addresses use cases where:

Multiple consumers need to process items efficiently in parallel from a single/multiple topic(s).
Messages need explicit acknowledgment/release (to avoid reprocessing or allow retries).
Use cases where scaling Flink ML/LLM workload is critical - Shifting Kafka coordination and assignment logic to the broker side would simplify today’s complex Flink source management, making consumption more efficient, scalable, and far less error-prone.
Operational Benefits

Higher Throughput: ShareGroupHeartbeat helps in Queue-like workloads, maximum throughput scenarios. Share groups distribute messages at the record level, not partition level, so multiple readers can consume from the same topic with Kafka coordinating message distribution.
Better Availability and Flexible Scaling: consumers assignment logic is simpler in server side and rebalancing frequency is minimised.

Let's have discussion over the design and how the checkpointing will work when we use KafkaShareConsumer API from Kafka 4.1

Add comprehensive support for Kafka Share Groups with queue-like message distribution semantics. This implementation provides an alternative to traditional partition-based consumption by distributing messages at the record level across multiple consumers. Key features: - KafkaShareGroupSource using KafkaShareConsumer API (Kafka 4.1.0+) - Automatic message distribution without partition assignment - Share group configuration validation and error handling - Flink SQL table integration with 'kafka-sharegroup' connector - Comprehensive metrics and monitoring support - Proper split management adapted for share group semantics Components added: - KafkaShareGroupSource: Main source implementation - KafkaShareGroupSourceReader: Share group-aware source reader - KafkaShareConsumerSplitReader: Split reader using KafkaShareConsumer - KafkaShareGroupFetcherManager: Custom fetcher for share groups - KafkaShareGroupDynamicTableFactory: SQL integration - KafkaShareGroupSourceMetrics: Metrics collection - Comprehensive test suite Configuration improvements: - Remove incompatible properties (enable.auto.commit, auto.offset.reset) - Add share group specific validation - Enhanced version detection for Kafka 4.1.0+ features - Proper split registration synchronization

This commit adds full support for Kafka Share Groups (KIP-932) in Flink, enabling message-level consumption with parallelism beyond partition limits. Key features: - Topic-based splits instead of partition-based for share group semantics - Multiple readers per topic with automatic message distribution - Proper Flink FLIP-27 connector architecture integration - Support for subtasks > partitions use cases - Built-in metrics and state management Core components added: - KafkaShareGroupSplit: Topic-based split implementation - KafkaShareGroupEnumerator: Assigns topics to multiple readers - KafkaShareGroupSplitReader: Uses KafkaShareConsumer.subscribe() - KafkaShareGroupRecordEmitter: Handles deserialization - Updated fetcher manager and source reader for proper integration Removed obsolete KafkaShareConsumerSplitReader in favor of new architecture.

- Update dependency scopes from 'provided' to 'compile' for standalone execution - Add slf4j-simple logging dependency for debugging - Upgrade Java target from 10 to 11 for better compatibility - Update E2E test dependencies for share group testing - Clean up .gitignore to properly exclude .idea directory - Remove obsolete META-INF factory service file These changes enable proper standalone testing and development of the Kafka Share Group connector with source parallelism beyond partition limits.

- Enable explicit acknowledgment mode for precise record control - Add checkpoint-based acknowledgment mechanism to prevent data loss - Implement record release on checkpoint failures for redelivery - Add proper error handling for InvalidRecordStateException - Include memory leak prevention with pending record cleanup - Add comprehensive logging for debugging acknowledgment flow

- Add checkpoint start notification to associate records with checkpoint ID - Enhance checkpoint completion to trigger record acknowledgment - Add checkpoint abortion handling to release records for redelivery - Coordinate acknowledgment timing with Flink checkpoint lifecycle - Bridge communication between source reader and split readers

…anager - Add checkpoint start, complete, and abort notification methods - Include comprehensive INFO-level logging for checkpoint tracking - Establish coordination interface between source reader and split readers - Provide foundation for future fetcher-level checkpoint coordination - Add proper logger for fetcher manager operations

- Add recordSuccessfulCommit() method for tracking successful acknowledgments - Add recordFailedCommit() method for tracking failed acknowledgments - Include debug logging for commit operation visibility - Complete metrics interface required by enhanced split reader - Enable monitoring of acknowledgment success and failure rates

- Replace full record storage with memory-bounded caching approach - Add configurable cache memory limit (flink.share.group.cache.max.memory.bytes) - Implement record release under memory pressure to maintain at-least-once semantics - Follow Pulsar connector pattern for lightweight acknowledgment metadata - Add automatic cache cleanup and memory pressure handling - Reduce memory usage from ~1KB per record to ~50 bytes per record

- Implement acknowledgment metadata storage following Pulsar connector pattern - Add lightweight AcknowledgmentMetadata class for efficient checkpointing - Update snapshotState to store metadata instead of full records - Enhance notifyCheckpointComplete with metadata-based acknowledgment - Add proper cleanup of acknowledgment metadata after checkpoint completion - Maintain split state tracking for finished splits acknowledgment

- Add acknowledgment metadata tracking for Pulsar-style checkpointing - Implement addPendingAcknowledgment method for offset tracking - Add getLatestAcknowledgmentMetadata for checkpoint integration - Include clearPendingAcknowledgments for cleanup after commit - Add pending record count tracking for monitoring - Maintain minimal state while supporting metadata-only approach

- Implement acknowledgeMessages method for metadata-only acknowledgment - Add import for AcknowledgmentMetadata class - Maintain compatibility with SourceReader acknowledgment pattern - Support checkpoint notification infrastructure - Log acknowledgment operations for debugging and monitoring

- Add ShareGroupBatchManager to control polling and store complete record batches in checkpoint state for crash recovery Enhanced batch management, checkpoint snapshots, and recovery operations with structured logging Enhance KafkaShareGroupSourceReader with structured checkpoint lifecycle logging Enhanced fetch failures, acknowledgment errors, and checkpoint abort scenarios with detailed context

boring-cyborg · 2025-09-01T09:16:19Z

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

Shekharrajak · 2025-09-01T09:18:37Z

flink-connector-kafka/pom.xml

 			<groupId>org.apache.flink</groupId>
 			<artifactId>flink-streaming-java</artifactId>
-			<scope>provided</scope>
+			<scope>compile</scope>


These changes can be reverted.

Shekharrajak · 2025-09-01T09:20:12Z

.../src/test/java/org/apache/flink/connector/kafka/source/KafkaShareGroupSourceBuilderTest.java

+ * <p>This test validates builder functionality, error handling, and property management
+ * for Kafka share group source construction.
+ */
+@DisplayName("KafkaShareGroupSourceBuilder Tests")


Testcases improvements is required, I will check and update them accordingly.

Shekharrajak · 2025-09-01T09:21:00Z

pom.xml

 		<confluent.version>7.9.2</confluent.version>
 		<flink.version>2.0.0</flink.version>
-		<kafka.version>3.9.1</kafka.version>
+		<kafka.version>4.1.0</kafka.version>


This is not yet released and expected to be available to download. Meantime the testing is done by adding this into class path.

Shekharrajak · 2025-09-01T09:26:03Z

...fka/src/main/java/org/apache/flink/connector/kafka/source/reader/ShareGroupBatchManager.java

+ * Manages batches of records from Kafka share consumer for checkpoint persistence.
+ * Controls when new batches can be fetched to work within share consumer's auto-commit constraints.
+ */
+public class ShareGroupBatchManager<K, V> 


ListCheckpointed will help to store the records which is polled but not yet processed in Flink persistent checkpoint state - this will make sure in case of any failure / crash we process the records that we read & ack.

Shekharrajak · 2025-09-02T05:37:54Z

Using Flink SQL, we can have some validation :

CREATE TABLE kafka_share_source (
      message STRING
  ) WITH (
      'connector' = 'kafka-sharegroup',
      'bootstrap.servers' = 'localhost:9092',
      'share-group-id' = 'flink-sql-test-group',
      'topic' = 'test-topic',
      'format' = 'raw',
      'source.parallelism' = '4'  -- 4 subtasks regardless of partition count
  );
 
select * from kafka_share_source;

davidradl · 2025-09-05T08:45:12Z

pom.xml

-		<flink.version>2.1.0</flink.version>
-		<kafka.version>4.0.0</kafka.version>
+		<flink.version>2.0.0</flink.version>
+		<kafka.version>4.1.0</kafka.version>


PR 190 is also upping the Kafka client level to 4.1. If we do it here then we should amend the NOTOCE as per pr 190. fyi @tomncooper

jnh5y · 2025-09-16T14:26:47Z

As a high-level note, since share groups do not use transactions, there will be some possibility for reprocessing messages. Is that ok for your use cases?

Generally, do you have any performance numbers to show that this consumer is faster? (Of course, since transactions are not available, I could imagine it being a little bit faster anyhow...)

Shekharrajak added 15 commits August 21, 2025 12:58

intial updates for kafka 4.1.0

4e6912b

table factory update

bc9f59e

minor change

0c76d94

boring-cyborg bot added the component=Connectors/Kafka label Sep 1, 2025

Shekharrajak added 2 commits September 1, 2025 14:47

Delete Test Runner

88f09a7

Delete TestRunner.java

56bbc2d

Shekharrajak commented Sep 1, 2025

View reviewed changes

Shekharrajak changed the title ~~[WIP] Kafka 4.x Queue Semantics support in Flink Connector Kafka~~ [WIP] Kafka 4.x Queue Semantics support Sep 1, 2025

Shekharrajak and others added 2 commits September 1, 2025 19:16

minor change - pom.xml update

4575788

Merge branch 'main' into feature/kafka4

568b5d3

davidradl reviewed Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Kafka 4.x Queue Semantics support #189

[WIP] Kafka 4.x Queue Semantics support #189

Uh oh!

Shekharrajak commented Sep 1, 2025 •

edited

Loading

Uh oh!

boring-cyborg bot commented Sep 1, 2025

Uh oh!

Shekharrajak Sep 1, 2025

Uh oh!

Shekharrajak Sep 1, 2025

Uh oh!

Shekharrajak Sep 1, 2025

Uh oh!

Shekharrajak Sep 1, 2025

Uh oh!

Shekharrajak commented Sep 2, 2025

Uh oh!

davidradl Sep 5, 2025

Uh oh!

jnh5y commented Sep 16, 2025

Uh oh!

Uh oh!

[WIP] Kafka 4.x Queue Semantics support #189

Are you sure you want to change the base?

[WIP] Kafka 4.x Queue Semantics support #189

Uh oh!

Conversation

Shekharrajak commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boring-cyborg bot commented Sep 1, 2025

Uh oh!

Shekharrajak Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Shekharrajak Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Shekharrajak Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Shekharrajak Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Shekharrajak commented Sep 2, 2025

Uh oh!

davidradl Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

jnh5y commented Sep 16, 2025

Uh oh!

Uh oh!

Shekharrajak commented Sep 1, 2025 •

edited

Loading