This page describes how Apache Fluss distributes and replicates data across a cluster. It covers bucket-based data distribution, partition management, the replication model, and how Fluss ensures high availability through leader-follower replication and the In-Sync Replica (ISR) mechanism.
Fluss distributes data using a two-level hierarchy:
Data is replicated across multiple TabletServer nodes to ensure fault tolerance. Each bucket has one leader replica and zero or more follower replicas. The leader handles all read and write operations, while followers replicate data from the leader using a dedicated fetcher mechanism fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java150-153
Every table in Fluss is divided into a fixed number of buckets specified at table creation time. If not specified, the cluster uses default.bucket.number fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java74-82 Records are distributed to buckets using a deterministic hash function applied to the bucket key. The ReplicaManager on the TabletServer is responsible for managing these physical data structures and their lifecycles fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java86-109
The following diagram shows how the CoordinatorServer and TabletServer interact to manage data distribution via state machines and metadata storage in ZooKeeper.
Data Distribution Code Map
Sources: fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java95-107 fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java89-109 fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java147-163 fluss-server/src/main/java/org/apache/fluss/server/kv/KvTablet.java107-150
LogTablet for the changelog and a KvTablet for the current state if the replica is the leader fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java150-152LogTablet fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java152-153Fluss uses a leader-based replication model. The CoordinatorServer manages the lifecycle of buckets and replicas through state transitions.
The ReplicaManager on each TabletServer handles the physical instantiation of replicas. It uses NotifyLeaderAndIsrData to transition replicas between leader and follower roles fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java74-75
ReplicaFetcherThread fluss-server/src/test/java/org/apache/fluss/server/replica/fetcher/ReplicaFetcherThreadTest.java137-142Replica Leadership Transition
Sources: fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java112-113 fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java74-75 fluss-server/src/test/java/org/apache/fluss/server/replica/fetcher/ReplicaFetcherThreadTest.java137-142
The ISR is the set of replicas that are currently synchronized with the leader.
Followers pull data from the leader via ReplicaFetcherThread.
leaderEpoch to prevent data inconsistency during network partitions fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java24AdjustIsrRequest to the coordinator to remove it from the ISR fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java51highWatermark, which is the maximum offset that has been replicated to all replicas in the ISR fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java139For partitioned tables, Fluss supports both manual and automatic partitioning.
STRING, BIGINT, DATE) fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java72-78AutoPartitionManager in the CoordinatorServer periodically creates new partitions based on the table's partition strategy fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorServer.java132TableBucket instances, each managed as a Replica fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java148-152The RebalanceManager in the coordinator tracks the count of replicas and leaders per TabletServer to identify imbalances fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorServer.java36 Rebalancing involves moving TableBucketReplica instances to different servers to normalize load. The coordinator generates a RebalanceTask which is persisted in ZooKeeper fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java115
Fluss utilizes server rack information to ensure that replicas of the same bucket are not all placed on the same physical rack.
TabletServer reports its rack property during registration with ZooKeeper fluss-server/src/main/java/org/apache/fluss/server/tablet/TabletServer.java106InvalidServerRackInfoException is thrown fluss-server/src/main/java/org/apache/fluss/server/tablet/TabletServer.java103-105Sources: fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java74-92 fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java45-115 fluss-server/src/main/java/org/apache/fluss/server/replica/ReplicaManager.java74-142 fluss-server/src/main/java/org/apache/fluss/server/replica/Replica.java147-153 fluss-server/src/main/java/org/apache/fluss/server/tablet/TabletServer.java95-106 fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorServer.java120-154
Refresh this wiki