[Bug fix] Fault-Domain-Aware Instance Assignment failing rebalance with minimize data movement#17799
[Bug fix] Fault-Domain-Aware Instance Assignment failing rebalance with minimize data movement#17799J-HowHuang wants to merge 3 commits intoapache:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #17799 +/- ##
============================================
+ Coverage 63.21% 63.24% +0.03%
- Complexity 1454 1456 +2
============================================
Files 3183 3186 +3
Lines 191500 191615 +115
Branches 29289 29315 +26
============================================
+ Hits 121048 121183 +135
+ Misses 60986 60947 -39
- Partials 9466 9485 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
yashmayya
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix @J-HowHuang! Is it possible to add a small test for this case?
Also, IIUC, does this mean that all steady state rebalances (no instance additions / removals) with minimizeDataMovement currently fail when the instance selector is FDAwareInstancePartitionSelector?
| public void fill(Map<Integer, LinkedHashSet<String>> faultDomainToCandidateInstancesMap) { | ||
| // skip filling if there is no candidate instance, which can happen when minimize data movement is enabled and | ||
| // no new instances are added to any pool | ||
| if (faultDomainToCandidateInstancesMap.values().stream().allMatch(HashSet::isEmpty)) { |
There was a problem hiding this comment.
nit: Set::isEmpty or Collection::isEmpty would be more idiomatic rather than using HashSet::isEmpty for a LinkedHashSet.
yashmayya
left a comment
There was a problem hiding this comment.
LGTM, thanks for the fix @J-HowHuang! Is it possible to add a small test for this case?
Also, IIUC, does this mean that all steady state rebalances (no instance additions / removals) with minimizeDataMovement currently fail when the instance selector is FDAwareInstancePartitionSelector?
|
@yashmayya Added the test and verified that it would fail without the fix. |
| * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE | ||
| * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the | ||
| * License. You may obtain a copy of the License at | ||
| * <p> | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * <p> | ||
| * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations under the License. |
Description
For tables using
FD_AWARE_INSTANCE_PARTITION_SELECTORas their partition selector in instance assignment config, it's likely to fail rebalance whenminimizeDataMovement=trueif the instances didn't change in all pools. This will result in throwing an exception herejava.util.NoSuchElementExceptionsince the map is emptypinot/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/assignment/instance/FDAwareInstancePartitionSelector.java
Lines 250 to 255 in 1ebc1af
Reproduce
Run quickstart offline, for table
airlineStats_OFFLINE, remove itstierConfigsand add the followinginstanceAssignmentConfigMapRun rebalance with minimize data movement enabled.
Results in
Controller log:
Change
Add a check before
FDAwareInstancePartitionSelector$ReplicaGroupBasedAssignmentState.fillto see if there's any instance to fill, otherwise skip.