Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@whitehawk
Copy link

@whitehawk whitehawk commented Oct 17, 2025

Implement cleanup and rollback options for the cluster shrink

In this patch:

  1. The new option '--clean' is added for the cluster shrink by the ggrebalance
    tool.
  2. The new option '--rollback' is added for the cluster shrink by the
    ggrebalance tool.
  3. The new option '--non-interactive-mode' is added for the ggrebalance tool. It
    is essential to allow auto testing of some cleanup scenarios that would expect
    user confirmation without such an option.
  4. As the existing 'main' and the new 'rollback' shrink workflows use similar
    functionality, the shrink code is reorganized to reduce code duplication:
    a. New functions that are used in both 'main' and 'rollback' workflows are
    introduced (like 'prepare_shrink_schema()', 'rebalance_tables()').
    b. All logic related to the ggrebalance schema handling is moved to a separate
    class named 'RebalanceSchema' in 'rebalance_commons.py'.
  5. A new entity, 'Plan,' is added. It is used to pass information about required
    shrink configuration of the target cluster to the shrink engine. We store it in
    the rebalance schema and used for the 'rollback' workflow, and when we recover
    from an interrupted shrink state. It is added due to the following reasons:
    a. As already stated above, we need it during rollback. When the user starts the
    rollback operation, he doesn't specify the target segment count that was used
    at the preceding shrink operation. Thus we need to store this information at
    shrink for the later usage.
    b. When the user tries to re-enter the shrink procedure from an interrupted
    state, we need to re-start with the same target segment count that was specified
    originally. Otherwise we may get the cluster in some invalid configuration where
    tables are shrunk to different segment counts. Giving the user the ability
    to specify target segment count for the re-enter launch opens the way for such
    error prone scenarios. So we just forbid specifying segment count configuration
    if we re-enter the interrupted state or start the rollback, and use the saved
    plan information that we got at the very first operation start.
    c. According to the current design, at the later phase we'll introduce a Planner
    entity, that will perform planning for all shrink/expand/rebalance operations.
    And its output Plan will be the input to the shrink engine. So this change is
    aligned with the overall design.
  6. New behave test cases are added. The test cases cover not only the 'cleanup'
    and 'rollback' flows, but also the existing 'main' shrink flow, as we can't
    guarantee the correctness of rollback without proving the 'main' flow works Ok.
    The existing test case is renamed to 'test 2.4' and moved to be near the new
    tests that cover similar functionality.
  7. New steps are added to mgmt_utils.py, that are used to verify that the
    shrinked segments are actually down. Also a small change in 'SegmentIsShutDown'
    is done - it is required to check that the mirror is down.
  8. In order to recover properly, if we are interrupted in the middle of stopping
    shrinked segments, a new class 'SegmentStopAfterShrink' is introduced. It wraps
    the 'SegmentStop' with the checking whether the segment is actually still
    running. Without it, if shrink was re-entered and some segments were already
    shut down by the preceding interrupted launch, we got an error when trying to
    shut down such segments.

@whitehawk whitehawk changed the title DRAFT - Adbdev 8485 2 Implement cleanup and rollback options for the cluster shrink Oct 20, 2025
@whitehawk whitehawk marked this pull request as ready for review October 20, 2025 05:15
Copy link
Member

@bimboterminator1 bimboterminator1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to approve

@whitehawk whitehawk merged commit 14f4669 into feature/ADBDEV-6608 Oct 24, 2025
1 check passed
@whitehawk whitehawk deleted the ADBDEV-8485-2 branch October 24, 2025 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants