Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[bugfix] fix the race condition in waitForOperationCompletion#93

Merged
zhongkechen merged 1 commit intomainfrom
step-bug
Feb 13, 2026
Merged

[bugfix] fix the race condition in waitForOperationCompletion#93
zhongkechen merged 1 commit intomainfrom
step-bug

Conversation

@zhongkechen
Copy link
Contributor

@zhongkechen zhongkechen commented Feb 13, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue Link, if available

#72

Description

The bug is in the following lines:

if (phaser.getPhase() == ExecutionPhase.RUNNING.getValue()) {
    // Operation not done yet
    phaser.register();

After checking the phaser status (is RUNNING) and before a new party is registered, the phaser could have changed its status (advanced to the next phase). This is a typical TOCTOU bug

This bug is reproduced with an additional delay between them:

if (phaser.getPhase() == ExecutionPhase.RUNNING.getValue()) {
    try {
        Thread.sleep(2000);
    } catch (InterruptedException ignored) {
    }
    // Operation not done yet
    phaser.register();

With the delay, some test cases will hang forever because the phaser of the operation is registered but never notified (arrived).

The proposed fix here is first calling register to lock the phase state, then check the state and deregister immediately if it's completed

phaser.register();
if (phaser.getPhase() != ExecutionPhase.RUNNING.getValue()) {
    phaser.deregister();
} else {
    ...
}

Demo/Screenshots

added 2 seconds delays in different places in waitForOperationCompletion and the tests always succeeded (but slower)

Checklist

  • I have filled out every section of the PR template
  • I have thoroughly tested this change

Testing

Unit Tests

Have unit tests been written for these changes? Unable to cover the race condition in a unit test

Integration Tests

Have integration tests been written for these changes? Existing integration tests should run smoother than before

Examples

Has a new example been added for the change? (if applicable)

@zhongkechen zhongkechen marked this pull request as ready for review February 13, 2026 20:47
@zhongkechen zhongkechen self-assigned this Feb 13, 2026
Copy link
Contributor

@maschnetwork maschnetwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good elaboration of the problem and fix for now. Let's monitor if there are any other edge cases. Good that we have BaseOperation now here.

@zhongkechen zhongkechen merged commit 87ea164 into main Feb 13, 2026
12 of 13 checks passed
@zhongkechen zhongkechen deleted the step-bug branch February 13, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants