Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NIFI-15655 - Flaky System Test FlowSynchronizationIT.testUnnecessaryProcessorsAndConnectionsRemoved#10946

Open
pvillard31 wants to merge 3 commits intoapache:mainfrom
pvillard31:NIFI-15655
Open

NIFI-15655 - Flaky System Test FlowSynchronizationIT.testUnnecessaryProcessorsAndConnectionsRemoved#10946
pvillard31 wants to merge 3 commits intoapache:mainfrom
pvillard31:NIFI-15655

Conversation

@pvillard31
Copy link
Contributor

@pvillard31 pvillard31 commented Mar 1, 2026

Summary

NIFI-15655 - Flaky System Test FlowSynchronizationIT.testUnnecessaryProcessorsAndConnectionsRemoved

[ERROR] Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 382.2 s <<< FAILURE! -- in org.apache.nifi.tests.system.clustering.FlowSynchronizationIT
[ERROR] org.apache.nifi.tests.system.clustering.FlowSynchronizationIT.testUnnecessaryProcessorsAndConnectionsRemoved -- Time elapsed: 312.5 s <<< ERROR!
java.util.concurrent.TimeoutException: testUnnecessaryProcessorsAndConnectionsRemoved() timed out after 5 minutes
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	Suppressed: java.lang.InterruptedException: sleep interrupted
		at java.base/java.lang.Thread.sleep0(Native Method)
		at java.base/java.lang.Thread.sleep(Thread.java:509)
		at org.apache.nifi.tests.system.NiFiSystemIT.waitFor(NiFiSystemIT.java:420)
		at org.apache.nifi.tests.system.NiFiSystemIT.waitFor(NiFiSystemIT.java:407)
		at org.apache.nifi.tests.system.clustering.FlowSynchronizationIT.testUnnecessaryProcessorsAndConnectionsRemoved(FlowSynchronizationIT.java:718)

During the disconnect/reconnect cycle, ZooKeeper leader election can transiently return an address without a port. Previously, this caused an ArrayIndexOutOfBoundsException — which, while caught by HeartbeatSendTask's catch (Throwable), produced an opaque error.

With this change, both layers now throw a proper ProtocolException with a clear message. The ProtocolException is still caught by the same catch (Throwable) handler in HeartbeatSendTask.run(), the heartbeat task continues running, and the next cycle succeeds once leader election stabilizes. The clearer exception type and message makes the transient condition easier to diagnose in logs.

Modified some tests to improve reliability and have the proper wait conditions.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@pvillard31 pvillard31 marked this pull request as draft March 1, 2026 13:37
@pvillard31 pvillard31 marked this pull request as ready for review March 1, 2026 14:51
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this issue @pvillard31, improving system test stability is very useful for some of these unstable tests.

Including the sanity check looks helpful with the Leader Election Manager as a framework extension point, but the note in the description about ZooKeeper behavior raises another question. Reviewing the CuratorLeaderElectionManager, are there changes at that level which could prevent this from happening?

@exceptionfactory
Copy link
Contributor

It looks like the Curator LeaderSelector can return a Participant with an empty id, which could cause issues as one path. Tightening up that return handling in CuratorLeaderElectionManager seems like one candidate for improvement.

@pvillard31
Copy link
Contributor Author

That is a very good point, thanks for catching that @exceptionfactory - I pushed a commit to improve this code path on the curator side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants