[Backport release-1.32] Reclassify controller readiness failures in Autopilot#7008
Merged
twz123 merged 8 commits intok0sproject:release-1.32from Jan 27, 2026
Merged
Conversation
Remove the unnecessary interface, as well. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit d51b4bf) (cherry picked from commit f10b794) (cherry picked from commit 227b629)
There's no need to pass the targets as a struct field. Passing them as method parameters prevents possible race conditions if K0sUpdateReady is called concurrently. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 5ccb088) (cherry picked from commit 46f0f1a) (cherry picked from commit 8f1a38e)
And let the first failed probe cancel the context of the other concurrent probes to be fail-fast. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 67c73e1) (cherry picked from commit cd98532) (cherry picked from commit 1ab2664)
The type names are different from those used for plans as a whole. Moreover, there's no `MissingPlatform` state for plans. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 8b0a756) (cherry picked from commit 2a1235b) (cherry picked from commit 7ffd773)
MissingSignalNode and IncompleteTargets both represented "a resolved target no longer exists". Keeping both is confusing. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 0c9c921) (cherry picked from commit 8cc72cb) (cherry picked from commit 0aa92dd)
Just let it accept its direct dependencies. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 859d722) (cherry picked from commit e20416d) (cherry picked from commit 11c087c)
The previous implementation of the readiness probe didn't categorize the different errors that could occur when probing the controllers in any way. Callers could not distinguish between transient and fundamental problems. This change classifies probe failures into the following categories: - Target resolution failed (e.g. missing ControlNode) - Readiness probe unsuccessful - Everything else The probe records the worst error encountered and cancels in-flight probes as soon as a target resolution failure is detected. Callers can then decide to treat those error categories differently. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit bf4bb4d) (cherry picked from commit afb7660) (cherry picked from commit 3c6b851)
Previously, problems with controller readiness would halt the execution of an Autopilot plan, requiring manual intervention. This was explicitly documented. However, it is anticipated that controllers won't always be ready, e.g. during restarts. Don't let a readiness failure terminate plan execution. Instead, requeue the plan and retry later. This renders the `InconsistentTargets` plan state unnecessary. The `IncompleteTargets` state serves the same purpose. The controller delegate now reports `Incomplete` rather than `Inconsistent`, which maps to `IncompleteTargets`. Remove the `InconsistentTargets` state and keep it only as a legacy state in the docs. Note that a missing internal IP address is now considered a transient error. Therefore, a controller that temporarily loses its IP address will not fail the plan permanently. The quorumsafety test has been updated to better reflect the intended outcome. It stops a real controller, verifies that the plan remains in a schedulable state as long as not all controllers are ready, restarts the controller, and asserts that the plan completes. This avoids reliance on a fabricated ControlNode with no IP address. Signed-off-by: Tom Wieczorek <[email protected]> (cherry picked from commit 0b4f1a6) (cherry picked from commit ab141d3) (cherry picked from commit caf4bcc)
jnummelin
approved these changes
Jan 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport to
release-1.32:See: