Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Comments

[Backport release-1.32] Reclassify controller readiness failures in Autopilot#7008

Merged
twz123 merged 8 commits intok0sproject:release-1.32from
twz123:backport-6996-to-release-1.32
Jan 27, 2026
Merged

[Backport release-1.32] Reclassify controller readiness failures in Autopilot#7008
twz123 merged 8 commits intok0sproject:release-1.32from
twz123:backport-6996-to-release-1.32

Conversation

Remove the unnecessary interface, as well.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit d51b4bf)
(cherry picked from commit f10b794)
(cherry picked from commit 227b629)
There's no need to pass the targets as a struct field. Passing them as
method parameters prevents possible race conditions if K0sUpdateReady is
called concurrently.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 5ccb088)
(cherry picked from commit 46f0f1a)
(cherry picked from commit 8f1a38e)
And let the first failed probe cancel the context of the other
concurrent probes to be fail-fast.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 67c73e1)
(cherry picked from commit cd98532)
(cherry picked from commit 1ab2664)
The type names are different from those used for plans as a whole.
Moreover, there's no `MissingPlatform` state for plans.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 8b0a756)
(cherry picked from commit 2a1235b)
(cherry picked from commit 7ffd773)
MissingSignalNode and IncompleteTargets both represented "a resolved
target no longer exists". Keeping both is confusing.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 0c9c921)
(cherry picked from commit 8cc72cb)
(cherry picked from commit 0aa92dd)
Just let it accept its direct dependencies.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 859d722)
(cherry picked from commit e20416d)
(cherry picked from commit 11c087c)
The previous implementation of the readiness probe didn't categorize
the different errors that could occur when probing the controllers in
any way. Callers could not distinguish between transient and fundamental
problems.

This change classifies probe failures into the following categories:

- Target resolution failed (e.g. missing ControlNode)
- Readiness probe unsuccessful
- Everything else

The probe records the worst error encountered and cancels in-flight
probes as soon as a target resolution failure is detected. Callers can
then decide to treat those error categories differently.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit bf4bb4d)
(cherry picked from commit afb7660)
(cherry picked from commit 3c6b851)
Previously, problems with controller readiness would halt the execution
of an Autopilot plan, requiring manual intervention. This was explicitly
documented. However, it is anticipated that controllers won't always
be ready, e.g. during restarts.

Don't let a readiness failure terminate plan execution. Instead,
requeue the plan and retry later.

This renders the `InconsistentTargets` plan state unnecessary. The
`IncompleteTargets` state serves the same purpose. The controller
delegate now reports `Incomplete` rather than `Inconsistent`, which maps
to `IncompleteTargets`. Remove the `InconsistentTargets` state and keep
it only as a legacy state in the docs.

Note that a missing internal IP address is now considered a transient
error. Therefore, a controller that temporarily loses its IP address
will not fail the plan permanently. The quorumsafety test has been
updated to better reflect the intended outcome. It stops a real
controller, verifies that the plan remains in a schedulable state as
long as not all controllers are ready, restarts the controller, and
asserts that the plan completes. This avoids reliance on a fabricated
ControlNode with no IP address.

Signed-off-by: Tom Wieczorek <[email protected]>
(cherry picked from commit 0b4f1a6)
(cherry picked from commit ab141d3)
(cherry picked from commit caf4bcc)
@twz123 twz123 marked this pull request as ready for review January 22, 2026 13:45
@twz123 twz123 requested review from a team as code owners January 22, 2026 13:45
@twz123 twz123 requested review from juanluisvaladas and kke January 22, 2026 13:45
@twz123 twz123 merged commit 52833cf into k0sproject:release-1.32 Jan 27, 2026
266 of 270 checks passed
@twz123 twz123 deleted the backport-6996-to-release-1.32 branch January 27, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants