-
Notifications
You must be signed in to change notification settings - Fork 93
feat(upgrade): pausable node upgrade #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Introduce the node upgrade pause feature by describing how to pause and unpause node upgrades. Signed-off-by: Zespre Chang <[email protected]>
w13915984028
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
docs/upgrade/automatic.md
Outdated
|
|
||
| # Update the annotation to unpause the node | ||
| $ kubectl -n harvester-system annotate --overwrite upgrades hvst-upgrade-6mcwv harvesterhci.io/node-upgrade-pause-map='{"charlie-1-tink-system":"unpause","charlie-2-tink-system":"pause","charlie-3-tink-system":"pause"}' | ||
| upgrade.harvesterhci.io/hvst-upgrade-6mcwv annotate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L310 is better to be removed
docs/upgrade/automatic.md
Outdated
| :::info | ||
| Harvester applies the pause-relevant configuration during upgrade initialization. After that, any further changes to the fields under the `nodeUpgradeOption` option will not affect the current upgrade and can only take effect upon the next upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to here or other places, mention that when the upgrade is ongoing, as long as a node is not upgraded yet, it can be freely added or removed to/from the upgrade object's annotation; the upgrade makes decision upon the upgrade object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this can be considered an advanced tip for now, since we don't have a counterpart UI component for it. The main entrance for the pause node upgrade feature is through configuring the upgrade-config setting.
ihcsim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content lgtm - thanks
docs/advanced/settings.md
Outdated
|
|
||
| - `nodeUpgradeOption`: Options for the node upgrading phase. | ||
|
|
||
| In the node upgrade phase, Harvester upgrades each node's RKE2 and operating system autonomously. Harvester tries to migrate all virtual machines running on the to-be-upgraded node to other nodes. Harvester will shut down any virtual machine considered non-live-migratable to ensure the node upgrade goes smoothly. The `nodeUpgradeOption` provides customizability for how Harvester should behave upon node upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autonomously
I think if we are trying to convey that the operation either completes fully or fails, then "atomic" might be closer than "autonomous". So maybe something like:
"The node upgrade is an atomic operation, which includes upgrading the node's RKE2 components and operating system. The upgrade is either fully completed or failed, with no half-finished state."
| hvst-upgrade-6mcwv 4h16 | ||
|
|
||
| # Update the annotation to unpause the node | ||
| $ kubectl -n harvester-system annotate --overwrite upgrades hvst-upgrade-6mcwv harvesterhci.io/node-upgrade-pause-map='{"charlie-1-tink-system":"unpause","charlie-2-tink-system":"pause","charlie-3-tink-system":"pause"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, removing charlie-1-tink-system works too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If you do so, the upgrade controller does not know what node has been removed from the annotation. As a result, the relevant machine-plan secret cannot be enqueued for reconciliation. The upgrade holds still until you "touch" the corresponding machine-plan secret.
jillian-maroket
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review done
Co-authored-by: Jillian Maroket <[email protected]> Signed-off-by: Zespre Chang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces documentation for the new node upgrade pause feature available in Harvester v1.7.0. The feature allows administrators to pause automatic node upgrades for specific nodes or all nodes in the cluster, enabling manual maintenance or verification tasks to be performed before resuming the upgrade process.
Key changes:
- Added comprehensive documentation for the
nodeUpgradeOptionsetting configuration - Provided step-by-step instructions for pausing and resuming node upgrades
- Included kubectl command examples and UI screenshots for visual confirmation
Reviewed changes
Copilot reviewed 2 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| docs/upgrade/automatic.md | Added new "Customize Node Upgrade" section with detailed instructions on pausing/resuming node upgrades, including kubectl examples and UI screenshots |
| docs/advanced/settings.md | Updated upgrade-config setting documentation to include the new nodeUpgradeOption field with its sub-fields (mode and pauseNodes), and reformatted the default value as JSON for better readability |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| You can use the `nodeUpgradeOption` option in the [`upgrade-config`](advanced/settings.md#upgrade-config) setting to pause node upgrades. | ||
| - Pause for all nodes in the cluster: Change the value of the `mode` field to `manual`. | ||
| - Pause for specific nodes: List the node names in the `pauseNodes` field. Nodes not included in the list are automatically upgraded. |
Copilot
AI
Dec 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is incomplete regarding the relationship between mode and pauseNodes. Line 265 states you can pause specific nodes by listing them in pauseNodes, but according to settings.md line 858, this only works when mode is set to manual. When mode is auto, the pauseNodes field is ignored. This important prerequisite should be mentioned here to avoid confusion.
| - Pause for specific nodes: List the node names in the `pauseNodes` field. Nodes not included in the list are automatically upgraded. | |
| - Pause for specific nodes: When `mode` is set to `manual`, list the node names in the `pauseNodes` field. Nodes not included in the list are automatically upgraded. When `mode` is `auto`, the `pauseNodes` field is ignored. |
| ### Pausing Node Upgrades | ||
| You can use the `nodeUpgradeOption` option in the [`upgrade-config`](advanced/settings.md#upgrade-config) setting to pause node upgrades. |
Copilot
AI
Dec 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link path is incorrect. Since automatic.md is in the docs/upgrade directory and settings.md is in docs/advanced, the relative path should start with "../" to navigate up one directory level. The correct path should be "../advanced/settings.md#upgrade-config" (consistent with other references to settings.md in this file on lines 92 and 108).
| You can use the `nodeUpgradeOption` option in the [`upgrade-config`](advanced/settings.md#upgrade-config) setting to pause node upgrades. | |
| You can use the `nodeUpgradeOption` option in the [`upgrade-config`](../advanced/settings.md#upgrade-config) setting to pause node upgrades. |
docs/advanced/settings.md
Outdated
|
|
||
| :::info important | ||
|
|
||
| Upgrading of nodes listed in this field is _paused definitely_ until you take specific actions to [resume the process](upgrade/automatic.md#how-to-resume-a-node-to-continue-with-node-upgrade). Given that Harvester upgrades nodes sequentially, this implies that the entire upgrade progress is paused as well. |
Copilot
AI
Dec 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "paused definitely" appears to be a spelling error. The intended word is likely "paused indefinitely" (meaning paused for an unlimited time until action is taken), not "definitely" (meaning certainly).
| Upgrading of nodes listed in this field is _paused definitely_ until you take specific actions to [resume the process](upgrade/automatic.md#how-to-resume-a-node-to-continue-with-node-upgrade). Given that Harvester upgrades nodes sequentially, this implies that the entire upgrade progress is paused as well. | |
| Upgrading of nodes listed in this field is _paused indefinitely_ until you take specific actions to [resume the process](upgrade/automatic.md#how-to-resume-a-node-to-continue-with-node-upgrade). Given that Harvester upgrades nodes sequentially, this implies that the entire upgrade progress is paused as well. |
Signed-off-by: Zespre Chang <[email protected]>
Signed-off-by: Zespre Chang <[email protected]>
Signed-off-by: Zespre Chang <[email protected]>
Signed-off-by: Zespre Chang <[email protected]>
brandboat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Introduce the node upgrade pause feature by describing how to pause and unpause node upgrades.
Problem:
The new node upgrade pause feature lacks documentation.
Solution:
Outline the new settings and provide steps to pause/unpause node upgrades.
Related Issue(s):
harvester/harvester#8980
Test plan:
Additional documentation or context