HEP: Migrate from Wicked to NetworkManager [skip ci] #9039

tserong · 2025-09-03T10:21:25Z

Problem:

We need to migrate from Wicked to NetworkManager in order to be able to update our base OS to SL Micro 6.x.

Solution:

Described in this HEP

Related Issue(s):

#3418

Test plan:

Included in this HEP

w13915984028

The HEP looks have covered the broad areas, thanks.

w13915984028 · 2025-09-03T12:10:33Z

enhancements/20250902-networkmanager.md

+
+With new installs, `/oem/90_custom.yaml` will include NetworkManager connection profiles instead of `ifcfg-*` files.
+
+With upgrades, the existing `/oem/90_custom.yaml` file will still include the old `ifcfg-*` files, which will be ignored by NetworkManager.


harvester.config might be outdated after several upgrades and user's potential changes; the general upgrade practice is to generate the target networkmanager.yaml from the active 90_custom.yaml

request user to manually change before/after upgrade is not acceptable in most cases

I understand the concern here about the requesting manual changes, but there are some difficulties with generating NetworkManager config based on what's in 90_custom.yaml.

What we need, to generate NetworkManager config, is essentially the data in install.management_interface from harvester.config (i.e. whether it's DHCP or static IP, the list of interfaces to be bonded, the bond options, the mtu, the VLAN ID). This is all explicitly stated in harvester.config and can be reliably read, at least since Harvester v1.1 (earlier in v1.0 and v0.3 the format was different, so actually if someone's got a system that old, that would be another problem with the approach I'm suggesting here :-/)

The data we need is not explicit or trivial to extract from 90_custom.yaml. Rather, it's spread over a bunch of ifcfg-* files embedded in that YAML. So we would have to read 90_custom.yaml, look for all the files directives, check those to see which ones were ifcfg files, parse them all and figure out what the settings are based on that. This strikes me as complicated and potentially error prone, especially if we have to handle every possible thing that someone could have added to an ifcfg-* file. If any of this were to break, the user would be forced to fix it manually.

Note that there's not a 1:1 mapping between ifcfg-* files and NetworkManager connection profiles, and SLE doesn't include any tools to perform this migration.

In the happy case (Harvester was originally installed with v1.1 or newer, and no manual changes have been made to management interface configuration via 90_custom.yaml), it's going to be much simpler to read harvester.config than trying to parse 90_custom.yaml.

In the unhappy case (Harvester originally installed with v1.0 or older, or manual changes made to management interface configuration via 90_custom.yaml), reading harvester.config may not give the correct configuration. But OTOH, trying to parse 90_custom.yaml can't be guaranteed to give a reliable result either.

we may need to document that users ensure harvester.config needs to match the existing network definition before the user initiates the upgrade. This would only be needed in case the users have made post install networking changes by editing the harvester-installer generated elemental cloudinit.

We also need to handle the schema change in harvester.config which was introduced in Harvester v1.1. This has been stable since.

However in harvester v1.0 our networking configuration looked like

install: mode: create networks: harvester-mgmt: interfaces: - name: ens5 hwAddr: "B8:CA:3A:6A:64:7C" method: dhcp

compared to post v1.1

install: mode: create management_interface: interfaces: - name: ens5 hwAddr: "B8:CA:3A:6A:64:7C" method: dhcp

We need to handle this as part of the upgrade as we may have users running since Harvester v1.0 and the harvester.config has remained untouched.

A possible option would be to provide a utility/cli which leverages the cloudinit generation logic and can be run in advance on each node in the cluster. This would help users review the generated network configuration before the actual upgrade. This will allow them the opportunity to fine tune the harvester.config to ensure the generate network configuration is matching the current network setup.

I've opened a couple of enhancement issues to cover the above:

#9300
#9301

enhancements/20250902-networkmanager.md

irishgordo · 2025-09-10T01:44:33Z

Just as a small mention on the Docs aspect 😄 , thinking:

Harvester "post-install" Runtime Change, updating DNS Servers would be great to get updates:
- not sure if netconfig update would be a reality anymore or not 🤔 , guessing the DNS STATIC SERVERS list, won't be stashed in /etc/sysconfig/network/config anymore?

Since:

Story 2
I'm upgrading from Harvester v1.6.x to v1.7.0. I have not made any post-installation changes to network configuration by manually editing /oem/*.yaml files. I expect networking to continue to operate correctly after the upgrade.

The user may also likely have persisted that change, in /oem so they likely wouldn't meet the qualifications of Story 2 -> but Fresh Installs of Harvester Post-Install, a user may still want to change / update DNS

ihcsim · 2025-09-10T22:41:01Z

enhancements/20250902-networkmanager.md

+
+No special user action is required. Everything should just appear to work as it currently does.
+
+#### Upgrades Where Post-installation Configuration Changes Have Been Made


I want to better understand the failure domains and possible scenarios. Basically, what is the worst thing that can happen and how do we recover? Presumably, if the switch over from wicked to nm failed, the problem will be very visible, right? The host loses connectivity, RKE2 reports the node status as NotReady then Unknown. Upgrade is halted. Likely, the admin can still access the host (via serial console?) to diagnose and repair.

If the host loses connectivity, does its role play a role (no pun intended) here? Will role promotion happen while the upgrade is happening? If yes, will the next promoted host be promoted to the correct version? Do we need to ask user to add another management host to ensure quorum in case of partition? Should upgrade be done on management hosts first? (I assume we are already doing this, but can't remember.)

Taking it a step further, is it possible for the (networking stack) upgrade to complete successfully, but user applications to be completely broken, say because some customization translation was amiss, they failed silently etc.? If yes, then what can user do to detect the problem mid-upgrade, before it takes down the entire user application layer? Is it possible to pause the upgrade?

Complete failure would leave the host without network connectivity as you say, and the admin would need to access it via remote console to diagnose and repair.

Role promotion should only happen if a host is actually removed. AFAIK that won't happen automatically if a host just becomes uncontactable for an extended period.

Could user apps be broken by a messed up network config? Uh.. I'm not sure. All we do statically (i.e. the stuff we're changing the configuration of here) is the management interface and how it comes up on boot. Other networks that might affect workloads (extra cluster networks, vm networks, storage network) are all created by harvester dynamically at runtime and so thus shouldn't(!) be affected by this change.

Complete failure would leave the host without network connectivity as you say, and the admin would need to access it via remote console to diagnose and repair.

Just to add to that -- it turns out this is (or should be) really easy to do if necessary since harvester/harvester-installer#1150 went in:

Login via remote console and become root

Edit /oem/harvester.config and make sure install: managementinterface: .... specifies the network configuration you want

Run harvester-installer generate-network-yaml. By default (i.e. with no other options specified) this will create /oem/91_networkmanager.yaml based on oem/harvester.config.

Reboot and enjoy your now functional network

tserong · 2025-09-16T09:47:56Z

@irishgordo yeah, instead of tweaking /etc/sysconfig/network/config, it'll be something like running nmcli con modify bridge-mgmt ipv4.dns 8.8.8.8,1.1.1.1 && nmcli device reapply mgmt-br to apply the change live. To persist it, you'll be looking for that nmcli con modify command in /oem/90_custom.yaml, rather than the sed -i 's/^NETCONFIG_DNS_STATIC_SERVERS line.

irishgordo · 2025-09-30T18:34:17Z

@tserong for Harvester ISO the "Net Install" variant will there be any extra considerations needed from the network manager switch?

tserong · 2025-10-10T09:20:39Z

for Harvester ISO the "Net Install" variant will there be any extra considerations needed from the network manager switch?

I don't believe so @irishgordo but we should probably give it a quick test run to make sure.

tserong · 2025-10-10T09:22:35Z

BTW there's a WIP docs PR at harvester/docs#897, which between the finished bits and the bits still marked "DOCS TODO" should hopefully cover all the relevant changes.

Signed-off-by: Tim Serong <[email protected]>

tserong requested a review from a team September 3, 2025 10:21

github-actions bot assigned tserong Sep 3, 2025

tserong force-pushed the wip-networkmanager-hep branch from 65c461a to 81a1c2a Compare September 3, 2025 10:24

w13915984028 reviewed Sep 3, 2025

View reviewed changes

ihcsim reviewed Sep 10, 2025

View reviewed changes

tserong mentioned this pull request Sep 16, 2025

feat: Use NetworkManager instead of Wicked harvester/harvester-installer#1141

Merged

tserong mentioned this pull request Oct 10, 2025

[FEATURE] Change OS network tool from wicked to the NetworkManager #3418

Open

This was referenced Oct 13, 2025

[ENHANCEMENT] Make NetworkManager config generation work with harvester v1.0 config files #9300

Open

[ENHANCEMENT] need tool to assist in verifying network config before upgrade to v1.7.x #9301

Open

HEP: Migrate from Wicked to NetworkManager [skip ci]

c0b83ad

Signed-off-by: Tim Serong <[email protected]>

tserong force-pushed the wip-networkmanager-hep branch from 81a1c2a to c0b83ad Compare October 13, 2025 08:18


		With new installs, `/oem/90_custom.yaml` will include NetworkManager connection profiles instead of `ifcfg-*` files.

		With upgrades, the existing `/oem/90_custom.yaml` file will still include the old `ifcfg-*` files, which will be ignored by NetworkManager.


		No special user action is required. Everything should just appear to work as it currently does.

		#### Upgrades Where Post-installation Configuration Changes Have Been Made

HEP: Migrate from Wicked to NetworkManager [skip ci] #9039

Are you sure you want to change the base?

HEP: Migrate from Wicked to NetworkManager [skip ci] #9039

Uh oh!

Conversation

tserong commented Sep 3, 2025

Problem:

Solution:

Related Issue(s):

Test plan:

Uh oh!

w13915984028 left a comment

Choose a reason for hiding this comment

Uh oh!

w13915984028 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

tserong Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

ibrokethecloud Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

tserong Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

irishgordo commented Sep 10, 2025

Uh oh!

ihcsim Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tserong Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

tserong Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tserong commented Sep 16, 2025

Uh oh!

irishgordo commented Sep 30, 2025

Uh oh!

tserong commented Oct 10, 2025

Uh oh!

tserong commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ihcsim Sep 10, 2025 •

edited

Loading

tserong Oct 1, 2025 •

edited

Loading