Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[16.0] Harden NI Reconciler to prevent and recover from dnsmasq host file conflicts#5384

Merged
rene merged 2 commits into
lf-edge:16.0from
milan-zededa:16.0-harden-dnsmasq-configurator
Nov 12, 2025
Merged

[16.0] Harden NI Reconciler to prevent and recover from dnsmasq host file conflicts#5384
rene merged 2 commits into
lf-edge:16.0from
milan-zededa:16.0-harden-dnsmasq-configurator

Conversation

@milan-zededa

Copy link
Copy Markdown
Contributor

Description

Backport of #5361

How to test and validate this PR

This issue was originally detected during internal automated testing. Re-running the same test suite should confirm whether the fix is effective.

To reproduce and verify manually:

  1. Onboard an EVE device.
  2. Deploy any application with at least one virtual network interface connected to a Local network instance.
  3. Immediately replace the application instance with a new one that has the same configuration but a different UUID. (this “instant replacement” scenario can be easily triggered via Terraform or the controller API; it’s harder to reproduce through the UI)
  4. Wait for the old instance to fully undeploy and for the new instance to reach the Online state.
  5. Delete the new application instance.
  6. Verify that no errors are reported for the Local network instance (app was successfully disconnected from the NI).
  7. Confirm that new applications can still be deployed into the same network instance (i.e., the instance is not stuck in a broken state).

Changelog notes

  • Improved reliability of application networking during app replacement or redeployment.
  • Prevented DNS/DHCP config conflicts when multiple applications attempt to use the same display name.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR
  • I've added a reference link to the original PR
  • PR's title follows the template
  • I've checked the boxes above, or I've provided a good reason why I didn't check them.

…uring removal

Previously, when two applications accidentally shared the same DNS host
file (for example, due to a duplicate DisplayName), the removal of the second
application would fail because the shared file had already been deleted during
the removal of the first app. This failure caused the dnsmasq configuration
item to enter a broken state, preventing any new applications from being
deployed into the same network instance.

The only workaround in such cases was to redeploy the entire network instance
and all applications connected to it.

This change hardens the Dnsmasq configurator by handling missing DNS and DHCP
host files gracefully. If a host file is not found during removal, a warning is
logged, but the configuration item is not marked as failed.
Although the original issue only affected DNS host files, it is also prudent to
handle missing DHCP host files in the same way to ensure that their absence does
not leave the network instance in a permanently broken state.

Signed-off-by: Milan Lenco <[email protected]>
(cherry picked from commit 3536a1f)
Normally, deploying two applications with the same DisplayName is not allowed.
However, it is possible for user to replace an existing app definition inside
EdgeDevConfig with a new one that has a different UUID but reuses the same
DisplayName.

Because shutting down the original app takes some time, zedmanager may start
bringing up the new app while the old one is still connected. In this window,
zedrouter may attempt to connect both apps to the same network instance.
Since dnsmasq DNS host files are named after the app DisplayName, both apps would
end up using the same file. Once the obsolete app is disconnected, it removes
the shared host file, breaking name-to-IP resolution for the new app
inside the NI.

To prevent this, zedrouter now disallows multiple apps with the same
DisplayName from being connected to a network instance simultaneously.
In such cases, the new app will be marked with an error until the old app is
fully removed. Once cleanup completes, the retry timer will reconnect the new
app and clear the error state.

Signed-off-by: Milan Lenco <[email protected]>
(cherry picked from commit 347b860)

@eriknordmark eriknordmark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rene rene merged commit 16ce7ef into lf-edge:16.0 Nov 12, 2025
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants