Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@milan-zededa
Copy link
Contributor

@milan-zededa milan-zededa commented Aug 21, 2025

Description

  • Refactored the LPS-handling code inside zedagent into a separate component, LocalCmdAgent. This helps mitigate the complexity and unorganized structure of the already large zedagent package.
  • No functional changes were introduced; the code was reorganized, cleaned up, and better documented.
  • Improved thread-safety for variables that previously lacked proper protection, reducing potential race conditions.
  • Added user-facing documentation for the Local Profile Server (LPS).
  • Added initial developer-facing documentation for zedagent, with focus on LPS handling.

How to test and validate this PR

No functional changes were made, but the entire LPS functionality should be retested.
Deploy and configure LPS, then test all the endpoints: https://github.com/lf-edge/eve-api/blob/main/PROFILE.md

  • try to change local profile
  • try to enable/disable radio silence
  • check app info published to LPS
  • try to restart and purge app via LPS
  • check device info published to LPS
  • try to shutdown all apps, shutdown the entire device and reboot the device
  • if device has cellular modem with a GNSS receiver (with a good signal reception), check location published to LPS

The same commands should be also tested with the Local Operator Console (LOC) in air-gaped environment.

Note that I performed all these tests and additionally we have a small test suite in eden for LPS: https://github.com/lf-edge/eden/blob/master/tests/workflow/lps-loc.tests.txt

Changelog notes

  • Refactored Local Profile Server (LPS) handling in the zedagent microservice into a dedicated LocalCmdAgent component.
  • Added user-facing documentation for Local Profile Server (LPS).

PR Backports

This is just refactoring and does not need to be backported.

  • 14.5-stable: No
  • 13.4-stable: No

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR
  • I've checked the boxes above, or I've provided a good reason why I didn't check them.

@github-actions github-actions bot requested a review from uncleDecart August 21, 2025 10:13
@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch 3 times, most recently from ba5f340 to 4b3b3fc Compare August 21, 2025 10:53
@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch from 4b3b3fc to 5929bd9 Compare August 21, 2025 12:58
@milan-zededa
Copy link
Contributor Author

@christoph-zededa
Copy link
Contributor

@christoph-zededa Btw. in my refactoring I added return statement here: https://github.com/milan-zededa/eve/blob/zedagent-lps-refactor/pkg/pillar/cmd/zedagent/localcommand.go#L43 I believe that it was missing unintentionally: https://github.com/lf-edge/eve/blob/master/pkg/pillar/cmd/zedagent/localinfo.go#L714-L725

Sounds good. Once it is merged, I will do a backport of that fix.

@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch from 5929bd9 to 4cbb4cd Compare August 21, 2025 15:19
christoph-zededa added a commit to christoph-zededa/eve that referenced this pull request Aug 27, 2025
issue has been found here:
lf-edge#5191 (comment)

Signed-off-by: Christoph Ostarek <[email protected]>
@OhmSpectator OhmSpectator added the main-quest The fate of the project rests on this PR. Prioritise review to advance the storyline! label Aug 28, 2025
Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New docs are super useful, thanks! Could you please link them in https://github.com/lf-edge/eve/blob/master/docs/mkdocs/mkdocs.yml ?

@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch from 4cbb4cd to 030506a Compare August 28, 2025 13:33
@milan-zededa
Copy link
Contributor Author

milan-zededa commented Aug 28, 2025

New docs are super useful, thanks! Could you please link them in https://github.com/lf-edge/eve/blob/master/docs/mkdocs/mkdocs.yml ?

Sure, added link to mkdocs.yml

@github-actions github-actions bot requested a review from OhmSpectator August 28, 2025 13:34
@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch from 030506a to e5afc0c Compare August 28, 2025 13:37
throttled bool
}

// newTaskTicker creates a new taskTicker with a randomized firing interval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The research paper from the 1980's says you need 0.5 to 1.5 to avoid synchronization.

Copy link
Contributor Author

@milan-zededa milan-zededa Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can put there anything, in this PR I just kept what was already there:

if lc.beforeStart != nil {
lc.beforeStart()
}
locked := lc.taskMx.TryRLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we hold a (read) lock across http invocations in the old/current implementation?
Means the read lock can be held for minutes.

Copy link
Contributor Author

@milan-zededa milan-zededa Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't (but then we had some potential race conditions in the old implementation).
But I enhanced this now so that Pause will not be blocked by HTTP request. I added function:

// runInterruptible temporarily releases the task lock to allow Pause() to proceed,
// runs the provided callback, then re-acquires the lock. Returns true if a pause
// was triggered while the callback was running, indicating the caller should
// discard or retry the operation.
func (lc *taskControl) runInterruptible(callback func()) (wasPaused bool) {
...

Which is now used to run HTTP operations without lock being held and discard results if there was a Pause while the HTTP request was running.

// GetLocalAppRestartCmd returns the most recent locally issued restart
// command for the given app, or an empty command if none exists.
func (lc *LocalCmdAgent) GetLocalAppRestartCmd(appUUID uuid.UUID) types.AppInstanceOpsCmd {
lc.appCommandsMx.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the callers ok with this potentially blocking for minutes when the http call needs to time out?

Copy link
Contributor Author

@milan-zededa milan-zededa Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not locked during HTTP call execution.
Only taskMx was, but I changed it to avoid Pause being blocked by HTTP operations: #5191 (comment)

// from the primary controller is being applied. Or vice versa.
getconfigCtx.sideController.Lock()
defer getconfigCtx.sideController.Unlock()
resume := getconfigCtx.localCmdAgent.Pause()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this block for minutes if a local task is waiting for http to time out while holding the read lock (from startTask)?

Copy link
Contributor Author

@milan-zededa milan-zededa Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- fixed: #5191 (comment)

@milan-zededa milan-zededa force-pushed the zedagent-lps-refactor branch 4 times, most recently from f0143a7 to fd53736 Compare September 3, 2025 12:14
christoph-zededa added a commit to christoph-zededa/eve that referenced this pull request Sep 3, 2025
issue has been found here:
lf-edge#5191 (comment)

Signed-off-by: Christoph Ostarek <[email protected]>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please test with a config where the http post/get times due to unreachable/blocked connectivity to the server.

OhmSpectator pushed a commit that referenced this pull request Sep 8, 2025
issue has been found here:
#5191 (comment)

Signed-off-by: Christoph Ostarek <[email protected]>
@OhmSpectator
Copy link
Member

Hmmm... LPS/LOC tests fails: --- FAIL: TestEdenScripts/dev_local_info (1549.20s)

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOC/LPS tests are failing. Requesting a change to prevent occasional merging.

@milan-zededa
Copy link
Contributor Author

LOC/LPS tests are failing. Requesting a change to prevent occasional merging.

I'm unable to reproduce these failures locally. So I temporarily added new commit with some extra log messages that might help. Could you please restart eden tests?

Copy link
Member

@uncleDecart uncleDecart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kicking the tests!

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the tests

@milan-zededa
Copy link
Contributor Author

@uncleDecart / @OhmSpectator When you get a chance, please restart the eden tests again. I submitted a small change which should fix the failing tests (but for now keeping the temporary commit with extra logs).

@OhmSpectator
Copy link
Member

They should start once the build is done

@milan-zededa
Copy link
Contributor Author

Looks like the failing test is fixed, I removed the extra logs for troubleshooting.

- Refactored the LPS-handling code inside zedagent into a separate
  component, LocalCmdAgent. This helps mitigate the complexity and
  unorganized structure of the already large zedagent package.
- No functional changes were introduced; the code was reorganized,
  cleaned up, and better documented.
- Improved thread-safety for variables that previously lacked proper
  protection, reducing potential race conditions.
- Added user-facing documentation for the Local Profile Server (LPS).
- Added initial developer-facing documentation for zedagent, with
  focus on LPS handling.

Signed-off-by: Milan Lenco <[email protected]>
@milan-zededa
Copy link
Contributor Author

@OhmSpectator Finally everything is green. When you get a chance, please merge.

@OhmSpectator OhmSpectator merged commit 3ad0e22 into lf-edge:master Sep 18, 2025
73 of 75 checks passed
@OhmSpectator
Copy link
Member

@OhmSpectator Finally everything is green. When you get a chance, please merge.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

main-quest The fate of the project rests on this PR. Prioritise review to advance the storyline!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants