Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bump eve-libs: switch EVE to nettrace offload tracing#5282

Merged
rene merged 5 commits into
lf-edge:masterfrom
kperakis-zededa:nettrace-oom-issue
Oct 11, 2025
Merged

bump eve-libs: switch EVE to nettrace offload tracing#5282
rene merged 5 commits into
lf-edge:masterfrom
kperakis-zededa:nettrace-oom-issue

Conversation

@kperakis-zededa

@kperakis-zededa kperakis-zededa commented Oct 6, 2025

Copy link
Copy Markdown
Contributor

Switch EVE to nettrace offload tracing

Description

Network trace batches are streamed from eve-libs, via callback into a BoltDB file and then
exported as a single JSON at the end of the session.

What’s included:

  • Use offload tracing mode: nettrace batches → BoltDB → final JSON export.

  • Per-session artifacts under the configurable nettrace folder (default: /persist/nettrace)
    using a generated sessionUUID.

  • Automatic cleanup of stale DB/JSON files at boot.

  • JSON export produced at session end from the DB contents.

How to test and validate this PR

Here’s a simple, practical validation scenario you can use:

  1. Deploy a download that pulls a large file (>15 GB) from a cloud provider (AWS/GCP).

  2. While the download runs, monitor EVE memory (zedbox service RSS) and confirm it stays stable low (up to 230MB max). This verifies that offload mode is working and nettrace is not accumulating data in RAM.

  3. After the download completes verify that the nettrace.json file exists.

  4. Open the JSON and confirm it has the usual sections and values (as before this patch), e.g.: description, traceBeginAt, traceEndAt, dials, dnsQueries, httpRequests, tcpConns, udpConns, tlsTunnels.

Changelog notes

Improves EVE memory usage during heavy network activity by offloading trace metadata to disk.
EVE now retains only a small, capped window of network metadata in RAM (≈100 MB) and writes the rest to disk, preventing memory growth during large transfers (e.g., big file downloads) while still producing the final nettrace report.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

Comment thread pkg/pillar/types/global.go Outdated

// NetDumpEnable : enable publishing of network diagnostics (as tgz archives to /persist/netdump).
NetDumpEnable GlobalSettingKey = "netdump.enable"
// NetTraceFolder global setting key

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have this file as vendor for the following packages:

  • pkg/newlog
  • pkg/recovertpm/src
  • pkg/edgeview
  • pkg/installer/src
  • pkg/wwan/mmagent
  • pkg/vtpm/swtpm-vtpm

Please, update them as well. I think @eriknordmark missed this update from his PR to change from kubevirt to k....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, just saw for Edgeview we already have the update: #5281. But you will need to update anyways....

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rene done!

@kperakis-zededa kperakis-zededa force-pushed the nettrace-oom-issue branch 2 times, most recently from f7f61ba to 53b33f8 Compare October 6, 2025 08:46
@kperakis-zededa kperakis-zededa requested a review from rene October 6, 2025 09:23

@OhmSpectator OhmSpectator left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this brings a lot of new things into EVE. etcd and an embedded database (bbolt) are major changes to our architecture.
Has @eriknordmark already approved it?

Anyway, if we’re bringing them in:

Please split the changes into separate, well-described commits explaining what and why each part was changed. Right now it all comes in a single commit, which makes it hard to review.

Add documentation for all the new components and changes - at least short notes on how they fit into the system and how they’re supposed to interact. A basic explanation for developers would help a lot.

@kperakis-zededa

kperakis-zededa commented Oct 6, 2025

Copy link
Copy Markdown
Contributor Author

Wow, this brings a lot of new things into EVE. etcd and an embedded database (bbolt) are major changes to our architecture. Has @eriknordmark already approved it?

Anyway, if we’re bringing them in:

Please split the changes into separate, well-described commits explaining what and why each part was changed. Right now it all comes in a single commit, which makes it hard to review.

Add documentation for all the new components and changes - at least short notes on how they fit into the system and how they’re supposed to interact. A basic explanation for developers would help a lot.

@OhmSpectator

The EVE is the client of the nettrace library which is located on EVE-libs repo, and the PR is already merged (lf-edge/eve-libs#36).

I’ve updated the nettrace docs there as well.

Bbolt was decided and added in eve-libs. In EVE it’s only used when the offload option is enabled, otherwise tracing stays in-memory as before.

Also there are some addings on EVE, based in the code review of the EVE-LIBS repo:

  1. Added configurable paths for nettrace DB/JSON.

  2. Added cleanup of stale DB/JSON files on reboot.

  3. We aligned with eve-libs review feedback (naming, options, session UUID, etc.).

No other architectural changes on the EVE side.

On commits,I can split them if needed, but given the urgency I’d prefer to keep this as a single PR.

@rene @OhmSpectator your thoughts?

@OhmSpectator

Copy link
Copy Markdown
Member

Single PR is totally fine, but it needs more context to be clear.
Ok, I'll look through the original PR.
Nevertheless, I see that etcd and the DB are introduced here; they are not part of eve-libs (though they are used), so this should be documented somewhere in the EVE docs.

@shjala

shjala commented Oct 6, 2025

Copy link
Copy Markdown
Member

@kperakis-zededa could you please split this PR into multiple incremental commits, it is hard to review this way.

@OhmSpectator OhmSpectator requested a review from Copilot October 6, 2025 12:25

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes nettrace memory usage by implementing a disk-based storage system that limits in-memory metadata and offloads traces to persistent storage when thresholds are exceeded.

Key changes:

  • Introduces configurable nettrace folder path via global settings
  • Replaces in-memory JSON marshaling with BoltDB-based batch storage for network traces
  • Adds automatic cleanup functionality for old nettrace files

Reviewed Changes

Copilot reviewed 23 out of 81 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/pillar/types/global.go Adds NetTraceFolder global setting with default path
pkg/pillar/netdump/netdump.go Modifies netdump to read traces from disk files instead of in-memory JSON
pkg/pillar/go.mod Updates eve-libs dependency and adds bbolt for database storage
pkg/pillar/controllerconn/send.go Implements BoltDB-based trace storage with session UUIDs and disk offloading
pkg/pillar/cmd/zedagent/zedagent.go Adds cleanup function for old nettrace files
Multiple cmd files Updates function calls to include netTraceFolder parameter

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread pkg/pillar/controllerconn/send.go
Comment thread pkg/pillar/controllerconn/send.go Outdated
@github-actions github-actions Bot requested a review from OhmSpectator October 6, 2025 14:24
@kperakis-zededa kperakis-zededa changed the title Reduce nettrace memory usage by limiting in-memory metadata and offlo… Switch EVE to nettrace offload tracing Oct 6, 2025
@kperakis-zededa

Copy link
Copy Markdown
Contributor Author

Single PR is totally fine, but it needs more context to be clear. Ok, I'll look through the original PR. Nevertheless, I see that etcd and the DB are introduced here; they are not part of eve-libs (though they are used), so this should be documented somewhere in the EVE docs.

@OhmSpectator @rene I checked all the README.md files but didn’t find anything about nettrace. Could you suggest where I should document it? Should I add a section to the Pillar README.md (and if so, under which heading), or create a new topic? Also, should I include more details about the current nettrace functionality and how EVE interacts with eve-libs, etc.

@milan-zededa

Copy link
Copy Markdown
Contributor

Single PR is totally fine, but it needs more context to be clear. Ok, I'll look through the original PR. Nevertheless, I see that etcd and the DB are introduced here; they are not part of eve-libs (though they are used), so this should be documented somewhere in the EVE docs.

@OhmSpectator @rene I checked all the README.md files but didn’t find anything about nettrace. Could you suggest where I should document it? Should I add a section to the Pillar README.md (and if so, under which heading), or create a new topic? Also, should I include more details about the current nettrace functionality and how EVE interacts with eve-libs, etc.

Please update the documentation here: https://github.com/lf-edge/eve/blob/master/docs/DEVICE-CONNECTIVITY.md#netdump-and-nettrace

@kperakis-zededa

Copy link
Copy Markdown
Contributor Author

Single PR is totally fine, but it needs more context to be clear. Ok, I'll look through the original PR. Nevertheless, I see that etcd and the DB are introduced here; they are not part of eve-libs (though they are used), so this should be documented somewhere in the EVE docs.

@OhmSpectator @rene I checked all the README.md files but didn’t find anything about nettrace. Could you suggest where I should document it? Should I add a section to the Pillar README.md (and if so, under which heading), or create a new topic? Also, should I include more details about the current nettrace functionality and how EVE interacts with eve-libs, etc.

Please update the documentation here: https://github.com/lf-edge/eve/blob/master/docs/DEVICE-CONNECTIVITY.md#netdump-and-nettrace

Thanks @milan-zededa

@rene rene changed the title Switch EVE to nettrace offload tracing bump eve-libs: switch EVE to nettrace offload tracing Oct 6, 2025
@rene

rene commented Oct 6, 2025

Copy link
Copy Markdown
Contributor

@kperakis-zededa , @milan-zededa already answered your question about the documentation. I just added "bump eve-libs:" to the PR's title in order to make it more clear we are bumping eve-libs with the reworked nettrace.

@rene

rene commented Oct 6, 2025

Copy link
Copy Markdown
Contributor

@kperakis-zededa , you need to bump pkg/recovertpm hash:

Error: /home/runner/work/eve/eve/pkg/debug/Dockerfile uses lfedge/eve-recovertpm:8a7aceb428f78a1f0717e15b7d69998ea84071cf but 5de54f94c0a60ea7aea7ee7681c92a74ed378638 is built in this repo


// deleteOldNetTraceFiles removes old nettrace files
func deleteOldNetTraceFiles(gcp *types.ConfigItemValueMap) {
log.Noticef("cleanupNettraceFiles")

@milan-zededa milan-zededa Oct 9, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful for troubleshooting to put this log at the end of the function and log filenames of all the files that were removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@github-actions github-actions Bot requested a review from milan-zededa October 9, 2025 09:07
Comment thread pkg/pillar/netdump/netdump.go Outdated
traceInJSON, err := json.MarshalIndent(req.NetTrace, "", " ")
if nt, ok := req.NetTrace.(nettrace.HTTPTrace); ok {
file := "nettrace_" + nt.SessionUUID + ".json"
filePath := netTraceFolderPath + "/" + file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion to use path.Join here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@milan-zededa

milan-zededa commented Oct 9, 2025

Copy link
Copy Markdown
Contributor

After the download completes verify that the nettrace.json file exists.

In this testing instruction step, I would just recommend to mention that this is inside a netdump tar and where to find it.
(and tester should maybe also check the content a bit, to make sure everything is in there: tcp traces, udp traces, https traces, etc.)

@milan-zededa milan-zededa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few last suggestions, otherwise LGTM

@milan-zededa milan-zededa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed latest changes

@OhmSpectator OhmSpectator left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.
But please create a ticket to address the potential issue of the final .json file growing in size.
Also, @rucoder, please check that the struct changes here don’t break the TUI.

@rene rene left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rene

rene commented Oct 10, 2025

Copy link
Copy Markdown
Contributor

@rucoder , FWIW, I did some tests as well and the TUI is working fine:
image

@milan-zededa

Copy link
Copy Markdown
Contributor

@rene Could you please check the runner used for Go tests? It looks like it has no space left?

@rene

rene commented Oct 10, 2025

Copy link
Copy Markdown
Contributor

@rene Could you please check the runner used for Go tests? It looks like it has no space left?

@milan-zededa we don't run Go tests in our runners, that's why is running out of disk space. @kperakis-zededa however, ran the tests locally and posted the logs here, that's how we are doing on such cases....

It includes the eve-libs library for the offload nettrace procedure.

Signed-off-by: Konstantinos Perakis <[email protected]>
It includes the database, Bbolt library for the offload nettrace procedure.

Signed-off-by: Konstantinos Perakis <[email protected]>
It includes the Pillar changes for the offload nettrace procedure.

Here are the main changes:

- Use offload tracing mode: nettrace batches → BoltDB → final JSON export.

- Per-session artifacts under nettrace folder (/persist/nettrace)
using a generated sessionUUID.

- Automatic cleanup of stale DB/JSON files at boot.

- JSON export produced at session end from the DB contents.

Signed-off-by: Konstantinos Perakis <[email protected]>
-Update Documentation
-Update DockerFile
-Update Tests

Signed-off-by: Konstantinos Perakis <[email protected]>
-Add Nettrace folder, for the update interfaces functionality.

Signed-off-by: Konstantinos Perakis <[email protected]>
@rene rene merged commit 254c70a into lf-edge:master Oct 11, 2025
42 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

main-quest The fate of the project rests on this PR. Prioritise review to advance the storyline!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants