bump eve-libs: switch EVE to nettrace offload tracing#5282
Conversation
|
|
||
| // NetDumpEnable : enable publishing of network diagnostics (as tgz archives to /persist/netdump). | ||
| NetDumpEnable GlobalSettingKey = "netdump.enable" | ||
| // NetTraceFolder global setting key |
There was a problem hiding this comment.
We also have this file as vendor for the following packages:
- pkg/newlog
- pkg/recovertpm/src
- pkg/edgeview
- pkg/installer/src
- pkg/wwan/mmagent
- pkg/vtpm/swtpm-vtpm
Please, update them as well. I think @eriknordmark missed this update from his PR to change from kubevirt to k....
There was a problem hiding this comment.
Btw, just saw for Edgeview we already have the update: #5281. But you will need to update anyways....
f7f61ba to
53b33f8
Compare
OhmSpectator
left a comment
There was a problem hiding this comment.
Wow, this brings a lot of new things into EVE. etcd and an embedded database (bbolt) are major changes to our architecture.
Has @eriknordmark already approved it?
Anyway, if we’re bringing them in:
Please split the changes into separate, well-described commits explaining what and why each part was changed. Right now it all comes in a single commit, which makes it hard to review.
Add documentation for all the new components and changes - at least short notes on how they fit into the system and how they’re supposed to interact. A basic explanation for developers would help a lot.
The EVE is the client of the nettrace library which is located on EVE-libs repo, and the PR is already merged (lf-edge/eve-libs#36). I’ve updated the nettrace docs there as well. Bbolt was decided and added in eve-libs. In EVE it’s only used when the offload option is enabled, otherwise tracing stays in-memory as before. Also there are some addings on EVE, based in the code review of the EVE-LIBS repo:
No other architectural changes on the EVE side. On commits,I can split them if needed, but given the urgency I’d prefer to keep this as a single PR. @rene @OhmSpectator your thoughts? |
|
Single PR is totally fine, but it needs more context to be clear. |
|
@kperakis-zededa could you please split this PR into multiple incremental commits, it is hard to review this way. |
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes nettrace memory usage by implementing a disk-based storage system that limits in-memory metadata and offloads traces to persistent storage when thresholds are exceeded.
Key changes:
- Introduces configurable nettrace folder path via global settings
- Replaces in-memory JSON marshaling with BoltDB-based batch storage for network traces
- Adds automatic cleanup functionality for old nettrace files
Reviewed Changes
Copilot reviewed 23 out of 81 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/pillar/types/global.go | Adds NetTraceFolder global setting with default path |
| pkg/pillar/netdump/netdump.go | Modifies netdump to read traces from disk files instead of in-memory JSON |
| pkg/pillar/go.mod | Updates eve-libs dependency and adds bbolt for database storage |
| pkg/pillar/controllerconn/send.go | Implements BoltDB-based trace storage with session UUIDs and disk offloading |
| pkg/pillar/cmd/zedagent/zedagent.go | Adds cleanup function for old nettrace files |
| Multiple cmd files | Updates function calls to include netTraceFolder parameter |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
53b33f8 to
4bdeec6
Compare
4bdeec6 to
a1d6b18
Compare
@OhmSpectator @rene I checked all the README.md files but didn’t find anything about nettrace. Could you suggest where I should document it? Should I add a section to the Pillar README.md (and if so, under which heading), or create a new topic? Also, should I include more details about the current nettrace functionality and how EVE interacts with eve-libs, etc. |
Please update the documentation here: https://github.com/lf-edge/eve/blob/master/docs/DEVICE-CONNECTIVITY.md#netdump-and-nettrace |
Thanks @milan-zededa |
|
@kperakis-zededa , @milan-zededa already answered your question about the documentation. I just added "bump eve-libs:" to the PR's title in order to make it more clear we are bumping eve-libs with the reworked nettrace. |
|
@kperakis-zededa , you need to bump |
|
|
||
| // deleteOldNetTraceFiles removes old nettrace files | ||
| func deleteOldNetTraceFiles(gcp *types.ConfigItemValueMap) { | ||
| log.Noticef("cleanupNettraceFiles") |
There was a problem hiding this comment.
Would it be useful for troubleshooting to put this log at the end of the function and log filenames of all the files that were removed?
| traceInJSON, err := json.MarshalIndent(req.NetTrace, "", " ") | ||
| if nt, ok := req.NetTrace.(nettrace.HTTPTrace); ok { | ||
| file := "nettrace_" + nt.SessionUUID + ".json" | ||
| filePath := netTraceFolderPath + "/" + file |
There was a problem hiding this comment.
Just a suggestion to use path.Join here
In this testing instruction step, I would just recommend to mention that this is inside a netdump tar and where to find it. |
milan-zededa
left a comment
There was a problem hiding this comment.
Just few last suggestions, otherwise LGTM
42459cc to
c0be463
Compare
milan-zededa
left a comment
There was a problem hiding this comment.
Reviewed latest changes
OhmSpectator
left a comment
There was a problem hiding this comment.
Looks fine.
But please create a ticket to address the potential issue of the final .json file growing in size.
Also, @rucoder, please check that the struct changes here don’t break the TUI.
|
@rucoder , FWIW, I did some tests as well and the TUI is working fine: |
|
@rene Could you please check the runner used for Go tests? It looks like it has no space left? |
@milan-zededa we don't run Go tests in our runners, that's why is running out of disk space. @kperakis-zededa however, ran the tests locally and posted the logs here, that's how we are doing on such cases.... |
c0be463 to
609da14
Compare
It includes the eve-libs library for the offload nettrace procedure. Signed-off-by: Konstantinos Perakis <[email protected]>
It includes the database, Bbolt library for the offload nettrace procedure. Signed-off-by: Konstantinos Perakis <[email protected]>
It includes the Pillar changes for the offload nettrace procedure. Here are the main changes: - Use offload tracing mode: nettrace batches → BoltDB → final JSON export. - Per-session artifacts under nettrace folder (/persist/nettrace) using a generated sessionUUID. - Automatic cleanup of stale DB/JSON files at boot. - JSON export produced at session end from the DB contents. Signed-off-by: Konstantinos Perakis <[email protected]>
-Update Documentation -Update DockerFile -Update Tests Signed-off-by: Konstantinos Perakis <[email protected]>
-Add Nettrace folder, for the update interfaces functionality. Signed-off-by: Konstantinos Perakis <[email protected]>
609da14 to
a1a0084
Compare

Switch EVE to nettrace offload tracing
Description
Network trace batches are streamed from eve-libs, via callback into a BoltDB file and then
exported as a single JSON at the end of the session.
What’s included:
Use offload tracing mode: nettrace batches → BoltDB → final JSON export.
Per-session artifacts under the configurable nettrace folder (default: /persist/nettrace)
using a generated sessionUUID.
Automatic cleanup of stale DB/JSON files at boot.
JSON export produced at session end from the DB contents.
How to test and validate this PR
Here’s a simple, practical validation scenario you can use:
Deploy a download that pulls a large file (>15 GB) from a cloud provider (AWS/GCP).
While the download runs, monitor EVE memory (zedbox service RSS) and confirm it stays stable low (up to 230MB max). This verifies that offload mode is working and nettrace is not accumulating data in RAM.
After the download completes verify that the nettrace.json file exists.
Open the JSON and confirm it has the usual sections and values (as before this patch), e.g.: description, traceBeginAt, traceEndAt, dials, dnsQueries, httpRequests, tcpConns, udpConns, tlsTunnels.
Changelog notes
Improves EVE memory usage during heavy network activity by offloading trace metadata to disk.
EVE now retains only a small, capped window of network metadata in RAM (≈100 MB) and writes the rest to disk, preventing memory growth during large transfers (e.g., big file downloads) while still producing the final nettrace report.
Checklist
For backport PRs (remove it if it's not a backport):
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.