Thanks to visit codestin.com
Credit goes to github.com

Skip to content

newlog: refactor getMemlogMsg and add testing#5293

Merged
rene merged 4 commits into
lf-edge:masterfrom
europaul:newlog-refactor-get-memlog
Oct 11, 2025
Merged

newlog: refactor getMemlogMsg and add testing#5293
rene merged 4 commits into
lf-edge:masterfrom
europaul:newlog-refactor-get-memlog

Conversation

@europaul

@europaul europaul commented Oct 7, 2025

Copy link
Copy Markdown
Contributor

Description

  • refactored getMemlogMsg.go to make the code more testable.
  • added logs as they come from memlog socket as test data.
  • added tests for other function, so that it's clearer what functionality they have.
  • added a special case for logs coming from vector, as they were confused in format with other key=value logs, which led to incorrect parsing.

PR dependencies

None

How to test and validate this PR

Testing is covered by the tests added in this PR and existing Eden tests.

Changelog notes

N/A since the functionality wasn't really changed

PR Backports

No need.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

@europaul europaul requested a review from deitch as a code owner October 7, 2025 16:59
@europaul europaul force-pushed the newlog-refactor-get-memlog branch from a84e7a7 to f03db79 Compare October 7, 2025 17:00
@europaul europaul added the main-quest The fate of the project rests on this PR. Prioritise review to advance the storyline! label Oct 7, 2025
}

func TestParseLevelTimeMsg(t *testing.T) {
t.Parallel()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this:

=== CONT  TestParseLevelTimeMsg
==================
WARNING: DATA RACE
Write at 0x0000037ba270 by goroutine 50:
  github.com/lf-edge/eve/pkg/newlog/cmd.TestGzipParsing()
      /home/christoph/projects/eve-3/pkg/newlog/cmd/writelogFile_test.go:18 +0x64
  testing.tRunner()
      /usr/lib/golang/src/testing/testing.go:1792 +0x225
  testing.(*T).Run.gowrap1()
      /usr/lib/golang/src/testing/testing.go:1851 +0x44

Previous write at 0x0000037ba270 by goroutine 49:
  github.com/lf-edge/eve/pkg/newlog/cmd.TestGetTimestampFromGzipName()
      /home/christoph/projects/eve-3/pkg/newlog/cmd/newlogd_test.go:31 +0x36f
  testing.tRunner()
      /usr/lib/golang/src/testing/testing.go:1792 +0x225
  testing.(*T).Run.gowrap1()
      /usr/lib/golang/src/testing/testing.go:1851 +0x44

Goroutine 50 (running) created at:
  testing.(*T).Run()
      /usr/lib/golang/src/testing/testing.go:1851 +0x8f2
  testing.runTests.func1()
      /usr/lib/golang/src/testing/testing.go:2279 +0x85
  testing.tRunner()
      /usr/lib/golang/src/testing/testing.go:1792 +0x225
  testing.runTests()
      /usr/lib/golang/src/testing/testing.go:2277 +0x96c
  testing.(*M).Run()
      /usr/lib/golang/src/testing/testing.go:2142 +0xeea
  main.main()
      _testmain.go:69 +0x164

Goroutine 49 (running) created at:
  testing.(*T).Run()
      /usr/lib/golang/src/testing/testing.go:1851 +0x8f2
  testing.runTests.func1()
      /usr/lib/golang/src/testing/testing.go:2279 +0x85
  testing.tRunner()
      /usr/lib/golang/src/testing/testing.go:1792 +0x225
  testing.runTests()
      /usr/lib/golang/src/testing/testing.go:2277 +0x96c
  testing.(*M).Run()
      /usr/lib/golang/src/testing/testing.go:2142 +0xeea
  main.main()
      _testmain.go:69 +0x164
==================

with go test -v -race -parallel 1 it does not fail

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! thank you!

@christoph-zededa

Copy link
Copy Markdown
Contributor

Don't you want to add?

diff --git a/Makefile b/Makefile
index 186ff6477..ec26dde81 100644
--- a/Makefile
+++ b/Makefile
@@ -508,6 +508,7 @@ test: $(LINUXKIT) pkg/pillar | $(DIST)
        make -C eve-tools/bpftrace-compiler test
        make -C pkg/dnsmasq test
        make -C pkg/debug test
+       go test -C pkg/newlog/cmd/ -v -race
        $(QUIET): $@: Succeeded
 
 test-profiling:


const (
ansi = "[\u0009\u001B\u009B][[\\]()#;?]*(?:(?:(?:[a-zA-Z\\d]*(?:;[a-zA-Z\\d]*)*)?\u0007)|(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PRZcf-ntqry=><~]))"
ansi = "\u001B\\[[0-9;]*[A-Za-z]|\u001B[\\(\\)\\[\\]#;?]*[A-Za-z0-9]|\u009B[0-9;]*[A-Za-z]"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the regex simplified here? I'm not saying the new one is worse... but while checking it, I did find cases where both patterns stumble on real-ish sequences.

For example, \x1b]0;build@host:~/repo (main)\x07 (OSC title set by shells/build tools inside guest VM/comtainers) leaves ;build@host...\x07 with both;
\x1b]8;;https://example.com/file?time=2025-10-08\x1b\\ (OSC-8 hyperlink emitted by many modern CLIs and terminals, also possible log line from guest application) leaves the whole URL, including time=, with the new regex, while the old one oddly eats the first h;
\x1b[3~ (CSI with ~ final is common from key handling/cursor keys in interactive tools) is fully removed by the old regex but leaves a stray ~ with the new;
and \x1bP$q...\x1b\\ (DCS block seen in terminal/multiplexer negotiations like tmux/screen/kitty/WezTerm) slips past both except for the leading ESC P.

The only case that is handled fine by both: colours.

See here: https://go.dev/play/p/9Bbkn3hG9Uh

So I can’t claim the simplification is strictly worse, but it clearly trades one set of failure modes for another - and neither variant properly handles full OSC/DCS.

So... What was the motivation? ))

@europaul europaul Oct 8, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old regex was incorrectly removing tabs and the next character, e.g. Col1\tCol2\tCol3 -> Col1ol2ol3
you can also see the same behavior in your hyperlink testcase - the first h of https is removed.

I agree that the new regex is also not perfect, it leaves some garbage chars, but I think it's better to have some artifacts in the result than remove useful info.

speaking of the real-world scenario: the testdata/memlog file that I provided has some guest_vm logs that come from an nginx container. The only escape char that's present there is handled correctly by both old and new regex:

"root@shim ~# \u001b[6nExecuting \"/docker-entrypoint.sh\" \"nginx\" \"-g\" \"daemon off;\""

@europaul

europaul commented Oct 8, 2025

Copy link
Copy Markdown
Contributor Author

Don't you want to add?

diff --git a/Makefile b/Makefile
index 186ff6477..ec26dde81 100644
--- a/Makefile
+++ b/Makefile
@@ -508,6 +508,7 @@ test: $(LINUXKIT) pkg/pillar | $(DIST)
        make -C eve-tools/bpftrace-compiler test
        make -C pkg/dnsmasq test
        make -C pkg/debug test
+       go test -C pkg/newlog/cmd/ -v -race
        $(QUIET): $@: Succeeded
 
 test-profiling:

yes, I didn't know it was missing, thanks!

Comment thread pkg/newlog/cmd/getMemlogMsg.go Outdated
Comment on lines +195 to +197
// Don't upload 'kube' container logs (they can be found in /persist/kubelog for detail)
if logInfo.Source == "kube" {
return false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewd-zededa @zedi-pramodh @naiming-zededa I'm not sure who added this if-statement...
Is it about kube logs not being sent to the controller? Or should they not be processed by newlogd at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @christoph-zededa :)

@europaul europaul force-pushed the newlog-refactor-get-memlog branch from babe1c5 to 82cf361 Compare October 9, 2025 16:23
Comment thread pkg/newlog/cmd/getMemlogMsg.go Outdated
// based on the configured log levels.
func shouldSendToRemote(logInfo Loginfo, logFromApp bool) bool {
// Don't upload 'kube' container logs (they can be found in /persist/kubelog for detail)
if logInfo.Source == "kube" {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, kube logs never reached logChan <- entry because we returned early. Now, we only set a flag to prevent them from going to the remote sink. But they enter the local pipeline. If there are many kube logs, this can be a big change, as we might overwhelm local storage.

To stay safe, I suggest mimicking the old behaviour and not handling these logs locally either. And if we do change it, we should test with a KubeVirt build (unfortunately, we don’t have any KubeVirt tests integrated into the pipeline).

More broadly, this looks like a risky workaround. If the intent is to skip handling KubeVirt logs, why are they going through memlogd in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap, that's exactly why I'm trying to find the original author to figure our the intention behind this if-statement #5293 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewd-zededa @zedi-pramodh @naiming-zededa do you know what's happening here?

@OhmSpectator OhmSpectator Oct 10, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the change was introduced by @naiming-zededa here: 05f2182

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the discussion that @christoph-zededa found (https://github.com/lf-edge/eve/pull/4412/files#r1829993370) it seems like kube indeed generates too many logs. However if they are written to /persist/kubelog I wonder if they are rotated there 🤔

Either way the proper way to filter them out by source would be to add the filter to vector's config instead of doing it in newlogd

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend creating a dedicated pull request for that change. From what I understand, you were only looking to make a minor refactoring. It would be a good idea to create an issue regarding the Kubvirt log rotation and assign it to one of the Kubvirt experts. To be honest, I find it concerning that the logs are not being rotated; however, this is not your immediate responsibility.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if they are rotated there

according to #4412 (comment) "we rotate them and keep 3 copies with 5M max to each of them."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I would roll back to the original behaviour here. Otherwise, it's not just a refactoring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, rolled back to the original behavior

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this, since it generated too much data, even the error messages from kubernetes system. If we have to debug that, we'll use 'collectinfo' or have to get onto the device to do that. So, skip for the device logging here will be good.

if len(time1) == 2 && strings.HasPrefix(time1[1], "\"") {
time2 := strings.Split(time1[1], "\"")
if len(time2) == 3 {
if len(time2) >= 3 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it here? The last time we changed it was when memlogd changed the format. Do we have a format change again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it just makes the parsing more robust - see the test case TestParseLevelTimeMsg.Time without quotes (not parsed) for example

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have it without quotes?...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I see in testdate/memlog it's always with quotes

Comment thread pkg/newlog/cmd/getMemlogMsg.go Outdated
func parseMemlogEntry(rawBytes []byte) (inputEntry, error) {
var logEntry MemlogLogEntry
if err := json.Unmarshal(rawBytes, &logEntry); err != nil {
return inputEntry{}, err

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the warning from here? It may be useful...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, it was more verbose than just the error - I'll bring it back

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if len(level1) == 2 {
level2 := strings.Split(level1[1], " ")
level = level2[0]
level = strings.ToLower(level2[0])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need ToLower now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have it here, since it unifies the format and doesn't rely on logrus's support for it (

switch strings.ToLower(lvl) {
)

some components print their log level as INFO instead of info

}

// Returns level, time and msg if the string contains those attr=val
func parseLevelTimeMsg(content string) (level string, timeStr string, msg string) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the functions work identically, but I'm still a bit worried about the undocumented changes in the logic... I already had a problem with this part in the past.

@europaul europaul Oct 10, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's why it's good that we added the unit tests - this way everyone can see which cases are covered

Refactored to make the code more testable.
Added logs as they come from memlog socket as test data.
Added tests for other function, so that it's clearer what
functionality they have.
Added a special case for logs coming from vector, as they
were confused in format with other key=value logs, which
led to incorrect parsing.

Signed-off-by: Paul Gaiduk <[email protected]>
Avoid copying unnecessary things (like the Dockerfile) when building
newlog. This way we can benefit from Docker layer caching better.

Signed-off-by: Paul Gaiduk <[email protected]>
TestGetFileInfo depends on global vars being constant at least for the
duration of the test and other tests in the package modify them. That's
why running it in parallel with other tests is unsafe.

Signed-off-by: Paul Gaiduk <[email protected]>
Add race detection to tests and remove parallel execution, due to the
found race condition.

Signed-off-by: Paul Gaiduk <[email protected]>
@europaul europaul force-pushed the newlog-refactor-get-memlog branch from 82cf361 to e8648d6 Compare October 10, 2025 16:41

@OhmSpectator OhmSpectator left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine.

@rene rene merged commit a714191 into lf-edge:master Oct 11, 2025
47 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

main-quest The fate of the project rests on this PR. Prioritise review to advance the storyline!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants