newlog: refactor getMemlogMsg and add testing#5293
Conversation
a84e7a7 to
f03db79
Compare
| } | ||
|
|
||
| func TestParseLevelTimeMsg(t *testing.T) { | ||
| t.Parallel() |
There was a problem hiding this comment.
I got this:
=== CONT TestParseLevelTimeMsg
==================
WARNING: DATA RACE
Write at 0x0000037ba270 by goroutine 50:
github.com/lf-edge/eve/pkg/newlog/cmd.TestGzipParsing()
/home/christoph/projects/eve-3/pkg/newlog/cmd/writelogFile_test.go:18 +0x64
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1792 +0x225
testing.(*T).Run.gowrap1()
/usr/lib/golang/src/testing/testing.go:1851 +0x44
Previous write at 0x0000037ba270 by goroutine 49:
github.com/lf-edge/eve/pkg/newlog/cmd.TestGetTimestampFromGzipName()
/home/christoph/projects/eve-3/pkg/newlog/cmd/newlogd_test.go:31 +0x36f
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1792 +0x225
testing.(*T).Run.gowrap1()
/usr/lib/golang/src/testing/testing.go:1851 +0x44
Goroutine 50 (running) created at:
testing.(*T).Run()
/usr/lib/golang/src/testing/testing.go:1851 +0x8f2
testing.runTests.func1()
/usr/lib/golang/src/testing/testing.go:2279 +0x85
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1792 +0x225
testing.runTests()
/usr/lib/golang/src/testing/testing.go:2277 +0x96c
testing.(*M).Run()
/usr/lib/golang/src/testing/testing.go:2142 +0xeea
main.main()
_testmain.go:69 +0x164
Goroutine 49 (running) created at:
testing.(*T).Run()
/usr/lib/golang/src/testing/testing.go:1851 +0x8f2
testing.runTests.func1()
/usr/lib/golang/src/testing/testing.go:2279 +0x85
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1792 +0x225
testing.runTests()
/usr/lib/golang/src/testing/testing.go:2277 +0x96c
testing.(*M).Run()
/usr/lib/golang/src/testing/testing.go:2142 +0xeea
main.main()
_testmain.go:69 +0x164
==================
with go test -v -race -parallel 1 it does not fail
There was a problem hiding this comment.
nice catch! thank you!
|
Don't you want to add? |
|
|
||
| const ( | ||
| ansi = "[\u0009\u001B\u009B][[\\]()#;?]*(?:(?:(?:[a-zA-Z\\d]*(?:;[a-zA-Z\\d]*)*)?\u0007)|(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PRZcf-ntqry=><~]))" | ||
| ansi = "\u001B\\[[0-9;]*[A-Za-z]|\u001B[\\(\\)\\[\\]#;?]*[A-Za-z0-9]|\u009B[0-9;]*[A-Za-z]" |
There was a problem hiding this comment.
Why was the regex simplified here? I'm not saying the new one is worse... but while checking it, I did find cases where both patterns stumble on real-ish sequences.
For example, \x1b]0;build@host:~/repo (main)\x07 (OSC title set by shells/build tools inside guest VM/comtainers) leaves ;build@host...\x07 with both;
\x1b]8;;https://example.com/file?time=2025-10-08\x1b\\ (OSC-8 hyperlink emitted by many modern CLIs and terminals, also possible log line from guest application) leaves the whole URL, including time=, with the new regex, while the old one oddly eats the first h;
\x1b[3~ (CSI with ~ final is common from key handling/cursor keys in interactive tools) is fully removed by the old regex but leaves a stray ~ with the new;
and \x1bP$q...\x1b\\ (DCS block seen in terminal/multiplexer negotiations like tmux/screen/kitty/WezTerm) slips past both except for the leading ESC P.
The only case that is handled fine by both: colours.
See here: https://go.dev/play/p/9Bbkn3hG9Uh
So I can’t claim the simplification is strictly worse, but it clearly trades one set of failure modes for another - and neither variant properly handles full OSC/DCS.
So... What was the motivation? ))
There was a problem hiding this comment.
the old regex was incorrectly removing tabs and the next character, e.g. Col1\tCol2\tCol3 -> Col1ol2ol3
you can also see the same behavior in your hyperlink testcase - the first h of https is removed.
I agree that the new regex is also not perfect, it leaves some garbage chars, but I think it's better to have some artifacts in the result than remove useful info.
speaking of the real-world scenario: the testdata/memlog file that I provided has some guest_vm logs that come from an nginx container. The only escape char that's present there is handled correctly by both old and new regex:
"root@shim ~# \u001b[6nExecuting \"/docker-entrypoint.sh\" \"nginx\" \"-g\" \"daemon off;\""
yes, I didn't know it was missing, thanks! |
| // Don't upload 'kube' container logs (they can be found in /persist/kubelog for detail) | ||
| if logInfo.Source == "kube" { | ||
| return false |
There was a problem hiding this comment.
@andrewd-zededa @zedi-pramodh @naiming-zededa I'm not sure who added this if-statement...
Is it about kube logs not being sent to the controller? Or should they not be processed by newlogd at all?
There was a problem hiding this comment.
I was too curious to wait ...
https://github.com/lf-edge/eve/pull/4412/files#r1829993370
babe1c5 to
82cf361
Compare
| // based on the configured log levels. | ||
| func shouldSendToRemote(logInfo Loginfo, logFromApp bool) bool { | ||
| // Don't upload 'kube' container logs (they can be found in /persist/kubelog for detail) | ||
| if logInfo.Source == "kube" { |
There was a problem hiding this comment.
Previously, kube logs never reached logChan <- entry because we returned early. Now, we only set a flag to prevent them from going to the remote sink. But they enter the local pipeline. If there are many kube logs, this can be a big change, as we might overwhelm local storage.
To stay safe, I suggest mimicking the old behaviour and not handling these logs locally either. And if we do change it, we should test with a KubeVirt build (unfortunately, we don’t have any KubeVirt tests integrated into the pipeline).
More broadly, this looks like a risky workaround. If the intent is to skip handling KubeVirt logs, why are they going through memlogd in the first place?
There was a problem hiding this comment.
yap, that's exactly why I'm trying to find the original author to figure our the intention behind this if-statement #5293 (comment)
There was a problem hiding this comment.
@andrewd-zededa @zedi-pramodh @naiming-zededa do you know what's happening here?
There was a problem hiding this comment.
I see that the change was introduced by @naiming-zededa here: 05f2182
There was a problem hiding this comment.
From the discussion that @christoph-zededa found (https://github.com/lf-edge/eve/pull/4412/files#r1829993370) it seems like kube indeed generates too many logs. However if they are written to /persist/kubelog I wonder if they are rotated there 🤔
Either way the proper way to filter them out by source would be to add the filter to vector's config instead of doing it in newlogd
There was a problem hiding this comment.
I would recommend creating a dedicated pull request for that change. From what I understand, you were only looking to make a minor refactoring. It would be a good idea to create an issue regarding the Kubvirt log rotation and assign it to one of the Kubvirt experts. To be honest, I find it concerning that the logs are not being rotated; however, this is not your immediate responsibility.
There was a problem hiding this comment.
I wonder if they are rotated there
according to #4412 (comment) "we rotate them and keep 3 copies with 5M max to each of them."
There was a problem hiding this comment.
Anyway, I would roll back to the original behaviour here. Otherwise, it's not just a refactoring.
There was a problem hiding this comment.
I agree, rolled back to the original behavior
There was a problem hiding this comment.
I added this, since it generated too much data, even the error messages from kubernetes system. If we have to debug that, we'll use 'collectinfo' or have to get onto the device to do that. So, skip for the device logging here will be good.
| if len(time1) == 2 && strings.HasPrefix(time1[1], "\"") { | ||
| time2 := strings.Split(time1[1], "\"") | ||
| if len(time2) == 3 { | ||
| if len(time2) >= 3 { |
There was a problem hiding this comment.
Why do we need it here? The last time we changed it was when memlogd changed the format. Do we have a format change again?
There was a problem hiding this comment.
no, it just makes the parsing more robust - see the test case TestParseLevelTimeMsg.Time without quotes (not parsed) for example
There was a problem hiding this comment.
Is it possible to have it without quotes?...
There was a problem hiding this comment.
from what I see in testdate/memlog it's always with quotes
| func parseMemlogEntry(rawBytes []byte) (inputEntry, error) { | ||
| var logEntry MemlogLogEntry | ||
| if err := json.Unmarshal(rawBytes, &logEntry); err != nil { | ||
| return inputEntry{}, err |
There was a problem hiding this comment.
Why did we remove the warning from here? It may be useful...
There was a problem hiding this comment.
you're right, it was more verbose than just the error - I'll bring it back
| if len(level1) == 2 { | ||
| level2 := strings.Split(level1[1], " ") | ||
| level = level2[0] | ||
| level = strings.ToLower(level2[0]) |
There was a problem hiding this comment.
Why do we need ToLower now?
There was a problem hiding this comment.
I think it's good to have it here, since it unifies the format and doesn't rely on logrus's support for it (
some components print their log level as INFO instead of info
| } | ||
|
|
||
| // Returns level, time and msg if the string contains those attr=val | ||
| func parseLevelTimeMsg(content string) (level string, timeStr string, msg string) { |
There was a problem hiding this comment.
I hope the functions work identically, but I'm still a bit worried about the undocumented changes in the logic... I already had a problem with this part in the past.
There was a problem hiding this comment.
I think that's why it's good that we added the unit tests - this way everyone can see which cases are covered
Refactored to make the code more testable. Added logs as they come from memlog socket as test data. Added tests for other function, so that it's clearer what functionality they have. Added a special case for logs coming from vector, as they were confused in format with other key=value logs, which led to incorrect parsing. Signed-off-by: Paul Gaiduk <[email protected]>
Avoid copying unnecessary things (like the Dockerfile) when building newlog. This way we can benefit from Docker layer caching better. Signed-off-by: Paul Gaiduk <[email protected]>
TestGetFileInfo depends on global vars being constant at least for the duration of the test and other tests in the package modify them. That's why running it in parallel with other tests is unsafe. Signed-off-by: Paul Gaiduk <[email protected]>
Add race detection to tests and remove parallel execution, due to the found race condition. Signed-off-by: Paul Gaiduk <[email protected]>
82cf361 to
e8648d6
Compare
Description
getMemlogMsg.goto make the code more testable.PR dependencies
None
How to test and validate this PR
Testing is covered by the tests added in this PR and existing Eden tests.
Changelog notes
N/A since the functionality wasn't really changed
PR Backports
No need.
Checklist