-
Notifications
You must be signed in to change notification settings - Fork 1.1k
conmon: handle multi-line logging #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I haven't tested this yet. |
conmon/conmon.c
Outdated
| ptrdiff_t line_len = buf - line_end; | ||
|
|
||
| /* Write the (timestamp, stream, line) tuple. */ | ||
| if (write(fd, tsbuf, TSBUFLEN-1) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this making sure we drop the NULL terminator which got in with snprintf? One of the original issue is that the NULL terminator (00) is causing strings matching to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSBUFLEN-1 doesn't contain the null terminator. But I could switch to strlen(tsbuf) if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah it's fine
|
I'll run k8s tests with this tomorrow afternoon (my timezone) |
|
/me just realised the output spacing was wrong. Pushed a fix and squashed. |
|
@cyphar There are bugs in this right now. It returns empty line when queried through kubectl logs. Details: File contents: |
|
Yeah sorry @mrunalp this code was quite wrong before. I've now tested it with quite a few test cases (here's a sample), so it should work now: Will output: |
|
@cyphar okay, will retest this. Thanks! |
conmon/conmon.c
Outdated
| /* Log all output to logfd. */ | ||
| if (write(logfd, buf, num_read) != num_read) { | ||
| nwarn("partial/failed write (logFd)"); | ||
| if (write_k8s_log(logfd, "stdout", buf, num_read) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add buf[num_read] = '\0'. Otherwise we see trailing stuff in logs.
[conmon:i]: read a chunk: (fd=5) '3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564/tmp/conmon-term.XXXXXXXX'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should modify the buffer like that (it means the logs won't actually match what was written by the program). The fix IMO is to change how we log the whole read a chunk thing. To be honest, that was a debugging measure and we should drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify whether the actual log file has incorrect data in it, or just the stderr log from conmon? Because to be honest at the moment ninfo is currently reading out-of-bounds and we need to stop doing that, so I hope that's the only issue here. π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar Adding the '\0' is only for making the debug logs better :) I think we will fine even if we just drop the ninfo like you said. The actual log files looked good in my manual testing with a few different types of pods. Unfortunately, the e2e tests ran into unrelated issues on my machine that I am still debugging. So I will ask @runcom to run the suite on his machine.
| }; | ||
|
|
||
| int set_k8s_timestamp(char *buf, ssize_t buflen, const char *stream_type) | ||
| /* strlen("1997-03-25T13:20:42+01:00") + 1 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer RFC3339Nano if we have a choice here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikebrow Yeah, we can do that for sure. Just want to get this correctness patch in first :)
|
testing this out with k8s right now. (restarted Travis also) |
|
as far as k8s testing is concerned this PR LGTM :) (109/121 is a great result) |
|
(testing with latest k8s master source seems fine as well π ) weirdly enough, tests fail with https://travis-ci.org/kubernetes-incubator/cri-o/jobs/220157466#L2775 (I've never seen it) |
|
I'm super confused why Travis is broken, the previous commit passed the cases (and now I'm worried to re-run the old commit). There's some problem with seccomp though... |
|
@cyphar Yeah weird. The first failure seems to be execsync related. |
|
It's definitely a real failure, I just am confused what commits hit master between the two test runs that caused the breakage. I'll take a look today. |
|
Ah, I think I know why. It's because |
|
@cyphar Another option is to strip this out before we send back the ExecSyncResponse. We will probably need to do that to return stderr separately when we add a separate pipe for it. However, we can do this in a follow on. |
|
There was another error which is that In addition, |
|
@mrunalp I will switch to stripping after this is merged and we do the separate pipes. The tests pass now, so I'm squashing. |
Previously we returned an internal error result when a program had a non-zero exit code, which was incorrect. Fix this as well as change the tests to actually check the "ExitCode" response from ExecSync (rather than expecting ocic-ctr to return an internal error). Signed-off-by: Aleksa Sarai <[email protected]>
The CRI requires us to prepend (timestamp, stream) to every line of the output, and it's quite likely (especially in the !terminal case) that we will read more than one line of output in the read loop. So, we need to write out each line separately with the prepended timestamps. Doing this the simple way (the final part of the buffer is written partially if it doesn't end in a newline) makes the code much simpler, with the downside that if we ever switch to multiple streams for output we'll have to rewrite parts of this. In addition, drop the debugging output of cri-o for each chunk read so we stop spamming stderr. We can do this now because 8a928d0 ("oci: make ExecSync with ExitCode != 0 act properly") actually fixed how ExecSync was being handled (especially in regards to this patch). Fixes: 1dc4c87 ("conmon: add timestamps to logs") Signed-off-by: Aleksa Sarai <[email protected]>
|
will run k8s on this last time assuming Travis's green |
|
@runcom In particular can you make sure that |
how would you do this? should we write other integration tests for this or the ones you fixes made sure this case works fine? otherwise, I'm just going to run k8s tests which I don't know they're testing this code path (ExecSync) |
|
@runcom Oh, I meant for you to just do |
|
@runcom Tests are π¦ btw. πΈ |
|
running k8s tests right now! |
|
no regression in k8s LGTM |
|
/cc @mrunalp |
|
|
Actually in the above cases it failed because it can't find the executable. |
|
Looks fine from this test. It should be in stderr but that will be fixed once we have separate stderr pipe. |
Yup, I'm working on that patch at the moment. |
|
LGTM |
The CRI requires us to prepend (timestamp, stream) to every line of the
output, and it's quite likely (especially in the !terminal case) that we
will read more than one line of output in the read loop.
So, we need to write out each line separately with the prepended
timestamps. Doing this the simple way (the final part of the buffer is
written partially if it doesn't end in a newline) makes the code much
simpler, with the downside that if we ever switch to multiple streams
for output we'll have to rewrite parts of this.
Alternative to #430.
Fixes: 1dc4c87 ("conmon: add timestamps to logs")
Signed-off-by: Aleksa Sarai [email protected]