-
-
Notifications
You must be signed in to change notification settings - Fork 771
Temporally order userspace print statements with kernel debug statements #3327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
userpace libraries to use device 0xa (print log) rather than 0x1 (console) for printing system calls.
This was discussed in the core call on December 16th: The conclusion is that synchronous writes to the console (the code in this PR) should be considered an alternative implementation of the console system call API. Tock will therefore have two different implementations of the console system calls. One (the existing one today) performs asynchronous writes of arbitrary length, while another (the one in this PR) performs synchronous writes of finite length. This will require some changes to the API, as now the number of bytes written must be passed back to userspace so it knows whether the whole buffer was written. I am going to start working on this. |
It looks like there is a complication: the newlib implementation of printf assumes that
There is also kind of a third option, which is to re-implement newlib's printf. I'll put this on the agenda for the call tomorrow. |
3369: debug_writer_component: make debug buffer size configurable; double default debug buffer size r=hudson-ayers a=hudson-ayers ### Pull Request Overview This pull request makes the debug buffer size configurable via an optional macro argument, and doubles the default size of the buffer to 2kB. I modified the Hifive1 to use the configurable parameter so that it still uses a 1kB buffer, since it is highly RAM constrained. This PR was inspired by #3327, which will ultimately require a much larger buffer (~8kB) for boards that use it. ### Testing Strategy This pull request was tested by compiling. ### TODO or Help Wanted Are any other boards sufficiently RAM constrained as to require a smaller buffer than 2kB? ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make prepush`. Co-authored-by: Hudson Ayers <[email protected]>
I am finally getting back to this. The call on 12/16/22 concluded that this new console and the old one should have the same system call device number, and be alternative implementations. This leads to the complication that the two behave differently. One (asynchronously) prints the entire offered string, while the other prints potentially a truncated subset. This means that we have two options for how the userspace library behaves:
Looking through the call notes, I don't see a clear resolution. I suggested that we should do #2, but I don't see substantial discussion or agreement, so want to make sure that's the right conclusion before wrapping this up. |
+1 for approach #2 |
Option 3: have a state machine in kernel console implementation which will print full lines from applications, possibly use line buffer in apps grant. Benefit vs option 1 - state machine is implemented once and shared across apps. Benefit vs. option 2 - print full lines. There are certain corner cases I can think of:
|
Option 1 resembles what our Ti50 console does right now - we line buffer in userspace and issue I'm curious what y'all think of Option 3 offered by Vadim. |
Just getting back to this. I think I would need to understand the proposed semantics better. @vsukhoml You have described an implementation, not an interface. What is the desired behavior in response to a large printf that is larger than the available space in the kernel buffer? The current options are:
From your description, I can't tell if you want: b) is not possible without console locking or blocking within the kernel. Console locking would mean that other prints would be lost while it is locked (there is no way to buffer them). If the desired behavior is 3a), this will be an increase in code size, as it will require an alternative debug!() implementation that checks how much space is left, and outputs up to the last newline that can fit in that space. |
@vsukhoml and I discussed the desired semantics. They are temporal ordering and atomicity of a line (up to a newline) up to a finite length L. If userspace tries to print a line longer than L, the kernel will print up to L and return to userspace how much was printed. If userspace prints a buffer that is < L bytes and has more than one newline in it, it's up to the kernel on how many lines it prints (it could print 1, 2, or N). |
…erspersed kernel writes.
One update. libtock-c's
I talked with @vsukhoml and we concluded this means the kernel console driver shouldn't worry about newlines, merely ordered and atomic writes. As console writes no longer guarantee writing every byte, libtock-c will require some updates to The code now assures atomicity of writes up to 200 bytes (by waiting for 200 bytes to be available in the debug buffer) and ensures that userspace writes do not interleave. This means that kernel debug writes can interleave at 200 byte granularity if the debug buffer is full. I've tested the current code with two userspace writers and interleaved kernel writes (logging the system calls of the writes themselves). I'll test it some more in the next week. |
On another note, I think I've also found some bugs in the newlib printf implementation. Writes that are between 1400-1800 bytes or so (without a newline) behave erratically, sometimes truncating the first 1024 bytes. It's deterministic based on the length, but different lengths behave differently. E.g., writes longer than 1480 bytes do not print the first 1024 bytes, while writes 1600-2000 or so do not print anything at all. I will look at the source to try to get to the bottom of this. |
We discussed this on the core call today, and @bradjc suggested it was likely a malloc failure issue within newlib. He's correct; if you increase the size of the process heap, then prints within this range complete correctly. |
Might be a reason to switch from newlib to https://github.com/vsukhoml/noc :-) - I tried to add all functionality required by libtock-c. |
Your buffering scheme is definitely much better for an embedded system. :) |
in-order console logging.
bors try |
tryBuild failed: |
Is the status of this that it is ready for final review now? |
Not quite yet. Right now, this requires updates to libtock-c. I think I can change the logic to always complete writes before issuing a callback, which would not require an update to libtock-c. |
writes from userspace before issuing an upcall.
This is now ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
bors r+ |
Pull Request Overview
Currently it possible for userspace
printf
(orputstr
) calls to be re-ordered with respect to kerneldebug!
calls. This means that a kerneldebug!
call made before a userspace print operation can appear later in the console output. This can make debugging difficult, as it is challenging to correctly correlate kernel and userspace debug output.This problem is caused by how serial port output is queued.
debug
operations synchronously write to a circular queue, which is then asynchronously written to the serial port. The userspaceputstr
API is technically asynchronous, butputstr
andprintf
yield until completion, so they appear synchronous to userspace.There are two problems. First, when there are multiple writers, their ordering is independent of their arrival. The first writer will start writing, and subsequent writers are queued. The order of this queue depends on the order in which they were initialized (it's a fixed order linked list determined at initialization), not the order of operations. So if writers A, B and C write in that order, it is possible that A, C, B will be the output.
Second, the fact that debug uses a circular queue that's written out asynchronously breaks any ordering of its operations. For example, suppose the kernel calls
debug
for an upcall to a userspace process, but this can't be written because it's busy writing something else (e.g., another userspace printf). In the upcall, the userspace process callsprintf
. The code forprintf
issues anotherdebug
call. The text of these two debug calls (one from the upcall, one from the printf) will be written together when it'sdebug''s turn to write, and the actual
printf` may occur either before or after them.The basic issue is that kernel logging and userspace print statements use different UART clients, which write independently of one another.
This PR provides a ConsoleOrdered capsule (name should be different) that provides the same writing interface as
Console
but usesdebug
to write to the same circular buffer as kernel circular buffer. This means that kernel prints and userspace print operations have the correct ordering. The one edge case is that userspace print operations are queued and have an inserted delay until completion. Sincedebug
is synchronous, there is no way to exert backpressure besides polling. The current approach simply writes as much as it can to the debug log, and then tries to write more after a timeout (which is configurable when instantiating the component). So this means that in real time it is possible for userspace to issue a printf system call, have that write delayed because the debug buffer was full, then see kerneldebug
operations that occured in wall time later appear earlier in the log. However, since the userspace process is blocked, none of thesedebug
operations could be in response to its operations. In theory, the userspace process could have been rescheduled before issuing theprintf
. So ConsoleOrdered does not provide true wall clock time ordering of print operations, but assuming that userspace prints are blocking, it maintains causal ordering.The system call driver maintains ordering between userspace printf operations by tracking a sequence number. Each call to the write system call increments a sequence number, which is then stored in that client. While a client (process) finishes sending, the driver looks for the client with a write pending that has the lowest sequence number. This handles wraparound for 32-bit values, under the assumption that there are at most 2^31 writes at any time.
The imix board has been configured to use this new capsule. The ConsoleOrdered capsule provides the same system call interface and device ID as the Console capsule.
Testing Strategy
This pull request was tested on imix by running two userspace processes which both write to the serial port with
printf
in a loop. The printfs are large, such thatnewlibc
breaks them into multiple writes. The capsule prints each write in entirety before interleaving another write. Addingdebug!
statements into the kernel has those statements appear in temporal order with respect to the userspaceprintf
writes.TODO or Help Wanted
This needs to incorporate reading.
Documentation Updated
/docs
, or no updates are required.Formatting
make prepush
.