Temporally order userspace print statements with kernel debug statements #3327

phil-levis · 2022-11-12T05:11:55Z

Pull Request Overview

Currently it possible for userspace printf(or putstr) calls to be re-ordered with respect to kernel debug! calls. This means that a kernel debug! call made before a userspace print operation can appear later in the console output. This can make debugging difficult, as it is challenging to correctly correlate kernel and userspace debug output.

This problem is caused by how serial port output is queued. debug operations synchronously write to a circular queue, which is then asynchronously written to the serial port. The userspace putstr API is technically asynchronous, but putstr and printf yield until completion, so they appear synchronous to userspace.

There are two problems. First, when there are multiple writers, their ordering is independent of their arrival. The first writer will start writing, and subsequent writers are queued. The order of this queue depends on the order in which they were initialized (it's a fixed order linked list determined at initialization), not the order of operations. So if writers A, B and C write in that order, it is possible that A, C, B will be the output.

Second, the fact that debug uses a circular queue that's written out asynchronously breaks any ordering of its operations. For example, suppose the kernel calls debug for an upcall to a userspace process, but this can't be written because it's busy writing something else (e.g., another userspace printf). In the upcall, the userspace process calls printf. The code for printf issues another debug call. The text of these two debug calls (one from the upcall, one from the printf) will be written together when it's debug''s turn to write, and the actual printf` may occur either before or after them.

The basic issue is that kernel logging and userspace print statements use different UART clients, which write independently of one another.

This PR provides a ConsoleOrdered capsule (name should be different) that provides the same writing interface as Console but uses debug to write to the same circular buffer as kernel circular buffer. This means that kernel prints and userspace print operations have the correct ordering. The one edge case is that userspace print operations are queued and have an inserted delay until completion. Since debug is synchronous, there is no way to exert backpressure besides polling. The current approach simply writes as much as it can to the debug log, and then tries to write more after a timeout (which is configurable when instantiating the component). So this means that in real time it is possible for userspace to issue a printf system call, have that write delayed because the debug buffer was full, then see kernel debug operations that occured in wall time later appear earlier in the log. However, since the userspace process is blocked, none of these debug operations could be in response to its operations. In theory, the userspace process could have been rescheduled before issuing the printf. So ConsoleOrdered does not provide true wall clock time ordering of print operations, but assuming that userspace prints are blocking, it maintains causal ordering.

The system call driver maintains ordering between userspace printf operations by tracking a sequence number. Each call to the write system call increments a sequence number, which is then stored in that client. While a client (process) finishes sending, the driver looks for the client with a write pending that has the lowest sequence number. This handles wraparound for 32-bit values, under the assumption that there are at most 2^31 writes at any time.

The imix board has been configured to use this new capsule. The ConsoleOrdered capsule provides the same system call interface and device ID as the Console capsule.

Testing Strategy

This pull request was tested on imix by running two userspace processes which both write to the serial port with printf in a loop. The printfs are large, such that newlibc breaks them into multiple writes. The capsule prints each write in entirety before interleaving another write. Adding debug! statements into the kernel has those statements appear in temporal order with respect to the userspace printf writes.

TODO or Help Wanted

This needs to incorporate reading.

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

userpace libraries to use device 0xa (print log) rather than 0x1 (console) for printing system calls.

boards/components/src/print_log.rs

capsules/src/print_log.rs

kernel/src/debug.rs

capsules/src/driver.rs

boards/components/src/debug_writer.rs

boards/components/src/print_log.rs

boards/imix/src/main.rs

capsules/src/print_log.rs

phil-levis · 2023-01-05T21:41:17Z

This was discussed in the core call on December 16th:

https://github.com/tock/tock/blob/912f18de861164afd51817b2d57164f1d7b32346/doc/wg/core/notes/core-notes-2022-12-16.md

The conclusion is that synchronous writes to the console (the code in this PR) should be considered an alternative implementation of the console system call API. Tock will therefore have two different implementations of the console system calls. One (the existing one today) performs asynchronous writes of arbitrary length, while another (the one in this PR) performs synchronous writes of finite length. This will require some changes to the API, as now the number of bytes written must be passed back to userspace so it knows whether the whole buffer was written. I am going to start working on this.

phil-levis · 2023-01-06T00:58:25Z

It looks like there is a complication: the newlib implementation of printf assumes that putnstr writes the entire buffer; printf returns the size of the complete buffer regardless of what putnstr returns. I see two options going forward:

Calls to printf always return the buffer size passed, even if it is not fully outputted.
Calls to printf always output the entire buffer, but the write is no longer atomic: other debug statements may be interleaved inside the printf (although not from that process).

There is also kind of a third option, which is to re-implement newlib's printf.

I'll put this on the agenda for the call tomorrow.

3369: debug_writer_component: make debug buffer size configurable; double default debug buffer size r=hudson-ayers a=hudson-ayers ### Pull Request Overview This pull request makes the debug buffer size configurable via an optional macro argument, and doubles the default size of the buffer to 2kB. I modified the Hifive1 to use the configurable parameter so that it still uses a 1kB buffer, since it is highly RAM constrained. This PR was inspired by #3327, which will ultimately require a much larger buffer (~8kB) for boards that use it. ### Testing Strategy This pull request was tested by compiling. ### TODO or Help Wanted Are any other boards sufficiently RAM constrained as to require a smaller buffer than 2kB? ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make prepush`. Co-authored-by: Hudson Ayers <[email protected]>

phil-levis · 2023-02-11T00:10:21Z

@kupiakos @bradjc

I am finally getting back to this. The call on 12/16/22 concluded that this new console and the old one should have the same system call device number, and be alternative implementations. This leads to the complication that the two behave differently. One (asynchronously) prints the entire offered string, while the other prints potentially a truncated subset. This means that we have two options for how the userspace library behaves:

have a write state machine in userspace that means large prints might be broken into multiple, smaller ones which the kernel can handle and userspace will make multiple print calls until the entire buffer is printed. Implication: a print from userspace will not be atomic and can have other writes interleaved within it, userspace becomes more complex/bigger.
truncate long prints from userspace: they atomically print as much as they can in the kernel buffer and return. Implication: prints might be incomplete (a warning is printed when this happens).

Looking through the call notes, I don't see a clear resolution. I suggested that we should do #2, but I don't see substantial discussion or agreement, so want to make sure that's the right conclusion before wrapping this up.

hudson-ayers · 2023-02-13T17:36:37Z

+1 for approach #2

phil-levis · 2023-02-18T00:17:22Z

@kupiakos @vsukhoml -- option 1 or option 2?

vsukhoml · 2023-02-18T22:27:28Z

Option 3: have a state machine in kernel console implementation which will print full lines from applications, possibly use line buffer in apps grant. Benefit vs option 1 - state machine is implemented once and shared across apps. Benefit vs. option 2 - print full lines.

There are certain corner cases I can think of:

Adding timestamps - probably time stamps should be added at the time of first symbol of string received by driver no matter when last symbol / line feed is received.
Probably need a flush call to force print to support prints with control codes which overwrite previously printed characters. E.g. some dynamic/animated prompt with / - \ | transition.

kupiakos · 2023-02-24T16:37:28Z

Option 1 resembles what our Ti50 console does right now - we line buffer in userspace and issue blocking_command syscalls for every line (grouping multiple lines into a single syscall when available). We've managed to strip down most of the size cost from doing this, but it's still there and duplicated per-app. An issue I have with approach 2 is how are userspace applications supposed to know the max amount they can send to the console at a time without data being dropped? For cases where we want to print a lot of data and guarantee it's all sent, we'd need to have some sort of breakup system in userspace anyways.

I'm curious what y'all think of Option 3 offered by Vadim.

phil-levis · 2023-03-03T21:46:44Z

@vsukhoml @kupiakos

Just getting back to this. I think I would need to understand the proposed semantics better. @vsukhoml You have described an implementation, not an interface. What is the desired behavior in response to a large printf that is larger than the available space in the kernel buffer? The current options are:

Prints the entire printf but other prints (concurrent, ordered) may be interspersed with it. I.e., if a print P2 appears between the start of print P1 and the end of print P1, this means it was invoked after P1 was invoked and before P1 returned.
Prints as much of the printf as it can atomically, truncating it if there is not enough space.

From your description, I can't tell if you want:
3a. Prints the entire print, other prints may be interspersed with it, but interspersals will always occur at line boundaries. Note that kernel prints may be truncated, as they are currently.
3b. Prints the entire printf without any interspersed prints.

b) is not possible without console locking or blocking within the kernel. Console locking would mean that other prints would be lost while it is locked (there is no way to buffer them).

If the desired behavior is 3a), this will be an increase in code size, as it will require an alternative debug!() implementation that checks how much space is left, and outputs up to the last newline that can fit in that space.

phil-levis · 2023-03-17T17:09:13Z

@vsukhoml and I discussed the desired semantics. They are temporal ordering and atomicity of a line (up to a newline) up to a finite length L. If userspace tries to print a line longer than L, the kernel will print up to L and return to userspace how much was printed. If userspace prints a buffer that is < L bytes and has more than one newline in it, it's up to the kernel on how many lines it prints (it could print 1, 2, or N).

with Vadim.

…erspersed kernel writes.

phil-levis · 2023-03-23T05:03:51Z

One update. libtock-c's newlib semantics for printf are:

An output string that contains multiple newlines is printed one line at a time (each line is a separate invocation of putnstr)
The maximum length of a single print is 1024 characters; if more than 1023 characters are before the newline it prints them in 1024 chunks and the last chunk ends in a newline

I talked with @vsukhoml and we concluded this means the kernel console driver shouldn't worry about newlines, merely ordered and atomic writes. As console writes no longer guarantee writing every byte, libtock-c will require some updates to sys.c.

The code now assures atomicity of writes up to 200 bytes (by waiting for 200 bytes to be available in the debug buffer) and ensures that userspace writes do not interleave. This means that kernel debug writes can interleave at 200 byte granularity if the debug buffer is full. I've tested the current code with two userspace writers and interleaved kernel writes (logging the system calls of the writes themselves). I'll test it some more in the next week.

phil-levis · 2023-03-23T05:06:05Z

On another note, I think I've also found some bugs in the newlib printf implementation. Writes that are between 1400-1800 bytes or so (without a newline) behave erratically, sometimes truncating the first 1024 bytes. It's deterministic based on the length, but different lengths behave differently. E.g., writes longer than 1480 bytes do not print the first 1024 bytes, while writes 1600-2000 or so do not print anything at all. I will look at the source to try to get to the bottom of this.

phil-levis · 2023-03-24T16:55:42Z

On another note, I think I've also found some bugs in the newlib printf implementation. Writes that are between 1400-1800 bytes or so (without a newline) behave erratically, sometimes truncating the first 1024 bytes. It's deterministic based on the length, but different lengths behave differently. E.g., writes longer than 1480 bytes do not print the first 1024 bytes, while writes 1600-2000 or so do not print anything at all. I will look at the source to try to get to the bottom of this.

We discussed this on the core call today, and @bradjc suggested it was likely a malloc failure issue within newlib. He's correct; if you increase the size of the process heap, then prints within this range complete correctly.

vsukhoml · 2023-03-24T17:08:22Z

Might be a reason to switch from newlib to https://github.com/vsukhoml/noc :-) - I tried to add all functionality required by libtock-c.

simpler configuration.

Console.

phil-levis · 2023-03-24T21:11:28Z

Your buffering scheme is definitely much better for an embedded system. :)

in-order console logging.

phil-levis · 2023-03-24T23:16:20Z

bors try

bors · 2023-03-24T23:17:23Z

try

Build failed:

ci-build (ubuntu-latest)

hudson-ayers · 2023-03-29T05:13:33Z

Is the status of this that it is ready for final review now?

phil-levis · 2023-03-29T06:04:53Z

Is the status of this that it is ready for final review now?

Not quite yet. Right now, this requires updates to libtock-c. I think I can change the logic to always complete writes before issuing a callback, which would not require an update to libtock-c.

writes from userspace before issuing an upcall.

phil-levis · 2023-03-31T22:53:59Z

@hudson-ayers @kupiakos

This is now ready for review.

bradjc

Looks good!

bradjc · 2023-04-17T17:24:24Z

bors r+

bors · 2023-04-17T17:53:06Z

Build succeeded:

phil-levis added 3 commits November 11, 2022 11:53

Happens-before logging.

0d98395

Synchronous/in-order print log implementation. Requires changing

4d98b66

userpace libraries to use device 0xa (print log) rather than 0x1 (console) for printing system calls.

10 ms callback, not 1s.

db2514b

github-actions bot added component kernel labels Nov 12, 2022

Formatting and removing warnings.

5d939e1

phil-levis marked this pull request as ready for review November 19, 2022 06:41

bradjc reviewed Nov 21, 2022

View reviewed changes

boards/components/src/print_log.rs Outdated Show resolved Hide resolved

capsules/src/print_log.rs Outdated Show resolved Hide resolved

capsules/src/print_log.rs Outdated Show resolved Hide resolved

hudson-ayers reviewed Nov 29, 2022

View reviewed changes

phil-levis added 2 commits December 14, 2022 11:44

Addressing comments on PR.

c6aef0b

Formatting

1e7f15a

hudson-ayers mentioned this pull request Jan 6, 2023

debug_writer_component: make debug buffer size configurable; double default debug buffer size #3369

Merged

2 tasks

Proper write size handling.

1222e6d

phil-levis added 2 commits March 17, 2023 13:54

Ordered console log following specification based on conversation

c6c5467

with Vadim.

Ordered console working. Tested with two application writers, and int…

514980f

…erspersed kernel writes.

Clean up components.

c95e98a

phil-levis added 4 commits March 24, 2023 13:22

Factor out parameters from constant into state variables, to allow

1064e79

simpler configuration.

Remove PrintLog device; ConsoleOrdered is now the same device as

deb74e9

Console.

Formatting.

9fad6d4

Merge branch 'master' of github.com:tock/tock into total_order_log

f25be28

Update all boards to new Writer::write interface to support

93e9ea2

in-order console logging.

github-actions bot added the WG-OpenTitan In the purview of the OpenTitan working group. label Mar 24, 2023

bors bot added a commit that referenced this pull request Mar 24, 2023

Try #3327:

bf891be

Typo.

ae4477f

phil-levis added 3 commits March 31, 2023 11:24

Remove partial writes from ordered console; the console now completes

24ae2b8

writes from userspace before issuing an upcall.

Formatting.

3373cc5

Improve the comments. Clean up a little formatting.

377ef65

phil-levis added 4 commits April 7, 2023 09:04

Merge branch 'master' of github.com:tock/tock into total_order_log

5a93e97

Add receive functionality.

16cd33d

Improve comment.

b07e741

Formatting.

9a4db7f

bradjc approved these changes Apr 12, 2023

View reviewed changes

bradjc added the last-call Final review period for a pull request. label Apr 13, 2023

bors bot merged commit 5a42ae9 into master Apr 17, 2023

bors bot deleted the total_order_log branch April 17, 2023 17:53

Uh oh!

Temporally order userspace print statements with kernel debug statements #3327

Temporally order userspace print statements with kernel debug statements #3327

Uh oh!

Conversation

phil-levis commented Nov 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

phil-levis commented Jan 5, 2023

Uh oh!

phil-levis commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil-levis commented Feb 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hudson-ayers commented Feb 13, 2023

Uh oh!

phil-levis commented Feb 18, 2023

Uh oh!

vsukhoml commented Feb 18, 2023

Uh oh!

kupiakos commented Feb 24, 2023

Uh oh!

phil-levis commented Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil-levis commented Mar 17, 2023

Uh oh!

phil-levis commented Mar 23, 2023

Uh oh!

phil-levis commented Mar 23, 2023

Uh oh!

phil-levis commented Mar 24, 2023

Uh oh!

vsukhoml commented Mar 24, 2023

Uh oh!

phil-levis commented Mar 24, 2023

Uh oh!

phil-levis commented Mar 24, 2023

Uh oh!

bors bot commented Mar 24, 2023

try

Uh oh!

hudson-ayers commented Mar 29, 2023

Uh oh!

phil-levis commented Mar 29, 2023

Uh oh!

phil-levis commented Mar 31, 2023

Uh oh!

bradjc left a comment

Choose a reason for hiding this comment

Uh oh!

bradjc commented Apr 17, 2023

Uh oh!

bors bot commented Apr 17, 2023

Uh oh!

Uh oh!

phil-levis commented Nov 12, 2022 •

edited

Loading

phil-levis commented Jan 6, 2023 •

edited

Loading

phil-levis commented Feb 11, 2023 •

edited

Loading

phil-levis commented Mar 3, 2023 •

edited

Loading