Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

phil-levis
Copy link
Contributor

@phil-levis phil-levis commented Nov 12, 2022

Pull Request Overview

Currently it possible for userspace printf(or putstr) calls to be re-ordered with respect to kernel debug! calls. This means that a kernel debug! call made before a userspace print operation can appear later in the console output. This can make debugging difficult, as it is challenging to correctly correlate kernel and userspace debug output.

This problem is caused by how serial port output is queued. debug operations synchronously write to a circular queue, which is then asynchronously written to the serial port. The userspace putstr API is technically asynchronous, but putstr and printf yield until completion, so they appear synchronous to userspace.

There are two problems. First, when there are multiple writers, their ordering is independent of their arrival. The first writer will start writing, and subsequent writers are queued. The order of this queue depends on the order in which they were initialized (it's a fixed order linked list determined at initialization), not the order of operations. So if writers A, B and C write in that order, it is possible that A, C, B will be the output.

Second, the fact that debug uses a circular queue that's written out asynchronously breaks any ordering of its operations. For example, suppose the kernel calls debug for an upcall to a userspace process, but this can't be written because it's busy writing something else (e.g., another userspace printf). In the upcall, the userspace process calls printf. The code for printf issues another debug call. The text of these two debug calls (one from the upcall, one from the printf) will be written together when it's debug''s turn to write, and the actual printf` may occur either before or after them.

The basic issue is that kernel logging and userspace print statements use different UART clients, which write independently of one another.

This PR provides a ConsoleOrdered capsule (name should be different) that provides the same writing interface as Console but uses debug to write to the same circular buffer as kernel circular buffer. This means that kernel prints and userspace print operations have the correct ordering. The one edge case is that userspace print operations are queued and have an inserted delay until completion. Since debug is synchronous, there is no way to exert backpressure besides polling. The current approach simply writes as much as it can to the debug log, and then tries to write more after a timeout (which is configurable when instantiating the component). So this means that in real time it is possible for userspace to issue a printf system call, have that write delayed because the debug buffer was full, then see kernel debug operations that occured in wall time later appear earlier in the log. However, since the userspace process is blocked, none of these debug operations could be in response to its operations. In theory, the userspace process could have been rescheduled before issuing the printf. So ConsoleOrdered does not provide true wall clock time ordering of print operations, but assuming that userspace prints are blocking, it maintains causal ordering.

The system call driver maintains ordering between userspace printf operations by tracking a sequence number. Each call to the write system call increments a sequence number, which is then stored in that client. While a client (process) finishes sending, the driver looks for the client with a write pending that has the lowest sequence number. This handles wraparound for 32-bit values, under the assumption that there are at most 2^31 writes at any time.

The imix board has been configured to use this new capsule. The ConsoleOrdered capsule provides the same system call interface and device ID as the Console capsule.

Testing Strategy

This pull request was tested on imix by running two userspace processes which both write to the serial port with printf in a loop. The printfs are large, such that newlibc breaks them into multiple writes. The capsule prints each write in entirety before interleaving another write. Adding debug! statements into the kernel has those statements appear in temporal order with respect to the userspace printf writes.

TODO or Help Wanted

This needs to incorporate reading.

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

userpace libraries to use device 0xa (print log) rather than 0x1 (console)
for printing system calls.
@phil-levis phil-levis marked this pull request as ready for review November 19, 2022 06:41
@phil-levis
Copy link
Contributor Author

This was discussed in the core call on December 16th:

https://github.com/tock/tock/blob/912f18de861164afd51817b2d57164f1d7b32346/doc/wg/core/notes/core-notes-2022-12-16.md

The conclusion is that synchronous writes to the console (the code in this PR) should be considered an alternative implementation of the console system call API. Tock will therefore have two different implementations of the console system calls. One (the existing one today) performs asynchronous writes of arbitrary length, while another (the one in this PR) performs synchronous writes of finite length. This will require some changes to the API, as now the number of bytes written must be passed back to userspace so it knows whether the whole buffer was written. I am going to start working on this.

@phil-levis
Copy link
Contributor Author

phil-levis commented Jan 6, 2023

It looks like there is a complication: the newlib implementation of printf assumes that putnstr writes the entire buffer; printf returns the size of the complete buffer regardless of what putnstr returns. I see two options going forward:

  1. Calls to printf always return the buffer size passed, even if it is not fully outputted.
  2. Calls to printf always output the entire buffer, but the write is no longer atomic: other debug statements may be interleaved inside the printf (although not from that process).

There is also kind of a third option, which is to re-implement newlib's printf.

I'll put this on the agenda for the call tomorrow.

bors bot added a commit that referenced this pull request Jan 18, 2023
3369: debug_writer_component: make debug buffer size configurable; double default debug buffer size r=hudson-ayers a=hudson-ayers

### Pull Request Overview

This pull request makes the debug buffer size configurable via an optional macro argument, and doubles the default size of the buffer to 2kB. I modified the Hifive1 to use the configurable parameter so that it still uses a 1kB buffer, since it is highly RAM constrained.

This PR was inspired by #3327, which will ultimately require a much larger buffer (~8kB) for boards that use it.

### Testing Strategy

This pull request was tested by compiling.


### TODO or Help Wanted

Are any other boards sufficiently RAM constrained as to require a smaller buffer than 2kB?


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make prepush`.


Co-authored-by: Hudson Ayers <[email protected]>
@phil-levis
Copy link
Contributor Author

phil-levis commented Feb 11, 2023

@kupiakos @bradjc

I am finally getting back to this. The call on 12/16/22 concluded that this new console and the old one should have the same system call device number, and be alternative implementations. This leads to the complication that the two behave differently. One (asynchronously) prints the entire offered string, while the other prints potentially a truncated subset. This means that we have two options for how the userspace library behaves:

  1. have a write state machine in userspace that means large prints might be broken into multiple, smaller ones which the kernel can handle and userspace will make multiple print calls until the entire buffer is printed. Implication: a print from userspace will not be atomic and can have other writes interleaved within it, userspace becomes more complex/bigger.
  2. truncate long prints from userspace: they atomically print as much as they can in the kernel buffer and return. Implication: prints might be incomplete (a warning is printed when this happens).

Looking through the call notes, I don't see a clear resolution. I suggested that we should do #2, but I don't see substantial discussion or agreement, so want to make sure that's the right conclusion before wrapping this up.

@hudson-ayers
Copy link
Contributor

+1 for approach #2

@phil-levis
Copy link
Contributor Author

@kupiakos @vsukhoml -- option 1 or option 2?

@vsukhoml
Copy link
Contributor

Option 3: have a state machine in kernel console implementation which will print full lines from applications, possibly use line buffer in apps grant. Benefit vs option 1 - state machine is implemented once and shared across apps. Benefit vs. option 2 - print full lines.

There are certain corner cases I can think of:

  1. Adding timestamps - probably time stamps should be added at the time of first symbol of string received by driver no matter when last symbol / line feed is received.
  2. Probably need a flush call to force print to support prints with control codes which overwrite previously printed characters. E.g. some dynamic/animated prompt with / - \ | transition.

@kupiakos
Copy link
Contributor

Option 1 resembles what our Ti50 console does right now - we line buffer in userspace and issue blocking_command syscalls for every line (grouping multiple lines into a single syscall when available). We've managed to strip down most of the size cost from doing this, but it's still there and duplicated per-app. An issue I have with approach 2 is how are userspace applications supposed to know the max amount they can send to the console at a time without data being dropped? For cases where we want to print a lot of data and guarantee it's all sent, we'd need to have some sort of breakup system in userspace anyways.

I'm curious what y'all think of Option 3 offered by Vadim.

@phil-levis
Copy link
Contributor Author

phil-levis commented Mar 3, 2023

@vsukhoml @kupiakos

Just getting back to this. I think I would need to understand the proposed semantics better. @vsukhoml You have described an implementation, not an interface. What is the desired behavior in response to a large printf that is larger than the available space in the kernel buffer? The current options are:

  1. Prints the entire printf but other prints (concurrent, ordered) may be interspersed with it. I.e., if a print P2 appears between the start of print P1 and the end of print P1, this means it was invoked after P1 was invoked and before P1 returned.
  2. Prints as much of the printf as it can atomically, truncating it if there is not enough space.

From your description, I can't tell if you want:
3a. Prints the entire print, other prints may be interspersed with it, but interspersals will always occur at line boundaries. Note that kernel prints may be truncated, as they are currently.
3b. Prints the entire printf without any interspersed prints.

b) is not possible without console locking or blocking within the kernel. Console locking would mean that other prints would be lost while it is locked (there is no way to buffer them).

If the desired behavior is 3a), this will be an increase in code size, as it will require an alternative debug!() implementation that checks how much space is left, and outputs up to the last newline that can fit in that space.

@phil-levis
Copy link
Contributor Author

@vsukhoml and I discussed the desired semantics. They are temporal ordering and atomicity of a line (up to a newline) up to a finite length L. If userspace tries to print a line longer than L, the kernel will print up to L and return to userspace how much was printed. If userspace prints a buffer that is < L bytes and has more than one newline in it, it's up to the kernel on how many lines it prints (it could print 1, 2, or N).

@phil-levis
Copy link
Contributor Author

One update. libtock-c's newlib semantics for printf are:

  • An output string that contains multiple newlines is printed one line at a time (each line is a separate invocation of putnstr)
  • The maximum length of a single print is 1024 characters; if more than 1023 characters are before the newline it prints them in 1024 chunks and the last chunk ends in a newline

I talked with @vsukhoml and we concluded this means the kernel console driver shouldn't worry about newlines, merely ordered and atomic writes. As console writes no longer guarantee writing every byte, libtock-c will require some updates to sys.c.

The code now assures atomicity of writes up to 200 bytes (by waiting for 200 bytes to be available in the debug buffer) and ensures that userspace writes do not interleave. This means that kernel debug writes can interleave at 200 byte granularity if the debug buffer is full. I've tested the current code with two userspace writers and interleaved kernel writes (logging the system calls of the writes themselves). I'll test it some more in the next week.

@phil-levis
Copy link
Contributor Author

On another note, I think I've also found some bugs in the newlib printf implementation. Writes that are between 1400-1800 bytes or so (without a newline) behave erratically, sometimes truncating the first 1024 bytes. It's deterministic based on the length, but different lengths behave differently. E.g., writes longer than 1480 bytes do not print the first 1024 bytes, while writes 1600-2000 or so do not print anything at all. I will look at the source to try to get to the bottom of this.

@phil-levis
Copy link
Contributor Author

On another note, I think I've also found some bugs in the newlib printf implementation. Writes that are between 1400-1800 bytes or so (without a newline) behave erratically, sometimes truncating the first 1024 bytes. It's deterministic based on the length, but different lengths behave differently. E.g., writes longer than 1480 bytes do not print the first 1024 bytes, while writes 1600-2000 or so do not print anything at all. I will look at the source to try to get to the bottom of this.

We discussed this on the core call today, and @bradjc suggested it was likely a malloc failure issue within newlib. He's correct; if you increase the size of the process heap, then prints within this range complete correctly.

@vsukhoml
Copy link
Contributor

Might be a reason to switch from newlib to https://github.com/vsukhoml/noc :-) - I tried to add all functionality required by libtock-c.

@phil-levis
Copy link
Contributor Author

Your buffering scheme is definitely much better for an embedded system. :)

@github-actions github-actions bot added the WG-OpenTitan In the purview of the OpenTitan working group. label Mar 24, 2023
@phil-levis
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Mar 24, 2023
@bors
Copy link
Contributor

bors bot commented Mar 24, 2023

try

Build failed:

@hudson-ayers
Copy link
Contributor

Is the status of this that it is ready for final review now?

@phil-levis
Copy link
Contributor Author

Is the status of this that it is ready for final review now?

Not quite yet. Right now, this requires updates to libtock-c. I think I can change the logic to always complete writes before issuing a callback, which would not require an update to libtock-c.

@phil-levis
Copy link
Contributor Author

@hudson-ayers @kupiakos

This is now ready for review.

Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@bradjc bradjc added the last-call Final review period for a pull request. label Apr 13, 2023
@bradjc
Copy link
Contributor

bradjc commented Apr 17, 2023

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 17, 2023

@bors bors bot merged commit 5a42ae9 into master Apr 17, 2023
@bors bors bot deleted the total_order_log branch April 17, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component kernel last-call Final review period for a pull request. WG-OpenTitan In the purview of the OpenTitan working group.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants