Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

lschuermann
Copy link
Member

Pull Request Overview

The LiteUART peripheral driver transmit code contained a bug where in a certain scenario, an interrupt is not expected to be thrown and a deferred call is requested instead, but the hardware generates an interrupt anyways. If this happens, a panic such as the following will be issued:

panicked at 'no tx buffer', chips/litex/src/uart.rs:246:44
        Kernel version [...]

---| No debug queue found. You can set it with the DebugQueue
component.

---| LiteX configuration for LiteX on Arty A7 |---
---| RISC-V Machine State |---

The root cause lies in the fact that the LiteUART hardware peripheral uses a FIFO buffer of a certain depth. Transmission of bytes is started as soon as data is fed into the FIFO. A TX event (interrupt) will be asserted on a falling edge of the txfull signal, i.e. as soon as the hardware is capable to send more data but was at capacity.

The driver as implemented in Tock will -- in a loop -- check if there is still space in the transmission FIFO (!txfull) and place bytes into the FIFO accordingly. As soon as either the buffer is full (txfull == true), OR the entire buffer has been transmitted, the loop is aborted. Following that, the code evaluates whether the hardware will issue an interrupt, either to continue sending more data or inform the client about the finished transmission. If the buffer has been fully transmitted and txfull was never asserted, the hardware is assumed to not generate an interrupt.

However, the hardware might be able to transmit some data between inserting a byte into the FIFO and checking whether the FIFO is full, causing txfull to not be asserted during the check while the FIFO was temporarily at capacity.

During normal transmission this is not an issue, since the driver will simply fill the free space with additional data. However, when this occurs on the last byte to send, the driver assumes that the buffer capacity has not been reached, while the hardware was temporarily at capacity (txfull == true) and sent at least one byte prior to reading txfull (which will report txfull == false). As specified, the TX interrupt is asserted on this falling edge and as such both an interrupt and a deferred call will be delivered, causing the aforementioned panic.

This commit introduces an additional check to determine whether an interrupt will be delivered: if either txfull is asserted (an interrupt will be generated as soon as transmission of at least one byte completes) OR if the TX event is already asserted (txfull was temporarily asserted) an interrupt will be delivered and no deferred call is issued.


I've known of this bug for some time, though it was hard to trace down:

  • it occurs (in my UART "fuzzing" tests) on average once every 4-6h
  • it reliably vanishes once you try to debug it, even when using I/Os (probably a typical Heisenbug)
  • it only reliably occurs when the UART baudrate is relatively high compared to the CPU frequency (e.g. 100MHz / 1MBaud), making a simulation in Verilator difficult
  • the LiteUART hardware peripheral is designed to be trivial and have low implementation complexity, but this makes the interrupt handling logic in the driver easy to mess up

Testing Strategy

This pull request was tested by running a kernel/app combination which previously managed to cause the panic on a 1MBaud UART for approx. 20 hours. There's a good chance this fixes it, as it makes sense when looking at the Verilog/Migen side of things. There might be additional issues generating further unexpected interrupts, though I'm pretty confident this solves the issues and I have the interrupts under control now 😄.

TODO or Help Wanted

N/A

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

The LiteUART peripheral driver transmit code contained a bug where in
a certain scenario, an interrupt is not expected to be thrown and a
deferred call is requested instead, but the hardware generates an
interrupt anyways. If this happens, a panic such as the following will
be issued:

    panicked at 'no tx buffer', chips/litex/src/uart.rs:246:44
            Kernel version [...]

    ---| No debug queue found. You can set it with the DebugQueue
    component.

    ---| LiteX configuration for LiteX on Arty A7 |---
    ---| RISC-V Machine State |---

The root cause lies in the fact that the LiteUART hardware peripheral
uses a FIFO buffer of a certain depth. Transmission of bytes is
started as soon as data is fed into the FIFO. A TX event (interrupt)
will be asserted on a falling edge of the `txfull` signal, i.e. as
soon as the hardware is capable to send more data but was at capacity.

The driver as implemented in Tock will -- in a loop -- check if there
is still space in the transmission FIFO (!txfull) and place bytes into
the FIFO accordingly. As soon as either the buffer is full (txfull ==
true), OR the entire buffer has been transmitted, the loop is
aborted. Following that, the code evaluates whether the hardware will
issue an interrupt, either to continue sending more data or inform the
client about the finished transmission. If the buffer has been fully
transmitted and txfull was never asserted, the hardware is assumed to
not generate an interrupt.

However, the hardware might be able to transmit some data between
inserting a byte into the FIFO and checking whether the FIFO is full,
causing txfull to not be asserted during the check while the FIFO was
temporarily at capacity.

During normal transmission this is not an issue, since the driver will
simply fill the free space with additional data. However, when this
occurs on the last byte to send, the driver assumes that the buffer
capacity has not been reached, while the hardware was temporarily at
capacity (txfull == true) and sent at least one byte prior to reading
txfull (which will report txfull == false). As specified, the TX
interrupt is asserted on this falling edge and as such both an
interrupt and a deferred call will be delivered, causing the
aforementioned panic.

This commit introduces an additional check to determine whether an
interrupt will be delivered: if either txfull is asserted (an
interrupt will be generated as soon as transmission of at least one
byte completes) OR if the TX event is already asserted (txfull was
temporarily asserted) an interrupt will be delivered and no deferred
call is issued.

Signed-off-by: Leon Schuermann <[email protected]>
@lschuermann lschuermann added the bug label May 6, 2021
@ppannuto
Copy link
Member

ppannuto commented May 7, 2021

bors r+

Awesome debugging 👍🏼

@bors
Copy link
Contributor

bors bot commented May 7, 2021

@bors bors bot merged commit a08ca25 into tock:master May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants