-
-
Notifications
You must be signed in to change notification settings - Fork 779
Time redesign v3 #2089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time redesign v3 #2089
Conversation
Co-Authored-By: Niklas Adolfsson <[email protected]>
Co-Authored-By: Niklas Adolfsson <[email protected]>
precludes low power operation.
tests when alarms are significantly delayed. It submits alarms with a "now" (the 'reference' parameter) in the past, such that some alarms are in the future and others have already passed.
implementation.
|
My latest output from hifive1b hardware after some fixes: The "now" lines are when It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired. I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts. |
This is super helpful, thank you. I will look tomorrow. |
Whatever is going on is more insidious than just this. It also looks like whenever an interrupt fires, all three Alarms fire even if some of them are in the future (hence the huge diff values, which represent small negative numbers). These point at a common problem, incorrectly calculating if an Alarm X has expired at time T. I recreated these cases on imix (ordering of Alarms) and it works fine. I am wondering if this has something to do with Ticks64 and if its inequality tests are wrong; I am going to test on OpenTitan. |
when there are long running computations in the kernel that start alarms.
The first bug related to deciding if a new alarm was sooner than the
current earliest alarm. The old logic checked if the current earliest
expiration didn't fall within the new alarm's interval. The bug here
is what happens if the current earliest was set earlier in this
execution of the kernel loop (e.g., at boot, setting up the alarm test)
but by the time the newer alarm is started it has already expired.
So,
|----| earliest
|----| new
in the old logic the new alarm would be seen as earlier. The issue is
not being within the interval of the newer alarm means the existing is
either earlier or later; there's now a test to check that it's later
(by looking at now).
The second bug related to deciding if an alarm had expired. The logic
was using the alarm compare value as "now" rather than the real now.
This ran into similar to problems above: if there was a long computation
path in the kernel, such that the hardware alarm had expired before a
later alarm was entered, then this later alarm would not have the
expiration time within its interval. This use of a virtual "now" based
on expiration time was intended to handle some edge cases which the
rest of the logic handle correctly, so I moved this back to using
the real "now" rather than a virtual "now" from the past.
|
I think that the bugs related to the speed at which the RISCV clocks were going in comparison to their CPU speed. The fact that the tests start in the boot sequence exacerbates this, as there's a long-running computation. The key problem was this: Alarm B is very short, and the time until AlarmC is long enough that its reference (start point) is outside the interval of B. The core logic in the virtualizer assumed this wouldn't happen. If it saw that the current earliest expiration fell outside the interval of an alarm A, it assumed it mean that alarm A had expired. I added logic to handle the case that considers A may be in the future and not just the past. Please try now. @bradjc |
|
Ok looks good on hifive1b: Although not sure I can check the box until #2116 is merged since without that the timers don't work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @alistair23 I think the only thing blocking this now is the apollo3 test
|
It's currently working on Apollo3 |
|
Never mind, I was using the old libtock-c, it seems to work with command_num 6 int main(void) {
putnstr_async(hello, sizeof(hello), nop, NULL);
delay_ms (0);
putnstr_async(hello, sizeof(hello), nop, NULL);
return 0;
} |
|
bors r+ |
Yes, the next step is to upgrade libtock-c to use the new command. |
Pull Request Overview
This pull request updates the time HIL to address a series of bug reports with the previous API (#1651, #1691, #1513). It also incorporates proposed changes by @gendx to generalize the width of counters/alarms/timers with an associated type rather than assume 32 bits (#1521).
This has been implemented on all of the chips. It has been tested for the 24-bit nRF52 series, the 32-bit SAM4L, and 64-bit OpenTitan.
The overall design and summary of the traits is described in
https://github.com/tock/tock/blob/time-redesign-v3/doc/reference/trd-time.md
We will update this document and give it a TRD number when ready to merge.
There is also an update to the system call API: a new command for Alarm passes both a reference time and a dt. This new API can be used by using the
timer_v3_updatesbranch of libtock-c.Testing Strategy
This pull request was tested by compiling and testing on nrf52, SAM4L (imix), and OpenTitan (FPGA) boards. For imix and OT, it was tested using the multi_alarm_test and multi_timer_test tests in the kernel. On imix, it was tested in userspace by running a pair of multi_alarm_test processes.
I was not able to test the userspace alarm driver on OpenTitan -- after struggling to get libtock-rs applications to run and librtock-c ones to compile I gave up. This is an important test because the capsule is 32 bits, and tries to automatically handle an underlying 64-bit Alarm.
TODO or Help Wanted
This pull request needs userspace testing on OT (to test that 64-to-32 conversion works correctly for the userspace API). This PR updates the mtimer implementation to seed it with a value close to a 32-bit overflow. So you do not have to run the test very long. Any userspace application that uses an alarm should be a good test.
This pull request needs kernel testing on
To test, you need run a
multi_alarm_test. I've added amulti_alarm_testfor each board and modified each board'smain.rsto invoke it. Double-check you see a call tomulti_alarm_test::run_multi_alarm(mux_alarm).This test starts 3 alarms (A, B, C). The
dtof these alarms is random, with one in 11 alarms (randomly) having adtof 0. A typical output of the test looks something like this (this is from OpenTitan):The
delayvalue is thedtset for the next invocation of this Alarm. Thediffvalue is the number of ticks between the desired firing time and a call tonowin the firing. Note that this value is large (e.g., 7 ticks above!) mostly because of these print statements: formatting the numbers takes significant cycles at these timescales.The three things to look for to make sure the test is running properly are:
Documentation Updated
/docs, or no updates are required.Formatting
make prepush.