Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@phil-levis
Copy link
Contributor

@phil-levis phil-levis commented Sep 1, 2020

Pull Request Overview

This pull request updates the time HIL to address a series of bug reports with the previous API (#1651, #1691, #1513). It also incorporates proposed changes by @gendx to generalize the width of counters/alarms/timers with an associated type rather than assume 32 bits (#1521).

This has been implemented on all of the chips. It has been tested for the 24-bit nRF52 series, the 32-bit SAM4L, and 64-bit OpenTitan.

The overall design and summary of the traits is described in

https://github.com/tock/tock/blob/time-redesign-v3/doc/reference/trd-time.md

We will update this document and give it a TRD number when ready to merge.

There is also an update to the system call API: a new command for Alarm passes both a reference time and a dt. This new API can be used by using the timer_v3_updates branch of libtock-c.

Testing Strategy

This pull request was tested by compiling and testing on nrf52, SAM4L (imix), and OpenTitan (FPGA) boards. For imix and OT, it was tested using the multi_alarm_test and multi_timer_test tests in the kernel. On imix, it was tested in userspace by running a pair of multi_alarm_test processes.

I was not able to test the userspace alarm driver on OpenTitan -- after struggling to get libtock-rs applications to run and librtock-c ones to compile I gave up. This is an important test because the capsule is 32 bits, and tries to automatically handle an underlying 64-bit Alarm.

TODO or Help Wanted

This pull request needs userspace testing on OT (to test that 64-to-32 conversion works correctly for the userspace API). This PR updates the mtimer implementation to seed it with a value close to a 32-bit overflow. So you do not have to run the test very long. Any userspace application that uses an alarm should be a good test.

This pull request needs kernel testing on

To test, you need run a multi_alarm_test. I've added a multi_alarm_test for each board and modified each board's main.rs to invoke it. Double-check you see a call to multi_alarm_test::run_multi_alarm(mux_alarm).

This test starts 3 alarms (A, B, C). The dt of these alarms is random, with one in 11 alarms (randomly) having a dt of 0. A typical output of the test looks something like this (this is from OpenTitan):

TestA@Ticks64(17033607736): Expected at Ticks64(17033607729) (diff = Ticks64(7)), setting alarm to Ticks64(17033616266) (delay = Ticks64(8537))
TestB: Alarm fired.
TestB@Ticks64(17033614398): Expected at Ticks64(17033614391) (diff = Ticks64(7)), setting alarm to Ticks64(17033626851) (delay = Ticks64(12462))
TestC: Alarm fired.
TestC@Ticks64(17033614581): Expected at Ticks64(17033614576) (diff = Ticks64(5)), setting alarm to Ticks64(17033618165) (delay = Ticks64(3592))
TestA: Alarm fired.
TestA@Ticks64(17033616273): Expected at Ticks64(17033616266) (diff = Ticks64(7)), setting alarm to Ticks64(17033629481) (delay = Ticks64(13214))
TestC: Alarm fired.
TestC@Ticks64(17033618172): Expected at Ticks64(17033618165) (diff = Ticks64(7)), setting alarm to Ticks64(17033626435) (delay = Ticks64(8270))

The delay value is the dt set for the next invocation of this Alarm. The diff value is the number of ticks between the desired firing time and a call to now in the firing. Note that this value is large (e.g., 7 ticks above!) mostly because of these print statements: formatting the numbers takes significant cycles at these timescales.

The three things to look for to make sure the test is running properly are:

  • All 3 Alarms are firing (one has not been lost or dropped or otherwise miscalculated)
  • The diff values are always positive
  • The diff of alarm firings after a delay of 0 are not excessively high (they will be higher than non-zero delays)

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

phil-levis and others added 30 commits March 6, 2020 13:23
tests when alarms are significantly delayed. It submits alarms
with a "now" (the 'reference' parameter) in the past, such that
some alarms are in the future and others have already passed.
@bradjc
Copy link
Contributor

bradjc commented Sep 18, 2020

My latest output from hifive1b hardware after some fixes:

ATE0-->ATE0
OK
AT+BLEINIT=0-->OK
AT+CWMODE=0-->OK

HiFive1 initialization complete.
Entering main loop.
Starting multi alarm test.
Starting random alarm test TestA.
now Ticks64(1938) dt Ticks64(39284) val 41183
TestA@Ticks64(1905): Expected at Ticks64(0) (diff = Ticks64(1905)), setting alarm to Ticks64(41183) (delay = Ticks64(39284))
Starting random alarm test TestB.
now Ticks64(2430) dt Ticks64(0) val 2430
TestB@Ticks64(2415): Expected at Ticks64(0) (diff = Ticks64(2415)), setting alarm to Ticks64(2415) (delay = Ticks64(0))
Starting random alarm test TestC.
now Ticks64(2675) dt Ticks64(3703) val 6359
TestC@Ticks64(2660): Expected at Ticks64(0) (diff = Ticks64(2660)), setting alarm to Ticks64(6359) (delay = Ticks64(3703))
here??
TestC: Alarm fired.
TestC@Ticks64(6445): Expected at Ticks64(6359) (diff = Ticks64(86)), setting alarm to Ticks64(25471) (delay = Ticks64(19029))
TestB: Alarm fired.
TestB@Ticks64(6544): Expected at Ticks64(2415) (diff = Ticks64(4129)), setting alarm to Ticks64(6544) (delay = Ticks64(0))
TestA: Alarm fired.
TestA@Ticks64(6641): Expected at Ticks64(41183) (diff = Ticks64(18446744073709517074)), setting alarm to Ticks64(10865) (delay = Ticks64(4228))
now Ticks64(6795) dt Ticks64(0) val 6795
here??
TestC: Alarm fired.
TestC@Ticks64(6882): Expected at Ticks64(25471) (diff = Ticks64(18446744073709533027)), setting alarm to Ticks64(41235) (delay = Ticks64(34355))
TestB: Alarm fired.
TestB@Ticks64(7001): Expected at Ticks64(6544) (diff = Ticks64(457)), setting alarm to Ticks64(7001) (delay = Ticks64(0))
TestA: Alarm fired.
TestA@Ticks64(7088): Expected at Ticks64(10865) (diff = Ticks64(18446744073709547839)), set
*** DEBUG BUFFER FULL ***

The "now" lines are when set_alarm() is called in machine_timer.rs, and the "here??" lines are when handle_interrupt() is called in machine_timer.rs.

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

@phil-levis
Copy link
Contributor Author

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

This is super helpful, thank you. I will look tomorrow.

@phil-levis
Copy link
Contributor Author

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

Whatever is going on is more insidious than just this. It also looks like whenever an interrupt fires, all three Alarms fire even if some of them are in the future (hence the huge diff values, which represent small negative numbers). These point at a common problem, incorrectly calculating if an Alarm X has expired at time T.

I recreated these cases on imix (ordering of Alarms) and it works fine. I am wondering if this has something to do with Ticks64 and if its inequality tests are wrong; I am going to test on OpenTitan.

when there are long running computations in the kernel that start alarms.

The first bug related to deciding if a new alarm was sooner than the
current earliest alarm. The old logic checked if the current earliest
expiration didn't fall within the new alarm's interval. The bug here
is what happens if the current earliest was set earlier in this
execution of the kernel loop (e.g., at boot, setting up the alarm test)
but by the time the newer alarm is started it has already expired.
So,

|----|         earliest
       |----|  new

in the old logic the new alarm would be seen as earlier. The issue is
not being within the interval of the newer alarm means the existing is
either earlier or later; there's now a test to check that it's later
(by looking at now).

The second bug related to deciding if an alarm had expired. The logic
was using the alarm compare value as "now" rather than the real now.
This ran into similar to problems above: if there was a long computation
path in the kernel, such that the hardware alarm had expired before a
later alarm was entered, then this later alarm would not have the
expiration time within its interval. This use of a virtual "now" based
on expiration time was intended to handle some edge cases which the
rest of the logic handle correctly, so I moved this back to using
the real "now" rather than a virtual "now" from the past.
@phil-levis
Copy link
Contributor Author

phil-levis commented Sep 19, 2020

I think that the bugs related to the speed at which the RISCV clocks were going in comparison to their CPU speed. The fact that the tests start in the boot sequence exacerbates this, as there's a long-running computation. The key problem was this:

Alarm B    |--------|
Alarm C                   |------------------|

Alarm B is very short, and the time until AlarmC is long enough that its reference (start point) is outside the interval of B. The core logic in the virtualizer assumed this wouldn't happen. If it saw that the current earliest expiration fell outside the interval of an alarm A, it assumed it mean that alarm A had expired. I added logic to handle the case that considers A may be in the future and not just the past. Please try now. @bradjc

@bradjc
Copy link
Contributor

bradjc commented Sep 21, 2020

Ok looks good on hifive1b:

ATE0-->ATE0
OK
AT+BLEINIT=0-->OK
AT+CWMODE=0-->OK

HiFive1 initialization complete.
Entering main loop.
Starting multi alarm test.
Starting random alarm test TestA.
val 41172
TestA@Ticks64(1894): Expected at Ticks64(0) (diff = Ticks64(1894)), setting alarm to Ticks64(41172) (delay = Ticks64(39284))
Starting random alarm test TestB.
val 2356
TestB@Ticks64(2332): Expected at Ticks64(0) (diff = Ticks64(2332)), setting alarm to Ticks64(2332) (delay = Ticks64(0))
Starting random alarm test TestC.
TestC@Ticks64(2595): Expected at Ticks64(0) (diff = Ticks64(2595)), setting alarm to Ticks64(6294) (delay = Ticks64(3703))
int
TestB: Alarm fired.
TestB@Ticks64(2812): Expected at Ticks64(2332) (diff = Ticks64(480)), setting alarm to Ticks64(2812) (delay = Ticks64(0))
val 2903
int
TestB: Alarm fired.
TestB@Ticks64(3037): Expected at Ticks64(2812) (diff = Ticks64(225)), setting alarm to Ticks64(3037) (delay = Ticks64(0))
val 3143
int
TestB: Alarm fired.
TestB@Ticks64(3229): Expected at Ticks64(3
*** DEBUG BUFFER FULL ***
int
TestC: Alarm fired.
TestC@Ticks64(6370): Expected at Ticks64(6294) (diff = Ticks64(76)), setting alarm to Ticks64(25396) (delay = Ticks64(19029))
val 11678
int
TestB: Alarm fired.
TestB@Ticks64(11759): Expected at Ticks64(11678) (diff = Ticks64(81)), setting alarm to Ticks64(35535) (delay = Ticks64(23783))
val 25396
int
TestC: Alarm fired.
TestC@Ticks64(25472): Expected at Ticks64(25396) (diff = Ticks64(76)), setting alarm to Ticks64(59825) (delay = Ticks64(34355))
val 35535
int
TestB: Alarm fired.
TestB@Ticks64(35616): Expected at Ticks64(35535) (diff = Ticks64(81)), setting alarm to Ticks64(74719) (delay = Ticks64(39109))
val 41172
int
TestA: Alarm fired.
TestA@Ticks64(41252): Expected at Ticks64(41172) (diff = Ticks64(80)), setting alarm to Ticks64(45476) (delay = Ticks64(4228))
val 45476
int
TestA: Alarm fired.
TestA@Ticks64(45552): Expected at Ticks64(45476) (diff = Ticks64(76)), setting alarm to Ticks64(65103) (delay = Ticks64(19554))
val 59825
int
TestC: Alarm fired.
TestC@Ticks64(59912): Expected at Ticks64(59825) (diff = Ticks64(87)), setting alarm to Ticks64(109592) (delay = Ticks64(49681))
val 65103
int
TestA: Alarm fired.
TestA@Ticks64(65170): Expected at Ticks64(65103) (diff = Ticks64(67)), setting alarm to Ticks64(100048) (delay = Ticks64(34880))
val 74719
int
TestB: Alarm fired.
TestB@Ticks64(74780): Expected at Ticks64(74719) (diff = Ticks64(61)), setting alarm to Ticks64(78829) (delay = Ticks64(4053))
val 78829
int
TestB: Alarm fired.
TestB@Ticks64(78892): Expected at Ticks64(78829) (diff = Ticks64(63)), setting alarm to Ticks64(98268) (delay = Ticks64(19379))
val 98268
int
TestB: Alarm fired.
TestB@Ticks64(98340): Expected at Ticks64(98268) (diff = Ticks64(72)), setting alarm to Ticks64(133043) (delay = Ticks64(34705))
val 100048
int
TestA: Alarm fired.
TestA@Ticks64(100125): Expected at Ticks64(100048) (diff = Ticks64(77)), setting alarm to Ticks64(150330) (delay = Ticks64(50206))

Although not sure I can check the box until #2116 is merged since without that the timers don't work.

bradjc
bradjc previously approved these changes Sep 21, 2020
Copy link
Contributor

@hudson-ayers hudson-ayers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @alistair23 I think the only thing blocking this now is the apollo3 test

@alistair23
Copy link
Contributor

It's currently working on Apollo3

@bradjc bradjc added the last-call Final review period for a pull request. label Sep 22, 2020
@bradjc
Copy link
Contributor

bradjc commented Sep 22, 2020

@jrvanwhy @alevy

@alexandruradovici
Copy link
Contributor

alexandruradovici commented Sep 23, 2020

This code still hangs up when using delay_ms(0) using stm32f4:

Never mind, I was using the old libtock-c, it seems to work with command_num 6

int main(void) {
  putnstr_async(hello, sizeof(hello), nop, NULL);
  delay_ms (0);
  putnstr_async(hello, sizeof(hello), nop, NULL);
  return 0;
}

@bradjc
Copy link
Contributor

bradjc commented Sep 23, 2020

bors r+

@bors
Copy link
Contributor

bors bot commented Sep 23, 2020

@bors bors bot merged commit 295157f into master Sep 23, 2020
@bors bors bot deleted the time-redesign-v3 branch September 23, 2020 21:49
@phil-levis
Copy link
Contributor Author

This code still hangs up when using delay_ms(0) using stm32f4:

Never mind, I was using the old libtock-c, it seems to work with command_num 6

int main(void) {
  putnstr_async(hello, sizeof(hello), nop, NULL);
  delay_ms (0);
  putnstr_async(hello, sizeof(hello), nop, NULL);
  return 0;
}

Yes, the next step is to upgrade libtock-c to use the new command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers. release-blocker Issue or PR that must be resolved before the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.