Time redesign v3 #2089

phil-levis · 2020-09-01T04:29:03Z

Pull Request Overview

This pull request updates the time HIL to address a series of bug reports with the previous API (#1651, #1691, #1513). It also incorporates proposed changes by @gendx to generalize the width of counters/alarms/timers with an associated type rather than assume 32 bits (#1521).

This has been implemented on all of the chips. It has been tested for the 24-bit nRF52 series, the 32-bit SAM4L, and 64-bit OpenTitan.

The overall design and summary of the traits is described in

https://github.com/tock/tock/blob/time-redesign-v3/doc/reference/trd-time.md

We will update this document and give it a TRD number when ready to merge.

There is also an update to the system call API: a new command for Alarm passes both a reference time and a dt. This new API can be used by using the timer_v3_updates branch of libtock-c.

Testing Strategy

This pull request was tested by compiling and testing on nrf52, SAM4L (imix), and OpenTitan (FPGA) boards. For imix and OT, it was tested using the multi_alarm_test and multi_timer_test tests in the kernel. On imix, it was tested in userspace by running a pair of multi_alarm_test processes.

I was not able to test the userspace alarm driver on OpenTitan -- after struggling to get libtock-rs applications to run and librtock-c ones to compile I gave up. This is an important test because the capsule is 32 bits, and tries to automatically handle an underlying 64-bit Alarm.

TODO or Help Wanted

This pull request needs userspace testing on OT (to test that 64-to-32 conversion works correctly for the userspace API). This PR updates the mtimer implementation to seed it with a value close to a 32-bit overflow. So you do not have to run the test very long. Any userspace application that uses an alarm should be a good test.

This pull request needs kernel testing on

To test, you need run a multi_alarm_test. I've added a multi_alarm_test for each board and modified each board's main.rs to invoke it. Double-check you see a call to multi_alarm_test::run_multi_alarm(mux_alarm).

This test starts 3 alarms (A, B, C). The dt of these alarms is random, with one in 11 alarms (randomly) having a dt of 0. A typical output of the test looks something like this (this is from OpenTitan):

TestA@Ticks64(17033607736): Expected at Ticks64(17033607729) (diff = Ticks64(7)), setting alarm to Ticks64(17033616266) (delay = Ticks64(8537))
TestB: Alarm fired.
TestB@Ticks64(17033614398): Expected at Ticks64(17033614391) (diff = Ticks64(7)), setting alarm to Ticks64(17033626851) (delay = Ticks64(12462))
TestC: Alarm fired.
TestC@Ticks64(17033614581): Expected at Ticks64(17033614576) (diff = Ticks64(5)), setting alarm to Ticks64(17033618165) (delay = Ticks64(3592))
TestA: Alarm fired.
TestA@Ticks64(17033616273): Expected at Ticks64(17033616266) (diff = Ticks64(7)), setting alarm to Ticks64(17033629481) (delay = Ticks64(13214))
TestC: Alarm fired.
TestC@Ticks64(17033618172): Expected at Ticks64(17033618165) (diff = Ticks64(7)), setting alarm to Ticks64(17033626435) (delay = Ticks64(8270))

The delay value is the dt set for the next invocation of this Alarm. The diff value is the number of ticks between the desired firing time and a call to now in the firing. Note that this value is large (e.g., 7 ticks above!) mostly because of these print statements: formatting the numbers takes significant cycles at these timescales.

The three things to look for to make sure the test is running properly are:

All 3 Alarms are firing (one has not been lost or dropped or otherwise miscalculated)
The diff values are always positive
The diff of alarm firings after a delay of 0 are not excessively high (they will be higher than non-zero delays)

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

Co-Authored-By: Niklas Adolfsson <[email protected]>

…esign-v3

precludes low power operation.

and Tick values.

…me-redesign-v3

tests when alarms are significantly delayed. It submits alarms with a "now" (the 'reference' parameter) in the past, such that some alarms are in the future and others have already passed.

implementation.

…e time HIL.

chips/apollo3/src/stimer.rs

bradjc · 2020-09-18T18:29:12Z

My latest output from hifive1b hardware after some fixes:

ATE0-->ATE0
OK
AT+BLEINIT=0-->OK
AT+CWMODE=0-->OK

HiFive1 initialization complete.
Entering main loop.
Starting multi alarm test.
Starting random alarm test TestA.
now Ticks64(1938) dt Ticks64(39284) val 41183
TestA@Ticks64(1905): Expected at Ticks64(0) (diff = Ticks64(1905)), setting alarm to Ticks64(41183) (delay = Ticks64(39284))
Starting random alarm test TestB.
now Ticks64(2430) dt Ticks64(0) val 2430
TestB@Ticks64(2415): Expected at Ticks64(0) (diff = Ticks64(2415)), setting alarm to Ticks64(2415) (delay = Ticks64(0))
Starting random alarm test TestC.
now Ticks64(2675) dt Ticks64(3703) val 6359
TestC@Ticks64(2660): Expected at Ticks64(0) (diff = Ticks64(2660)), setting alarm to Ticks64(6359) (delay = Ticks64(3703))
here??
TestC: Alarm fired.
TestC@Ticks64(6445): Expected at Ticks64(6359) (diff = Ticks64(86)), setting alarm to Ticks64(25471) (delay = Ticks64(19029))
TestB: Alarm fired.
TestB@Ticks64(6544): Expected at Ticks64(2415) (diff = Ticks64(4129)), setting alarm to Ticks64(6544) (delay = Ticks64(0))
TestA: Alarm fired.
TestA@Ticks64(6641): Expected at Ticks64(41183) (diff = Ticks64(18446744073709517074)), setting alarm to Ticks64(10865) (delay = Ticks64(4228))
now Ticks64(6795) dt Ticks64(0) val 6795
here??
TestC: Alarm fired.
TestC@Ticks64(6882): Expected at Ticks64(25471) (diff = Ticks64(18446744073709533027)), setting alarm to Ticks64(41235) (delay = Ticks64(34355))
TestB: Alarm fired.
TestB@Ticks64(7001): Expected at Ticks64(6544) (diff = Ticks64(457)), setting alarm to Ticks64(7001) (delay = Ticks64(0))
TestA: Alarm fired.
TestA@Ticks64(7088): Expected at Ticks64(10865) (diff = Ticks64(18446744073709547839)), set
*** DEBUG BUFFER FULL ***

The "now" lines are when set_alarm() is called in machine_timer.rs, and the "here??" lines are when handle_interrupt() is called in machine_timer.rs.

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

phil-levis · 2020-09-19T06:16:39Z

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

This is super helpful, thank you. I will look tomorrow.

phil-levis · 2020-09-19T15:55:29Z

It does still look like the virtualizer is calling set_alarm for C even though B should be earlier. When the interrupt comes through for C, it looks like the virtualizer does detect that both C and B should be fired.

I'm guessing that the machine timer does fire for B right away, but the interrupt is not handled because timer C's alarm is set before the scheduler gets around to handling interrupts.

Whatever is going on is more insidious than just this. It also looks like whenever an interrupt fires, all three Alarms fire even if some of them are in the future (hence the huge diff values, which represent small negative numbers). These point at a common problem, incorrectly calculating if an Alarm X has expired at time T.

I recreated these cases on imix (ordering of Alarms) and it works fine. I am wondering if this has something to do with Ticks64 and if its inequality tests are wrong; I am going to test on OpenTitan.

when there are long running computations in the kernel that start alarms. The first bug related to deciding if a new alarm was sooner than the current earliest alarm. The old logic checked if the current earliest expiration didn't fall within the new alarm's interval. The bug here is what happens if the current earliest was set earlier in this execution of the kernel loop (e.g., at boot, setting up the alarm test) but by the time the newer alarm is started it has already expired. So, |----| earliest |----| new in the old logic the new alarm would be seen as earlier. The issue is not being within the interval of the newer alarm means the existing is either earlier or later; there's now a test to check that it's later (by looking at now). The second bug related to deciding if an alarm had expired. The logic was using the alarm compare value as "now" rather than the real now. This ran into similar to problems above: if there was a long computation path in the kernel, such that the hardware alarm had expired before a later alarm was entered, then this later alarm would not have the expiration time within its interval. This use of a virtual "now" based on expiration time was intended to handle some edge cases which the rest of the logic handle correctly, so I moved this back to using the real "now" rather than a virtual "now" from the past.

quickly.

phil-levis · 2020-09-19T19:03:25Z

I think that the bugs related to the speed at which the RISCV clocks were going in comparison to their CPU speed. The fact that the tests start in the boot sequence exacerbates this, as there's a long-running computation. The key problem was this:

Alarm B    |--------|
Alarm C                   |------------------|

Alarm B is very short, and the time until AlarmC is long enough that its reference (start point) is outside the interval of B. The core logic in the virtualizer assumed this wouldn't happen. If it saw that the current earliest expiration fell outside the interval of an alarm A, it assumed it mean that alarm A had expired. I added logic to handle the case that considers A may be in the future and not just the past. Please try now. @bradjc

kernel/src/hil/time.rs

bradjc · 2020-09-21T02:21:29Z

Ok looks good on hifive1b:

ATE0-->ATE0
OK
AT+BLEINIT=0-->OK
AT+CWMODE=0-->OK

HiFive1 initialization complete.
Entering main loop.
Starting multi alarm test.
Starting random alarm test TestA.
val 41172
TestA@Ticks64(1894): Expected at Ticks64(0) (diff = Ticks64(1894)), setting alarm to Ticks64(41172) (delay = Ticks64(39284))
Starting random alarm test TestB.
val 2356
TestB@Ticks64(2332): Expected at Ticks64(0) (diff = Ticks64(2332)), setting alarm to Ticks64(2332) (delay = Ticks64(0))
Starting random alarm test TestC.
TestC@Ticks64(2595): Expected at Ticks64(0) (diff = Ticks64(2595)), setting alarm to Ticks64(6294) (delay = Ticks64(3703))
int
TestB: Alarm fired.
TestB@Ticks64(2812): Expected at Ticks64(2332) (diff = Ticks64(480)), setting alarm to Ticks64(2812) (delay = Ticks64(0))
val 2903
int
TestB: Alarm fired.
TestB@Ticks64(3037): Expected at Ticks64(2812) (diff = Ticks64(225)), setting alarm to Ticks64(3037) (delay = Ticks64(0))
val 3143
int
TestB: Alarm fired.
TestB@Ticks64(3229): Expected at Ticks64(3
*** DEBUG BUFFER FULL ***
int
TestC: Alarm fired.
TestC@Ticks64(6370): Expected at Ticks64(6294) (diff = Ticks64(76)), setting alarm to Ticks64(25396) (delay = Ticks64(19029))
val 11678
int
TestB: Alarm fired.
TestB@Ticks64(11759): Expected at Ticks64(11678) (diff = Ticks64(81)), setting alarm to Ticks64(35535) (delay = Ticks64(23783))
val 25396
int
TestC: Alarm fired.
TestC@Ticks64(25472): Expected at Ticks64(25396) (diff = Ticks64(76)), setting alarm to Ticks64(59825) (delay = Ticks64(34355))
val 35535
int
TestB: Alarm fired.
TestB@Ticks64(35616): Expected at Ticks64(35535) (diff = Ticks64(81)), setting alarm to Ticks64(74719) (delay = Ticks64(39109))
val 41172
int
TestA: Alarm fired.
TestA@Ticks64(41252): Expected at Ticks64(41172) (diff = Ticks64(80)), setting alarm to Ticks64(45476) (delay = Ticks64(4228))
val 45476
int
TestA: Alarm fired.
TestA@Ticks64(45552): Expected at Ticks64(45476) (diff = Ticks64(76)), setting alarm to Ticks64(65103) (delay = Ticks64(19554))
val 59825
int
TestC: Alarm fired.
TestC@Ticks64(59912): Expected at Ticks64(59825) (diff = Ticks64(87)), setting alarm to Ticks64(109592) (delay = Ticks64(49681))
val 65103
int
TestA: Alarm fired.
TestA@Ticks64(65170): Expected at Ticks64(65103) (diff = Ticks64(67)), setting alarm to Ticks64(100048) (delay = Ticks64(34880))
val 74719
int
TestB: Alarm fired.
TestB@Ticks64(74780): Expected at Ticks64(74719) (diff = Ticks64(61)), setting alarm to Ticks64(78829) (delay = Ticks64(4053))
val 78829
int
TestB: Alarm fired.
TestB@Ticks64(78892): Expected at Ticks64(78829) (diff = Ticks64(63)), setting alarm to Ticks64(98268) (delay = Ticks64(19379))
val 98268
int
TestB: Alarm fired.
TestB@Ticks64(98340): Expected at Ticks64(98268) (diff = Ticks64(72)), setting alarm to Ticks64(133043) (delay = Ticks64(34705))
val 100048
int
TestA: Alarm fired.
TestA@Ticks64(100125): Expected at Ticks64(100048) (diff = Ticks64(77)), setting alarm to Ticks64(150330) (delay = Ticks64(50206))

Although not sure I can check the box until #2116 is merged since without that the timers don't work.

hudson-ayers

cc @alistair23 I think the only thing blocking this now is the apollo3 test

alistair23 · 2020-09-22T18:45:50Z

It's currently working on Apollo3

resolved

bradjc · 2020-09-22T19:40:34Z

@jrvanwhy @alevy

alexandruradovici · 2020-09-23T07:38:29Z

~~This code still hangs up when using delay_ms(0) using stm32f4:~~

Never mind, I was using the old libtock-c, it seems to work with command_num 6

int main(void) {
  putnstr_async(hello, sizeof(hello), nop, NULL);
  delay_ms (0);
  putnstr_async(hello, sizeof(hello), nop, NULL);
  return 0;
}

bradjc · 2020-09-23T21:41:15Z

bors r+

bors · 2020-09-23T21:49:27Z

Build succeeded:

phil-levis · 2020-09-24T15:48:30Z

~~This code still hangs up when using delay_ms(0) using stm32f4:~~

Never mind, I was using the old libtock-c, it seems to work with command_num 6
int main(void) {
  putnstr_async(hello, sizeof(hello), nop, NULL);
  delay_ms (0);
  putnstr_async(hello, sizeof(hello), nop, NULL);
  return 0;
}

Yes, the next step is to upgrade libtock-c to use the new command.

phil-levis and others added 30 commits March 6, 2020 13:23

Size tool.

6b8393e

Update tools/print_tock_memory_usage.py

283d904

Co-Authored-By: Niklas Adolfsson <[email protected]>

Update tools/print_tock_memory_usage.py

8cd5a0b

Co-Authored-By: Niklas Adolfsson <[email protected]>

Factor out objdump executable to a global.

1fc8f80

Fix indentation.

0ae2820

Transition to llvm-objdump.

477834d

Completed up to Alarm (Section 4).

3b301e7

Skeleton of end.

1cbdc20

Rename function to within_range.

0e640db

Fleshing out chip requirements.

91dd621

Merge branch 'master' of github.com:tock/tock into time-redesign-v3

c35e6ca

Merge branch 'time-redesign-v3' of github.com:tock/tock into time-red…

47cc9ce

…esign-v3

Time update. Don't require a Time with 1MHz Frequency because this

a84da93

precludes low power operation.

Stated there are capsules for transforming between different Frequency

325a452

and Tick values.

Formatting.

651c44a

Update Ticks::within_range API.

281c7b1

Merge branch 'master' of github.com:tock/tock into time-redesign-v3

2362879

New time HIL.

517201d

Merge branch 'time-redesign-v3' of github.com:phil-levis/tock into ti…

72257bb

…me-redesign-v3

Incomplete -- imix not compiling yet.

ac07b53

AST on SAM4L tested. Added a new test, alarm_edge_cases.rs, which

3807082

tests when alarms are significantly delayed. It submits alarms with a "now" (the 'reference' parameter) in the past, such that some alarms are in the future and others have already passed.

Update time TRD to reflect refinements to traits after

82ea9f5

implementation.

First set of virtualizer tests passed.

51edc2b

Merge remote-tracking branch 'upstream/master' into time-redesign-v3

b10a3b5

make format

656c9e0

Fix compilation of boards/imix. Didn't test on hardware.

c69b955

Move implementation of ticks_from_seconds and related functions to th…

5d12046

…e time HIL.

Implement new time HIL for nrf5x/rtc.

8ebfd62

Implement the new time HIL for nrf5x/timer.

bfccfff

Build all nrf5x boards.

f9f2833

alistair23 previously requested changes Sep 18, 2020

View reviewed changes

chips/apollo3/src/stimer.rs Show resolved Hide resolved

phil-levis added 2 commits September 19, 2020 11:47

Initialzie to 0xFFFF_00000 to make sure we always test wraparound

ea1cc7d

quickly.

lschuermann requested changes Sep 19, 2020

View reviewed changes

kernel/src/hil/time.rs Show resolved Hide resolved

phil-levis added 3 commits September 19, 2020 16:13

Formatting.

d10c776

Start the counter.

01ccedb

Formatting.

f201707

bradjc previously approved these changes Sep 21, 2020

View reviewed changes

phil-levis added 3 commits September 21, 2020 10:46

Improve comments.

62ae8e8

Merge branch 'master' of github.com:tock/tock into time-redesign-v3

75bb0a2

Formatting.

21c6dad

phil-levis dismissed bradjc’s stale review via 21c6dad September 21, 2020 18:26

hudson-ayers approved these changes Sep 22, 2020

View reviewed changes

bradjc approved these changes Sep 22, 2020

View reviewed changes

bradjc added the last-call Final review period for a pull request. label Sep 22, 2020

bors bot merged commit 295157f into master Sep 23, 2020

bors bot deleted the time-redesign-v3 branch September 23, 2020 21:49

phil-levis mentioned this pull request Dec 17, 2020

set timer_at to use relativce tics tock/libtock-c#75

Closed

Uh oh!

Time redesign v3 #2089

Time redesign v3 #2089

Uh oh!

Conversation

phil-levis commented Sep 1, 2020 • edited by hudson-ayers Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

Uh oh!

bradjc commented Sep 18, 2020

Uh oh!

phil-levis commented Sep 19, 2020

Uh oh!

phil-levis commented Sep 19, 2020

Uh oh!

phil-levis commented Sep 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bradjc commented Sep 21, 2020

Uh oh!

hudson-ayers left a comment

Choose a reason for hiding this comment

Uh oh!

alistair23 commented Sep 22, 2020

Uh oh!

bradjc commented Sep 22, 2020

Uh oh!

alexandruradovici commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Sep 23, 2020

Uh oh!

bors bot commented Sep 23, 2020

Uh oh!

phil-levis commented Sep 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

phil-levis commented Sep 1, 2020 •

edited by hudson-ayers

Loading

phil-levis commented Sep 19, 2020 •

edited

Loading

alexandruradovici commented Sep 23, 2020 •

edited

Loading