sched: Add a watchdog #1887

alistair23 · 2020-05-28T23:36:02Z

Pull Request Overview

As part of #1882 we started talking about safe waits (#1552).

I have a PR with some initial safe wait support (#1886) in that we can be interrupted while looping on a hardware condition. The idea here is that while entering low power mode we can skip the infinite loop if we get an interrupt.

#1886 doesn't address a large number of loops in the driver capsules as they don't have access to Chip so it's more difficult to wait on interrupts. The harder part is what to do when a wait "fails". What happens when returning to full power fails to occur in a specific time, what happens when a bus fails to respond in a reasonable amount of time? How do we gracefully handle this and continue running in some capacity.

This PR starts to implement a watchdog functionality. This is not instead of a safe wait (in lots of cases a safe wait would be more graceful) but is generally to help avoid infinite loops and help debugging. It is also something that has come up to help make Tock more reliable.

It is up to Chips and boards to handle the watchdog interrupt and decided how to act.

Currently we tickle the watch dog when iterating on each process and at the start of each kernel loop.

Testing Strategy

None

TODO or Help Wanted

Feedback

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make format.
Fixed errors surfaced by make clippy.

kernel/src/platform/mod.rs

bradjc · 2020-06-12T18:09:53Z

My thought would be to add a watchdog trait, inline with systick, mpu, and UKB. Then the kernel could entirely manage this, and if a chip wants to opt-out it could just not provide a watchdog implementation.

alistair23 · 2020-06-23T20:04:13Z

Pushed an update where it's not a general trait like the MPU or systick.

hudson-ayers · 2020-07-01T00:46:05Z

I feel that whether to include a watchdog is probably more of a board-specific decision than a chip specific one. For example, people maintaining out of tree boards probably should not be forced to also maintain an out-of-tree version of an otherwise in-tree chip just because the board maintainer has different thoughts on whether to use a watchdog. Maybe kernel_loop() could take in a boolean that indicates whether to use the chip watchdog or the unit version?

Also, I am interested to see an example of how watchdog interrupts would be handled.

kernel/src/platform/watchdog.rs

kernel/src/sched.rs

kernel/src/platform/watchdog.rs

ppannuto · 2020-07-06T13:25:27Z

It is up to Chips and boards to handle the watchdog interrupt and decided how to act.

I'm not quite sure I see in this interface how boards get to configure the watchdog. I imagine this is actually fairly important, as watchdogging is much more of a board-specific question that a chip-specific one.

I do think this is a pretty good interface for the chip part of the watchdog system.

alistair23 · 2020-07-07T19:42:15Z

That is a good point. Currently there is no hookup to boards. That is probably something that the Chips crate will have to expose to a board.

alistair23 · 2020-07-07T20:58:51Z

I have updated this so it passes the tests and remove the old watchdog HIL and replaced it with this one.

Untested on the SAM4L though

alistair23 · 2020-07-07T20:58:58Z

I have updated this so it passes the tests and remove the old watchdog HIL and replaced it with this one.

Untested on the SAM4L though

hudson-ayers

One spelling issue, but I don't think its blocking.

I do think that we really want this to ultimately be configurable by boards rather than chips, but I don't want to block on that either. Eventually there will need to be some thought about how the handling of the watchdog interrupt can be configured by the board, rather than the chip, which will probably require an additional function in the HIL (something like unsafe fn handle_interrupt(), which would be called within the chip interrupt handler but the contents of which could be chosen by the board)

kernel/src/platform/watchdog.rs

This PR starts to implement a watchdog functionality. This is not instead of a safe wait (in lots of cases a safe wait would be more graceful) but is generally to help avoid infinite loops and help debugging. It is also something that has come up to help make Tock more reliable. It is up to Chips (and in the future hopefully boards) to handle the watchdog interrupt and decided how to act. Currently we tickle the watch dog when iterating on each process and at the start of each kernel loop. Signed-off-by: Alistair Francis <[email protected]>

Signed-off-by: Alistair Francis <[email protected]>

alistair23 · 2020-07-15T15:04:01Z

I have fixed the spelling issue.

I agree about configuring from boards. There are a few places where we would like to do this (see #1998) so it's something that should happen eventually. We just need to figure out a good way to do it.

hudson-ayers · 2020-07-15T15:12:57Z

I have fixed the spelling issue.

I don't think you pushed the commit

alistair23 · 2020-07-15T15:21:52Z

I did, GitHub is just really slow.

bradjc · 2020-07-15T15:49:01Z

I think the tension here is that after the scheduler PR is merged a board will be able to select a cooperative scheduler, but not disable the watchdog (if the chip crate has enabled the watchdog). A watchdog timer is probably under the purview of the chip crate (I don't think that boards should have to decide what watchdog to use, and there probably is only one choice anyway). Perhaps this falls to the scheduler PR: just like the scheduler algorithm decides how to use the systick, maybe it should also decide how to use the watchdog.

Overall this PR looks good and moves things forward so we should merge it soon.

If we start using the watchdog in the future, it would be nice to add a function to the trait that allows the kernel to check if the last reset was due to the watchdog. This would help with debugging by allowing the kernel to somehow notify developers that the watchdog is triggering.

bradjc · 2020-07-16T14:17:27Z

@alistair23 @hudson-ayers How do you want to merge this? This before or after #1767?

hudson-ayers · 2020-07-16T14:20:20Z

I think it is fine for this to go first, I can rebase #1767 after.

bradjc · 2020-07-16T21:18:04Z

bors r+

bors · 2020-07-16T21:25:23Z

Build succeeded:

2118: stm32f3: Watchdog Timers r=bradjc a=krady21 ### Pull Request Overview This pull request adds both watchdogs supported by the stm32f3 boards. The difference between the two of them is thoroughly described [here](https://electronics.stackexchange.com/questions/123080/independent-watchdog-iwdg-or-window-watchdog-wwdg). TLDR: The independent watchdog is clocked by its own dedicated low-speed clock, but is not as precise, while the window watchdog is more precise, has a configurable time window that can be used to detect early or late abnormalities and can also generate an interrupt just before resetting. At this point, someone could configure the board to use one, both or neither of the two watchdogs. I don't know if there's need for both, but i wanted to consult with you first before deleting one or the other. I also tried to have this configuration done in the boards file, as it was discussed in a previous [pr](#1887 (comment)), but I am not entirely satisfied with how i did it. ### Testing Strategy This pull request was tested using the stm32f3discovery board. ### TODO or Help Wanted The main problem with both of the watchdogs is that none of them provides a way to suspend them once they are started, except by a full system reset. Since suspend and resume functions are unimplemented, sleeping in the kernel_loop [function](https://github.com/tock/tock/blob/ad9387a577405675b044d5bde85badf0274995c8/kernel/src/sched.rs#L495-L514) will probably end up causing a watchdog reset. ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make prepush`. Co-authored-by: Bogdan Grigoruta <[email protected]>

bradjc added the rfc Issue designed for discussion and to solicit feedback. label May 29, 2020

bradjc reviewed May 29, 2020

View reviewed changes

kernel/src/platform/mod.rs Outdated Show resolved Hide resolved

alistair23 force-pushed the alistair/watchdog branch from 8d6ee9f to 57557ec Compare June 23, 2020 20:03

alistair23 closed this Jun 23, 2020

alistair23 reopened this Jun 23, 2020

hudson-ayers requested changes Jul 1, 2020

View reviewed changes

kernel/src/platform/watchdog.rs Outdated Show resolved Hide resolved

ppannuto reviewed Jul 6, 2020

View reviewed changes

kernel/src/sched.rs Outdated Show resolved Hide resolved

kernel/src/platform/watchdog.rs Outdated Show resolved Hide resolved

kernel/src/platform/watchdog.rs Outdated Show resolved Hide resolved

alistair23 force-pushed the alistair/watchdog branch from 57557ec to 64fd1f9 Compare July 7, 2020 19:41

alistair23 force-pushed the alistair/watchdog branch from 64fd1f9 to f497b2b Compare July 7, 2020 20:58

alistair23 marked this pull request as ready for review July 7, 2020 20:58

hudson-ayers previously approved these changes Jul 15, 2020

View reviewed changes

kernel/src/platform/watchdog.rs Outdated Show resolved Hide resolved

alistair23 added 2 commits July 15, 2020 07:51

kernel/hil: Remove the watchdog HIL

f654c82

Signed-off-by: Alistair Francis <[email protected]>

alistair23 dismissed hudson-ayers’s stale review via f654c82 July 15, 2020 15:21

alistair23 force-pushed the alistair/watchdog branch from f497b2b to f654c82 Compare July 15, 2020 15:21

hudson-ayers previously approved these changes Jul 15, 2020

View reviewed changes

bradjc added the P-Significant This is a substancial change that requires review from all core developers. label Jul 15, 2020

bradjc previously approved these changes Jul 15, 2020

View reviewed changes

bradjc added the last-call Final review period for a pull request. label Jul 16, 2020

Merge branch 'master' into pr/1887

8486a76

bradjc dismissed stale reviews from hudson-ayers and themself via 8486a76 July 16, 2020 14:36

bradjc approved these changes Jul 16, 2020

View reviewed changes

hudson-ayers approved these changes Jul 16, 2020

View reviewed changes

bors bot merged commit 14fe53a into tock:master Jul 16, 2020

alistair23 deleted the alistair/watchdog branch July 16, 2020 23:13

hudson-ayers mentioned this pull request Jul 21, 2020

Scheduler trait + transition all boards to round robin scheduler #1767

Merged

2 tasks

krady21 mentioned this pull request Sep 21, 2020

stm32f3: Watchdog Timers #2118

Merged

2 tasks

Uh oh!

sched: Add a watchdog #1887

sched: Add a watchdog #1887

Uh oh!

Conversation

alistair23 commented May 28, 2020

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

Uh oh!

bradjc commented Jun 12, 2020

Uh oh!

alistair23 commented Jun 23, 2020

Uh oh!

hudson-ayers commented Jul 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ppannuto commented Jul 6, 2020

Uh oh!

alistair23 commented Jul 7, 2020

Uh oh!

alistair23 commented Jul 7, 2020

Uh oh!

alistair23 commented Jul 7, 2020

Uh oh!

hudson-ayers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alistair23 commented Jul 15, 2020

Uh oh!

hudson-ayers commented Jul 15, 2020

Uh oh!

alistair23 commented Jul 15, 2020

Uh oh!

bradjc commented Jul 15, 2020

Uh oh!

bradjc commented Jul 16, 2020

Uh oh!

hudson-ayers commented Jul 16, 2020

Uh oh!

bradjc commented Jul 16, 2020

Uh oh!

bors bot commented Jul 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants