Add Hardware Continuous Integration (HWCI) Framework for Writing Tests for Treadmill #4205

charles37 · 2024-10-16T17:56:11Z

Pull Request Overview

This pull request adds a new Hardware Continuous Integration (HWCI) system to the Tock OS project. It includes:

A new hwci folder with Python scripts for testing Tock OS on hardware.
Changes to the GitHub Actions runner configuration to support hardware testing.

The HWCI system provides a framework for running tests on actual hardware, including:

Board abstractions for the nRF52 Development Kit and a mock board for testing
A test harness for running various types of tests
Utility classes for serial communication
Example tests, including a "Hello World" test

Testing Strategy

This pull request was tested by:

Running the example "Hello World" test on both the mock board and the nRF52 Development Kit.
Verifying the GitHub Actions runner changes by running a test job on a hardware-equipped runner.
Conducting unit tests for individual components of the HWCI system.

In The Future

More comprehensive tests for different Tock OS features including GPIO tests.
Integration with a wider range of hardware platforms.
Documentation on how to add new tests and board support.

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

Additional Notes

The HWCI system is designed to be extensible. It's aim is to make it easier for contributors to add support for new boards and create new types of tests.

Co-authored-by: Leon Schuermann <[email protected]>

lschuermann · 2024-10-18T17:00:44Z

FWIW, here's a successful workflow run with these scripts: https://github.com/treadmill-tb/tock/actions/runs/11350804975/job/31570037800

In case you can't view the logs, I posted the test-script ones to this gist: https://gist.github.com/lschuermann/174977a08a89adc419acf667b77bacb1

Co-authored-by: charles37 <[email protected]>

bradjc · 2024-10-18T23:27:17Z

I would like to see this in a separate repository, for a couple reasons. First, the python code framework doesn't need to be reviewed like kernel code. Second, these tests need more than just the kernel to run. Third, these tests really shouldn't be changing with other kernel patches.

Other than that it's look good to me.

alevy · 2024-10-20T19:22:46Z

I agree with @bradjc. @charles37 we can discuss but can we close this PR and instead create a new repo for this in the tock organization?

charles37 · 2024-10-20T23:53:27Z

I agree with @bradjc. @charles37 we can discuss but can we close this PR and instead create a new repo for this in the tock organization?

Yes, we can close this PR and create a separate repository for this. Thanks for the feedback!

lschuermann · 2024-10-21T13:11:35Z

@charles37 I created https://github.com/tock/tock-hardware-ci for this purpose, alongside @tock/hardware-ci-team, and given that team "maintain"-permissions on that repository. You should have received an invite to that team.

I like the idea of moving this to a different repository. We can use GitHub's reusable workflows to define a common, shared workflow in that repository and then import it in every repository that we want to run integration tests on. There's some guidance on how to import workflows from another repository here:
https://docs.github.com/en/actions/sharing-automations/reusing-workflows

We need to make sure that the workflow is parameterizable over which components should be fetched from their latest HEAD (aka., for kernel tests, that should be libtock-c, and which components we should take a local checkout from for the exact revision to test). The current code does not do this.

bradjc · 2024-10-21T15:22:17Z

Do these tests block PRs from being merged?

Is there a way we can have some HWCI tests required and others not? I can imagine we would be able to detect which syscalls the userspace test uses, and maintain a list of stabilized syscall numbers, and if the test is subset then the test is required to pass. But for other tests we will be playing chicken and egg if the kernel PR changes the syscall interface for a non-stabilized syscall.

lschuermann · 2024-10-21T16:15:48Z

Do these tests block PRs from being merged?

Currently we do not. However, we arguably should.

Is there a way we can have some HWCI tests required and others not?

Yes. We can have this infrastructure work as follows:

A test-prepare phase will analyze changes and come up with a test strategy, in more or less the following form:

boards:
  nrf52840dk_1:
    tests:
      multi_alarm_test:
        optional: false # fail the CI workflow if this test fails
     ble_passive_scanning:
       optional: true # simply issue a warning if this test fails

This part is yet to be built. Currently, we generate a static such mapping, for a single nRF52840DK board.

We generate one GitHub actions test-execute job per such board (for BOARD in $BOARDS), which will execute the list of tests assigned to that board.

We cannot dynamically create "steps" (foldable sections) in the job's output, or tolerate failures of individual tests.
Thus we can create two steps:
- Required tests, for which the job fails on any non-zero return code: for TEST in $REQ_TESTS; do $TEST; done
- Optional tests, where failures are ignored: for TEST in $REQ_TESTS; do $TEST || echo "Oh no, to bad!"; done

However, I question the utility of optional tests if they just issue a warning, except for the case you mentioned (where, temporarily, kernel & userspace go out of sync). I don't think we'd actively look for warnings in CI runs.

Instead, I think that we should always mark all tests as required by default, and then manually make a PR to https://github.com/tock/tock-hardware-ci to temporarily mark a test as optional, and reenable it when both sides are in sync again. Would that alleviate your concerns? @bradjc

bradjc · 2024-10-21T20:20:54Z

However, I question the utility of optional tests if they just issue a warning, except for the case you mentioned (where, temporarily, kernel & userspace go out of sync). I don't think we'd actively look for warnings in CI runs.

Yeah that's a good point. For most PRs we want all checks, and it should be well understood which PRs we expect to break checks.

Instead, I think that we should always mark all tests as required by default, and then manually make a PR to https://github.com/tock/tock-hardware-ci to temporarily mark a test as optional, and reenable it when both sides are in sync again. Would that alleviate your concerns? @bradjc

This seems complicated, but this is a complicated issue. I don't think we want to block progress on thinking about this at this point. It does seem like we are going to want to version all SyscallDrivers so the test suite knows when things are expected to be broken, or something like that, eventually. But for now we can just work around this challenge.

charles37 · 2024-10-23T18:08:38Z

I've implemented the requested changes, and it is now ready for merge. The reusable workflow logic has been tested in tock/tock-hardware-ci, however the only way to test this workflow for the tock/tock specific environement is to have this PR enter the merge queue so it goes through the repository specific environment.

bradjc · 2024-10-23T18:49:11Z

How does this compare to an upstream action like what we use for the labeler? https://github.com/tock/tock/blob/master/.github/workflows/labeler.yml

If this is effectively the same and if some day we publish the action and its an easy change to use it then this is great.

If it's different, then why don't we leave the full workflow in this repo? The tool (and python code) would be separate, still.

lschuermann · 2024-10-23T18:57:54Z

If it's different, then why don't we leave the full workflow in this repo? The tool (and python code) would be separate, still.

I don't understand the first part of the question, but the workflow and the scripts are very much tied together. For instance, the workflow will want to run some analysis steps based on the Python files, then schedule jobs, and then invoke these scripts.

By having this split out into a reusable workflow and putting it next to the Python code, we make sure that those things are always in sync. Note that this workflow "shim" in this repository can still parameterize the invocation of the underlying workflow.

Also, this deduplicates code we'd need to maintain in all the other repositories that we'd add this workflow to, such as libtock-c and libtock-rs.

It's different from "importing" an action: it won't run as a single step, but really insert this workflow into the parent one, and run it in the context as if it was located in this repository. This mechanism is only there for code deduplication. Here's how this will look like:

bradjc · 2024-10-23T19:03:16Z

In a hypothetical future will we be able to have a workflow in Tock similar to this?

jobs:
  triage:
    permissions:
      contents: read  # for actions/labeler to determine modified files
      pull-requests: write  # for actions/labeler to add labels to PRs
    runs-on: ubuntu-latest
    steps:
    - uses: actions/[email protected]
      with:
        repo-token: "${{ secrets.GITHUB_TOKEN }}"

The key being uses: actions/[email protected]. If directly including a workflow file is a good analog to that without the commitment of upstreaming our github action then I support this. If directly including a workflow file is only a workaround for my earlier comment and not something we could ever do differently then I'm not so sure.

lschuermann · 2024-10-23T19:03:45Z

however the only way to test this workflow for the tock/tock specific environement is to have this PR enter the merge queue so it goes through the repository specific environment

I'll note that this restriction does not have anything to do with our use of reusable workflows, but just because we can't yet run Treadmill jobs on PRs from untrusted remote repositories or untrusted branches without potentially leaking secrets. This will be addressed in the future.

lschuermann · 2024-10-23T19:07:01Z

If directly including a workflow file is a good analog to that without the commitment of upstreaming our github action then I support this.

Yes, I think that including a reusable workflow is, modulo some UX differences, essentially analog to importing a published action, without needing to actually publish anything. The benefits of this being that we don't need to modify Tock (or any other repositories) for changes internal to the testing infrastructure.

In a hypothetical future will we be able to have a workflow in Tock similar to this?

Isn't this effectively what this PR is proposing right now?

We will probably not get rid of the base workflow in the tock-hardware-ci repository, because that is Tock-specific (but not specific to the kernel repository). We will also not want to use a published action, as that would mean that all the test output is collapsed into one "step" in that workflow execution.

In an ideal future, the structure we'd be aiming for is:

tock/tock: imports tock/tock-hardware-ci (with some parameters)
tock/libtock-c: imports tock/tock-hardware-ci (with some parameters)
tock/tock-hardware-ci: uses some published Treadmill action, and includes Tock-specific test setup & harnesses

bradjc

I think the summary is what we are trying to do does not map to github actions and is instead really a collection of actions and some other configuration. It seems like this approach is better construed as workflow that is used in different repos, which is what this PR does.

Perhaps someday we would create individual actions and make this more modular, but that isn't feasible right now and shouldn't be a blocker.

There is enough configurability for our use cases right now.

github-actions bot assigned alevy Oct 17, 2024

charles37 force-pushed the dev/tock-hardware-ci branch 3 times, most recently from 28bed42 to 57f511a Compare October 18, 2024 02:54

treadmill-ci: add Python-based hardware test framework

83e832c

Co-authored-by: Leon Schuermann <[email protected]>

charles37 force-pushed the dev/tock-hardware-ci branch from 57f511a to 83e832c Compare October 18, 2024 04:33

tools/hwci: move test helpers infrastructure to utils module

7d26342

Co-authored-by: charles37 <[email protected]>

charles37 added 3 commits October 23, 2024 14:01

tock-hardware-ci: switch workflow to external tock/tock-hardware-ci repo

560edd7

tock-hardware-ci: revert license checker changes

fd7a41c

tock-hardware-ci: restore license header

2dd64e7

lschuermann approved these changes Oct 23, 2024

View reviewed changes

bradjc approved these changes Oct 23, 2024

View reviewed changes

alevy approved these changes Oct 25, 2024

View reviewed changes

alevy added this pull request to the merge queue Oct 25, 2024

Merged via the queue into tock:master with commit 264421b Oct 25, 2024
12 checks passed

alevy deleted the dev/tock-hardware-ci branch October 25, 2024 19:02

Uh oh!

Add Hardware Continuous Integration (HWCI) Framework for Writing Tests for Treadmill #4205

Add Hardware Continuous Integration (HWCI) Framework for Writing Tests for Treadmill #4205

Uh oh!

Conversation

charles37 commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Overview

Testing Strategy

In The Future

Documentation Updated

Formatting

Additional Notes

Uh oh!

lschuermann commented Oct 18, 2024

Uh oh!

bradjc commented Oct 18, 2024

Uh oh!

alevy commented Oct 20, 2024

Uh oh!

charles37 commented Oct 20, 2024

Uh oh!

lschuermann commented Oct 21, 2024

Uh oh!

bradjc commented Oct 21, 2024

Uh oh!

lschuermann commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Oct 21, 2024

Uh oh!

charles37 commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Oct 23, 2024

Uh oh!

lschuermann commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Oct 23, 2024

Uh oh!

lschuermann commented Oct 23, 2024

Uh oh!

lschuermann commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

charles37 commented Oct 16, 2024 •

edited

Loading

lschuermann commented Oct 21, 2024 •

edited

Loading

charles37 commented Oct 23, 2024 •

edited

Loading

lschuermann commented Oct 23, 2024 •

edited

Loading

lschuermann commented Oct 23, 2024 •

edited

Loading