Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lschuermann
Copy link
Member

@lschuermann lschuermann commented Jun 14, 2020

Pull Request Overview

This pull request fixes #1914 by checking whether the memory an AppSlice points to still belongs to the same app instance (more precisely, the instance field in AppSlice).

Thanks to @alevy for implementing the main part of this which turned out to be working perfectly. The commits on top make sure that the returned AppSlice length is consistent with the AsRef and AsMut implementations and document this new behavior.

Testing Strategy

This pull request was tested by developing a crude capsule which deliberately keeps an AppSlice out of a grant and does not accept a second allow. This ensures that the AppSlice is shared from the first app instance, but is used in the second app instance.

The test capsule (integrated with the nRF52840) along the userspace libtock_c app can be found here.

The output validates that indeed the AppSlice returns a length of 0 and hands out a immutable / mutable slice of length 0.

Initialization complete. Entering main loop
NRF52 HW INFO: Variant: AAC0, Part: N52840, Package: QI, Ram: K256, Flash: K1024

--> AppSlice safety test app!
Allow buffer of length 64
AppSlice shared by app!
Tell capsule to print AppSlice info
Shared slice reports length 64
Shared slice as_ref length 64
Shared slice as_mut length 64

<fault app by button press>

--> AppSlice safety test app!
Allow buffer of length 64
AppSlice already shared, refusing to replace
Tell capsule to print AppSlice info
Shared slice reports length 0
Shared slice as_ref length 0
Shared slice as_mut length 0

TODO

We should probably run the release test suite along with the DebugProcessRestart capsule to make sure this doesn't break any existing capsules. I can verify this once this gets "yes, we indeed want this" feedback. 😄

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

    I believe there not to be a section in /docs which needs to elaborate on this behavior. I hope the generated rustdoc is sufficient. If indeed an update in /docs is required, please provide pointers to the respective section.

Formatting

  • Ran make prepush.

alevy
alevy previously approved these changes Jun 14, 2020
Copy link
Member

@alevy alevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is a big bonus!

Will probably have to re-approve after fixing whatever is making the qemu ci test break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch!

Copy link
Member Author

@lschuermann lschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't request changes on my own PR so please view this as a change request. 😆
Edit: fixed.

@lschuermann
Copy link
Member Author

Given the potentially high impact of this, I'll try to run the release test suite today. I'm not expecting anything to fail, but this way we'd be on the safe side.

@lschuermann
Copy link
Member Author

lschuermann commented Jun 17, 2020

[WIP] tests on nRF52840DK (and where failed Hail)

(Explanations shamelessly stolen from @ppannuto's nRF52840DK testing of Release 1.5)

I've changed the kernel to restart all apps in case of a fault and integrated the DebugProcessRestart bound to button 1 on the board. I'm expecting the apps behave exactly the same after the manually triggered fault, as that would mean that all shared AppSlices are handled as expected. This may not uncover all potential issues but should give a good impression of the impact of this.

  • All Boards
    • examples/sensors
    • examples/services/ble-env-sense and examples/services/ble-env-sense/test-with-sensors
      Previous experienced kernel restart was due to side effects of my test modification, likely binding to a button already in use.
      Panics on both nRF52840DK and Hail, but also on unmodified latest master, so unrelated.
    • examples/c_hello and examples/tests/printf_long
    • examples/tests/console_recv_short and examples/tests/console_recv_long
      [LONG] Error doing UART receive: -2
      kernel process console is responsive and identical behavior on faulted app
    • examples/blink
    • examples/rot_client and examples/rot_service
      Having both client and service on Hail simply does nothing.
      • With only client: No rot13 service
      • Flashing the service makes my board unusable. Have to flash all zeros over kernel and app memory to revive it. Hangs in a loop, trying to print the initialization message:
        Ini\x374Ini\x374Ini\x374Ini\x374 etc.
        Using tockloader 4.0.0 for flashing kernel & managing apps.
    • examples/blink and examples/c_hello and examples/buttons
      After fault:
      No available GPIOTE interrupt channels
      Hello World!
      Though unlikely to be related to the AppSlice issue.
    • examples/lua-hello
    • examples/tests/console_timeout
    • examples/tests/malloc_test01
    • examples/tests/stack_size_test01
    • examples/tests/stack_size_test02
    • examples/tests/mpu_stack_growth
      Continuous loop (restart policy):
      stack: 0x200043e0 - buffer: 0x200043e0 - at_least: 0x   0
      write to 0x200043e0
      read from 0x200043e0
      value 33
      
    • examples/tests/mpu_walk_region
    • examples/tests/multi_alarm_test
    • examples/tests/adc
    • examples/tests/adc_continuous
    • examples/tutorials/05_ipc/led and examples/tutorials/05_ipc/rng and examples/tutorials/05_ipc/logic
    • examples/tests/gpio with mode set to 0
  • nRF specific
    • examples/ble_advertising
    • examples/ble_passive_scanning

[preliminary] conclusion

Although I haven't conducted all tests yet, I'm fairly confident that this doesn't break anything fundamental. I can continue the tests, but don't want to block this any further.

Any previously seen weird behavior is either reproducable on latest master (meaning I'll continue to investigate and potentially open other PRs/issues) or could be explained by integrating DebugProcessRestart incorrectly.

@bradjc bradjc mentioned this pull request Jun 23, 2020
2 tasks
@bradjc bradjc added last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers. labels Jun 24, 2020
@bradjc
Copy link
Contributor

bradjc commented Jun 25, 2020

bors r+

@bors
Copy link
Contributor

bors bot commented Jun 25, 2020

@bors bors bot merged commit 079f44a into tock:master Jun 25, 2020
bors bot added a commit that referenced this pull request Jul 10, 2020
2003: kernel: grant: do not pass T::default() r=alevy a=bradjc

This changes how grant allocation is structured in grant.rs so that the
allocation function does not take a copy of the data to be written when
creating a new grant. This data can be large if the grant region is
large, and it can cause a stack overflow when the allocate function is
called.

The change is pretty simple, but I'm not sure if there is some subtle reason to not do it this way.

Fixes the IPC stack overflow issue found in #1933. Replaces #1976.


### Testing Strategy

Running the hail app on hail.


### TODO or Help Wanted

n/a


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make prepush`.


Co-authored-by: Brad Campbell <[email protected]>
@lschuermann lschuermann deleted the dev/safe-appslice branch December 17, 2020 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AppSlices are not bounded to process instance lifetime

4 participants