Implement `UserspaceKernelBoundary` version 2 #1318

bradjc · 2019-06-04T19:07:48Z

Pull Request Overview

This pull request is the answer to the question we had a while ago of "Are we confident in the UserlandKernelBoundary trait if we only support one architecture?" with the answer of "no".

After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make UserlandKernelBoundary more generic, this PR makes two general changes to the trait.

Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic.
The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having pop_stack_frame() at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield().

What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code.

Other changes:

This PR also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.
If UKB returns that the app called a syscall, then it must provide a valid syscall. If it cannot create a valid syscall (perhaps because the app is buggy) then it must return that the process faulted. Before, I believe, we just ignored invalid syscalls and allowed the app to proceed after effectively not doing anything. I think this is an improvement, but perhaps the old behavior was intentional.
This adds a new process state type called Unstarted. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the first time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one.

This PR also includes a patch to the cortex-m arch to support the new version of UKB.

The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to.

In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows.

The other change is we now set state.psr and state.yield_pc after every syscall, and not just effectively after yield(). I don't think this has any effect.

Testing Strategy

This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed....

TODO or Help Wanted

We can probably wait to merge this until after context switching is fully working for RISC-V. I think these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features.

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make formatall.

bradjc · 2019-06-11T15:22:12Z

This is the relevant new interface:

tock/kernel/src/syscall.rs

Lines 52 to 146 in 3e1f738

    
           /// Why the process stopped executing and execution returned to the kernel. 
        
           #[derive(PartialEq)] 
        
           pub enum ContextSwitchReason { 
        
               /// Process called a syscall. Also returns the syscall and relevant values. 
        
               SyscallFired { syscall: Syscall }, 
        
               /// Process triggered the hardfault handler. 
        
               Fault, 
        
               /// Process exceeded its timeslice. 
        
               TimesliceExpired, 
        
               /// Process interrupted (e.g. by a hardware event) 
        
               Interrupted, 
        
           } 
        
           /// This trait must be implemented by the architecture of the chip Tock is 
        
           /// running on. It allows the kernel to manage switching to and from processes 
        
           /// in an architecture-agnostic manner. 
        
           pub trait UserspaceKernelBoundary { 
        
               /// Some architecture-specific struct containing per-process state that must 
        
               /// be kept while the process is not running. For example, for keeping CPU 
        
               /// registers that aren't stored on the stack. 
        
               type StoredState: Default + Copy; 
        
               /// Set the return value the process should see when it begins executing 
        
               /// again after the syscall. This will only be called after a process has 
        
               /// called a syscall. 
        
               /// 
        
               /// To help implementations, both the current stack pointer of the process 
        
               /// and the saved state for the process are provided. The `return_value` is 
        
               /// the value that should be passed to the process so that when it resumes 
        
               /// executing it knows the return value of the syscall it called. 
        
               unsafe fn set_syscall_return_value( 
        
                   &self, 
        
                   stack_pointer: *const usize, 
        
                   state: &mut Self::StoredState, 
        
                   return_value: isize, 
        
               ); 
        
               /// Set the function that the process should execute when it is resumed. 
        
               /// This has two major uses: 1) sets up the initial function call to 
        
               /// `_start` when the process is started for the very first time; 2) tells 
        
               /// the process to execute a callback function after calling `yield()`. 
        
               /// 
        
               /// ### Arguments 
        
               /// 
        
               /// - `stack_pointer` is the address of the stack pointer for the current 
        
               ///   app. 
        
               /// - `remaining_stack_memory` is the number of bytes below the 
        
               ///   `stack_pointer` that is allocated for the process. This value is 
        
               ///   checked by the implementer to ensure that there is room for this stack 
        
               ///   frame without overflowing the stack. 
        
               /// - `state` is the stored state for this process. 
        
               /// - `callback` is the function that should be executed when the process 
        
               ///   resumes. 
        
               /// - `first_function` is true if this is the first time this process is 
        
               ///   being run. This allows a `UserspaceKernelBoundary` implementation to 
        
               ///   assume there are no stack frames on the process's stack. 
        
               /// 
        
               /// ### Return 
        
               /// 
        
               /// Returns `Ok` or `Err` with the current address of the stack pointer for 
        
               /// the process. One reason for returning `Err` is that adding the function 
        
               /// call requires adding to the stack, and there is insufficient room on the 
        
               /// stack to add the function call. 
        
               unsafe fn set_process_function( 
        
                   &self, 
        
                   stack_pointer: *const usize, 
        
                   remaining_stack_memory: usize, 
        
                   state: &mut Self::StoredState, 
        
                   callback: process::FunctionCall, 
        
                   first_function: bool, 
        
               ) -> Result<*mut usize, *mut usize>; 
        
               /// Context switch to a specific process. 
        
               /// 
        
               /// This returns a tuple: 
        
               /// - The new stack pointer address of the process. 
        
               /// - Why the process stopped executing and switched back to the kernel. 
        
               unsafe fn switch_to_process( 
        
                   &self, 
        
                   stack_pointer: *const usize, 
        
                   state: &mut Self::StoredState, 
        
               ) -> (*mut usize, ContextSwitchReason); 
        
               /// Display any general information about the fault. 
        
               unsafe fn fault_fmt(&self, writer: &mut Write); 
        
               /// Display architecture specific (e.g. CPU registers or status flags) data 
        
               /// for a process identified by its stack pointer. 
        
               unsafe fn process_detail_fmt( 
        
                   &self, 
        
                   stack_pointer: *const usize, 
        
                   state: &Self::StoredState, 
        
                   writer: &mut Write, 
        
               ); 
        
           }

The initial version of the `UserlandKernelBoundary` trait abstracted details about context switching on cortex-m platforms out of process.rs, however, the trait was very much designed around the structure of how cortex-m handles switching between privilege modes. After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make `UserlandKernelBoundary` more generic, this commit makes two general changes to the trait. 1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic. 2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having `pop_stack_frame()` at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield(). What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code. This commit also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.

For portability reasons the syscall interface has been updated, and this commit implements it for cortex-m. The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to. In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substancial so who knows. The other change is we now set state.psr and state.yield_pc after every syscall, and not just effectively after yield(). I don't think this has any effect.

bradjc · 2019-06-28T19:55:33Z

Tested on hail, multiple apps (hail, console, hello_loop, crash_dummy) seem to work fine.

ppannuto · 2019-07-03T22:49:23Z

I think it's a bit weird to include first_function in the UKB interface, when really it's a Cortex-M specific thing to need to know that. What do you think of moving that information in to the Cortex-M state instead: https://github.com/tock/tock/compare/syscall-v2...remove_first_function?expand=1 ?

I did leave the Unstarted state, as I could see that being useful for debugging, and is conceptually different that Yielded, but wouldn't be opposed to removing it either

(tested with the basics: blink/c_hello/hello_loop; worked fine)

Edit: 👍 to everything else

bradjc · 2019-07-09T18:45:28Z

I think it's a bit weird to include first_function in the UKB interface, when really it's a Cortex-M specific thing to need to know that.

Well, we use it in the RISC-V implementation as well as there is a architecture-independent difference between the starting an app for the first time and resuming an app because only in the latter case does it have any state on the stack or in registers. However, I think only the kernel knows this, and as such needs to pass it to the UKB implementation. What happens in your implementation if an app is restarted?

ppannuto · 2019-07-11T01:13:37Z

Hmm, so, my thinking here is definitely influenced by having just read the HotOS fork paper, but I'm wondering if the idea of (literally) restarting a process isn't a good one. It means that we would have to implement code that is able to correctly undo everything a process has ever done to its kernel state, which feels complicated and hard to get right.

I think it architecturally makes more sense to implement "Restart" behavior by completely tearing down the old process and then creating a new one, rather than trying to re-use anything.

Now, all of this is lives on the enormous caveat that currently we have don't have a way of destroying processes, but they are dynamically allocated at least, so at least half of the hard work is already done. Towards the mission of restart-able processes, I think we should focus effort on implementing tear-down, rather than thinking about code to "restart" an existing process.

To the question then of first_function, I think what we're really saying here is that conceptually there are chip/architecture-specific things that need to be part of process creation. What do you think of this interface, which says that more explicitly: https://github.com/tock/tock/compare/syscall-v2...syscall-new-process-interface?expand=1

Conceptually I think this is a bit nicer as it moves chip-specific process creation to part of the process creation logic rather than having it co-mingled with regular execution.

A side concern with (both) interfaces as-presented is the assumption that the only initialization stuff that will be necessary is stack related. Maybe that's sufficient. I played with a variation that passed a reference to the Process object itself, but in that case initialize_new_process can't do anything interesting because all of the Process fields are private. Probably an issue for v3 whenever something proves it needs access to more.

bradjc · 2019-07-11T20:37:46Z

To the question then of first_function, I think what we're really saying here is that conceptually there are chip/architecture-specific things that need to be part of process creation. What do you think of this interface, which says that more explicitly: https://github.com/tock/tock/compare/syscall-v2...syscall-new-process-interface?expand=1

I like this interface, and I agree it's better to be explicit about the functionality.

I think it architecturally makes more sense to implement "Restart" behavior by completely tearing down the old process and then creating a new one, rather than trying to re-use anything.

That would make that feature easier to maintain as Process changes. It does, however, incur a cost on each restart because everything has to be re-scanned and re-calculated. Additionally, I think it will make keeping per-process kernel state (e.g. how many times the process has been restarted) clumsy. I wonder if there isn't some hybrid approach, where all of the per-execution state can be encapsulated and then cleanly erased when the process crashes/ends, but all of the per-process state is preserved.

A side concern with (both) interfaces as-presented is the assumption that the only initialization stuff that will be necessary is stack related. ... Probably an issue for v3 whenever something proves it needs access to more.

Well that function gets StoredState as well, so it has access to the registers too. But in general I agree we can revise that interface if some other architecture shows it to be necessary in the future.

Add an explicit note about the hazard of funciton injection, namely that it will override any kernel return value (as the injected function will run first, and its return value will overwrite any kernel return value that has been set). In practice, this means that process function injection is only safe to call in response to syscalls without return values (i.e. only `yield` in the current syscall interface).

ppannuto

This is good to go for me at this point, but should have a few more eyes on it before hitting the button

bradjc · 2019-07-20T20:13:01Z

We didn't get to this in the call I don't think, but maybe it's time to move ahead with this? We have the release testing to help find if there are any issues, and the API changes are pretty internal and unlikely to affect code outside of this repo.

alevy · 2019-07-20T22:28:34Z

I agree @bradjc. If you want to wait for me to look through the code tomorrow. I'm also happy to put a timer on it, where if I don't comment in the next day or so, it counts as approving.

bradjc · 2019-07-22T11:27:10Z

bors r+

1318: Implement `UserspaceKernelBoundary` version 2 r=bradjc a=bradjc ### Pull Request Overview This pull request is the answer to the question we had a while ago of "Are we confident in the `UserlandKernelBoundary` trait if we only support one architecture?" with the answer of "no". After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make `UserlandKernelBoundary` more generic, this PR makes two general changes to the trait. 1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic. 2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having `pop_stack_frame()` at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield(). What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code. Other changes: - This PR also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code. - If UKB returns that the app called a syscall, then it must provide a valid syscall. If it cannot create a valid syscall (perhaps because the app is buggy) then it must return that the process faulted. Before, I believe, we just ignored invalid syscalls and allowed the app to proceed after effectively not doing anything. I think this is an improvement, but perhaps the old behavior was intentional. - This adds a new process state type called `Unstarted`. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the _first_ time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one. This PR also includes a patch to the cortex-m arch to support the new version of UKB. The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to. In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows. The other change is we now set `state.psr` and `state.yield_pc` after every syscall, and not just effectively after yield(). I don't think this has any effect. ### Testing Strategy This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed.... ### TODO or Help Wanted We can probably wait to merge this until after context switching is fully working for RISC-V. I _think_ these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features. ### Documentation Updated - [ ] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <[email protected]> Co-authored-by: Pat Pannuto <[email protected]>

bors · 2019-07-22T11:39:59Z

Build succeeded

continuous-integration/travis-ci/push

@sv2bb

1323: RISC-V: Add context switching r=bradjc a=bradjc ### Pull Request Overview This pull request implements the `UserlandKernelBoundary` trait for RISC-V platforms. This PR is towards tracking issue #1135. This depends on #1318 so those commits have been included here. That PR should be merged first, and then I can update this one. While I pulled things together for the PR, @sv2bb did much of this. ### Testing Strategy This pull request was tested by running blink and c_hello on the the arty-e21 FPGA based board. ### TODO or Help Wanted See inline comments. ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <[email protected]>

1531: Fix counting the number of syscalls a process has called. r=ppannuto a=bradjc This accidentally got removed when refactoring how context switches are handled. Regression likely from #1318. ### Testing Strategy I tested this by running the list command with the process console. ### TODO or Help Wanted n/a ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <[email protected]>

bradjc added kernel P-Significant This is a substancial change that requires review from all core developers. labels Jun 4, 2019

bradjc changed the title ~~Implement UserlandKernelBoundary version 2~~ Implement UserspaceKernelBoundary version 2 Jun 4, 2019

bradjc added 2 commits June 11, 2019 14:50

bradjc force-pushed the syscall-v2 branch from 3e1f738 to 1f5902d Compare June 11, 2019 18:50

e310: update to new UKB interface

42c48a5

bradjc force-pushed the syscall-v2 branch from 1f5902d to 42c48a5 Compare June 11, 2019 19:16

bradjc mentioned this pull request Jun 11, 2019

RISC-V: Add context switching #1323

Merged

2 tasks

ppannuto added 2 commits July 11, 2019 14:05

syscall: make process init an explicit interface

a6b10a0

ppannuto force-pushed the syscall-v2 branch from 483faea to c0b27f2 Compare July 11, 2019 21:50

ppannuto approved these changes Jul 11, 2019

View reviewed changes

alevy self-assigned this Jul 12, 2019

bors bot merged commit c0b27f2 into master Jul 22, 2019

bors bot deleted the syscall-v2 branch July 22, 2019 11:40

This was referenced Jan 7, 2020

Restarting Apps no longer works #1524

Closed

Fix restarting apps #1529

Merged

bradjc mentioned this pull request Jan 10, 2020

Fix counting the number of syscalls a process has called. #1531

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement `UserspaceKernelBoundary` version 2 #1318

Implement `UserspaceKernelBoundary` version 2 #1318

Uh oh!

bradjc commented Jun 4, 2019

Uh oh!

bradjc commented Jun 11, 2019

Uh oh!

bradjc commented Jun 28, 2019

Uh oh!

ppannuto commented Jul 3, 2019 •

edited

Loading

Uh oh!

bradjc commented Jul 9, 2019

Uh oh!

ppannuto commented Jul 11, 2019 •

edited

Loading

Uh oh!

bradjc commented Jul 11, 2019

Uh oh!

ppannuto left a comment

Uh oh!

bradjc commented Jul 20, 2019

Uh oh!

alevy commented Jul 20, 2019

Uh oh!

bradjc commented Jul 22, 2019

Uh oh!

bors bot commented Jul 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Implement UserspaceKernelBoundary version 2 #1318

Implement UserspaceKernelBoundary version 2 #1318

Uh oh!

Conversation

bradjc commented Jun 4, 2019

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

bradjc commented Jun 11, 2019

Uh oh!

bradjc commented Jun 28, 2019

Uh oh!

ppannuto commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Jul 9, 2019

Uh oh!

ppannuto commented Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradjc commented Jul 11, 2019

Uh oh!

ppannuto left a comment

Choose a reason for hiding this comment

Uh oh!

bradjc commented Jul 20, 2019

Uh oh!

alevy commented Jul 20, 2019

Uh oh!

bradjc commented Jul 22, 2019

Uh oh!

bors bot commented Jul 22, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement `UserspaceKernelBoundary` version 2 #1318

Implement `UserspaceKernelBoundary` version 2 #1318

ppannuto commented Jul 3, 2019 •

edited

Loading

ppannuto commented Jul 11, 2019 •

edited

Loading