-
-
Notifications
You must be signed in to change notification settings - Fork 779
Implement UserspaceKernelBoundary version 2
#1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
UserlandKernelBoundary version 2UserspaceKernelBoundary version 2
|
This is the relevant new interface: Lines 52 to 146 in 3e1f738
|
The initial version of the `UserlandKernelBoundary` trait abstracted details about context switching on cortex-m platforms out of process.rs, however, the trait was very much designed around the structure of how cortex-m handles switching between privilege modes. After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make `UserlandKernelBoundary` more generic, this commit makes two general changes to the trait. 1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic. 2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having `pop_stack_frame()` at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield(). What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code. This commit also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.
For portability reasons the syscall interface has been updated, and this commit implements it for cortex-m. The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to. In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substancial so who knows. The other change is we now set state.psr and state.yield_pc after every syscall, and not just effectively after yield(). I don't think this has any effect.
|
Tested on hail, multiple apps (hail, console, hello_loop, crash_dummy) seem to work fine. |
|
I think it's a bit weird to include I did leave the (tested with the basics: blink/c_hello/hello_loop; worked fine) Edit: 👍 to everything else |
Well, we use it in the RISC-V implementation as well as there is a architecture-independent difference between the starting an app for the first time and resuming an app because only in the latter case does it have any state on the stack or in registers. However, I think only the kernel knows this, and as such needs to pass it to the UKB implementation. What happens in your implementation if an app is restarted? |
|
Hmm, so, my thinking here is definitely influenced by having just read the HotOS fork paper, but I'm wondering if the idea of (literally) restarting a process isn't a good one. It means that we would have to implement code that is able to correctly undo everything a process has ever done to its kernel state, which feels complicated and hard to get right. I think it architecturally makes more sense to implement "Restart" behavior by completely tearing down the old process and then creating a new one, rather than trying to re-use anything. Now, all of this is lives on the enormous caveat that currently we have don't have a way of destroying processes, but they are dynamically allocated at least, so at least half of the hard work is already done. Towards the mission of restart-able processes, I think we should focus effort on implementing tear-down, rather than thinking about code to "restart" an existing process. To the question then of Conceptually I think this is a bit nicer as it moves chip-specific process creation to part of the process creation logic rather than having it co-mingled with regular execution. A side concern with (both) interfaces as-presented is the assumption that the only initialization stuff that will be necessary is stack related. Maybe that's sufficient. I played with a variation that passed a reference to the |
I like this interface, and I agree it's better to be explicit about the functionality.
That would make that feature easier to maintain as
Well that function gets |
Add an explicit note about the hazard of funciton injection, namely that it will override any kernel return value (as the injected function will run first, and its return value will overwrite any kernel return value that has been set). In practice, this means that process function injection is only safe to call in response to syscalls without return values (i.e. only `yield` in the current syscall interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go for me at this point, but should have a few more eyes on it before hitting the button
|
We didn't get to this in the call I don't think, but maybe it's time to move ahead with this? We have the release testing to help find if there are any issues, and the API changes are pretty internal and unlikely to affect code outside of this repo. |
|
I agree @bradjc. If you want to wait for me to look through the code tomorrow. I'm also happy to put a timer on it, where if I don't comment in the next day or so, it counts as approving. |
|
bors r+ |
1318: Implement `UserspaceKernelBoundary` version 2 r=bradjc a=bradjc
### Pull Request Overview
This pull request is the answer to the question we had a while ago of "Are we confident in the `UserlandKernelBoundary` trait if we only support one architecture?" with the answer of "no".
After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make `UserlandKernelBoundary` more generic, this PR makes two general changes to the trait.
1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic.
2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having `pop_stack_frame()` at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield().
What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code.
Other changes:
- This PR also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.
- If UKB returns that the app called a syscall, then it must provide a valid syscall. If it cannot create a valid syscall (perhaps because the app is buggy) then it must return that the process faulted. Before, I believe, we just ignored invalid syscalls and allowed the app to proceed after effectively not doing anything. I think this is an improvement, but perhaps the old behavior was intentional.
- This adds a new process state type called `Unstarted`. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the _first_ time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one.
This PR also includes a patch to the cortex-m arch to support the new version of UKB.
The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to.
In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows.
The other change is we now set `state.psr` and `state.yield_pc` after every syscall, and not just effectively after yield(). I don't think this has any effect.
### Testing Strategy
This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed....
### TODO or Help Wanted
We can probably wait to merge this until after context switching is fully working for RISC-V. I _think_ these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features.
### Documentation Updated
- [ ] Updated the relevant files in `/docs`, or no updates are required.
### Formatting
- [x] Ran `make formatall`.
Co-authored-by: Brad Campbell <[email protected]>
Co-authored-by: Pat Pannuto <[email protected]>
Build succeeded |
1323: RISC-V: Add context switching r=bradjc a=bradjc ### Pull Request Overview This pull request implements the `UserlandKernelBoundary` trait for RISC-V platforms. This PR is towards tracking issue #1135. This depends on #1318 so those commits have been included here. That PR should be merged first, and then I can update this one. While I pulled things together for the PR, @sv2bb did much of this. ### Testing Strategy This pull request was tested by running blink and c_hello on the the arty-e21 FPGA based board. ### TODO or Help Wanted See inline comments. ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <[email protected]>
1531: Fix counting the number of syscalls a process has called. r=ppannuto a=bradjc This accidentally got removed when refactoring how context switches are handled. Regression likely from #1318. ### Testing Strategy I tested this by running the list command with the process console. ### TODO or Help Wanted n/a ### Documentation Updated - [x] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Brad Campbell <[email protected]>
Pull Request Overview
This pull request is the answer to the question we had a while ago of "Are we confident in the
UserlandKernelBoundarytrait if we only support one architecture?" with the answer of "no".After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make
UserlandKernelBoundarymore generic, this PR makes two general changes to the trait.Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic.
The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having
pop_stack_frame()at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield().What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code.
Other changes:
Unstarted. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the first time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one.This PR also includes a patch to the cortex-m arch to support the new version of UKB.
The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to.
In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows.
The other change is we now set
state.psrandstate.yield_pcafter every syscall, and not just effectively after yield(). I don't think this has any effect.Testing Strategy
This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed....
TODO or Help Wanted
We can probably wait to merge this until after context switching is fully working for RISC-V. I think these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features.
Documentation Updated
/docs, or no updates are required.Formatting
make formatall.