Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@heshamelmatary
Copy link
Contributor

This is another attempt at addressing various discussions (with @lsf37, @Indanz, and @kent-mcleod) on the RFC and previous PRs. It's a stripped out version with the following:

  • Only hybrid kernel
  • Enables running unmodified userspace binaries, C, Rust, and hybrid and/or purecap CHERI, all side-by-side simultaneously.
  • Cuts the LoC changes and commits to less than half (compared to [RFC-15] Add experimental CHERI support (hybrid kernel) #1344) in order to ease the review and upstreaming process.
  • Only targets standard CHERI-RISC-V [1]. Further CHERI platforms/archs will be added later if this PR is accepted.
  • No CHERI caps are passed in the IPC buffer and all of its types are kept as is.
  • No CHERI caps are passed during system calls at all.
  • Adds 3 new system calls for CHERI: read/write CHERI registers, and a WriteCapMem to write a CHERI cap to a remote protection domain's memory. The kernel is the only thing that can construct valid CHERI capabilities iff passed valid TCB+VSPace seL4 caps.
  • This port has been tested with Microkit on Codasip's QEMU and hardware platforms (x730) [2].

Please note the current standard CHERI-RISC-V is undergoing ARC reviews before being ratified; some things may change. This is a draft PR to resurrect the discussions and to serve as a reference implementation for the RFC.

[1] http://github.com/riscv/riscv-cheri
[2] https://codasip.com/solutions/riscv-processor-safety-security/cheri/x730-risc-v-application-processor/

Copy link
Contributor

@Indanz Indanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall much better than the other PR. Main concern is the vptr_t change, that's probably better done explicitly like you did with rword_t, so we can see where it is actually needed.

Edit: Forgot to mention, but WriteCapMem should probably require a page cap and perhaps the caller's vspace cap too. The logic is that if you could map and write to the memory yourself, you're also allowed to write to it via this system call. Without something like this, WriteCapMem would grant access to memory a task is not supposed to have.

config.cmake Outdated
Comment on lines 3 to 4
# Copyright 2024, Capabilities Limited
# CHERI support contributed by Capabilities Limited was developed by Hesham Almatary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all of these except for the ones in newly added files, thanks.

If everyone who ever edits any files slightly would add their copyright to the file, things would explode and become very cumbersome to maintain, even for a small project like seL4. We have git history for these kind of details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those copyright lines are only intended for new files and those with non-trivial changes, not any edited files. Some are indeed mistakenly left in this PR from refactoring from previous PRs (hybrid, purecap, Morello, +10 CHERI platforms, etc), like this file where there was more non-trivial changes. I've done another iteration for this PR, and now only 13 files have them.

Comment on lines +72 to +71
#define LOAD lc
#define STORE sc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion yet on what's better to name differently, the full ones or the integer-only ones. But considering CHERI is the reason for this hassle, it might be clearer to swap them around. We'll see.

But while you're touching all these lines anyway, can't you get rid or shorten the horrible LOAD_S and STORE_S?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOAD/STORE are just intended to load GPRs+CHERI CSRs, depending on the underlying architecture. Integer loads/stores are just used in a couple of places when CHERI is enabled (e.g., to save sstatus), so I just followed the common case.

I'm fine to shorten the macros. Any suggestions? LD/ST, LOAD/STORE, LR/SR?

Comment on lines +99 to +102
#if defined(CONFIG_HAVE_CHERI)
field FSR 12
#else
field FSR 5
padding 7
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any downsides to having FSR always 12 bits? Or are the new bits at the bottom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to make it always 12 in the future, but I've been trying to guard CHERI-specific changes when I can and hide it from verification in this PR when building the kernel with CHERI disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, and I agree with that, that's the right attitude. But I was curious if this can be consolidated without downsides, to know what our options are. Then it could be done to reduce the difference and to avoid future pain. Verification changes for details like this should be very small.


#if defined(CONFIG_HAVE_CHERI)
typedef __uintcap_t rword_t;
typedef __uintcap_t vptr_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most vptr_t instances don't need to be CHERI pointers I think, as the kernel itself is still using normal pointers.

But I guess page tables still need to contain valid CHERI pointers even when disabled for kernel mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for vptr_t but it didn't hurt. The cases where vptr_t is needed to be a capability are only for:

  1. User's entry point for the root task
  2. User's IPC buffer pointer for the root task
  3. User's BootInfo pointer for the root task

We have two options for the above:

  1. Retype vptr_t to be __uintcap_t as I did in this PR. __uintcap_t is a type that's suggested to be used when the value could be either a capability pointer or an integer pointer, or just a normal integer.
  2. Change the existing types of the above 3 cases to rword_t (or even better, void *__capability, as they'll always need to hold capability pointers).

But I guess page tables still need to contain valid CHERI pointers even when disabled for kernel mode?
For the kernel, yes. But not sure I understand your question and how it's related to vptr_t?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for rword_t (or whatever the final name will be).

Using void* for pointers to a different address space than the kernel's is absolutely wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using void* for pointers (or pointers types, in general, instead of integer types for pointers like pptr_t/vptr_t etc) is IMO the right approach and better coding practice. It doesn't matter if it's a different address space or not here, CHERI protection isn't expected in the kernel nor all of the user-level security implications we discussed before; the kernel is trusted to never get hacked nor mis-use them nor de-reference, exactly the same as its current usage of vptr_t. But that's another discussion and I understand this will break verification if we try to change all current pointer types that use integer types (word_t) to use actual C pointer types (type *), as I did in the purecap kernel.

I'd use void *__user (that you recommended before) for any capability pointer held in the kernel AND is going to be exported to the user, but NEVER will be de-referenced by the kernel. This currently includes the above 3 cases I mentioned, and for the new system calls that construct pointer capabilities on behalf of the user.

Anyway, I tried to change the above 3 use cases to use rword_t and/or void *__user, but it's bit disruptive as those are passed down to other functions as well, so I'll have to make unnecessary more changes touching more files and code which I've been trying to avoid.

exception_t handle_SysCheriWriteRegister(cap_t tcb_cap, word_t *ipc_buffer)
{
cap_t vRootCap;
void *__capability constructed_cap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use rword_t instead of user space pointers in kernel space...

But I'll postpone detailed review for later, this is clearly a quick proof of concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule of thumbs I follow (and is suggested by the CHERI programming guide [1]) is to use rword_t/uintcap_t for things that may contain either capabilities or integers, and use void *__capability for things that will always need to be capability pointers. Also all of the builtin CHERI macros used expect void *__capability, using rword_t will incur more unnecessary casting.

[1] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are talking about pointers that are valid within the same address space, not cross address space pointers you are dealing with here!

And you made wrapper functions anyway, they can do any casting necessary. Now you have all these senseless casts to void* by the callers, that's stupid.

Comment on lines +261 to +271
asm volatile(
#if defined(CONFIG_HAVE_CHERI)
"modesw.cap \n"
".option push \n"
".option capmode \n"
#endif
"csrr %0, " SSCRATCH "\n"
#if defined(CONFIG_HAVE_CHERI)
".option pop \n"
"modesw.int \n"
#endif
: "="ASM_REG_CONSTR(temp));
Copy link
Contributor

@midnightveil midnightveil May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've abstracted the inline assembly with these SSCRATCH and ASM_REG_CONSTR, and with the reg() macros too.

But then all the assembly gets ifdef'd anyway for the mode switches and various things.

Why not just use the appropriate names directly? You're already duplicating most of the assembly.. And removes the indirection through the macros.

(Maybe other people disagree here, but I don't really see the point. If you didn't have the ifdef cheri in all the places you use assembly then they'd serve a point but it's already different everywhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Less code duplication; you have > 32 LOADs/STOREs in assembly. Not all assembly is ifdef'd, most of it isn't actually.
  2. Ease of future maintenance and less breaking. e.g., when in the future someone needs to change this save/restore assembly, they won't have to keep maintaining two separate ifdef blocks; one for CHERI, and one for non-CHERI, especially if they don't know much about CHERI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But both at the price of much less readable assembly, so I agree with @midnightveil. And hiding it behind macro's makes people unaware they are modifying multiple versions at once, so chances are higher they accidentally break something for CHERI. In addition to normal RISC-V assembly, they also need to know CHERI specific macros. So the burden is much greater with this mess than a bit of very straightforward code duplication. And I say that as someone who in general hates code duplication.

@heshamelmatary
Copy link
Contributor Author

@Indanz

Overall much better than the other PR. Main concern is the vptr_t change, that's probably better done explicitly like you did with rword_t, so we can see where it is actually needed.

Great to know! vptr_t can change, no issue, it's just an implementation discussion. But generally speaking, it'd be great if you can comment whether this addresses the higher-level design concerns/blockers you had before on the RFC.

Edit: Forgot to mention, but WriteCapMem should probably require a page cap and perhaps the caller's vspace cap too. The logic is that if you could map and write to the memory yourself, you're also allowed to write to it via this system call. Without something like this, WriteCapMem would grant access to memory a task is not supposed to have.

Agreed, I think I also suggested to do that; passing a page capability for this system call before as well. I'll experiment with it.

@heshamelmatary
Copy link
Contributor Author

heshamelmatary commented May 30, 2025 via email

@heshamelmatary
Copy link
Contributor Author

@Indanz

Overall much better than the other PR. Main concern is the vptr_t change, that's probably better done explicitly like you did with rword_t, so we can see where it is actually needed.

Great to know! vptr_t can change, no issue, it's just an implementation discussion. But generally speaking, it'd be great if you can comment whether this addresses the higher-level design concerns/blockers you had before on the RFC.

Edit: Forgot to mention, but WriteCapMem should probably require a page cap and perhaps the caller's vspace cap too. The logic is that if you could map and write to the memory yourself, you're also allowed to write to it via this system call. Without something like this, WriteCapMem would grant access to memory a task is not supposed to have.

Agreed, I think I also suggested to do that; passing a page capability for this system call before as well. I'll experiment with it.

This is now implemented, just passing an extra page cap intended to write a capability to.

@heshamelmatary
Copy link
Contributor Author

The preprocess failure looks trivial. It'd be interesting to run the verification tests and see if it fails at all.

This change adds a new rword_t type for variables that may hold CHERI
capabilities at any point in the kernel.

rword_t: register word type newly added to *always* hold
capability-width variables in CHERI mode (e.g., for hybrid/purecap
user-space register context).
- In non-CHERI mode, this is just an integer and corresponds to
unsigned long (word_t).
- In any CHERI mode,  this is a capability-width type and
corresponds to CHERI's __uintcap_t.

word_t: is used by seL4 as some type that can hold anything (eg
unsigned long, and is the most widely used for mixed use cases such as
pointers, integers, registers, etc. It's left as it is to only
represent "integers".

vptr_t: conventionally only holds user pointers. In order to support
both hybrid and purecap CHERI userspace, any user pointers held in
vptr_t need to be capabilities, hence this type is changed to
__uintcap_t when in CHERI-mode.

Signed-off-by: Hesham Almatary <[email protected]>
HW registers have another format and size when CHERI is enabled. This
needs to be able to hold full CHERI HW registers all the time when CHERI
is enabled.

Signed-off-by: Hesham Almatary <[email protected]>
This commit adds core files to manipulate CHERI capabilities and
does a basic port to architecture-independent code to build and run
the kernel in hybrid mode. It also adds shared system support in shared
CMake and C files for supported CHERI architectures.

In hybrid mode, any pointer created by the kernel and passed to user,
needs to be CHERI capabilities for purecap user; this includes the
IPC buffer, hence it needs to be manually annotated as a CHERI
capability (with the __capability keyword).

__user annotation is added and is defined to a __capability. It is
suggested for any pointer capability that holds a user address, __user
is used for better reading/practise. The kernel never de-references
those. This is a bit similar to the current kernel's usage of vptr_t
where it holds integer pointers to the user but the kernel never
de-references them.

When building with a CHERI toolchain in CHERI mode, the compiler defines
__has_feature(capabilities). The "__has_feature" macro does not exist
in some old GCC toolchains, so a macro is defined to 0 here just for
userspace backward compatability (e.g., when compiling an seL4 user
program with an old non-CHERI toolchain on a CHERI-enabled seL4 kernel).

Signed-off-by: Hesham Almatary <[email protected]>
@heshamelmatary heshamelmatary force-pushed the std-cheri-riscv branch 2 times, most recently from 893c635 to 7e097df Compare June 11, 2025 16:07
This is an architectural port of the standard CHERI-RISC-V [1], from
RISC-V International. The kernel is hybrid and minimal, but it enables
running the following userspace:
1- Unmodified binaries
2- Unmodified C or other projects (e.g., Rust)
3- Hybrid CHERI (eg that use void *__capability)
4- Purecap CHERI, for complete spatial memory safety

This port has been tested with Microkit on Codasip's QEMU and hardware
platforms (x730) [2].

[1] https://github.com/riscv/riscv-cheri
[2] https://codasip.com/solutions/riscv-processor-safety-security/cheri

Signed-off-by: Hesham Almatary <[email protected]>
This commit adds 3 new system calls if CHERI is enabled:

1- CheriWriteRegister: To construct/write a TCB's CHERI HW register from
decomposed CHERI capability fields passed from the user as integer
arguments.
2- CheriReadRegister: To read a TCB's CHERI HW register and return it to
the user as decomposed integer fields representing CHERI capability
fields.
3- CheriWriteMemoryCap: To construct/write a CHERI capability to
a TCB/VSpace (e.g., Microkit protection domain) from decomposed CHERI
capability fields passed from the user as integer arguments.

Rules:
- Only the kernel can construct valid CHERI caps
- No tagged CHERI caps are passed via syscall args, IPC buffer, or
syscall ret.
- Valid tagged CHERI caps are constructed only in the following
conditions:
  1- If the user passes *BOTH* valid TCB and VSpace seL4 caps to these
     system calls. This authorises the caller to construct a new tagged
     CHERI cap from the kernel's RootCheriCap.
  2- For CheriWriteRegister, if the src/dest register index is tagged
     *and* unsealed, and no valid VSpace cap is provided. The kernel
     will try to derive a CHERI cap off the destination CHERI HW reg.
     The requested CHERI cap must not violate CHERI rules or increase
     permissions or bounds of the destination CHERI register, otherwise
     an untagged cap will be written.

CHERI-aware root task and servers (e.g., Microkit's monitor) must use
these system calls if CHERI is enabled in cases like:
- Creating a new thread and writing its entry point, stack, DDC, etc.
- Passing code, data, rodata CHERI caps to a newly created thread.
- Setting up a new thread's stack that may contain valid CHERI caps. For
example, a POSIX server setting up argv[], auxv[], etc.
- Setting up and writing ELF symbols that contain valid pointers. For
instance, Microkit's memory regions (that have map setvar), Microkit
protection domains' IPC buffer pointer address, etc.

For more details and design discussions, see [1].

[1] http://github.com/seL4/rfcs/pull/21

Signed-off-by: Hesham Almatary <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants