Goal: Create, schedule, and manage processes. Load and run ELF binaries. Milestone: Can
clone(),exec()a static ELF binary,wait4()for it. Scheduler time-slices. Estimated effort: 6-10 weeks (~1,900 lines) Prerequisites: Epic 1 complete (heap works, page mapping works, VMAs tracked)
T2.1 (process struct + PID + wait queue)
├── T2.2 (kernel stack + context switch asm)
│ └── T2.3 (kernel threads + scheduler)
│ └── T2.4 (timer preemption)
├── T2.5 (SYSCALL/SYSRET MSRs + dispatch table) [parallel with T2.2-T2.4]
├── T2.6 (ELF parser + loader) [parallel with T2.2-T2.4]
│ └── T2.7 (user-mode entry + user pointer helpers)
│ └── T2.8 (clone + fork)
│ └── T2.9 (execve)
│ └── T2.10 (exit + wait4 + process groups)
Define the core process data structure, a PID allocator, and a generic wait queue primitive used throughout the kernel.
- INTERFACES.md
Processtrait,Pidtype - DECISIONS.md D12 (WaitQueue design), D13 (lock ordering)
kernel/src/sched/mod.rskernel/src/sched/process.rskernel/src/sched/waitqueue.rs
ProcessControlBlockstruct with fields:pid: Pidparent_pid: Option<Pid>state: ProcessState(Ready, Running, Sleeping, Zombie)address_space: AddressSpace(from mm)fd_table: FdTable(placeholder — just the field, impl in Epic 3)kernel_stack: VirtAddrcontext: CpuContext(saved registers for context switch)exit_code: Option<i32>cwd: String(current working directory, default "/")pgid: Pid(process group ID)sid: Pid(session ID)children: Vec<Pid>wait_queue: WaitQueue(for parent waiting on this process)
- PID allocator: simple counter with
AtomicU32, starting at 1allocate_pid() -> Pid- PIDs are never reused (for simplicity — wrap at MAX_PID)
- Global process table:
BTreeMap<Pid, Arc<Mutex<ProcessControlBlock>>>get_process(pid) -> Option<Arc<Mutex<ProcessControlBlock>>>current_process() -> Arc<Mutex<ProcessControlBlock>>
WaitQueue— generic sleep/wake primitive (DECISIONS.md D12):- Used by: pipes (T3.9), wait4 (T2.10), accept (T5.7), nanosleep (T4.7)
sleep()— add current PID to waiters, set state to Sleeping, call schedule()wake_one()— pop first waiter, set Ready, enqueue in schedulerwake_all()— wake every waiter
// sched/process.rs
use alloc::sync::Arc;
use spin::Mutex;
use alloc::collections::BTreeMap;
#[repr(C)]
pub struct CpuContext {
pub rsp: u64,
pub rbp: u64,
pub rbx: u64,
pub r12: u64,
pub r13: u64,
pub r14: u64,
pub r15: u64,
pub rip: u64,
pub rflags: u64,
}
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ProcessState {
Ready,
Running,
Sleeping,
Zombie,
}
pub struct ProcessControlBlock {
pub pid: Pid,
pub parent_pid: Option<Pid>,
pub state: ProcessState,
pub children: Vec<Pid>,
pub wait_queue: WaitQueue,
// ... other fields
pub context: CpuContext,
}
pub fn allocate_pid() -> Pid;
pub fn current_pid() -> Pid;
pub fn get_process(pid: Pid) -> Option<Arc<Mutex<ProcessControlBlock>>>;
pub fn current_process() -> Arc<Mutex<ProcessControlBlock>>;// sched/waitqueue.rs
use alloc::vec::Vec;
use spin::Mutex;
pub struct WaitQueue {
waiters: Mutex<Vec<Pid>>,
}
impl WaitQueue {
pub fn new() -> Self;
/// Add current PID to waiters, set Sleeping, call schedule().
/// Caller must NOT hold the scheduler lock (lock ordering: D13).
pub fn sleep(&self);
/// Wake first waiter: pop PID, set Ready, enqueue in scheduler.
pub fn wake_one(&self) -> Option<Pid>;
/// Wake all waiters.
pub fn wake_all(&self) -> usize;
/// Number of waiters currently sleeping.
pub fn len(&self) -> usize;
}- Unit tests (Tier 1):
- Allocate 100 PIDs, all unique
- PCB can be created with all fields
- Process table insert/lookup works
- WaitQueue: sleep adds PID to list, wake_one removes it, wake_all clears all
- WaitQueue: wake_one on empty returns None
CpuContextisrepr(C)(will be accessed from assembly)ProcessControlBlockisSend
- Process data structures, PID management,
Arc<Mutex<>>patterns, wait queue primitives
Allocate per-thread kernel stacks and implement the low-level register save/restore for switching between kernel threads.
kernel/src/mm/mapper.rs(from T1.4)kernel/src/mm/frame_allocator.rs(from T1.2)kernel/src/sched/process.rs(CpuContext struct from T2.1)
kernel/src/sched/stack.rskernel/src/arch/x86_64/context.rs
Kernel Stack Allocation:
- Each kernel thread needs a stack (default 16 KiB = 4 pages)
- Stack region:
0xFFFF_A000_0000_0000+ (stack_id * 0x10000) for guard page spacing - Guard page: one unmapped page below each stack to catch overflow
allocate_kernel_stack(stack_id) -> VirtAddr— maps 4 pages, returns top of stack (highest address)deallocate_kernel_stack(stack_id)— unmaps pages, frees frames
Context Switch:
switch_context(old: &mut CpuContext, new: &CpuContext)— naked function with inline asm- Saves:
rsp,rbp,rbx,r12,r13,r14,r15,rflagstoold - Restores the same from
new - Changes
rspto new thread's stack, jumps to new thread'srip - Uses
#[naked]function withcore::arch::asm! - Does NOT save/restore:
rax(return value),rcx/rdx/rsi/rdi/r8-r11(caller-saved)
// sched/stack.rs
pub const KERNEL_STACK_SIZE: usize = 4 * PAGE_SIZE; // 16 KiB
pub const KERNEL_STACK_REGION_START: u64 = 0xFFFF_A000_0000_0000;
pub fn allocate_kernel_stack(
stack_id: usize,
mapper: &mut impl PageMapper,
allocator: &mut impl FrameAllocator,
) -> KernelResult<VirtAddr>;
pub fn deallocate_kernel_stack(
stack_id: usize,
mapper: &mut impl PageMapper,
allocator: &mut impl FrameAllocator,
) -> KernelResult<()>;// arch/x86_64/context.rs
use crate::sched::process::CpuContext;
/// Switch from the current kernel thread to another.
/// SAFETY: Both contexts must be valid. The new context's rsp must point to a valid stack.
#[naked]
pub unsafe extern "C" fn switch_context(
old: *mut CpuContext,
new: *const CpuContext,
);- Integration test (Tier 3): allocate a stack, write to it, verify guard page fault on underflow
- Stack addresses are correctly aligned (16-byte aligned for x86_64 ABI)
deallocatefrees the frames (frame count increases)- Assembly compiles with
#[naked]andasm!on nightly (Intel syntax chosen, documented) - Integration test (Tier 3): create two contexts, switch between them, verify both run
- No register corruption (each thread sees its own register state)
CpuContextfield offsets match the assembly offsets (verified byoffset_of!in tests)
- Kernel stack layout, guard pages, virtual address space management
- x86_64 inline assembly (
core::arch::asm!), calling conventions, naked functions
- Reference: System V AMD64 ABI (callee-saved registers: rbx, rbp, r12-r15, rsp)
- Reference:
#[naked]function requirements (no prologue/epilogue)
Create kernel threads that execute a function and then exit. Implement a round-robin scheduler to select the next runnable process.
- INTERFACES.md
Schedulertrait kernel/src/sched/process.rs(from T2.1)kernel/src/sched/stack.rs(from T2.2)kernel/src/arch/x86_64/context.rs(from T2.2)- DECISIONS.md D13 (lock ordering)
kernel/src/sched/thread.rskernel/src/sched/scheduler.rs
Kernel Threads:
spawn_kernel_thread(name, function)creates a new PCB with:- Allocated kernel stack
CpuContext.rippointing to a trampoline functionCpuContext.rsppointing to top of kernel stack- Trampoline: calls the function, then calls
thread_exit()when it returns
thread_exit()— marks process as Zombie, triggers reschedule- Kernel thread runs in ring 0, shares kernel address space (no separate page tables)
- The first kernel thread is "idle" — loops on
hlt
Scheduler:
- Implement the
Schedulertrait from INTERFACES.md - Simple FIFO run queue using
VecDeque<Pid> enqueue(pid)— add to back of queuedequeue()— remove from fronttick()— decrement time slice counter; return true when it hits 0 (default: 10 ticks = ~100ms at 100Hz PIT)schedule()— top-level function: save current context, pick next, restore context, switch- Skip zombie/sleeping processes in the queue
Lock ordering (D13): The scheduler lock is #1 in the global order. Never acquire it while holding a PCB lock or FD table lock. Code that calls
schedule()must not hold any lock that a wakeup path might need.
// sched/thread.rs
pub fn spawn_kernel_thread(
name: &str,
entry: fn() -> (),
) -> KernelResult<Pid>;
pub fn thread_exit(exit_code: i32) -> !;// sched/scheduler.rs
use alloc::collections::VecDeque;
pub struct RoundRobinScheduler {
run_queue: VecDeque<Pid>,
current: Option<Pid>,
ticks_remaining: u32,
time_slice: u32, // ticks per slice, default 10
}
impl Scheduler for RoundRobinScheduler { /* ... */ }
/// Main scheduling entry point. Called from timer interrupt or voluntary yield.
pub fn schedule();
/// Voluntarily yield the CPU to the next runnable process.
pub fn yield_now();- Integration test (Tier 3): spawn 3 kernel threads, each prints its PID to serial, all 3 run
- Threads exit cleanly (no crash after function returns)
- Thread names appear in process table
- Unit tests (Tier 1):
- Enqueue PIDs 1,2,3; dequeue returns 1,2,3 in order
- Empty queue dequeue returns None
tick()returns true aftertime_sliceticks- Zombie PIDs are skipped
- Integration test (Tier 3): 3 kernel threads run round-robin, each gets CPU time
- Kernel thread lifecycle, stack setup for new threads, trampoline functions
- Scheduler design, round-robin algorithm, VecDeque, lock ordering
Wire the PIT timer interrupt to the scheduler for preemptive multitasking.
kernel/src/arch/x86_64/interrupts.rs(timer handler from T0.11)kernel/src/sched/scheduler.rs(from T2.3)
- Updated
kernel/src/arch/x86_64/interrupts.rs— timer calls scheduler - Updated
kernel/src/sched/scheduler.rs— handle timer-driven reschedule
- Timer interrupt handler calls
scheduler::tick() - If
tick()returns true (time slice expired), callschedule() - Context switch happens within the interrupt handler (save state to current PCB, restore next)
- Interrupts must be disabled during context switch (re-enabled after)
- PIT frequency: ~100Hz (default)
// Updated timer handler
extern "x86-interrupt" fn timer_interrupt_handler(stack_frame: InterruptStackFrame) {
TICK_COUNT.fetch_add(1, Ordering::Relaxed);
if scheduler::tick() {
scheduler::schedule();
}
unsafe { PICS.lock().notify_end_of_interrupt(InterruptIndex::Timer as u8); }
}- Integration test (Tier 3): two CPU-bound kernel threads both make progress (neither starves)
- Timer ticks continue during context switch
- No deadlocks from nested locking (PIC lock vs scheduler lock)
- Each thread runs for approximately
time_sliceticks before preemption
- Preemptive scheduling, interrupt-driven context switches, lock ordering
Configure the x86_64 SYSCALL/SYSRET instructions via MSRs and build the dispatch table routing syscall numbers to handler functions.
kernel/src/arch/x86_64/gdt.rs(from T0.7) — need kernel CS/SS selectors- INTERFACES.md syscall number constants
- DECISIONS.md D5 (interrupts re-enabled inside syscalls)
kernel/src/arch/x86_64/syscall.rskernel/src/syscall/mod.rskernel/src/syscall/misc.rs
MSR Setup:
- Write to MSRs:
STAR(0xC0000081): set kernel CS/SS and user CS/SS segment selectorsLSTAR(0xC0000082): set syscall entry point addressSFMASK(0xC0000084): clear IF flag (disable interrupts) on syscall entryEFER(0xC0000080): set SCE bit to enable SYSCALL instruction
- Syscall entry point: a naked function that:
swapgsto access per-CPU data- Save user RSP, switch to kernel stack
stito re-enable interrupts (DECISIONS.md D5 — blocking syscalls must be interruptible)- Save all caller-saved registers
- Call the Rust syscall dispatcher
clibefore restoring user state- Restore registers,
swapgs, executesysretq
Dispatch Table:
- Dispatch table: array of
Option<SyscallHandler>indexed by syscall number dispatch(num, arg1..arg6) -> isize:- Look up handler in table
- If found, call it, return result
- If not found, return
-ENOSYS(-38)
- Implement first stub syscalls (for testing):
sys_getpid()— return current PIDsys_uname()— fill in hardcoded utsname structsys_write(fd, buf, count)— if fd==1 or fd==2, write to serial (temporary)
// arch/x86_64/syscall.rs
/// Initialize SYSCALL/SYSRET MSRs. Call once during boot.
pub fn init();
/// Naked syscall entry point.
/// On entry: RAX = syscall number, RDI/RSI/RDX/R10/R8/R9 = args
/// On exit: RAX = return value
///
/// IMPORTANT: After swapgs + kernel stack switch, execute `sti` to
/// re-enable interrupts before calling dispatch. Execute `cli` before
/// restoring user state and sysretq.
#[naked]
unsafe extern "C" fn syscall_entry();// syscall/mod.rs
const MAX_SYSCALL: usize = 512;
static SYSCALL_TABLE: [Option<SyscallHandler>; MAX_SYSCALL] = {
let mut table = [None; MAX_SYSCALL];
table[nr::GETPID] = Some(misc::sys_getpid);
table[nr::UNAME] = Some(misc::sys_uname);
table[nr::WRITE] = Some(file::sys_write);
// ... filled in by later tasks
table
};
pub fn dispatch(num: usize, a1: usize, a2: usize, a3: usize,
a4: usize, a5: usize, a6: usize) -> isize;- MSR writes don't triple fault (verified by serial print after init)
- Assembly preserves all callee-saved registers
- User RSP is saved before switching to kernel stack
- Interrupts are disabled on entry (SFMASK clears IF), re-enabled after kernel stack switch (
sti), disabled again beforesysretq(cli) SYSRETcorrectly restores user RIP and RFLAGS- Unknown syscall returns -38 (ENOSYS)
sys_getpid()returns a positive integersys_write(1, "hello", 5)outputs "hello" to serial- Dispatch table has room for 512 entries
- Unit tests (Tier 1): dispatch known number calls handler, unknown returns ENOSYS
- x86_64 MSRs, SYSCALL/SYSRET mechanism, naked functions, inline assembly
- Syscall dispatch, function pointer tables, Linux syscall numbers
- Reference: AMD64 Architecture Manual Vol 2, Section 6.1 (SYSCALL/SYSRET)
- Reference: Intel SDM Vol 2B, SYSCALL instruction
- DECISIONS.md D5 for the
sti/cliplacement rationale
Parse ELF64 headers and program headers, then load PT_LOAD segments into a process's address space.
kernel/src/mm/mmap.rs(from T1.10)kernel/src/mm/address_space.rs(from T1.9)
kernel/src/sched/elf.rskernel/src/sched/loader.rs
ELF Parser:
- Parse ELF64 header: verify magic, class (64-bit), data (little-endian), machine (x86_64)
- Parse program headers (PHDR): extract PT_LOAD segments with vaddr, memsz, filesz, offset, flags
- Parse entry point address
- Return structured data, not raw bytes
ELF Loader:
load_elf(elf_data, address_space) -> KernelResult<LoadedElf>:- Parse ELF headers (using parser above)
- For each PT_LOAD segment:
- Calculate page-aligned vaddr and size
- Map pages with appropriate flags (R/W/X -> PageTableFlags)
- Copy file data (filesz bytes from elf_data at offset)
- Zero-fill remainder (memsz - filesz) for BSS
- Set up user stack (8 MiB, at top of user address space, e.g.,
0x7FFF_FFFF_F000downward) - Return entry point address
- Stack setup includes
argc,argv,envp,auxvon the stack (Linux ABI)
// sched/elf.rs
#[derive(Debug)]
pub struct ElfHeader {
pub entry_point: u64,
pub phdr_offset: u64,
pub phdr_count: u16,
pub phdr_size: u16,
}
#[derive(Debug)]
pub struct ProgramHeader {
pub segment_type: SegmentType,
pub offset: u64, // offset in file
pub vaddr: u64, // virtual address
pub filesz: u64, // size in file
pub memsz: u64, // size in memory (may be > filesz for BSS)
pub flags: SegmentFlags, // read/write/execute
pub align: u64,
}
#[derive(Debug, PartialEq)]
pub enum SegmentType { Null, Load, Dynamic, Interp, Note, Other(u32) }
bitflags::bitflags! {
pub struct SegmentFlags: u32 {
const EXECUTE = 0x1;
const WRITE = 0x2;
const READ = 0x4;
}
}
pub fn parse_elf(data: &[u8]) -> KernelResult<(ElfHeader, Vec<ProgramHeader>)>;// sched/loader.rs
pub struct LoadedElf {
pub entry_point: VirtAddr,
pub stack_top: VirtAddr,
pub brk_start: VirtAddr, // end of loaded segments (for brk syscall)
}
pub fn load_elf(
elf_data: &[u8],
address_space: &mut AddressSpace,
argv: &[&str],
envp: &[&str],
allocator: &mut impl FrameAllocator,
) -> KernelResult<LoadedElf>;- Unit tests (Tier 1) — critical, agent must write these:
- Parse a minimal valid ELF64 header (construct bytes manually or embed a small binary)
- Reject ELF32 (wrong class)
- Reject non-x86_64 machine
- Parse 3 program headers, verify vaddr/filesz/memsz/flags
- Handle truncated input gracefully (error, not panic)
- Entry point extracted correctly
- BSS detection:
memsz > filesz-> zero-fill gap - A minimal static ELF binary (write syscall + exit) loads and the entry point is correct
- Text segment is mapped as read + execute (not writable)
- Data/BSS segment is mapped as read + write
- Stack is mapped as read + write with guard page below
argv,envp,auxvare on the stack in Linux ABI format- Unit test (Tier 1): mock address space, verify correct mapping calls
- ELF64 format, binary parsing, bitflags
- ELF loading, Linux process startup (auxv), page flag mapping
- Reference: ELF64 spec (Tool Interface Standard, Chapter 4)
Drop from kernel mode (ring 0) to user mode (ring 3) to execute a loaded binary. Provide safe helpers for copying data between kernel and user address spaces.
kernel/src/arch/x86_64/gdt.rs(user code/data selectors)kernel/src/sched/loader.rs(from T2.6)- DECISIONS.md D8 (user pointer validation via page table walk)
kernel/src/arch/x86_64/usermode.rskernel/src/mm/user_access.rs
User-Mode Entry:
- Add user code segment and user data segment to GDT (if not already present)
enter_usermode(entry_point, user_stack_top)— never returns:- Push SS (user data selector | RPL 3)
- Push RSP (user stack top)
- Push RFLAGS (with IF set — interrupts enabled in user mode)
- Push CS (user code selector | RPL 3)
- Push RIP (entry point)
- Execute
iretq
- This is a one-way transition — kernel re-enters via syscall or interrupt
User Pointer Helpers (DECISIONS.md D8):
copy_from_user(dst, user_src, len)— copy bytes from user space to kernel buffercopy_to_user(user_dst, src)— copy bytes from kernel buffer to user spacecopy_string_from_user(user_ptr, max_len)— copy a null-terminated string from user space- Validation: walk page tables to check that all pages in the range are mapped with USER flag
- Return
Err(EFAULT)if any page is unmapped or lacks USER flag
// arch/x86_64/usermode.rs
/// Transition to user mode. Never returns.
/// SAFETY: entry_point must be a valid user-space instruction address.
/// stack_top must be a valid user-space stack address.
pub unsafe fn enter_usermode(entry_point: VirtAddr, stack_top: VirtAddr) -> !;// mm/user_access.rs
/// Copy `len` bytes from user-space `user_src` into kernel `dst`.
/// Returns EFAULT if any page in the range is not mapped with USER flag.
pub fn copy_from_user(dst: &mut [u8], user_src: usize, len: usize) -> KernelResult<()>;
/// Copy `src` bytes to user-space `user_dst`.
/// Returns EFAULT if any page in the range is not mapped with USER flag.
pub fn copy_to_user(user_dst: usize, src: &[u8]) -> KernelResult<()>;
/// Copy a null-terminated string from user-space, up to `max_len` bytes.
/// Returns EFAULT if any page is not mapped with USER flag.
/// Returns ENAMETOOLONG if no null terminator found within max_len.
pub fn copy_string_from_user(user_ptr: usize, max_len: usize) -> KernelResult<String>;- After
enter_usermode, code runs in ring 3 (verified by attempting a privileged instruction -> GPF) SYSCALLfrom user mode enters kernel (via T2.5 MSR setup)- Interrupts work in user mode (timer still fires)
- Integration test (Tier 3): enter user mode, execute
syscallforsys_write, verify output on serial copy_from_usersucceeds for valid mapped user pagescopy_from_userreturns EFAULT for unmapped or kernel-only pagescopy_to_usersucceeds for valid writable user pagescopy_string_from_userstops at null terminator, returns ENAMETOOLONG if none found- Unit tests (Tier 1): mock page tables, verify validation logic
- Ring transitions,
iretq, user/kernel mode, segment selectors - Page table walking, user pointer validation
Implement sys_clone() as the primary process-creation syscall, with sys_fork() as a thin wrapper. Deep copy address space (no CoW).
kernel/src/sched/process.rs(from T2.1)kernel/src/mm/address_space.rs(from T1.9)kernel/src/mm/page_table.rs(from T1.3)- DECISIONS.md D1 (deep copy, no CoW), D2 (clone is the primitive)
kernel/src/syscall/process.rs
sys_clone(flags, child_stack, ptid, ctid, tls)is the real primitive:- Allocate new PID
- Create new PCB, copy parent's fields
- Deep copy parent's address space — allocate new frames, copy page contents (DECISIONS.md D1)
- If
child_stack != 0, use it as child's user RSP; otherwise clone parent's RSP - Copy file descriptor table (all open FDs shared —
Arcclone) - Set child's return value to 0 (RAX = 0)
- Set parent's return value to child PID
- Add child to scheduler run queue
sys_fork()=sys_clone(SIGCHLD, 0, 0, 0, 0)— a simple wrapper- Child is
Ready, parent continuesRunning - Child inherits parent's pgid and sid
// syscall/process.rs
/// Primary process creation. musl libc calls this, not fork.
pub fn sys_clone(
flags: usize, // SIGCHLD | CLONE_* flags
child_stack: usize, // 0 = clone parent stack pointer
ptid: usize, // parent TID pointer (unused for now)
ctid: usize, // child TID pointer (unused for now)
tls: usize, // TLS base (unused for now)
_a6: usize,
) -> isize;
/// Convenience wrapper: sys_clone(SIGCHLD, 0, 0, 0, 0).
pub fn sys_fork(
_a1: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize,
) -> isize;- After clone, child gets PID > parent PID
- Child and parent have separate address spaces (write in child doesn't affect parent)
- Child's RAX = 0, parent's RAX = child PID
- Both parent and child are scheduled
sys_fork()produces identical behavior tosys_clone(SIGCHLD, 0, 0, 0, 0)sys_clonewith non-zerochild_stacksets child RSP correctly- Integration test (Tier 3): clone, child writes to serial, parent waits
- Process creation, address space duplication, register manipulation for child return value
- clone() flags, Linux clone vs fork semantics
Replace the current process's address space with a new ELF binary.
kernel/src/sched/loader.rs(from T2.6)kernel/src/syscall/process.rs(from T2.8)kernel/src/mm/user_access.rs(from T2.7)
- Updated
kernel/src/syscall/process.rs— add sys_execve
sys_execve(path, argv, envp):- Copy path from user space using
copy_string_from_user(from T2.7) - Copy argv/envp arrays from user space using
copy_from_user - Read ELF data from the filesystem (or initramfs for now)
- Destroy current address space (unmap all user pages, free frames)
- Load new ELF into fresh address space (using T2.6 loader)
- Set up new stack with argv, envp, auxv
- Reset signal handlers to defaults
- Close FDs with CLOEXEC flag
- Set instruction pointer to new entry point
- Does NOT return to caller — jumps to new program
- Copy path from user space using
- Path resolution deferred to Epic 3 — for now, look up filename in a hardcoded initramfs table
pub fn sys_execve(
path_ptr: usize, // user pointer to null-terminated path
argv_ptr: usize, // user pointer to null-terminated array of string pointers
envp_ptr: usize, // user pointer to null-terminated array of string pointers
_a4: usize, _a5: usize, _a6: usize,
) -> isize; // only returns on error (negative errno)- After execve, new program runs (different code at new entry point)
- Old address space is fully freed (frame count increases)
- Stack has correct argv/envp layout
- Invalid path returns -ENOENT
- Invalid ELF returns -ENOEXEC
- User pointers are validated via copy_from_user/copy_string_from_user (no raw dereference)
- execve semantics, address space teardown/rebuild, argv marshaling
Implement process termination, parent notification via wait queues, and minimal process group/session management for job control.
kernel/src/sched/process.rs(from T2.1)kernel/src/sched/scheduler.rs(from T2.3)kernel/src/sched/waitqueue.rs(from T2.1)- DECISIONS.md D13 (lock ordering)
- Updated
kernel/src/syscall/process.rs— add sys_exit, sys_exit_group, sys_wait4 - Updated
kernel/src/syscall/process.rs— add setpgid, getpgid, setsid, getpid, getppid
exit:
sys_exit(status):- Set process state to Zombie
- Store exit status
- Release address space (unmap all, free frames)
- Close all file descriptors
- Reparent children to PID 1 (init)
- Wake parent via
parent.wait_queue.wake_one()(using WaitQueue from T2.1) - Call scheduler to pick next process
wait4:
sys_wait4(pid, status_ptr, options):- If
pid == -1: wait for any child - If
pid > 0: wait for specific child - If child is Zombie: reap it (remove from process table), store status via
copy_to_user, return child PID - If child is still running:
self.wait_queue.sleep()until child exits WNOHANGoption: return 0 immediately if no zombie child
- If
Process Groups:
sys_setpgid(pid, pgid)— set process group IDsys_getpgid(pid)— get process group IDsys_setsid()— create new session, process becomes session leader + process group leadersys_getpid(),sys_getppid()— return PID and parent PID- Default: child inherits parent's pgid and sid on fork/clone
Lock ordering (D13): In exit path, acquire scheduler lock first, then process table lock, then individual PCB lock. In wait4 path, the WaitQueue.sleep() call invokes schedule() — caller must not hold any PCB locks when calling sleep(). Acquire PCB lock only to check state, release it before sleeping.
pub fn sys_exit(status: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_exit_group(status: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_wait4(pid: usize, status_ptr: usize, options: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_setpgid(pid: usize, pgid: usize,
_a3: usize, _a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_getpgid(pid: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_setsid(_a1: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_getpid(_a1: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;
pub fn sys_getppid(_a1: usize, _a2: usize, _a3: usize,
_a4: usize, _a5: usize, _a6: usize) -> isize;- Process exits and becomes zombie
- Parent
wait4reaps zombie and gets exit status wait4(-1, ...)returns first available zombie childWNOHANGreturns 0 when no zombie children- Orphaned children are reparented to PID 1
wait4uses WaitQueue to sleep (not busy-poll)getpidreturns current PIDgetppidreturns parent PID (or 0 for init)setsidcreates new session (sid = pid = pgid)setpgidchanges process group of self or child- Integration test (Tier 3): fork, child exits with status 42, parent waits and reads 42
- Unit tests (Tier 1) for pgid/sid logic
- Process lifecycle, zombie reaping, wait queues, parent-child relationships
- POSIX process groups, sessions, job control basics, lock ordering
| Task | Name | ~Lines | Deps | Parallelizable With |
|---|---|---|---|---|
| T2.1 | Process struct + PID + WaitQueue | 250 | Epic 1 | — |
| T2.2 | Kernel stack + context switch | 180 | T2.1 | T2.5, T2.6 |
| T2.3 | Kernel threads + scheduler | 270 | T2.2 | T2.5, T2.6 |
| T2.4 | Timer preemption | 60 | T2.3 | T2.5, T2.6 |
| T2.5 | SYSCALL/SYSRET + dispatch | 240 | Epic 1, T0.7 | T2.2-T2.4 |
| T2.6 | ELF parser + loader | 380 | Epic 1 (alloc) | T2.2-T2.5 |
| T2.7 | User-mode entry + user pointers | 150 | T2.6, T2.5 | — |
| T2.8 | clone() + fork() | 200 | T2.7, T2.3 | — |
| T2.9 | execve() | 150 | T2.8, T2.6 | — |
| T2.10 | exit + wait4 + process groups | 260 | T2.8 | — |
| Total | ~1,940 |