Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
868 lines (713 loc) · 30.9 KB

File metadata and controls

868 lines (713 loc) · 30.9 KB

Epic 2: Process Management — "It Runs Programs"

Goal: Create, schedule, and manage processes. Load and run ELF binaries. Milestone: Can clone(), exec() a static ELF binary, wait4() for it. Scheduler time-slices. Estimated effort: 6-10 weeks (~1,900 lines) Prerequisites: Epic 1 complete (heap works, page mapping works, VMAs tracked)

Dependency Graph

T2.1 (process struct + PID + wait queue)
 ├── T2.2 (kernel stack + context switch asm)
 │    └── T2.3 (kernel threads + scheduler)
 │         └── T2.4 (timer preemption)
 ├── T2.5 (SYSCALL/SYSRET MSRs + dispatch table) [parallel with T2.2-T2.4]
 ├── T2.6 (ELF parser + loader) [parallel with T2.2-T2.4]
 │    └── T2.7 (user-mode entry + user pointer helpers)
 │         └── T2.8 (clone + fork)
 │              └── T2.9 (execve)
 │                   └── T2.10 (exit + wait4 + process groups)

T2.1: Process Struct + PID Allocator + Wait Queue

Objective

Define the core process data structure, a PID allocator, and a generic wait queue primitive used throughout the kernel.

Context Files

  • INTERFACES.md Process trait, Pid type
  • DECISIONS.md D12 (WaitQueue design), D13 (lock ordering)

Output Files

  • kernel/src/sched/mod.rs
  • kernel/src/sched/process.rs
  • kernel/src/sched/waitqueue.rs

Requirements

  • ProcessControlBlock struct with fields:
    • pid: Pid
    • parent_pid: Option<Pid>
    • state: ProcessState (Ready, Running, Sleeping, Zombie)
    • address_space: AddressSpace (from mm)
    • fd_table: FdTable (placeholder — just the field, impl in Epic 3)
    • kernel_stack: VirtAddr
    • context: CpuContext (saved registers for context switch)
    • exit_code: Option<i32>
    • cwd: String (current working directory, default "/")
    • pgid: Pid (process group ID)
    • sid: Pid (session ID)
    • children: Vec<Pid>
    • wait_queue: WaitQueue (for parent waiting on this process)
  • PID allocator: simple counter with AtomicU32, starting at 1
    • allocate_pid() -> Pid
    • PIDs are never reused (for simplicity — wrap at MAX_PID)
  • Global process table: BTreeMap<Pid, Arc<Mutex<ProcessControlBlock>>>
    • get_process(pid) -> Option<Arc<Mutex<ProcessControlBlock>>>
    • current_process() -> Arc<Mutex<ProcessControlBlock>>
  • WaitQueue — generic sleep/wake primitive (DECISIONS.md D12):
    • Used by: pipes (T3.9), wait4 (T2.10), accept (T5.7), nanosleep (T4.7)
    • sleep() — add current PID to waiters, set state to Sleeping, call schedule()
    • wake_one() — pop first waiter, set Ready, enqueue in scheduler
    • wake_all() — wake every waiter

Interface Contract

// sched/process.rs
use alloc::sync::Arc;
use spin::Mutex;
use alloc::collections::BTreeMap;

#[repr(C)]
pub struct CpuContext {
    pub rsp: u64,
    pub rbp: u64,
    pub rbx: u64,
    pub r12: u64,
    pub r13: u64,
    pub r14: u64,
    pub r15: u64,
    pub rip: u64,
    pub rflags: u64,
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ProcessState {
    Ready,
    Running,
    Sleeping,
    Zombie,
}

pub struct ProcessControlBlock {
    pub pid: Pid,
    pub parent_pid: Option<Pid>,
    pub state: ProcessState,
    pub children: Vec<Pid>,
    pub wait_queue: WaitQueue,
    // ... other fields
    pub context: CpuContext,
}

pub fn allocate_pid() -> Pid;
pub fn current_pid() -> Pid;
pub fn get_process(pid: Pid) -> Option<Arc<Mutex<ProcessControlBlock>>>;
pub fn current_process() -> Arc<Mutex<ProcessControlBlock>>;
// sched/waitqueue.rs
use alloc::vec::Vec;
use spin::Mutex;

pub struct WaitQueue {
    waiters: Mutex<Vec<Pid>>,
}

impl WaitQueue {
    pub fn new() -> Self;

    /// Add current PID to waiters, set Sleeping, call schedule().
    /// Caller must NOT hold the scheduler lock (lock ordering: D13).
    pub fn sleep(&self);

    /// Wake first waiter: pop PID, set Ready, enqueue in scheduler.
    pub fn wake_one(&self) -> Option<Pid>;

    /// Wake all waiters.
    pub fn wake_all(&self) -> usize;

    /// Number of waiters currently sleeping.
    pub fn len(&self) -> usize;
}

Acceptance Criteria

  1. Unit tests (Tier 1):
    • Allocate 100 PIDs, all unique
    • PCB can be created with all fields
    • Process table insert/lookup works
    • WaitQueue: sleep adds PID to list, wake_one removes it, wake_all clears all
    • WaitQueue: wake_one on empty returns None
  2. CpuContext is repr(C) (will be accessed from assembly)
  3. ProcessControlBlock is Send

Agent Skills

  • Process data structures, PID management, Arc<Mutex<>> patterns, wait queue primitives

T2.2: Kernel Stack Allocation + Context Switch Assembly

Objective

Allocate per-thread kernel stacks and implement the low-level register save/restore for switching between kernel threads.

Context Files

  • kernel/src/mm/mapper.rs (from T1.4)
  • kernel/src/mm/frame_allocator.rs (from T1.2)
  • kernel/src/sched/process.rs (CpuContext struct from T2.1)

Output Files

  • kernel/src/sched/stack.rs
  • kernel/src/arch/x86_64/context.rs

Requirements

Kernel Stack Allocation:

  • Each kernel thread needs a stack (default 16 KiB = 4 pages)
  • Stack region: 0xFFFF_A000_0000_0000 + (stack_id * 0x10000) for guard page spacing
  • Guard page: one unmapped page below each stack to catch overflow
  • allocate_kernel_stack(stack_id) -> VirtAddr — maps 4 pages, returns top of stack (highest address)
  • deallocate_kernel_stack(stack_id) — unmaps pages, frees frames

Context Switch:

  • switch_context(old: &mut CpuContext, new: &CpuContext) — naked function with inline asm
  • Saves: rsp, rbp, rbx, r12, r13, r14, r15, rflags to old
  • Restores the same from new
  • Changes rsp to new thread's stack, jumps to new thread's rip
  • Uses #[naked] function with core::arch::asm!
  • Does NOT save/restore: rax (return value), rcx/rdx/rsi/rdi/r8-r11 (caller-saved)

Interface Contract

// sched/stack.rs
pub const KERNEL_STACK_SIZE: usize = 4 * PAGE_SIZE; // 16 KiB
pub const KERNEL_STACK_REGION_START: u64 = 0xFFFF_A000_0000_0000;

pub fn allocate_kernel_stack(
    stack_id: usize,
    mapper: &mut impl PageMapper,
    allocator: &mut impl FrameAllocator,
) -> KernelResult<VirtAddr>;

pub fn deallocate_kernel_stack(
    stack_id: usize,
    mapper: &mut impl PageMapper,
    allocator: &mut impl FrameAllocator,
) -> KernelResult<()>;
// arch/x86_64/context.rs
use crate::sched::process::CpuContext;

/// Switch from the current kernel thread to another.
/// SAFETY: Both contexts must be valid. The new context's rsp must point to a valid stack.
#[naked]
pub unsafe extern "C" fn switch_context(
    old: *mut CpuContext,
    new: *const CpuContext,
);

Acceptance Criteria

  1. Integration test (Tier 3): allocate a stack, write to it, verify guard page fault on underflow
  2. Stack addresses are correctly aligned (16-byte aligned for x86_64 ABI)
  3. deallocate frees the frames (frame count increases)
  4. Assembly compiles with #[naked] and asm! on nightly (Intel syntax chosen, documented)
  5. Integration test (Tier 3): create two contexts, switch between them, verify both run
  6. No register corruption (each thread sees its own register state)
  7. CpuContext field offsets match the assembly offsets (verified by offset_of! in tests)

Agent Skills

  • Kernel stack layout, guard pages, virtual address space management
  • x86_64 inline assembly (core::arch::asm!), calling conventions, naked functions

Agent Prompt Context

  • Reference: System V AMD64 ABI (callee-saved registers: rbx, rbp, r12-r15, rsp)
  • Reference: #[naked] function requirements (no prologue/epilogue)

T2.3: Kernel Threads + Round-Robin Scheduler

Objective

Create kernel threads that execute a function and then exit. Implement a round-robin scheduler to select the next runnable process.

Context Files

  • INTERFACES.md Scheduler trait
  • kernel/src/sched/process.rs (from T2.1)
  • kernel/src/sched/stack.rs (from T2.2)
  • kernel/src/arch/x86_64/context.rs (from T2.2)
  • DECISIONS.md D13 (lock ordering)

Output Files

  • kernel/src/sched/thread.rs
  • kernel/src/sched/scheduler.rs

Requirements

Kernel Threads:

  • spawn_kernel_thread(name, function) creates a new PCB with:
    • Allocated kernel stack
    • CpuContext.rip pointing to a trampoline function
    • CpuContext.rsp pointing to top of kernel stack
    • Trampoline: calls the function, then calls thread_exit() when it returns
  • thread_exit() — marks process as Zombie, triggers reschedule
  • Kernel thread runs in ring 0, shares kernel address space (no separate page tables)
  • The first kernel thread is "idle" — loops on hlt

Scheduler:

  • Implement the Scheduler trait from INTERFACES.md
  • Simple FIFO run queue using VecDeque<Pid>
  • enqueue(pid) — add to back of queue
  • dequeue() — remove from front
  • tick() — decrement time slice counter; return true when it hits 0 (default: 10 ticks = ~100ms at 100Hz PIT)
  • schedule() — top-level function: save current context, pick next, restore context, switch
  • Skip zombie/sleeping processes in the queue

Lock ordering (D13): The scheduler lock is #1 in the global order. Never acquire it while holding a PCB lock or FD table lock. Code that calls schedule() must not hold any lock that a wakeup path might need.

Interface Contract

// sched/thread.rs
pub fn spawn_kernel_thread(
    name: &str,
    entry: fn() -> (),
) -> KernelResult<Pid>;

pub fn thread_exit(exit_code: i32) -> !;
// sched/scheduler.rs
use alloc::collections::VecDeque;

pub struct RoundRobinScheduler {
    run_queue: VecDeque<Pid>,
    current: Option<Pid>,
    ticks_remaining: u32,
    time_slice: u32,  // ticks per slice, default 10
}

impl Scheduler for RoundRobinScheduler { /* ... */ }

/// Main scheduling entry point. Called from timer interrupt or voluntary yield.
pub fn schedule();

/// Voluntarily yield the CPU to the next runnable process.
pub fn yield_now();

Acceptance Criteria

  1. Integration test (Tier 3): spawn 3 kernel threads, each prints its PID to serial, all 3 run
  2. Threads exit cleanly (no crash after function returns)
  3. Thread names appear in process table
  4. Unit tests (Tier 1):
    • Enqueue PIDs 1,2,3; dequeue returns 1,2,3 in order
    • Empty queue dequeue returns None
    • tick() returns true after time_slice ticks
    • Zombie PIDs are skipped
  5. Integration test (Tier 3): 3 kernel threads run round-robin, each gets CPU time

Agent Skills

  • Kernel thread lifecycle, stack setup for new threads, trampoline functions
  • Scheduler design, round-robin algorithm, VecDeque, lock ordering

T2.4: Timer-Driven Preemption

Objective

Wire the PIT timer interrupt to the scheduler for preemptive multitasking.

Context Files

  • kernel/src/arch/x86_64/interrupts.rs (timer handler from T0.11)
  • kernel/src/sched/scheduler.rs (from T2.3)

Output Files

  • Updated kernel/src/arch/x86_64/interrupts.rs — timer calls scheduler
  • Updated kernel/src/sched/scheduler.rs — handle timer-driven reschedule

Requirements

  • Timer interrupt handler calls scheduler::tick()
  • If tick() returns true (time slice expired), call schedule()
  • Context switch happens within the interrupt handler (save state to current PCB, restore next)
  • Interrupts must be disabled during context switch (re-enabled after)
  • PIT frequency: ~100Hz (default)

Interface Contract

// Updated timer handler
extern "x86-interrupt" fn timer_interrupt_handler(stack_frame: InterruptStackFrame) {
    TICK_COUNT.fetch_add(1, Ordering::Relaxed);

    if scheduler::tick() {
        scheduler::schedule();
    }

    unsafe { PICS.lock().notify_end_of_interrupt(InterruptIndex::Timer as u8); }
}

Acceptance Criteria

  1. Integration test (Tier 3): two CPU-bound kernel threads both make progress (neither starves)
  2. Timer ticks continue during context switch
  3. No deadlocks from nested locking (PIC lock vs scheduler lock)
  4. Each thread runs for approximately time_slice ticks before preemption

Agent Skills

  • Preemptive scheduling, interrupt-driven context switches, lock ordering

T2.5: SYSCALL/SYSRET MSR Setup + Dispatch Table

Objective

Configure the x86_64 SYSCALL/SYSRET instructions via MSRs and build the dispatch table routing syscall numbers to handler functions.

Context Files

  • kernel/src/arch/x86_64/gdt.rs (from T0.7) — need kernel CS/SS selectors
  • INTERFACES.md syscall number constants
  • DECISIONS.md D5 (interrupts re-enabled inside syscalls)

Output Files

  • kernel/src/arch/x86_64/syscall.rs
  • kernel/src/syscall/mod.rs
  • kernel/src/syscall/misc.rs

Requirements

MSR Setup:

  • Write to MSRs:
    • STAR (0xC0000081): set kernel CS/SS and user CS/SS segment selectors
    • LSTAR (0xC0000082): set syscall entry point address
    • SFMASK (0xC0000084): clear IF flag (disable interrupts) on syscall entry
    • EFER (0xC0000080): set SCE bit to enable SYSCALL instruction
  • Syscall entry point: a naked function that:
    • swapgs to access per-CPU data
    • Save user RSP, switch to kernel stack
    • sti to re-enable interrupts (DECISIONS.md D5 — blocking syscalls must be interruptible)
    • Save all caller-saved registers
    • Call the Rust syscall dispatcher
    • cli before restoring user state
    • Restore registers, swapgs, execute sysretq

Dispatch Table:

  • Dispatch table: array of Option<SyscallHandler> indexed by syscall number
  • dispatch(num, arg1..arg6) -> isize:
    • Look up handler in table
    • If found, call it, return result
    • If not found, return -ENOSYS (-38)
  • Implement first stub syscalls (for testing):
    • sys_getpid() — return current PID
    • sys_uname() — fill in hardcoded utsname struct
    • sys_write(fd, buf, count) — if fd==1 or fd==2, write to serial (temporary)

Interface Contract

// arch/x86_64/syscall.rs

/// Initialize SYSCALL/SYSRET MSRs. Call once during boot.
pub fn init();

/// Naked syscall entry point.
/// On entry: RAX = syscall number, RDI/RSI/RDX/R10/R8/R9 = args
/// On exit: RAX = return value
///
/// IMPORTANT: After swapgs + kernel stack switch, execute `sti` to
/// re-enable interrupts before calling dispatch. Execute `cli` before
/// restoring user state and sysretq.
#[naked]
unsafe extern "C" fn syscall_entry();
// syscall/mod.rs
const MAX_SYSCALL: usize = 512;
static SYSCALL_TABLE: [Option<SyscallHandler>; MAX_SYSCALL] = {
    let mut table = [None; MAX_SYSCALL];
    table[nr::GETPID] = Some(misc::sys_getpid);
    table[nr::UNAME] = Some(misc::sys_uname);
    table[nr::WRITE] = Some(file::sys_write);
    // ... filled in by later tasks
    table
};

pub fn dispatch(num: usize, a1: usize, a2: usize, a3: usize,
                a4: usize, a5: usize, a6: usize) -> isize;

Acceptance Criteria

  1. MSR writes don't triple fault (verified by serial print after init)
  2. Assembly preserves all callee-saved registers
  3. User RSP is saved before switching to kernel stack
  4. Interrupts are disabled on entry (SFMASK clears IF), re-enabled after kernel stack switch (sti), disabled again before sysretq (cli)
  5. SYSRET correctly restores user RIP and RFLAGS
  6. Unknown syscall returns -38 (ENOSYS)
  7. sys_getpid() returns a positive integer
  8. sys_write(1, "hello", 5) outputs "hello" to serial
  9. Dispatch table has room for 512 entries
  10. Unit tests (Tier 1): dispatch known number calls handler, unknown returns ENOSYS

Agent Skills

  • x86_64 MSRs, SYSCALL/SYSRET mechanism, naked functions, inline assembly
  • Syscall dispatch, function pointer tables, Linux syscall numbers

Agent Prompt Context

  • Reference: AMD64 Architecture Manual Vol 2, Section 6.1 (SYSCALL/SYSRET)
  • Reference: Intel SDM Vol 2B, SYSCALL instruction
  • DECISIONS.md D5 for the sti/cli placement rationale

T2.6: ELF Parser + Loader

Objective

Parse ELF64 headers and program headers, then load PT_LOAD segments into a process's address space.

Context Files

  • kernel/src/mm/mmap.rs (from T1.10)
  • kernel/src/mm/address_space.rs (from T1.9)

Output Files

  • kernel/src/sched/elf.rs
  • kernel/src/sched/loader.rs

Requirements

ELF Parser:

  • Parse ELF64 header: verify magic, class (64-bit), data (little-endian), machine (x86_64)
  • Parse program headers (PHDR): extract PT_LOAD segments with vaddr, memsz, filesz, offset, flags
  • Parse entry point address
  • Return structured data, not raw bytes

ELF Loader:

  • load_elf(elf_data, address_space) -> KernelResult<LoadedElf>:
    • Parse ELF headers (using parser above)
    • For each PT_LOAD segment:
      • Calculate page-aligned vaddr and size
      • Map pages with appropriate flags (R/W/X -> PageTableFlags)
      • Copy file data (filesz bytes from elf_data at offset)
      • Zero-fill remainder (memsz - filesz) for BSS
    • Set up user stack (8 MiB, at top of user address space, e.g., 0x7FFF_FFFF_F000 downward)
    • Return entry point address
  • Stack setup includes argc, argv, envp, auxv on the stack (Linux ABI)

Interface Contract

// sched/elf.rs
#[derive(Debug)]
pub struct ElfHeader {
    pub entry_point: u64,
    pub phdr_offset: u64,
    pub phdr_count: u16,
    pub phdr_size: u16,
}

#[derive(Debug)]
pub struct ProgramHeader {
    pub segment_type: SegmentType,
    pub offset: u64,      // offset in file
    pub vaddr: u64,       // virtual address
    pub filesz: u64,      // size in file
    pub memsz: u64,       // size in memory (may be > filesz for BSS)
    pub flags: SegmentFlags, // read/write/execute
    pub align: u64,
}

#[derive(Debug, PartialEq)]
pub enum SegmentType { Null, Load, Dynamic, Interp, Note, Other(u32) }

bitflags::bitflags! {
    pub struct SegmentFlags: u32 {
        const EXECUTE = 0x1;
        const WRITE   = 0x2;
        const READ    = 0x4;
    }
}

pub fn parse_elf(data: &[u8]) -> KernelResult<(ElfHeader, Vec<ProgramHeader>)>;
// sched/loader.rs
pub struct LoadedElf {
    pub entry_point: VirtAddr,
    pub stack_top: VirtAddr,
    pub brk_start: VirtAddr,  // end of loaded segments (for brk syscall)
}

pub fn load_elf(
    elf_data: &[u8],
    address_space: &mut AddressSpace,
    argv: &[&str],
    envp: &[&str],
    allocator: &mut impl FrameAllocator,
) -> KernelResult<LoadedElf>;

Acceptance Criteria

  1. Unit tests (Tier 1) — critical, agent must write these:
    • Parse a minimal valid ELF64 header (construct bytes manually or embed a small binary)
    • Reject ELF32 (wrong class)
    • Reject non-x86_64 machine
    • Parse 3 program headers, verify vaddr/filesz/memsz/flags
    • Handle truncated input gracefully (error, not panic)
  2. Entry point extracted correctly
  3. BSS detection: memsz > filesz -> zero-fill gap
  4. A minimal static ELF binary (write syscall + exit) loads and the entry point is correct
  5. Text segment is mapped as read + execute (not writable)
  6. Data/BSS segment is mapped as read + write
  7. Stack is mapped as read + write with guard page below
  8. argv, envp, auxv are on the stack in Linux ABI format
  9. Unit test (Tier 1): mock address space, verify correct mapping calls

Agent Skills

  • ELF64 format, binary parsing, bitflags
  • ELF loading, Linux process startup (auxv), page flag mapping

Agent Prompt Context

  • Reference: ELF64 spec (Tool Interface Standard, Chapter 4)

T2.7: User-Mode Entry + User Pointer Helpers

Objective

Drop from kernel mode (ring 0) to user mode (ring 3) to execute a loaded binary. Provide safe helpers for copying data between kernel and user address spaces.

Context Files

  • kernel/src/arch/x86_64/gdt.rs (user code/data selectors)
  • kernel/src/sched/loader.rs (from T2.6)
  • DECISIONS.md D8 (user pointer validation via page table walk)

Output Files

  • kernel/src/arch/x86_64/usermode.rs
  • kernel/src/mm/user_access.rs

Requirements

User-Mode Entry:

  • Add user code segment and user data segment to GDT (if not already present)
  • enter_usermode(entry_point, user_stack_top) — never returns:
    • Push SS (user data selector | RPL 3)
    • Push RSP (user stack top)
    • Push RFLAGS (with IF set — interrupts enabled in user mode)
    • Push CS (user code selector | RPL 3)
    • Push RIP (entry point)
    • Execute iretq
  • This is a one-way transition — kernel re-enters via syscall or interrupt

User Pointer Helpers (DECISIONS.md D8):

  • copy_from_user(dst, user_src, len) — copy bytes from user space to kernel buffer
  • copy_to_user(user_dst, src) — copy bytes from kernel buffer to user space
  • copy_string_from_user(user_ptr, max_len) — copy a null-terminated string from user space
  • Validation: walk page tables to check that all pages in the range are mapped with USER flag
  • Return Err(EFAULT) if any page is unmapped or lacks USER flag

Interface Contract

// arch/x86_64/usermode.rs

/// Transition to user mode. Never returns.
/// SAFETY: entry_point must be a valid user-space instruction address.
/// stack_top must be a valid user-space stack address.
pub unsafe fn enter_usermode(entry_point: VirtAddr, stack_top: VirtAddr) -> !;
// mm/user_access.rs

/// Copy `len` bytes from user-space `user_src` into kernel `dst`.
/// Returns EFAULT if any page in the range is not mapped with USER flag.
pub fn copy_from_user(dst: &mut [u8], user_src: usize, len: usize) -> KernelResult<()>;

/// Copy `src` bytes to user-space `user_dst`.
/// Returns EFAULT if any page in the range is not mapped with USER flag.
pub fn copy_to_user(user_dst: usize, src: &[u8]) -> KernelResult<()>;

/// Copy a null-terminated string from user-space, up to `max_len` bytes.
/// Returns EFAULT if any page is not mapped with USER flag.
/// Returns ENAMETOOLONG if no null terminator found within max_len.
pub fn copy_string_from_user(user_ptr: usize, max_len: usize) -> KernelResult<String>;

Acceptance Criteria

  1. After enter_usermode, code runs in ring 3 (verified by attempting a privileged instruction -> GPF)
  2. SYSCALL from user mode enters kernel (via T2.5 MSR setup)
  3. Interrupts work in user mode (timer still fires)
  4. Integration test (Tier 3): enter user mode, execute syscall for sys_write, verify output on serial
  5. copy_from_user succeeds for valid mapped user pages
  6. copy_from_user returns EFAULT for unmapped or kernel-only pages
  7. copy_to_user succeeds for valid writable user pages
  8. copy_string_from_user stops at null terminator, returns ENAMETOOLONG if none found
  9. Unit tests (Tier 1): mock page tables, verify validation logic

Agent Skills

  • Ring transitions, iretq, user/kernel mode, segment selectors
  • Page table walking, user pointer validation

T2.8: clone() + fork()

Objective

Implement sys_clone() as the primary process-creation syscall, with sys_fork() as a thin wrapper. Deep copy address space (no CoW).

Context Files

  • kernel/src/sched/process.rs (from T2.1)
  • kernel/src/mm/address_space.rs (from T1.9)
  • kernel/src/mm/page_table.rs (from T1.3)
  • DECISIONS.md D1 (deep copy, no CoW), D2 (clone is the primitive)

Output Files

  • kernel/src/syscall/process.rs

Requirements

  • sys_clone(flags, child_stack, ptid, ctid, tls) is the real primitive:
    • Allocate new PID
    • Create new PCB, copy parent's fields
    • Deep copy parent's address space — allocate new frames, copy page contents (DECISIONS.md D1)
    • If child_stack != 0, use it as child's user RSP; otherwise clone parent's RSP
    • Copy file descriptor table (all open FDs shared — Arc clone)
    • Set child's return value to 0 (RAX = 0)
    • Set parent's return value to child PID
    • Add child to scheduler run queue
  • sys_fork() = sys_clone(SIGCHLD, 0, 0, 0, 0) — a simple wrapper
  • Child is Ready, parent continues Running
  • Child inherits parent's pgid and sid

Interface Contract

// syscall/process.rs

/// Primary process creation. musl libc calls this, not fork.
pub fn sys_clone(
    flags: usize,        // SIGCHLD | CLONE_* flags
    child_stack: usize,  // 0 = clone parent stack pointer
    ptid: usize,         // parent TID pointer (unused for now)
    ctid: usize,         // child TID pointer (unused for now)
    tls: usize,          // TLS base (unused for now)
    _a6: usize,
) -> isize;

/// Convenience wrapper: sys_clone(SIGCHLD, 0, 0, 0, 0).
pub fn sys_fork(
    _a1: usize, _a2: usize, _a3: usize,
    _a4: usize, _a5: usize, _a6: usize,
) -> isize;

Acceptance Criteria

  1. After clone, child gets PID > parent PID
  2. Child and parent have separate address spaces (write in child doesn't affect parent)
  3. Child's RAX = 0, parent's RAX = child PID
  4. Both parent and child are scheduled
  5. sys_fork() produces identical behavior to sys_clone(SIGCHLD, 0, 0, 0, 0)
  6. sys_clone with non-zero child_stack sets child RSP correctly
  7. Integration test (Tier 3): clone, child writes to serial, parent waits

Agent Skills

  • Process creation, address space duplication, register manipulation for child return value
  • clone() flags, Linux clone vs fork semantics

T2.9: execve() — Load New Binary

Objective

Replace the current process's address space with a new ELF binary.

Context Files

  • kernel/src/sched/loader.rs (from T2.6)
  • kernel/src/syscall/process.rs (from T2.8)
  • kernel/src/mm/user_access.rs (from T2.7)

Output Files

  • Updated kernel/src/syscall/process.rs — add sys_execve

Requirements

  • sys_execve(path, argv, envp):
    • Copy path from user space using copy_string_from_user (from T2.7)
    • Copy argv/envp arrays from user space using copy_from_user
    • Read ELF data from the filesystem (or initramfs for now)
    • Destroy current address space (unmap all user pages, free frames)
    • Load new ELF into fresh address space (using T2.6 loader)
    • Set up new stack with argv, envp, auxv
    • Reset signal handlers to defaults
    • Close FDs with CLOEXEC flag
    • Set instruction pointer to new entry point
    • Does NOT return to caller — jumps to new program
  • Path resolution deferred to Epic 3 — for now, look up filename in a hardcoded initramfs table

Interface Contract

pub fn sys_execve(
    path_ptr: usize,   // user pointer to null-terminated path
    argv_ptr: usize,   // user pointer to null-terminated array of string pointers
    envp_ptr: usize,   // user pointer to null-terminated array of string pointers
    _a4: usize, _a5: usize, _a6: usize,
) -> isize;  // only returns on error (negative errno)

Acceptance Criteria

  1. After execve, new program runs (different code at new entry point)
  2. Old address space is fully freed (frame count increases)
  3. Stack has correct argv/envp layout
  4. Invalid path returns -ENOENT
  5. Invalid ELF returns -ENOEXEC
  6. User pointers are validated via copy_from_user/copy_string_from_user (no raw dereference)

Agent Skills

  • execve semantics, address space teardown/rebuild, argv marshaling

T2.10: exit() + wait4() + Process Groups

Objective

Implement process termination, parent notification via wait queues, and minimal process group/session management for job control.

Context Files

  • kernel/src/sched/process.rs (from T2.1)
  • kernel/src/sched/scheduler.rs (from T2.3)
  • kernel/src/sched/waitqueue.rs (from T2.1)
  • DECISIONS.md D13 (lock ordering)

Output Files

  • Updated kernel/src/syscall/process.rs — add sys_exit, sys_exit_group, sys_wait4
  • Updated kernel/src/syscall/process.rs — add setpgid, getpgid, setsid, getpid, getppid

Requirements

exit:

  • sys_exit(status):
    • Set process state to Zombie
    • Store exit status
    • Release address space (unmap all, free frames)
    • Close all file descriptors
    • Reparent children to PID 1 (init)
    • Wake parent via parent.wait_queue.wake_one() (using WaitQueue from T2.1)
    • Call scheduler to pick next process

wait4:

  • sys_wait4(pid, status_ptr, options):
    • If pid == -1: wait for any child
    • If pid > 0: wait for specific child
    • If child is Zombie: reap it (remove from process table), store status via copy_to_user, return child PID
    • If child is still running: self.wait_queue.sleep() until child exits
    • WNOHANG option: return 0 immediately if no zombie child

Process Groups:

  • sys_setpgid(pid, pgid) — set process group ID
  • sys_getpgid(pid) — get process group ID
  • sys_setsid() — create new session, process becomes session leader + process group leader
  • sys_getpid(), sys_getppid() — return PID and parent PID
  • Default: child inherits parent's pgid and sid on fork/clone

Lock ordering (D13): In exit path, acquire scheduler lock first, then process table lock, then individual PCB lock. In wait4 path, the WaitQueue.sleep() call invokes schedule() — caller must not hold any PCB locks when calling sleep(). Acquire PCB lock only to check state, release it before sleeping.

Interface Contract

pub fn sys_exit(status: usize, _a2: usize, _a3: usize,
                _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_exit_group(status: usize, _a2: usize, _a3: usize,
                      _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_wait4(pid: usize, status_ptr: usize, options: usize,
                 _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_setpgid(pid: usize, pgid: usize,
                   _a3: usize, _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_getpgid(pid: usize, _a2: usize, _a3: usize,
                   _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_setsid(_a1: usize, _a2: usize, _a3: usize,
                  _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_getpid(_a1: usize, _a2: usize, _a3: usize,
                  _a4: usize, _a5: usize, _a6: usize) -> isize;

pub fn sys_getppid(_a1: usize, _a2: usize, _a3: usize,
                   _a4: usize, _a5: usize, _a6: usize) -> isize;

Acceptance Criteria

  1. Process exits and becomes zombie
  2. Parent wait4 reaps zombie and gets exit status
  3. wait4(-1, ...) returns first available zombie child
  4. WNOHANG returns 0 when no zombie children
  5. Orphaned children are reparented to PID 1
  6. wait4 uses WaitQueue to sleep (not busy-poll)
  7. getpid returns current PID
  8. getppid returns parent PID (or 0 for init)
  9. setsid creates new session (sid = pid = pgid)
  10. setpgid changes process group of self or child
  11. Integration test (Tier 3): fork, child exits with status 42, parent waits and reads 42
  12. Unit tests (Tier 1) for pgid/sid logic

Agent Skills

  • Process lifecycle, zombie reaping, wait queues, parent-child relationships
  • POSIX process groups, sessions, job control basics, lock ordering

Epic 2 Summary

Task Name ~Lines Deps Parallelizable With
T2.1 Process struct + PID + WaitQueue 250 Epic 1
T2.2 Kernel stack + context switch 180 T2.1 T2.5, T2.6
T2.3 Kernel threads + scheduler 270 T2.2 T2.5, T2.6
T2.4 Timer preemption 60 T2.3 T2.5, T2.6
T2.5 SYSCALL/SYSRET + dispatch 240 Epic 1, T0.7 T2.2-T2.4
T2.6 ELF parser + loader 380 Epic 1 (alloc) T2.2-T2.5
T2.7 User-mode entry + user pointers 150 T2.6, T2.5
T2.8 clone() + fork() 200 T2.7, T2.3
T2.9 execve() 150 T2.8, T2.6
T2.10 exit + wait4 + process groups 260 T2.8
Total ~1,940