Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views23 pages

L5 Kernel Extensions

Uploaded by

ruiting.chen.pub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

L5 Kernel Extensions

Uploaded by

ruiting.chen.pub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Lecture 5:

Kernel Extensions

CSC 469 / CSC 2208

Spring 2024
Discussion Questions (from last time)
• Why use a microkernel if your Linux applications are 5-7% slower than on
native Linux kernel?

• What other benefits or advantages might arise from the small L4 code
size?

CSC469 2 Lecture 5
How far can we take minimality?
• Microkernels: minimal set of abstractions and mechanisms
• Exokernel: MIT Research project
• Claim: OS abstractions are bad because they:
• deny application-specific optimizations;
• discourage innovation;
• impose “mandatory costs”.
• Solution: Separate concept of protection from abstraction and management
• Follows end-to-end principle: minimal, fewest H/W abstractions possible
• Exokernel is basically a secure resource multiplexor
• Applications link directly with a library that provides OS functions (libOS)
• Drawbacks?

CSC469 3 Lecture 5
Kernel comparison
• Monolithic
+ performance
- difficult to debug and maintain
• Microkernel
+ more reliable and secure
- performance overhead
• Exokernels
+ minimal and simple
- more work for application developers

CSC469 4 Lecture 5
Going farther…
• Exokernel drops OS abstractions, multiplexes hardware
• Much like an older strategy… Virtual Machines
• Place thin layer of software directly above hardware
• virtual machine monitor (VMM, aka hypervisor)
• Exports raw hardware interface
• OS/application above sees “virtual” machine identical to underlying physical
machine
• VMM multiplexes virtual machines
• We will explore this concept next time

• How can we add or modify OS functionality without complete redesign?


CSC469 5 Lecture 5
OS Extensions
• Adding new function to OS “on the fly”
• Why?
• Fixing mistakes
• Supporting new features or hardware
• Efficiency / Custom implementations
• How?
• Allow some OS function to run outside the kernel (μkernel)
• Give everyone their own virtual machine (VMs)
• Allow users to modify the OS (e.g., modules)

CSC469 6 Lecture 5
Loadable Kernel Modules
• Giving everyone a virtual machine doesn’t entirely solve the extension
problem
• You can run what you want on your VM, but do you really want to write a custom
OS?
• Often just want to modify/replace small part
• Solution: Allow parts of the kernel to be dynamically loaded / unloaded
• Requires dynamic relocation and linking
• Common strategy in monolithic kernels for device drivers (FreeBSD,
Windows, Linux)

CSC469 7 Lecture 5
Linux Loadable Kernel Modules
• Module writer must define (at least) two functions
• init_module – code executed when module loads
• cleanup_module – code executed when module unloads
• Module functions can refer to any exported kernel symbols
• Module is compiled into relocatable .ko file (since 2.6)
• Requires kernel source tree for kernel that module will be loaded into
mymodule.c
#include <linux/module.h> mymodule.ko
init_module() { .text
Kernel source

} headers .init.text
cleanup_module() { build environment make –C $KDIR M=$PWD
.modinfo

} __versions
Makefile
obj-m := mymodule.o

CSC469 8 Lecture 5
• insmod command loads module into running kernel
• 2.4 – insmod (at user level) resolves references to kernel symbols
• 2.6 – invokes syscall, kernel does the linking
• rmmod command removes module from kernel
• lsmod command lists currently-installed modules
• modprobe is a library wrapper that checks module dependencies and
loads additional required modules
copy_module_from_user
mymodule.ko insmod mymodule.ko check versions
sys_init_module
.text check_modinfo
call module_init
.init.text
.modinfo check module dependencies
rmmod mymodule.ko sys_delete_module check reference count
__versions
call module_cleanup
user kernel
CSC469 9 Lecture 5
Tracking Modules
• Kernel has a linked list of module objects
• struct contained in the module memory itself

state
list
name

state
list
name

enum module_state state; state
list
name

state
list
name

struct list_head list;


ref ref ref
ref
… … …

modules_which_use_me modules_which_use_me modules_which_use_me
modules_which_use_me
… … …

char name[MODULE_NAME_LEN];

/* What modules depend on me? */
struct list_head source_list;
/* What modules do I depend on? */
struct list_head target_list;
atomic_t refcount;

CSC469 10 Lecture 5
rmmod
• Unlinks module from kernel
• Needs to ensure no one is using module first!
• Reference count incremented whenever module is used
• source_list identifies other modules that depend on this one
• Invokes module-provided exit / cleanup function
• Frees memory

CSC469 11 Lecture 5
Problems with module approach
• Requires stable interfaces
• Linux uses version numbers to check if module is compiled for correct version of
kernel, but it is easy to get this wrong
• Unsafe
• Module code can do anything because it runs privileged
• E.g. VMWare Workstation driver
• “hijacks” machine by changing interrupt descriptor table (IDT) base register and then
jumps to code in the VM application!

CSC469 12 Lecture 5
Alternate kernel-level schemes
• Trusted compiler (or certification authority) + digital signatures
• Allows verification of source of code added to kernel
• You still have to decide if you trust that source
• Code can still do anything
• Proof-carrying code
• Code Consumer (OS) supplies a specification for what extensions are
allowed to do
• Code Producer (the extension) must supply a proof that it is safe to
execute according to specification
• OS validates proof
• Proof should be easy to check, but may be hard to generate (e.g. maze
example)

CSC469 13 Lecture 5
Checking a proof vs generating one
• G. Necula - Safe Kernel Extensions Without Run-Time Checking, OSDI’96
• A maze is “safe” if there’s a path through it.
• Easy to check a path, but hard to generate.

CSC469 14 Lecture 5
Alternates (2)
• Sandboxing (software fault isolation)
• Limit memory references to per-module segments
• Check for certain unsafe instructions
• Examples:
• SPIN (U. of Washington)
• Modula-3 + trusted compiler
• Safety properties provided by language
• Problems with dynamic behavior (e.g. “while(1)”)
• Vino (Harvard)
• Sandboxed C/C++ code called “grafts”
• Timeouts to guard against misbehaved grafts
• Resource limits + transactional “undo”
• Byte-Granularity Isolation (Microsoft) - BGI
CSC469 15 Lecture 5
eBPF
• “extended Berkeley Packet Filters”
• Language-level VM within Linux kernel
• Register-based VM
• Custom 64-bit RISC instruction set
• Bytecode verifier
• Restrictions are placed on eBPF programs for safety
• Limited number of instructions
• Controlled memory referencing
• Originally, no loops allowed
• Bounded loops were introduced in Linux 5.3
History: BPF
• “The BSD Packet Filter: A New Architecture for User-level Packet
Capture,” by Steven McCanne and Van Jacobson, in Proceedings of the
1993 Winter USENIX Conference.
• Register-based language-level virtual machine to run user programs for packet
capture & filtering inside the BSD Unix kernel.
• 2 registers
• 22 instructions
• No backward branches (no loops)
• Safety / restrictions not mentioned in paper
History: eBPF
• BPF instruction set was too limited
• Linux introduced new “internal” BPF circa 2013
• User programs written in “classic” BPF were translated to internal BPF
• New virtual machine had ten (10) 64-bit registers (enough to pass function
arguments in regs), new BPF_CALL instruction to call kernel functions, ~100
instructions, and other features
• “internal” BPF was made available to users as “extended” BPF soon after
(patch mid-2014)
• Verifier checks user programs at load time
• Termination (no loops), no uninitialized reads, no out-of-bounds memory
access, etc.
• Added support for data “maps” (key-value structures) shared between kernel and
user-space.
Classic usage: optimize packet filtering
$tcpdump host 127.0.0.1 and port 22 –d
• -d means print compiled bytecode and stop

(Brendan Gregg example, O’Reilly Velocity talk, 2017)


Running eBPF Programs
User-level
BPF
program Statistics
Event
1.compile config 4. output
BPF Per-event data
bytecode
2. load 3. attach

Kernel
tracepoints Static tracing
maps

Event sources
kprobes
eBPF eBPF Dynamic tracing
Bytecode Verifier Virtual Machine uprobes

perf_events Sampling, PMCs


Running eBPF Programs
• Must be “attached” to code points in kernel
• Event triggers execution of eBPF code
• Used for:
• Classic network filtering and monitoring
• Restricting system calls (seccomp)
• Debugging and performance analysis
How does it work?
• Userspace has one overloaded system call, bpf()
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
• Meaning of attr depends on the command.
• For loading an eBPF program cmd = BPF_PROG_LOAD
• For load, attr includes a program type, and the list of instructions in the program
• Type determines what eBPF program is allowed to access in kernel
• In-kernel verifier checks safety of eBPF program
• Terminates, bounded loops, no unreachable code
• No out-of-bounds accesses, no uninitialized reads
• Access to kernel functions restricted by program type
How it works (2)
• How is eBPF program “attached” to the kernel, so that it gets invoked at
the desired time?
• Depends on the kind of event
• For sockets, setsockopt()
• For perf events, ioctl()

• Other commands attach, create and access maps


• Need to specify map type, max # of elements, key size and value size (in bytes)

You might also like