Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views35 pages

Lecture 3

This document outlines a lecture on system calls in the CIS 5050 Software Systems course at the University of Pennsylvania. It covers topics such as the distinction between user mode and kernel mode, the necessity of system calls for applications to access kernel services, and the steps involved in executing a system call. Additionally, it includes announcements for special sessions on C/C++ concepts and debugging tools.

Uploaded by

ronesa3901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views35 pages

Lecture 3

This document outlines a lecture on system calls in the CIS 5050 Software Systems course at the University of Pennsylvania. It covers topics such as the distinction between user mode and kernel mode, the necessity of system calls for applications to access kernel services, and the steps involved in executing a system call. Additionally, it includes announcements for special sessions on C/C++ concepts and debugging tools.

Uploaded by

ronesa3901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

University of Pennsylvania

CIS 5050
Software Systems

Linh Thi Xuan Phan

Department of Computer and Information Science


University of Pennsylvania

Lecture 3: System calls


January 25+30, 2024

©2016-2024 Linh Thi Xuan Phan


Announcements
• How are the VM and course infrastructure working
for you?
– If you encounter any problems, please let us know right away!

• Special Session: C/C++ Refresher


– Time: Tomorrow, Friday, January 26 at 2-3pm in Wu & Chen
– Topics: common C/C++ concepts such as pointers, memory
allocation, object-oriented programming, namespaces, types and
data structures, Makefile

• Special Session: Debugging tools


– Time: Tomorrow, Friday, January 26 at 1-2pm in Wu & Chen
– Topics: gdb, address sanitizers, valgrind, coding style

2
©2016-2024 Linh Thi Xuan Phan
Plan for today
• System calls
– What are they, and why do we need them? NEXT

– Kernel entry and exit


– Blocking and context switching
– Some common system calls
– The kernel's perspective

3
©2016-2024 Linh Thi Xuan Phan
User mode vs. kernel mode

"User level"
(unprivileged
code)

Process Process Process

Privileged
PCB
PCB
PCB kernel code
Kernel

• Recall: Modern CPUs offer multiple privilege levels


– Usually at least two: User mode and kernel mode ("supervisor mode")
• Typically, application code runs in user mode
– Some privileged instructions can only be used in kernel mode
• Examples: Direct device I/O, page table manipulation, ... (why this restriction?)
– The kernel can restrict what memory is accessible in user mode
4
©2016-2024 Linh Thi Xuan Phan
Why system calls?
• Modern kernels offer many services to applications
– Storage, networking, memory management, ...

• To use these services, applications need to 'call


functions' inside the kernel
– Example: fork(), read(), write(), ...
– In some old kernels, these were literally function calls!

• How does this interact with privilege levels?


– User-level code may not even be able to access kernel code
– Even if it could, it wouldn't be able to run privileged instructions!

• Applications need a way to talk to the kernel!

5
©2016-2024 Linh Thi Xuan Phan
Recall: Function calls in user level
int foo(int arg) main: foo:
{ ... mov 8(esp), eax
return arg+3; push 5 add eax, 3
} sub esp, 4 mov eax, 4(esp)
call foo ret
int main(void) pop edx (return addr)
{ add esp, 4 8
(empty) Stack
int b = foo(5); ... frame
5
return b;
} Stack
Source code
Address space

• Regular function calls work (approximately) like this:


– Caller pushes arguments on the stack (if any)
• If there are return values, the caller leaves some room on the stack for these
– Caller jumps to the first instruction of the target function
• Using a special instruction that pushes the return address on the stack
– Callee runs, and then saves any return value in the stack
– Callee returns to the next instruction after the call site
• Caller pops the return value and then "cleans up" the stack 6
©2016-2024 Linh Thi Xuan Phan
Example: IA-32 function call
int foo(int arg) 00000000004005b0 <_Z3fooi>:
{ 4005b0: 55 push %rbp
return arg+3; 4005b1: 48 89 e5 mov %rsp,%rbp
} 4005b4: 89 7d fc mov %edi,-0x4(%rbp)
4005b7: 8b 45 fc mov -0x4(%rbp),%eax
int main(void) 4005ba: 83 c0 03 add $0x3,%eax
{ 4005bd: 5d pop %rbp
int b = foo(5); 4005be: c3 retq
return b;
} 00000000004005bf <main>:
4005bf: 55 push %rbp
4005c0: 48 89 e5 mov %rsp,%rbp
4005c3: 48 83 ec 10 sub $0x10,%rsp
4005c7: bf 05 00 00 00 mov $0x5,%edi
4005cc: e8 df ff ff ff callq 4005b0 <_Z3fooi>
4005d1: 89 45 fc mov %eax,-0x4(%rbp)
4005d4: 8b 45 fc mov -0x4(%rbp),%eax
4005d7: c9 leaveq
4005d8: c3 retq

• Example: IA-32 architecture


– You can try this out with 'objdump –X binaryfile'
– Main differences
• Argument & return value are passed in registers
• Uses the base pointer register (EBP) to point to the stack frame (why?) 7
©2016-2024 Linh Thi Xuan Phan
System calls
• From the programmer's perspective, system calls
are a lot like function calls
– They take parameters, return values, etc.
– In fact, the functions you call directly (read(), fork(), ...) are normal
functions in a library that 'wrap' the actual system call

• Key difference: The system call executes code in


the kernel (in supervisor mode)
– How do parameters and return values get passed between the
program and the kernel?

©2016-2024 Linh Thi Xuan Phan


How many system calls are there?
restart_syscall pipe oldlstat sigprocmask poll setresgid32 epoll_ctl fchownat prlimit64
exit times readlink create_module nfsservctl getresgid32 epoll_wait futimesat name_to_handle_at
fork prof uselib init_module setresgid chown32 remap_file_pages fstatat64 open_by_handle_at
read brk swapon delete_module getresgid setuid32 set_tid_address unlinkat clock_adjtime
write setgid reboot get_kernel_syms prctl setgid32 timer_create renameat syncfs
open getgid readdir quotactl rt_sigreturn setfsuid32 timer_settime linkat sendmmsg
close signal mmap getpgid rt_sigaction setfsgid32 timer_gettime symlinkat setns
waitpid geteuid munmap fchdir rt_sigprocmask pivot_root timer_getoverrun readlinkat process_vm_readv
creat getegid truncate bdflush rt_sigpending mincore timer_delete fchmodat process_vm_writev
link acct ftruncate sysfs rt_sigtimedwait madvise clock_settime faccessat kcmp
unlink umount2 fchmod personality rt_sigqueueinfo getdents64 clock_gettime pselect6 finit_module
execve lock fchown afs_syscall rt_sigsuspend fcntl64 clock_getres ppoll sched_setattr
chdir ioctl getpriority setfsuid pread64 gettid clock_nanosleep unshare sched_getattr
time fcntl setpriority setfsgid pwrite64 readahead statfs64 set_robust_list renameat2
mknod mpx profil _llseek chown setxattr fstatfs64 get_robust_list seccomp
chmod setpgid statfs getdents getcwd lsetxattr tgkill splice getrandom
lchown ulimit fstatfs _newselect capget fsetxattr utimes sync_file_range memfd_create
break oldolduname ioperm flock capset getxattr fadvise64_64 tee bpf
oldstat umask socketcall msync sigaltstack lgetxattr vserver vmsplice execveat
lseek chroot syslog readv sendfile fgetxattr mbind move_pages
getpid ustat setitimer writev getpmsg listxattr get_mempolicy getcpu
mount dup2 getitimer getsid putpmsg llistxattr set_mempolicy epoll_pwait
umount getppid stat fdatasync vfork flistxattr mq_open utimensat
setuid getpgrp lstat _sysctl ugetrlimit removexattr mq_unlink signalfd
getuid setsid fstat mlock mmap2 lremovexattr mq_timedsend timerfd_create
stime sigaction olduname munlock truncate64 fremovexattr mq_timedreceive eventfd
ptrace sgetmask iopl mlockall ftruncate64 tkill mq_notify fallocate
alarm ssetmask vhangup munlockall stat64 sendfile64 mq_getsetattr timerfd_settime
oldfstat setreuid idle sched_setparam lstat64 futex kexec_load timerfd_gettime
pause setregid vm86old sched_getparam fstat64 sched_setaffinity waitid signalfd4
utime sigsuspend wait4 sched_setscheduler lchown32 sched_getaffinity add_key eventfd2
stty sigpending swapoff sched_getscheduler getuid32 set_thread_area request_key epoll_create1
gtty sethostname sysinfo sched_yield getgid32 get_thread_area keyctl dup3
access setrlimit ipc sched_get_priority_max geteuid32 io_setup ioprio_set pipe2
nice getrlimit fsync sched_get_priority_min getegid32 io_destroy ioprio_get inotify_init1
ftime getrusage sigreturn sched_rr_get_interval setreuid32 io_getevents inotify_init preadv
sync gettimeofday clone nanosleep setregid32 io_submit inotify_add_watch pwritev
kill settimeofday setdomainname mremap getgroups32 io_cancel inotify_rm_watch rt_tgsigqueueinfo
rename getgroups uname setresuid setgroups32 fadvise64 migrate_pages perf_event_open
mkdir setgroups modify_ldt getresuid fchown32 exit_group openat recvmmsg
rmdir select adjtimex vm86 setresuid32 lookup_dcookie mkdirat fanotify_init
dup symlink mprotect query_module getresuid32 epoll_create mknodat fanotify_mark

• Example: Linux 3.19.8 (355 calls!)


– See /usr/include/asm/unistd_32.h
9
©2016-2024 Linh Thi Xuan Phan
Trap instructions
• There are special instructions for entering the kernel
– Examples: IA-32's sysenter, INT 0x80

• Effect: Switches to kernel mode & jumps to a


predefined address
– Why does the kernel need a special, well-defined "entry point"?

• Before running this instruction, the user code...


– ... loads the parameters into registers
– ... loads a 'system call number' to tell the kernel what function to run

10
©2016-2024 Linh Thi Xuan Phan
System call steps: Kernel entry
Process (PID 1) Kernel
foo: read: entry: sys_read:
... mov 4(esp),ebx push ... ...
push 7 mov 8(esp),ecx cmp eax, 3 pop ...
push 200 mov 12(esp),edx je sys_read sysexit
push bufaddr mov eax, 3 (addr of 'call') cmp eax, 4
call read sysenter bufaddr ...
add esp, 12 ret
200
...
void foo() {
7 PID: 1
UID: 47
PID: 2
UID: 38
...
...
read(7, &buf, 200);
Stack AS: 4711
...
AS: 0815
...
...
}

1. Process puts the arguments in place (if any)


– This is often done by a wrapper function, e.g., in glibc
– Small values can go into registers (e.g., close())
– Larger values can be in memory (e.g., the data for write())
• Remember: The kernel can always read from user memory!

2. Process loads system call number into a register


3. Process executes a trap instruction
11
©2016-2024 Linh Thi Xuan Phan
System call steps: Kernel entry
Process (PID 1) Kernel
foo: read: entry: sys_read:
... mov 4(esp),ebx push ... ...
push 7 mov 8(esp),ecx cmp eax, 3 pop ...
push 200 mov 12(esp),edx je sys_read sysexit
push bufaddr mov eax, 3 (addr of 'call') cmp eax, 4
call read sysenter bufaddr ...
add esp, 12 ret
200
...
7 PID: 1
UID: 47
PID: 2
UID: 38
...
Stack AS: 4711
...
AS: 0815
...

4. Kernel saves the process's context in its PCB


– This includes things like register values, the stack pointer, ...
5. Kernel jumps to the relevant system call handler
6. System call handler runs
– It can decide to block the calling process; in this case, the system call
doesn't return to the caller right away. (More about this later.)
– But suppose it doesn't...

12
©2016-2024 Linh Thi Xuan Phan
System call steps: Kernel exit
Process (PID 1) Kernel
foo: read: entry: sys_read:
... mov 4(esp),ebx push ... ...
push 7 mov 8(esp),ecx cmp eax, 3 pop ...
push 200 mov 12(esp),edx je sys_read sysexit
push bufaddr mov eax, 3 (addr of 'call') cmp eax, 4
call read sysenter bufaddr ...
add esp, 12 ret
200
...
7 PID: 1
UID: 47
PID: 2
UID: 38
...
Stack AS: 4711
...
AS: 0815
...

7. System call handler loads any return values


– Again, small values go in registers, large ones into memory
8. Kernel restores process context from the PCB
9. Kernel executes a return instruction (e.g., sysexit)
– Effect: CPU clears the kernel mode bit, loads PC with trap address
10.Process continues to run
13
©2016-2024 Linh Thi Xuan Phan
Plan for today
• System calls
– What are they, and why do we need them?
– Kernel entry and exit
– Blocking and context switching NEXT
– Some common system calls
– The kernel's perspective

14
©2016-2024 Linh Thi Xuan Phan
Blocking
• Sometimes the system call can't make progress
– ... at least not right away (e.g., read() called but no data available)
• In this case, the kernel runs another process
– Idea: Use the time effectively while the original process waits
– If there are multiple runnable processes, which one should it pick?
• The original process is said to be blocked

• A blocked process can be unblocked again


– If the condition occurs that it was waiting for (e.g., new data arrives)
– If the process receives a signal
– In some other situations (e.g., timeout)

15
©2016-2024 Linh Thi Xuan Phan
Context switch
Process (PID 1)
Process 2) Kernel
foo:
bar: wait:
read: entry: sys_read:
sys_wait:
...
... mov eax,
4(esp),ebx
114 push ... ...
push
call 7wait sysenter
mov 8(esp),ecx cmp eax, 3 pop ...
push
... 200 ret
mov 12(esp),edx je sys_read sysexit
push bufaddr mov eax, 3 (addr of 'call') cmp eax, 4
call read sysenter bufaddr ...
add esp, 12 ret
200
... (addr 7
of 'call') PID: 1
UID: 47
PID: 2
UID: 38
...
Stack AS: 4711
...
AS: 0815
...

• Suppose the read call had blocked earlier


– Kernel could run the process with PID 2
– Maybe this process was doing a wait() and the child has now exited
• Kernel makes the other PCB as the current one
• Then it returns from the system call as usual
– ... only now it is returning from a wait(), into the other process!
16
©2016-2024 Linh Thi Xuan Phan
Scheduling
• What if there are multiple runnable processes?
– Kernel needs to choose one. How should it do that?

• This is a rich and complex topic in its own right!


– For instance, processes could have priorities, and the kernel could
pick the one with the highest priority
– Or they could have deadlines, and the kernel could pick the one with
the earliest deadline
– Or the kernel could give priority to interactive processes, since these
matter most for user experience
– Or the kernel could try to be 'fair' by switching to the process that
has received the least CPU time recently
– ...

17
©2016-2024 Linh Thi Xuan Phan
Context switching and efficiency
• Switching between user and kernel is expensive
– Need to save and restore state, etc.
– Far more expensive than a function call!!

• Switching to a different process can be even more


expensive
– Need to switch to a different address space, possibly flush caches...

• High-performance applications need to take this


into account!
– Avoid needless context switches! Only invoke system calls for things
that cannot be done in user mode.
– Use blocking system calls and not simply busy-waiting!!

©2016-2024 Linh Thi Xuan Phan


Why do we have to know all this?
• System calls are the main interface between
processes and the kernel
– They are like an extended “instruction set” for user programs

• Understanding the system call interface of a given


kernel lets you write good programs under it
– Example: Performance implications of select() vs. epoll()

©2016-2024 Linh Thi Xuan Phan


Recap: System calls
• System calls are a way for applications to call
functions in the kernel
– Invoked via special 'trap instructions' that enter kernel mode
– Parameter passing is similar to regular function calls
– Some modern kernels offer hundreds of different system calls

• System calls can block


– Kernel can context-switch to another process
– If there are multiple runnable processes, the scheduler decides
which one should run next

20
©2016-2024 Linh Thi Xuan Phan
Plan for today
• System calls
– What are they, and why do we need them?
– Kernel entry and exit
– Blocking and context switching
– Some common system calls NEXT
– The kernel's perspective

21
©2016-2024 Linh Thi Xuan Phan
Example: The File API
• UNIX I/O is (mostly) based on a streaming model
– Data is seen as a stream of bytes

• This abstraction covers several types of streams:


– Files
– Directories
– Sockets
– Pipes
– Special files (block and character devices)

• Typical operations: read(), write()


– ... plus several others for specific types of streams, e.g., open() for
files, connect() for sockets, pipe() for pipes, ...

22
©2016-2024 Linh Thi Xuan Phan
File descriptors
• A process can have multiple I/O streams open at
any given time
• Processes use file descriptors to tell the kernel
which stream they are referring to
– System calls like open(), pipe(), ... return new file descriptors
– System calls like read(), write(), ... take them as arguments
– Internally, this is just a number; it can be thought of as an index into
the kernel's file descriptor table
0 stdin
• The shell gives processes some 1 stdout
standard streams: 2 stderr
3
– STDIN: Standard input
4
– STDOUT: Standard output 5
– STDERR: Standard error

...
– Usually associated with a terminal (but can be redirected) 23
©2016-2024 Linh Thi Xuan Phan
Opening and closing streams
• int open(path, flags, [mode])
– Opens a file for reading and writing
• It is possible to open the same file more than once! (How would that be useful?)
– Flags can say whether the file will be read, written to, or both; it can
request that the file be created if it doesn't exist yet, it can request
that read/write operations should not block, etc.
• When creating a new file, the third argument is required
– If all goes well, returns a new file descriptor; otherwise returns -1
• int close(int fd)
– Closes an open file descriptor
• int pipe(int fd[2])
– Creates a new pipe and returns two file descriptors: one for the read
end and one for the write end
• Different calls for other stream types
– Example: connect(). More about this later! 24
©2016-2024 Linh Thi Xuan Phan
Reading from & writing to streams
• int read(int fd, void *buf, size_t size)
– Asks the kernel to read up to 'size' bytes and write them to buf
– If successful, returns the number of bytes actually read
• This can be less than 'size' (under what conditions?)
• It can even be zero! (when?)
– Only guaranteed to read 'size' bytes if the descriptor a) belongs to a
file, and b) has at least that many bytes left

• int write(int fd, void *buf, size_t size)


– Asks the kernel to write up to 'size' bytes from 'buf' to the stream
– If successful, returns the number of bytes actually written
• Can be less than 'size', or zero!
• What does this mean for a program that really wants to write all 'size' bytes?

• More information: 'man 2 read', 'man 2 write'


25
©2016-2024 Linh Thi Xuan Phan
Working with multiple streams
• What if a process has several streams, and data
might arrive from any of them?
– If read() is the only option, what would you need to do?

• int select(int n,fd_set R,fd_set W,fd_set X,timeout)


– Takes three sets of descriptors (R, W, and X) and a timeout
– Blocks until
• any of the descriptors in R have data available for reading
• any of the descriptors in W can accept data (i.e., write() would not block)
• any of the descriptors in X has an exceptional condition (e.g., OOB data)
• the timeout expires (if one has been specified)
– When select() returns, the kernel removes any elements from the
sets that are not actually ready for reading or writing
• In other words, the process still has to call read()/write()/... on all the descriptors
that remain in the sets
26
©2016-2024 Linh Thi Xuan Phan
For more information
EAD(2) BSD System Calls Manual READ(2)
NAME
pread, read, readv -- read input
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
ssize_t
pread(int d, void *buf, size_t nbyte, off_t offset);
ssize_t
read(int fildes, void *buf, size_t nbyte);
ssize_t
readv(int d, const struct iovec *iov, int iovcnt);
DESCRIPTION
Read() attempts to read nbyte bytes of data from the object referenced
by the descriptor fildes into the buffer pointed to by buf. Readv()
performs the same action, but scatters the input data into the iovcnt
buffers specified by the members of the iov array: iov[0], iov[1],
..., iov[iovcnt-1]. Pread() performs the same function, but reads

• System calls are documented in Section 2 of the


Unix/Linux manual
– You can read the manual from the command line ('man 2 read')
27
©2016-2024 Linh Thi Xuan Phan
Recap: File API
• A somewhat closer look at some common syscalls
– File descriptor table
– open(), close(), read(), write(), select()

• We also saw the Unix stream abstraction


– This is just one example of the many abstractions that a modern
kernel provides
– Much more convenient than working with raw disk commands etc.
– It also offers some degree of transparency
• It doesn't matter whether a process is writing to a terminal, to a file, or to a socket!
– Good abstractions are key!

28
©2016-2024 Linh Thi Xuan Phan
Plan for today
• System calls
– What are they, and why do we need them?
– Kernel entry and exit
– Blocking and context switching
– Some common system calls
– The kernel's perspective NEXT

29
©2016-2024 Linh Thi Xuan Phan
How to implement system calls?
• So far, we've mostly looked at system calls from
the user-level perspective
– How processes use the calls, what the calls do for the process, etc.

• Next, let's (briefly) look at the kernel's perspective


– What has to happen inside the kernel during a system call?
– What would you need to implement if you wrote the kernel?

• Disclaimer: This will be highly simplified!

30
©2016-2024 Linh Thi Xuan Phan
How to implement: fork()
• What does the kernel do to implement fork()?
1. Allocate a new PCB for the child
2. Copy (most of) the values from the parent's PCB to the child's
• Including file descriptors, which are inherited from the parent
3. Create a new address space for the child
4. "Copy" the memory contents from parent to child
• In practice, this is usually implemented using copy-on-write (CoW)
5. Mark parent and child as ready to run
6. Set the return values
• Parent returns the child's PID; child returns zero
7. Return to user space
• To the parent, or to the child, or to whichever other process the dispatcher picks

31
©2016-2024 Linh Thi Xuan Phan
How to implement: exec()
• What does the kernel do to implement exec()?
1. Open the binary that the caller specified
2. Clear the address space
3. Load (or map) contents from the binary into memory
• This might take a while (I/O required!), so the process might block
4. Reset the context
5. Mark the process as ready

32
©2016-2024 Linh Thi Xuan Phan
How to implement: wait()
• What does the kernel do to implement wait()?
1. Check whether any child processes have terminated
2. If so:
1. Extract the exit code from the (zombie) PCB
2. Free the child's PCB
3. Set the parent's return value to the exit code
4. Mark the parent as ready, and return
3. If not, mark the parent as blocked

33
©2016-2024 Linh Thi Xuan Phan
How to implement: exit()
• What does the kernel do to implement exit()?
1. Set the process's state to terminated
2. Release any resources (address space, etc.)
3. If the parent is blocked in wait(), change the parent's state to ready

34
©2016-2024 Linh Thi Xuan Phan
How to implement: read()
• What does the kernel do to implement read()?
1. Check for errors (file descriptor is invalid, etc.)
• If so, mark process as ready and return an error code
• The user-level library (glibc?) will put the error code into 'errno' and return -1
from the wrapper function
2. Check whether data is available for reading
• Recall that read() can read from many types of streams, so the process for this
varies with the stream type
3. If data is already available in kernel memory:
1. Pick N:=min(buffer size, number of available bytes)
2. Copy N bytes to the specified user-level buffer in the process
3. Set return value to N and mark the process as ready
4. If data is not available but can be read (e.g., from disk):
1. Pick N as above, send read request to the device, mark process as blocked
5. If no data is available:
1. Check if the file descriptor has been set to nonblocking; if so, return 0
2. Otherwise, mark the process as blocked

35
©2016-2024 Linh Thi Xuan Phan

You might also like