Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
70 views326 pages

Luth 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views326 pages

Luth 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 326

Click to edit Master title style

Linux under the Hood

2024 edition
Intro
Click to edit Master title style
• This session is provided by Sander van Vugt
• It is based on material from my new Linux under the Hood (2024 edition)
video course
• It might have typos, bugs and more, but should be a lot of fun!
• Participants are expected to have decent Linux knowledge / experience
• To follow along, install a virtual machines running CentOS Stream
(preferred) or Ubuntu Server
• Using other environments is not recommended
Poll Question 1
Click to edit Master title style
Rate your Linux knowledge
• none
• poor
• average
• good
• more than good
Poll Question 2
Click to edit Master title style
Which distribution are you using?
• Red Hat family
• Ubuntu and alike
• Something else (please specify in group chat)
Poll Question 3
Click to edit Master title style
• Where are you from?
• Middle East
• Africa
• India
• Asia (other)
• North/Central America
• South America
• Pacific region
• Europe
• Belgium
Agenda
Click to edit Master title style
• This course doesn't have a fixed agenda, it service the "surprise menu"
• The slides are used as a reference for further studying after the course
• For more information, check the on demand course at
https://learning.oreilly.com/api/v1/continue/9780138293321/
Click to edit Master title style

Lesson 1: Core Linux Elements

1.1 System Space and User Space,


and How they are Related
Click to edit Master title style

Lesson 1: Core Linux Elements

1.2 The Role of the Kernel


Understanding the Kernel
Click to edit Master title style
• The kernel is the part of the Linux operating system that has control over
everything.
• It provides access to hardware by using kernel modules (drivers).
• It provides basic services to all other parts of the operating system.
• The Linux kernel was first announced on 25 August 1991, and version 0.01
was released on 17 September 1992.
• The first Linux distributions, bundling kernel and tools into an installable
and ready-to-use operating system, were launched in 1992 and 1993.
• Big Linux distributions like Debian, SUSE, and Red Hat were launched mid-
1994.
Click to edit Master title style

Lesson 1: Core Linux Elements

1.3 Why the Root User is Unrestricted


Understanding the Root User
Click to edit Master title style
• The user with UID 0 has all capabilities
• In /etc/passwd, UID 0 is assigned to the root user.
• Even after removing /etc/passwd, the system UID 0 still exists and there are
no limitations for this UID.
• Removing UID 0 is not a very good idea, it is needed throughout the
operating system
Click to edit Master title style

Lesson 1: Core Linux Elements

1.4 Drivers, Kernel Modules, and


Device Files
Kernel Modules
Click to edit Master title style
• Hardware support is offered by drivers.
• In Linux, the drivers are presented as kernel modules and loaded on-
demand in most cases.
• Most device nodes are represented by files in /dev, which present a user
space interface to the drivers.
Device Nodes
Click to edit Master title style
• Many devices only send bytes to a peripheral on the computer, or receive
bytes from the peripheral.
• Such devices work like pipes, and for that reason work well as character
devices.
• Other devices work like files: what you write to a specific location can later
be retrieved from the same location. These devices are represented by
block devices.
• Network devices are more complex as they work with packets, instead of
streams of bytes. These devices are controlled by the ioctl() system call,
which permits for more advanced operations, and for that reason doesn't
work well with either block or character devices.
• Video adapters also don't have device nodes; the kernel writes directly to
the memory of the video adapter as this is faster.
Click to edit Master title style

Lesson 1: Core Linux Elements

1.5 Glibc
Understanding Glibc
Click to edit Master title style
• Glibc is the GNU C Library
• It implements the C standard library, which contains standard functions
that can be used by all programs written in C
• As such, it provides core Linux facilities, such as open, read, write, malloc,
printf and more
• You'll find it as a dependency for all program files on Linux
• Glibc is released under the Generic Public License (GPL)
• It works on all common hardware platforms
Click to edit Master title style

Lesson 1: Core Linux Elements

1.6 The Shell


Understanding the Shell
Click to edit Master title style
• To pass instructions to the kernel, a user interface is needed.
• The shell interprets commands that the user types and passes them to the
underlying operating system.
• As a result, the computer will do something.
• bash is the standard shell on Linux.
• It is based on the Bourne shell, which was introduced with UNIX in 1979.
• Bash stands for "Bourne Again SHell", indicating that it is compatible with
Bourne shell.
• Other shells like dash and zsh may also be used.
Click to edit Master title style

Lesson 1: Core Linux Elements

1.7 File Descriptors


Understanding File Descriptor
Click to edit Master title style
• In Linux, everything is a file.
• That means that Linux communication happens through files.
• Device access happens through device files.
• Pipes happen through temporary files that are created in the operating system.
• File descriptors are numbers used by processes to keep a list of open files.
• All processes have at least 3 file descriptors:
• 0: STDIN, refers to the standard input device.
• 1: STDOUT, refers to the standard output device.
• 2: STDERR, refers to the standard error device.
• A complete list of file descriptors can be found in /proc/<PID>/fd.
• The lsof command gives an interpreted list.
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.1 Why Compiling Kernels isn't
Necessary Anymore
Understanding Modular Kernels
Click to edit Master title style
• When Linux was first released in 1992, it used monolithic kernels.
• In a monolithic kernel all functionality is compiled into the kernel.
• As a result, in order to add features to the kernel, it needed to be
recompiled each time.
• With the Linux 2.0 release in June 1996, the modular kernel was
introduced.
• To add functionality to the kernel, drivers could simply be added and
loaded, first with the insmod command, later with the modprobe
command.
• In modern kernels, drivers are loaded automatically.
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.2 Kernel Generic Interfaces
Understanding Kernel Generic Interfaces
Click to edit Master title style
• To access specific hardware devices, the kernel uses kernel modules as
drivers.
• To access these modules, some main generic interfaces are used.
• Virtual Memory for addressing memory
• Virtual File System (VFS) for addressing file systems
• Device Mapper for addressing several types of block devices
• TCP/IP for addressing network devices
• System calls provide access to these generic interfaces, which makes device
usage possible.
• The sysfs and procfs pseudo filesystems are used to tweak the generic
interfaces.
Drawing
Click to edit Master title style
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.3 Managing and Tuning Kernel
Modules
Managing Kernel Modules
Click to edit Master title style
• Kernel modules are loaded in different ways
• Through the initramfs / initrd
• Through systemd-udevd
• Manually using modprobe
• Some kernel modules take parameters
• Use modinfo to list kernel module information, including parameters
Tuning Kernel Modules
Click to edit Master title style
• Some kernel modules have options
• To automatically load kernel module options, create a file in
/etc/modprobe.d, reflecting the name of the module; such as
/etc/modprobe.d/cdrom.conf
• In this file use options modulename name=value, for instance: options
cdrom debug=true
Tuning Kernel Modules
Click to edit Master title style
• To manually load kernel module parameters, add paramname=value to the
command line while loading the module
• Some kernel modules keep their current parameter values in
/sys/module/modulename/parameters/*
• If this doesn't show any parameters, use modprobe -c | grep modulename
to print the module configuration
Demo: Using Kernel Module Parameters
Click to edit Master title style
• lspci -k # look for the Ethernet controller
• modinfo e1000
• modprobe -r e1000
• modprobe e1000 debug=4
• modprobe -c e1000
• echo options e1000 debug=4 > /etc/modprobe.d/e1000.conf
• Notice that this parameter doesn't show in
/sys/module/e1000/parameters/
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.4 The /proc Pseudo Filesystem
Using /proc
Click to edit Master title style
• /proc is a pseudo filesystem that provides an interface to kernel data
structures.
• Being a pseudo filesystem, /proc doesn't give access to a disk device, but
provides access to a kernel interface.
• The kernel uses other pseudo filesystems, like sysfs and debugfs.
• Within /proc, you'll find 3 main types of information
• /proc/nnn: this is where the kernel keeps information about every running
process. These directories are referred to as the PID directories.
• /proc/sys: here you'll find kernel tunables organized by the different kernel
interfaces.
• /proc/*: here you'll find many files that contain status information about a
running system.
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.5 Using /proc to Get Detailed
System Information
Useful Files in /proc
Click to edit Master title style
• cpuinfo: information about CPU features
• meminfo: detailed information about system memory
• cmdline: contains the current kernel boot command as issued by GRUB
• vmstat: statistics about current memory usage
• modules: shows kernel modules and their dependencies
• filesystems: contains all currently loaded filesystem drivers
• devices: shows all block and character devices and their major
• swaps: shows all swap devices currently in use
• mounts: shows mounted filesystems
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.6 Reading Process Information in
/proc
Understanding /proc/<PID>
Click to edit Master title style
• Every running process has a directory corresponding to its PID in the /proc
directory.
• This /proc/<PID> directory contains detailed information about the current
process state.
• Notice that much of this information can be displayed using common
utilities like ps and top.
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.7 Tuning the Kernel through
/proc/sys
Tuning the Kernel through /proc/sys
Click to edit Master title style
• /proc/sys contains a hierarchy with different kernel interfaces and their
tunables.
• To change a current parameter, the new value can be echoed into the file:
echo 60 > /proc/sys/vm/swappiness
• To make a change persistent, put it in /etc/sysctl.conf or a file in
/etc/sysctl.conf.d/
• vm.swappiness = 60
• Parameters in sysctl.conf are processed by the systemd-sysctl.service unit
file while booting.
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.8 Testing Critical Failures with sysrq
Understanding sysrq
Click to edit Master title style
• The /proc/sysrq file provides an interface to perform advanced operations
on the kernel.
• To use it, first use echo h > /proc/sysrq and read the output using dmesg
• Using these options can be useful for testing how a system reacts when
problems occur
• echo c > /proc/sysrq-trigger crashes your system
• echo f > /proc/sysrq-trigger triggers the OOM killer
• echo i > /proc/sysrq-trigger kills all processes
• echo b > /proc/sysrq-trigger resets the system
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.9 Using Watchdogs
Understanding Watchdogs
Click to edit Master title style
• Watchdogs can be used to reset the system if a serious problem is
detected.
• At the lowest level a kernel module must be loaded to implement
watchdog.
• This can be a hardware-related module that interfaces ipmi, or the softdog
software-based watchdog.
• Ensure the watchdog service is enabled and started: systemctl enable --
now watchdog
• This daemon writes to /dev/watchdog once a minute by default (configure
in /etc/watchdog.conf).
• If the daemon stops doing that, the system is considered to be failing, and
watchdog will reset the system.
Demo: Using Softdog
Click to edit Master title style
• dnf install watchdog
• systemctl enable --now watchdog
• modprobe -i softdog
• echo c > /proc/sysrq-trigger
Click to edit Master title style

Lesson 3: Looking Closer at the


Kernel
3.10 eBPF
Understanding eBPF
Click to edit Master title style
• eBPF is technology that originates from the Linux kernel and makes it
possible to change kernel behavior without having to change the kernel
module, or adding modules.
• It was added in the 4.x kernels.
• eBPF fuctions like a virtual machine that runs inside the Linux kernel and
which allows programmers to run code to address specific kernel routines.
• Currently, eBPF can be used at all layers of the operating system to extend
its functionality.
• See ebpf.io for more details.
Demo: Using bpftrace
Click to edit Master title style
• dnf install bpftrace
• git clone https://github.com/sandervanvugt/luth
• bpftrace luth/disk_io.bt
Click to edit Master title style

Lesson 5: Hardware Handling

5.1 Understanding Device Nodes


Understanding Hardware Access
Click to edit Master title style
• Kernel drivers are used to load drivers for devices.
• To access these drivers, a representation in user space is needed.
• This representation is made by device nodes in /dev.
• The "everything is a file" statement applies here, by using device files users
can easily access block as well as character devices.
From Device Node to Kernel Module
Click to edit Master title style
• When a device driver is loaded as a module in the kernel, it registers itself
with a major number, used to identify devices managed by that module.
• Device nodes are configured with the major number needed to access
these specific devices.
• The minor number on the device node is handled by the device driver, and
provides access to where it is specifically required.
Click to edit Master title style

Lesson 5: Hardware Handling

5.2 Initializing Devices Automatically


or Manually
How Devices are Initialized
Click to edit Master title style
• Linux devices can be initialized in different ways
• Statically through the initrd / initramfs
• Through systemd-udevd
• Manually creating mknod
• Using mknod is not common and only needed in exceptional cases
• Initramfs initializes known devices
• Systemd-udevd is the plug-and-play manager and initialized devices when
they are plugged in
Demo: Recovering Lost Devices
Click to edit Master title style
• umount /boot
• rm /dev/sda1
• mount /boot
• ls -l /dev/sda*
• mknod /dev/sda1 b 8 1
• ls -l /dev/sda*
• chgrp disk /dev/sda1
• chmod 660 /dev/sda1
• mount /boot
Click to edit Master title style

Lesson 5: Hardware Handling

5.3 Analyzing sysfs


Understanding sysfs
Click to edit Master title style
• sysfs is a pseudo-filesystem, used by different kernel routines to provide
runtime information about devices.
• It can be used to find out more details about hardware devices, or to
manage device properties.
• Apart from hardware management, it is also used by other Linux
subsystems, such as cgroups and SElinux.
• After hardware initialization, it can be used to discover several device
properties.
Demo: Managing Devices Through sysfs
Click to edit Master title style
• lscpu
• cat /sys/bus/cpu/devices/cpu1/online
• echo 0 > /sys/bus/cpu/devices/cpu1/online
• cat /sys/block/sda/queue/scheduler
• time dd if=/dev/zero of=/bigfile bs=1M count=1024
• echo none > /sys/block/sda/queue/scheduler
• time dd if=/dev/zero of=/bigfile bs=1M count=1024
Click to edit Master title style

Lesson 5: Hardware Handling

5.4 systemd-udevd
Understanding systemd-udevd
Click to edit Master title style
• systemd-udevd can be considered the plug-and-play manager on Linux
• When hardware events are detected, it processes rules to initialize devices
• This procedure can be followed using udevadm monitor
systemd-udevd Working
Click to edit Master title style
• The Linux kernel initiates devices loading, and then sends out uevents to
the systemd-udevd user space daemon
• systemd-udevd catches the event and decides how to handle it based on
attributes that it has received in the event
• It next reads its rules and acts based on these rules
• Default rules are in /usr/lib/udev/rules.d
• Custom rules are in /etc/udev/rules.d
• The results can be monitored through the /sys filesystem
Click to edit Master title style

Lesson 5: Hardware Handling

5.5 Creating Udev Rules


Understanding Udev Rules
Click to edit Master title style
• Rules can be defined to manage systemd-udevd device initialization
• This allows you to define what should happen upon device initialization
• Default rules are in /usr/lib/udev/rules.d/
• Custom rules can be created in /etc/udev/rules.d/
• usbguard is a specialized tool that makes allowing or denying access to USB
devices easier
Demo: Using Custom Rules
Click to edit Master title style
• start udevadm monitor and plug in a device
• cat /lib/udev/rules.d/60-persistent-storage.rules
• ls /dev/disk
• udevadm info --query=all --name=/dev/sdb
• udevadm info --attribute-walk --name=/dev/sdb
• create a custom udev rule /etc/udev/rules.d/50-custom-storage.rules
• ACTION=="add", SUBSYSTEM=="block", DRIVERS=="usb-storage",
SYMLINK+="usb/%k", RUN+="/usr/bin/logger CUSTOMUDEV Added %k"
• udevadm control --reload-rules
• Plug in a USB thumb drive and observe
• ls -l /dev/usb
• journalctl -f
Click to edit Master title style

Lesson 6: Storage Devices

6.1 Linux Storage Devices


Understanding Device Types
Click to edit Master title style
• Linux works with block devices and character devices.
• A block device allows the Linux kernel to address information on the device
in any order.
• A character device is addressed by sending and receiving single characters.
• Device types are accessed through device files in /dev, and the file type
indicates if it's a block device (b) or a character device (c).
Understanding Linux Storage Devices
Click to edit Master title style
• Most Linux systems store files on hard disk.
• Before working with a hard disk, it needs to be structured.
• The most common way to structure hard disks is by using partitions.
• Partitions make a static definition of an area on disk, and in the partition
typically a filesystem is used.
• Linux filesystems allow for allocation of blocks to store files.
• Because, in some cases, a more dynamic approach is needed, Linux can use
flexible volumes as well.
• The most common solution for flexible storage is Logical Volume Manager
(LVM).
Understanding Storage Device Addressing
Click to edit Master title style
• Original hard disks, which consist of rotating platters, used cylinder-head-
sector (CHS) addressing.
• CHS addressing doesn't make sense on SSD devices.
• Logical Block Addressing (LBA) was introduced to make addressing blocks
on disk easier and has become the standard.
• On modern hard disks, LBA is used to address sectors.
• The sector is the smallest physical storage unit on a disk, and it is normally
set to be 512 bytes in size.
• When creating partitions, boundaries are specified in sectors.
Click to edit Master title style

Lesson 6: Storage Devices

6.2 Partitions: MBR and GPT


Master Boot Record
Click to edit Master title style
• When the PC standard was introduced in 1981, the Master Boot Record
was defined as well.
• The MBR is the first 512 bytes on disk, of which 64 bytes is used as the
partition table.
• In this limited space, 4 partitions can be defined as primary partitions.
• If more partitions are needed, one partition needs to be created as an
extended partition.
• Within the extended partition, logical partitions can be created.
MBR Partition Limitations
Click to edit Master title style
• In MBR, 64 bytes are available for addressing 4 partitions, which means 16
bytes for each partition.
• Using these 64 bytes, a maximum disk size of 2TiB can be addressed.
• Another limitation is that only 4 partitions can be managed through MBR.
Demo: Exploring MBR Partition Tables
Click to edit Master title style
• ls /sys/firmware # no uefi subdir? Then you're on BIOS
• lsblk # shows block devices in use
• xxd -l 512 /dev/sda # shows the MBR and its partition table
• fdisk -l /dev/sda # shows it in a more readable way
GUID Partition Table (GPT)
Click to edit Master title style
• In the GUID Partition Table (GPT) more space is available for addressing
partitions.
• GUID is a part of the Unified Extensible Firmware Interface (UEFI), but on
Linux can be used on BIOS systems as well.
• GUID uses 64 bits for local block addresses, which means that on disks with
sectors that have a size of 512 bytes, the maximum size is 8ZiB.
• Also, in GUID a maximum of 128 partitions can be created and there is no
longer any need to differentiate between primary and secondary
partitions.
Where GPT is Stored
Click to edit Master title style
• As GPT is bigger, it cannot be stored in the MBR
• The GPT header is stored in the second sector in disk
• Right after that, the next sectors are used for storing GPT partitions
• Each partition needs a 128 bytes entry, so a max of 32 sectors can be used
to store GPT partitions
• A backup GPT header is stored on the last sector on disk
NTS: Demo: Analyzing GPT
Click to edit Master title style
• xxd -l /dev/sda # shows the protective MBR
• dd if=/dev/sda bs=512 skip=1 count=1 | xxd # shows the GPT header
• dd if=/dev/sda bs=512 skip=2 count=1 | xxd # shows the next sector
containing the first 4 partitions
• fdisk -l /dev/sda # look for the number of sectors on this disk
• dd if=/dev/sda bs=512 skip=$(( N-1 )) | xxd # replace N with number of
sectors in disk
Demo: Analyzing GPT
Click to edit Master title style
• xxd -l /dev/sda
• dd if=/dev/sda bs=512 skip=1 count=1 | xxd
• dd if=/dev/sda bs=512 skip=2 count=1 | xxd
• fdisk -l /dev/sda
• dd if=/dev/sda bs=512 skip=$(( N-1 )) | xxd
Click to edit Master title style

Lesson 6: Storage Devices

6.3 Managing Partitions


Click to edit Master title style

Lesson 6: Storage Devices

6.4 Images and ISO Files


Mounting Images and ISO files
Click to edit Master title style
• To mount, you need a block device.
• ISO files can be mounted directly because they are of the iso9660
filesystem type.
• To mount a file, a loop device is created as a fake block device.
• You can perform this procedure manually as well.
Demo: Using losetup to create a storage device
Click to edit Master title style
• dd if=/dev/zero of=/disk.img bs=1M count=1024
• losetup loop0 /disk.img
• mount /disk.img /mnt
• mount
• cp /etc/hosts /mnt
Click to edit Master title style

Lesson 6: Storage Devices

6.5 Understanding Flexible Storage


Solutions
Understanding Flexible Storage
Click to edit Master title style
• Partitions are relatively static.
• Logical Volume Manager (LVM) provides flexibility.
• Easy resizing of volumes
• Creating multi-device volumes
• Implementing RAID technology
• Live replacing of failing devices
• Virtual Data Optimizer (VDO) provides thin provisioning on top of LVM,
using the kvdo and uds kernel modules.
• Stratis is another solution that provides thin provisioning using the Stratisd
user space daemon.
Click to edit Master title style

Lesson 6: Storage Devices

6.6 Managing LVM Logical Volumes


Click to edit Master title style

Lesson 6: Storage Devices

6.7 Using LVM Features


Understanding LVM Snapshots
Click to edit Master title style
• An LVM snapshot takes a "picture" of the current state of the data.
• This is useful as it allows making a backup of files in a stable state, instead
of files that are open (which may fail in backups).
• Snapshots do not replace a backup, and should be removed after serving
their purpose.
• When no files have changes, files in the snapshot point to the original file
blocks, meaning that the snapshot can be really small.
• After files change, the original file blocks will be stored in the snapshot,
where the new file blocks are stored in the original volume.
• As a result, snapshots will grow over time.
• While creating snapshots, ensure that it is big enough to store all predicted
file changes.
Demo: Using LVM Snapshots
Click to edit Master title style
• This demo assumes the required LVM type partitions are already created
• vgcreate vgdata /dev/sda3
• lvcreate -n lvdata vgdata -l 90%FREE
• mkfs.ext4 /dev/vgdata/lvdata
• mount /dev/mapper/vgdata-lvdata /mnt
• cp /etc/b* /mnt
• lvcreate -s -n lvdatasnap /dev/vgdata/lvdata -l 5%FREE
• rm /mnt/b*
• mkdir /snap
• mount /dev/vgdata/lvdatasnap /snap
• ls /snap
Demo: Using LVM Snapshots
Click to edit Master title style
• umount /snap
• lvremove /dev/vgdata/lvdatasnap
Backing up VG Configuration
Click to edit Master title style
• The vgcfgbackup command can be used to backup volume group
configuration.
• Such a backup can be useful if a logical volume was accidentally removed.
• Use vgcfgrestore to revert to the original data.
Demo: Using vgcfgbackup
Click to edit Master title style
• This demo requires some LVs in vgdata
• lvs vgdata
• vgcfgbackup -f /tmp/vgbackup-$(date +%d-%m-%y)
• cat /tmp/vgbackup[Tab]
• lvremove /dev/vgdata/lvdata
• lvs
• vgchange -a n vgdata
• vgcfgrestore -f /tmp/vgbackup[Tab] vgdata
• vgchange -a y vgdata
• lvs
Click to edit Master title style

Lesson 6: Storage Devices

6.8 Device Mapper


Understanding Device Mapper
Click to edit Master title style
• Several advanced storage features are required in the Linux operating
system.
• cache
• encryption
• raid
• snapshot
• thin provisioning
• mirrorring
• These features are provided by the Device Mapper.
• Device Mapper maps physical block devices to virtual block devices, which
are used by upper-layer systems such as LVM.
Demo: Exploring DM devices
Click to edit Master title style
• lvs
• ls -l /dev/mapper/<vgname>-<lvname> /dev/vgname/lvname
• ls -l /dev/dm*
• dmsetup ls
• dmsetup info
Click to edit Master title style

Lesson 6: Storage Devices

6.9 Manually Creating Device Mapper


Storage
Understanding dmsetup
Click to edit Master title style
• dmsetup can be used to list and manage Device Mapper storage.
• Use dmsetup ls to get an overview of currently existing Device Mapper
storage, it will show devices created by LVM, Luks and more (if existing).
• Use dmsetup create to create devices without using an upper-layer device
manager.
Using dmsetup
Click to edit Master title style
• For common usage, using dmsetup to create devices is not recommended.
• The dmsetup create command can be used to create a new device on
sectors that are not yet used by any other device.
• To create a Device Mapper device with dmsetup, use the following
command:
• dmsetup create <devicename> --table '0 <block count> linear <source-device>
<start-block>'
• Notice that blocks are expressed as 512 byte units.
• Devices created this way are not persistent, include the command in the
startup procedure or manually run it again to get access to the devices.
• After creating, a block device file is created in /dev/mapper/.
• Use this block device file to create a filesystem and mount the device.
Demo: Manually Creating Device Mapper Storage
Click to edit Master title style
• Make sure a complete hard disk is available (assuming /dev/sdb)
• lsblk
• dmsetup create mydevice --table '0 419304 linear /dev/sdb 0'
• dmsetup ls
• mkfs.ext4 /dev/mapper/mydevice
Click to edit Master title style

Lesson 6: Storage Devices

6.10 LVM and VDO


Understanding VDO
Click to edit Master title style
• Virtual Data Optimizer (VDO) provides thin provisioning on top of LVM
logical volumes by using data deduplication and compression.
• LVM also provides thin provisioning, but VDO provides more efficient
algorithms.
• By using VDO on top of LVM, it is easy to increase the size of the underlying
volumes while running out of physical storage.
• While creating VDO logical volumes, a minimal physical size of 5GiB is
required.
• While creating a filesystem on top of a VDO logical volume, the nodiscard
option must be used.
• mkfs.ext4 -E nodiscard ...
• mkfs.xfs -K ...
Demo: Creating VDO Type Logical Volumes
Click to edit Master title style
• this demo requires a 10GiB LVM type partition
• vgcreate vgstorage /dev/sda7
• lvcreate --type vdo -L 5G -V 100G vgstorage/vdopool0
• mkfs -E nodiscard /dev/vgstorage/lvol0
• df -h | grep lvol0
Click to edit Master title style

Lesson 6: Storage Devices

6.11 Stratis
Understanding Stratis
Click to edit Master title style
• Stratis volumes always use the XFS filesystem.
• Stratis volumes are thin provisioned by nature.
• Volume storage is allocated from the Stratis pool.
• Each Stratis volume needs a minimal size of 4GiB.
• Because of the thin provisioning, Stratis tools must be used to monitor
available storage space.
Managing Stratis
Click to edit Master title style
• To work with Stratis, you'll need the stratisd and stratis-cli packages.
• Use the stratis command to create pools and filesystems.
• This command has awesome tab completion.
• While working with Stratis, use stratis pool list to ensure the pool has
sufficient space remaining.
• To mount Stratis volumes:
• Use UUID.
• Include x-systemd.requires=stratisd.service as a mount option.
Demo: Managing Stratis Volumes
Click to edit Master title style
• dnf install stratis-cli stratisd
• systemctl enable --now stratisd
• stratis pool create mypool /dev/sdb
• stratis pool list
• stratis pool add-data mypool /dev/sdc
• stratis blockdev list
• stratis fs create mypool myfs
• mkdir /myfs
• lsblk --output=UUID /dev/stratis/mypool/myfs >> /etc/fstab
• Edit /etc/fstab to include:
• UUID=d8ff... /myfs xfs defaults,x-systemd.requires=stratisd.service 0 0
Click to edit Master title style

Lesson 6: Storage Devices

6.12 Creating Encrypted Devices


Understanding LUKS
Click to edit Master title style
• Linux Unified Key Setup (LUKS) is the standard way for creating encrypted
devices.
• It encrypts a complete device, resulting in a new device mapper device.
• This device mapper device needs to be opened, after which a filesystem
can be created on that device.
• After encrypting the device, it's the device mapper device that should be
mounted, and not the original device.
• To access the encrypted device, a password must be entered manually or
automatically through /etc/crypttab.
Demo: Creating a LUKS Encrypted Device
Click to edit Master title style
• cryptsetup luksFormat /dev/sdb3
• Provide a strong password
• cryptsetup luksOpen /dev/sdb3 secret
• mkfs.ext4 /dev/mapper/secret
• mount /dev/mapper/secret ...
Automating Passkey Entry
Click to edit Master title style
• To access a LUKS encrypted volume, a passkey needs to be entered
• Manually entering the passkey is disruptive for the boot procedure
• A tang server and clevis client can be used to enter passkeys automatically
• As a result, a LUKS encrypted volume will be opened automatically when it
is booted on the network where the tang server is available
Demo: Setting up Clevis and Tang
Click to edit Master title style
• On an external server:
• dnf install -y clevis tang
• systemctl enable --now tangd.socket
• firewall-cmd --add-service http --permanent
• firewall-cmd --reload
• On the server that uses LUKS encrypted volumes
• clevis luks bind -d /dev/sda7 tang '{"url":"192.168.29.138"}'
• Enter the current passphrase and press Y
• reboot and verify that cryptsetup luksOpen /dev/sda7
/dev/mapper/secret works without entering a password
Click to edit Master title style

Lesson 6: Storage Devices

6.13 Booting from Encrypted Devices


Booting from Encrypted Devices
Click to edit Master title style
• Several Linux distributions offer an option to encrypt the entire disk while
installing.
• This adds important protection to portable devices.
• If this happens, a passphrase must be provided while booting.
• A specific initrd will be generated to provide all software to perform the
encrypted boot procedure.
Click to edit Master title style

Lesson 6: Storage Devices

Real-world Scenario: Creating a


Hidden Storage Device
Creating a Hidden Storage Device
Click to edit Master title style
• On a new device, create a 900MiB partition on the start of the device.
• Create a 1GiB partition, starting at sector 2097152; a 100MiB empty space
exists between the devices.
• dmsetup create hidden --table '0 204800 linear /dev/sdc 1845248'
• mkfs.ext4 /dev/mapper/hidden
• mkdir /hidden
• Create a script to activate the device, using the next commands:
• dmsetup create hidden --table '0 204800 linear /dev/sdc 1845248'
• mkdir /hidden
• mount /dev/mapper/hidden /hidden
• Create /root/.bash_logout, in which the device is unmounted and /hidden
is deleted.
Click to edit Master title style

Lesson 7: Filesystems

7.1 Filesystems and the VFS


Understanding VFS
Click to edit Master title style
• The Virtual Filesystem (VFS) is a generic interface that is used by several
specific filesystems to connect to the kernel.
• It provides common system calls that are needed in all filesystems
• It is used as an abstraction that hides differences between specific
filesystems
• Initially, it was introduced by Sun Microsystems in 1985 to allow the local
UFS filesystem to work in the same way as the remote NFS filesystem.
• Filesystem specific options that are not in VFS, need to be set as a
filesystem option, or addressed when mounting the filesystem
• mount -o noatime /dev/sda4 /mnt
NTS demo
Click to edit Master title style
• This demo requires an ext4 filesystem and an XFS filesystem
• mkfs.ext4 /dev/sda6
• tune2fs -l /dev/sda6 | grep ^Default
• mount -o noatime /dev/sda6 /ext4
• strace -e trace=file ls /ext4
• strace -e trace=file ls /xfs
• These show no differences!
Click to edit Master title style

Lesson 7: Filesystems

7.2 About POSIX and non-POSIX


Filesystems
Understanding POSIX
Click to edit Master title style
• The Portable Operating System Interface (POSIX) was introduced in the
1980s to provide standardization in UNIX.
• The mission was to describe how system calls should behave.
• This was important to guarantee standardization and interoperability in a
landscape where many flavors of UNIX existed.
• Linux was developed as a POSIX-compliant operating system, which made
the migration from any UNIX flavor to Linux relatively easy.
• Fun fact: Microsoft Windows NT was POSIX-compliant as well!
• Linux is mostly POSIX-compliant, for performance reasons some features
are not implemented.
POSIX Filesystem Requirements
Click to edit Master title style
• A POSIX-compliant filesystem must implement a couple of features:
• It should implement strong consistency.
• It should implement operations like random reads, writes, fsync, truncate
operations.
• File access is controlled by permissions, which are based on file ownership and
permission mode.
• These features are implemented in system calls, and guarantee that an
application can always address the filesystem in a specific way.
POSIX Filesystems
Click to edit Master title style
• POSIX filesystems implement common features.
• Inodes contain the file administration.
• Special files like device nodes and named pipes can be used.
• Directories are used for file administration.
• Hard links and symbolic links can be used.
• Filesystem metadata can be stored in the superblock.
• To ensure optimal working, you should use POSIX filesystems on Linux.
• Non-POSIX filesystems can also be used, but lack typical POSIX features.
Click to edit Master title style

Lesson 7: Filesystems

7.3 Linux Filesystem Components


Linux Filesystem Components
Click to edit Master title style
• POSIX-compliant Linux filesystems have a couple of components.
• The superblock contains filesystem metadata.
• The directory table contains filesystem names, which point to filesystem
inodes.
• Each file has an inode, which contains a list of blocks or extents that are
used to store the file data.
Demo: Showing Superblock Content
Click to edit Master title style
• On XFS filesystems:
• xfs_db /dev/sdx1
• info
• quit
• On Ext filesystems
• tune2fs -l /dev/sdx1
• debugfs /dev/sdx1
• stats
• quit
Click to edit Master title style

Lesson 7: Filesystems

7.4 Inodes and Block Allocation


Understanding Inodes
Click to edit Master title style
• When a filesystem is created, an inode table is created as well.
• Each file has one inode.
• The inode keeps all administration of a file, but not the filename.
• The filename is stored in the directory.
• Use stat <filename> to see inode information printed in a readable format.
• Use ls -i to print the inode number.
• In addition to the metadata information about files, inodes also include a
list of all blocks used by the file.
Block Allocation
Click to edit Master title style
• To store files on disk, blocks are allocated.
• Block size is a property of the filesystem; it's often set to 4KiB, but may be
set to more.
• Use xfs_info to find block size on XFS
• Use tune2fs -l to find block size on Ext4
• When files are stored, this happens on block boundaries.
• This means that a 10-byte file on a filesystem with 4KiB blocks will occupy
one complete block.
• To avoid wasting too much space this way, some filesystems allow for block
suballocation.
• Use ls -lsh filename to show the actual filesize (-l) as well as the size on disk
(-s) in a human-readable way.
Demo: Exploring Ext4 Inodes
Click to edit Master title style
• This demo requires an Ext4 filesystem mounted on /mnt
• cp /etc/hosts /mnt
• ls -il /mnt
• debugfs /dev/sda3
• stat <13>
• quit
• stat /mnt/hosts
Click to edit Master title style

Lesson 7: Filesystems

7.5 Sparse Files


Understanding Sparse Files
Click to edit Master title style
• Sparse files offer a solution to use disk space in an efficient way when the
file is partially empty.
• In a sparse file, blocks are only written to disk if the block contains real
data.
• Empty blocks are not committed to disk.
• This allows for efficient storage of disk images, snapshots, log files and
more.
• If sparse files are used, utilities must support it
• cp uses --sparse=auto as a default option to keep the sparse nature of files
• rsync -S will keep the sparse nature of files
Demo: Creating Sparse Files
Click to edit Master title style
• dd if=/dev/zero of=/sparsefile.img bs=1 count=0 seek=10G
• ls -lsh /sparsefile.img
• mkfs.xfs -b size=2048 /sparefile.img
• mount -o loop /sparsefile.img /mnt
• cp /etc/hosts /mnt
• ls -lsh /mnt/hosts
• dd if=/dev/urandom of=/mnt/bigfile bs=1M count=10
• ls -lsh /mnt
• ls -lsh /sparsefile.img
Understanding File Block Allocation
Click to edit Master title style
• When creating files, filesystems that support the fallocate() system call can
allocate blocks without actually writing anything in these blocks.
• By marking blocks as occupied in this way, files can be written much faster,
in particular when a non-sparse file contains lots of non-used space – the
alternative would be to fill the allocated blocks with zeros.
• If non-used space is allocated anyway, the fallocate command can be used
to mark the occupied blocks as free.
Demo: Using fallocate
Click to edit Master title style
• dd if=/dev/zero of=/bigfile.img bs=1M count=1024
• ls -lsh /bigfile.img
• fallocate --dig-holes /bigfile.img
• ls -lsh /bigfile.img
Click to edit Master title style

Lesson 7: Filesystems

7.6 FUSE Filesystems


Understanding FUSE
Click to edit Master title style
• FUSE is the Filesystem in USEr space.
• FUSE is useful as it allows for mounting filesystems in user space.
• This allows non-privileged users to mount filesystems.
• To work with FUSE, the fuse kernel module is used.
Demo: Using sshfs through FUSE
Click to edit Master title style
• This demo requires a remote server that is accessible through SSH
• dnf install sshfs
• sshfs -o allow_other,default_permissions
student@remotehost:/home/student /mnt/sshfs
• mount | grep ssh
• lsmod | grep fuse
Click to edit Master title style

Lesson 7: Filesystems

7.7 Next-generation Filesystems


The Need for a New Filesystem
Click to edit Master title style
• Ext4 was released in 2008.
• It is backwards compatible with Ext2, which was released in 1993.
• Because of the large increase in data use since then, a next-generation
filesystem is needed.
• Because in 2008, main Ext4 developer Ted T'so announced that Ext4 would
be followed by a next-generation filesystem; there will be no Ext5.
Exploring Next-generation Filesystems
Click to edit Master title style
• XFS is used as the default filesystem since the release of RHEL 7 in 2014.
• In RHEL 8, Red Hat pre-released Stratis, which is adding volume
management options and thin provisioning to the XFS filesystem.
• ZFS was developed by Sun Microsystems and is promising, but for legal
reasons ZFS is not officially included in the Linux kernel.
• Btrfs is the B-Tree filesystem, and was announced in 2007.
• Although Btrfs has been around for a long time, not everyone considers it a
stable filesystem yet.
• It is the default filesystem in SUSE Linux Enterprise Server since 2015, as
well as in current Fedora.
• Currently, there is no convincing next-generation filesystem yet.
Click to edit Master title style

Lesson 7: Filesystems

7.8 Running ZFS on Linux


ZFS and Linux Support
Click to edit Master title style
• ZFS (Zettabyte Filesystem) is licensed under the Common Development and
Distribution License (CDDL).
• CDDL license conditions are incompatible with GPL license terms, and for
that reason, ZFS cannot be distributed with the Linux kernel.
• Ubuntu includes ZFS using native drivers (which violates GPL).
• It is recommended to consider using ZFS alternatives that are supported on
Linux.
• Btrfs was developed as the Linux answer to ZFS.
• Stratis adds some ZFS features to the XFS filesystem.
ZFS Features
Click to edit Master title style
• Storage Pools: A ZFS filesystem is created on top of a ZFS pool, which can
span multiple devices
• Copy on Write: new information is written to a new block and once the
write is complete, filesystem metadata is updated to point to the new info
• Snapshots: these allow to freeze the current state of files while using
minimal additional disk space
• Data verification and automatic repair: files have checksums and are
automatically repaired if file content doesn't match the checksum
• RAID-Z: a ZFS alternative to RAID 5
• Maximum 16EiB file size
• Virtually unlimited storage size
Demo: Setting up ZFS on Ubuntu Server
Click to edit Master title style
• Make sure two unused disks are available
• sudo apt install zfsutils-linux
• zpool --version
• sudo zpool create mypool /dev/sdb /dev/sdc
• sudo zfs list
• sudo zfs get compression mypool
• sudo zfs set compression=on mypool
• mount | grep mypool
Click to edit Master title style

Lesson 7: Filesystems

7.9 Running Btrfs


Understanding Btrfs
Click to edit Master title style
• Btrfs is the B-tree filesystem.
• It offers advanced features, such as volumes and Copy on Write (CoW).
• Because "Butter comes from a CoW", according to the main developer,
Btrfs should be pronounced as "ButterFS”.
• Btrfs is the default filesystem on Fedora Linux.
• ZFS, Stratis, and LVM are providing features that are in Btrfs.
• Btrfs provides subvolumes: directories that can be provided with properties
that normally are set on storage devices.
Using Btrfs Tools
Click to edit Master title style
• btrfs filesystem show shows current Btrfs file systems
• btrfs filesystem label shows Btrfs labels
• btrfs subvolume list shows current subvolumes
• btrfs subvolume create /somedir creates a new subvolume
• useradd linda --btrfs-subvolume-home creates a Btrfs-based user home
directory
Using Snapshots
Click to edit Master title style
• Btrfs comes with embedded snapshot functionality.
• Making a snapshot allows users to freeze the current state of files.
• This is useful before performing tests, if desired it is easy to revert to the
previous state.
• Snapshots are created on subvolumes, using btrfs subvolume snapshot
/files /files/snapshots
• btrfs subvolume list shows subvolume IDs.
• btrfs subvolume set-default 258 /files allows you to set a snapshot ID as
the new mount point for a directory.
Demo: Using Btrfs
Click to edit Master title style
• mkfs.btrfs /dev/sdd
• mount -t btrfs /dev/sdd /mnt/butter
• btrfs subvolume show /mnt/butter
• btrfs subvolume create /mnt/butter/milk
• cp /etc/hosts /mnt/butter/milk
• btrfs subvolume snapshot /mnt/butter/milk /mnt/butter/milk/snapshot
• rm /mnt/butter/milk/hosts
• btrfs subvolume list /mnt/butter/milk
• btrfs subvolume set-default <snapshot-id> /mnt/butter/milk
• umount /dev/sdd
Demo: Using Btrfs
Click to edit Master title style
• mount /dev/sdd /mnt/butter
• tree /mnt/butter
Click to edit Master title style

Lesson 7: Filesystems

7.10 Using the Ext Filesystem


Debugger
Using the Ext Filesystem Debugger
Click to edit Master title style
• debugfs (AKA debuge2fs) can be used to perform advanced operations on
Ext filesystems.
• Think of printing the filesystem administrative data in the superblock,
dumping inodes, listing blocks used by specific files and more.
• To get more information from the Ext filesystem superblock, consider using
tune2fs -l or dumpe2fs.
Demo: Using debugfs
Click to edit Master title style
• This demo assumes the Ext formatted sdb3 device is mounted on /mnt
• cp /etc/hosts /mnt
• ls -i /mnt/hosts # assuming the inode number is 13
• umount /mnt
• debugfs /dev/sdb3
• stats
• stat <13>
• dump <13> copiedfile
Recovering Deleted Files on Ext4
Click to edit Master title style
• In Ext3 filesystems, the lsdel command could be used to list recently
deleted inodes.
• Next, the dump <inode> <filename> command could be used to recover
recently deleted files.
• This doesn't work anymore in Ext4.
• Use extundelete in Ext4 to undelete files.
• Notice that success is NOT guaranteed!
Demo: Trying to Recover Files
Click to edit Master title style
• This demo assumes a Ext4 formatted device /dev/sdb5
• mount /dev/sdb5 /mnt
• cp /etc/d* /mnt/
• sync
• rm -f /mnt/d*; sync
• umount /mnt
• fsck /dev/sdb5
• extundelete /dev/sdb5 --restore-all
• mount /dev/sdb5 /mnt
• ls /mnt
Click to edit Master title style

Lesson 7: Filesystems

7.11 Managing XFS UUIDs


Managing XFS UUIDs
Click to edit Master title style
• Each XFS filesystem gets a UUID.
• This UUID is supposed to be unique.
• If a duplicate UUID is detected, mounts will fail.
• Duplicate UUIDs may occur after cloning devices, or while trying to mount
a snapshot of an XFS filesystem.
• To fix this problem, you may use the -o nouuid option with the mount
command.
• Otherwise, you can use xfs_admin -U to set a new UUID on a filesystem.
Demo: Fixing Duplicate XFS UUIDs
Click to edit Master title style
• This demo requires you have created two partitions with a similar size
• mkfs.xfs /dev/sdb1
• mount /dev/sdb1 /mnt
• cp /etc/a* /mnt 2>/dev/null
• dd if=/dev/sdb1 of=/dev/sdb2 bs=1M
• mkdir /dup
• mount /dev/sdb2 /dup # will fail
• dmesg # shows why
• blkid
Demo: Fixing Duplicate XFS UUIDs
Click to edit Master title style
• mount -o nouuid /dev/sdb2 /dup
• umount /dup
• xfs_admin -U $(uuidgen) /dev/sdb2
• mount /dev/sdb2 /dup
Click to edit Master title style

Lesson 9: Memory
Management
9.1 Linux Memory Allocation: Virtual
vs. Physical Memory
Understanding Terminology
Click to edit Master title style
• "Virtual Memory" in Linux can be understood in two ways:
• Virtual Memory as the total addressable memory that is provided by the CPU
architecture that is used
• Virtual Memory as the sum of RAM and Swap
• In this course, "Virtual Memory" is the total addressable memory that is
provided by the CPU architecture
• In this course, the sum of physical RAM and emulated RAM (swap) is
referred to as "physical memory"
Understanding the Virtual Address Space
Click to edit Master title style
• When a process loads, it creates a virtual address space.
• This virtual address space contains virtual memory offsets that are private
to the process.
• When the process requests physical memory access, the kernel maps the
physical address of a memory page to the virtual address in the virtual
memory address space used by that process.
• The size of the available virtual address space is defined by the CPU
architecture.
Virtual Address Space Size
Click to edit Master title style
• Theoretically, a 64-bits CPU can address 2(64) bits of memory which
corresponds to 16EiB.
• The address sizes in /proc/cpuinfo define how many bits can be used for
addressing virtual memory on this specific CPU architecture.
• You may find a value of 45 bits physical and 48 bits virtual, meaning that
this specific CPU can address 256TiB of virtual memory and 64TiB of
physical memory.
• The VmallocTotal parameter in /proc/meminfo indicates the total amount
of virtual memory that can be used.
Click to edit Master title style

Lesson 9: Memory
Management
9.2: Cache
Understanding Cache
Click to edit Master title style
• Cache is temporary storage for faster data access.
• Cache is used at different levels:
• Internet proxy cache: speeds up fetching data from the Internet
• Disk cache: speeds up fetching data from disk
• CPU L3 cache: slower cache to buffer data the CPU has recently used
• CPU L2 cache: used to buffer data the CPU has recently used
• CPU L1 cache: CPU cache used to buffer commonly used CPU instructions
• For Linux optimization, disk cache is important.
• Monitoring CPU cache is not very interesting as, by design, it will always be
filled for at least 90%.
Understanding Linux Disk Cache
Click to edit Master title style
• When files are read from disk, they are stored in the buffer/cache area of
memory.
• Linux keeps these files in cache as long as possible.
• Eventually, the least recently used files in cache will expire.
• If memory is needed for more urgent needs, cached files will be discarded.
• The Linux kernel auto-tunes cache usage based on its knowledge of active
and inactive memory.
• Common utilities like top and free give an overview of current cache usage.
Page cache, Dentries, and Inodes
Click to edit Master title style
• In the cache area of memory, 3 types of data are stored:
• Page cache is a generic cache that maps to any type of block storage on disk
• Dentries represent a directory structure
• Inodes represent the files
• Page cache is used as a read cache, but also as a write cache.
Understanding Write Cache
Click to edit Master title style
• Write cache consists of modified files that are stored in cache to make disk
writes more efficient.
• This write cache is referred to as buffer cache.
• Keeping files in buffer cache as long as possible makes file writes more
efficient.
• The sysctl Dirty Cache parameters can be used to optimize the write cache.
Click to edit Master title style

Lesson 9: Memory
Management
9.3: Active and Inactive Memory
Understanding Active and Inactive Memory
Click to edit Master title style
• Linux keeps track of active and inactive memory.
• Active memory is memory that has been recently used; inactive memory
hasn't been used recently.
• Active/Inactive statistics are kept for cache (File) as well as application
(Anonymous) memory.
• Current usage statistics can be obtained from /proc/meminfo.
• When memory shortage occurs, the kernel considers inactive memory.
Active/Inactive File Memory Use
Click to edit Master title style
• When memory shortage occurs, the kernel will drop inactive file memory.
• This is an automatic process.
• Degraded performance will occur if active file memory is dropped as well
• To manually drop caches, a value can be written to vm.drop_caches
• 1: will drop page cache only
• 2: drops dentries and inodes
• 3: drops page cache as well as dentries and inodes
Active/Inactive Anonymous Memory Use
Click to edit Master title style
• Inactive Anonymous Memory shouldn't be stored in physical RAM.
• To deal with inactive anonymous memory, the Linux kernel can use swap.
• On memory shortage, inactive anonymous memory will be moved to swap
first.
• On serious memory shortage, active anonymous memory will be moved to
swap as well.
• This is bad for performance, as the system will lose significant time moving
data to and from swap.
Click to edit Master title style

Lesson 9: Memory
Management
9.4 The Need to Swap
Click to edit Master title style

Lesson 9: Memory
Management
9.5 Configuring and Monitoring Swap
Space
How Much Swap is Needed?
Click to edit Master title style
• There is no uniform answer to this question!
• On systems, with less than 1GB of RAM, the recommendation is to allocate
twice the amount of RAM as swap.
• On systems with more than 4GB, having 25% of RAM available in swap is
often enough.
• Some applications have their own recommendation.
• Some applications don't work well if swap is enabled.
Monitoring Swap Usage
Click to edit Master title style
• free and top give an overview of current swap usage.
• More important is to see how actively swap is being used.
• First, compare swap usage to inactive anonymous memory. If more swap is
used than inactive anonymous memory, you might have an issue.
• Also, monitor si and so in vmstat 2 20 output, it gives an impression of how
actively swap is being used.
Demo: Adding Swap Space
Click to edit Master title style
• free -m
• df -h
• dd if=/dev/zero of=/swapfile bs=1M count=1024
• chmod 0600 /swapfile
• mkswap /swapfile
• swapon /swapfile
• echo 80 > /proc/sys/vm/swappiness
• free -m
Click to edit Master title style

Lesson 9: Memory
Management
9.6 Managing Huge Pages
Understanding Huge Pages
Click to edit Master title style
• Default memory pages have a size of 4096 bytes.
• As a result, an application that needs 4GiB RAM, needs to administer
1,000,000 memory pages which causes a large overhead.
• To allocate memory in a more efficient way, huge pages can be used.
• Huge pages can have different sizes, but 2MiB is common.
• Check /proc/meminfo for more details: grep -i huge /proc/meminfo.
Reserving Huge Pages
Click to edit Master title style
• To work with huge pages, they need to be pre-allocated.
• Use the vm.nr_hugepages sysctl tunable to specify how many huge pages
you want to reserve.
• Memory that is reserved for huge pages cannot be used for anything else
anymore, and will show as used memory in free output.
• To allocate huge pages, enough contiguous free memory must be available.
• If the kernel cannot find enough memory that qualifies, it will allocate as
many huge pages as it can.
Using Huge Pages
Click to edit Master title style
• To use huge pages, processes need to use one of the following system calls
to request memory.
• mmap()
• shmat()
• shmget()
• If a process uses mmap(), the hugetlbfs filesystem must be mounted.
• Ubuntu as well as Red Hat-family distributions make this mount
automatically.
• If necessary, mount hugetlbfs yourself:
• mkdir /huge
• mount -t hugetlbfs none /huge
Demo: Using Huge Pages
Click to edit Master title style
• echo 500 > /proc/sys/vm/nr_hugepages
• grep -e vmx -e svm /proc/cpuinfo
• sudo systemctl enable --now libvirtd
• wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-
cloudimg-amd64-disk-kvm.img
• virsh net-list --all
• virsh net-start default
• virt-install --disk /tmp/jammy-server-cloudimg-amd64-disk-kvm.img --
memory 512 --osinfo detect=on,require=off --name myvm --boot hd
• grep -i huge /proc/meminfo # no huge pages!!
Demo: Using Huge Pages
Click to edit Master title style
• virsh destroy myvm
• virsh edit myvm # insert the following:
<memoryBacking>
<hugepages/>
</memoryBacking>
• virsh start myvm
• grep -i huge /proc/meminfo
Click to edit Master title style

Lesson 9: Memory
Management
9.7 Managing Dirty Cache
Understanding Dirty Cache
Click to edit Master title style
• Before committing a file write to disk, it is kept in the buffer cache for a
while.
• This allows multiple writes to be collected, and thus make more efficient
writes.
• A longer time to collect files to be written will result in more efficient
writes.
• If, however, the system fails while file writes have not been committed yet,
the writes might get lost.
• Storage hardware may provide solutions to prevent file loss on an
unexpected system outage.
• The Dirty Cache sysctl tunables can be tweaked to prevent losing files this
way.
Click to edit Master title style

Lesson 9: Memory
Management
9.8 Out of Memory (OOM) and
Dealing with it
Understanding Overcommitting
Click to edit Master title style
• To deal with memory efficiently, the Linux kernel allows memory
overcommitting.
• Memory overcommitting allows the sum of all virtual memory that is
claimed by applications to be bigger than the total amount of available
physical memory.
• As long as the mapped physical memory doesn't exceed the available
physical memory, this shouldn't create any issues.
• This does involve the risk that the kernel has overcommitted too much, and
has to deliver more memory than physically available.
• If that happens the OOM killer is invoked.
Managing Overcommitting
Click to edit Master title style
• By default, the vm.overcommit_memory sysctl tunable is set to value 0,
which uses a heuristic overcommitting algorithm.
• With this setting, a request that obviously exceeds possible memory
allocation will be denied, all other overcommitting is allowed.
• The Committed_AS parameter in /proc/meminfo indicates the total
amount of physical memory that is required given the current workload,
and is a good indicator to predict issues.
Managing Overcommitting
Click to edit Master title style
• As an alternative, the vm.overcommit_memory setting can be configured
differently:
• 1: the kernel will always grant overcommit requests, even if they're obviously
not reasonable.
• 2: the kernel limits per process overcommit requests to the amount of swap
space plus a percentage of RAM (50% by default).
• When set to 2, the vm.overcommit_ratio tunable can be set to define the
percentage of RAM.
• So when set to 2, with an overcommit_ratio of 50, 4GB of RAM and 8GB of
swap the maximum overcommit size is set to 10GB.
• CommitLimit in /proc/meminfo shows what is available
Demo: Investigating Overcommits
Click to edit Master title style
• Use top, sort output on the VIRT column and find the process with the
highest VIRT allocation (on a GUI system this will be gnome-shell).
• Add vm.overcommit_memory 2 to /etc/sysctl.conf.
• Verify available RAM and swap space, and calculate the amount needed so
that (50% RAM + swap) is less than the highest VIRT allocation you found in
top.
• Restart your system, if necessary use the Grub kernel argument mem=nG
to present only n GiB of RAM.
• If you were running a GUI system, you'll notice that gnome-shell can't start.
• Reboot with normal settings. Check the total VIRT allocated in top.
Understanding OOM
Click to edit Master title style
• When no memory can be allocated for a process, the OOM killer is
triggered.
• It will kill a process to free resources for the memory allocating tasks.
• To do so, the process OOM score is involved.
• All processes have an OOM score, and the process with the highest OOM
score will be killed.
• The OOM score is set as 10 times the percentage of memory used: if a
process uses 50% memory, the OOM score is 500.
• Current process OOM scores can be read from /proc/<pid>/oom_score.
Changing the OOM Score
Click to edit Master title style
• Based on the current OOM score, you can predict which process will be
terminated in an OOM situation.
• To change the default calculation, you can adjust the current score by
writing a value to /proc/<pid>/oom_score_adj: echo -200 >
/proc/<pid>/oom_score_adj will lower the current OOM score of <pid>
with 200
• top provides an easy way to monitor the current OOM score:
• Press f, select OOMs, press q to get back to the top main display.
Demo: Managing OOM Scores
Click to edit Master title style
• Start top, press f and toggle the OOMs field on. Press q to get back to the
top main screen.
• Check the current OOM scores, if you're running a GNOME shell, the
highest OOM score is probably set for the gnome-shell process.
• Use echo f > /proc/sysrq-trigger, which will trigger a full OOM kill.
• If you were running a GNOME environment, the current GNOME session is
probably killed and you'll have to log in again.
• Use dmesg to investigate the OOM event that has been triggered.
• Use echo -400 > /proc/$(pidof gnome-shell/oom_score_adj
• Check current OOM scores.
• Repeat the echo f > /proc/sysrq-trigger and observe what happens.
Click to edit Master title style

Lesson 9: Memory
Management
9.9 Analyzing Kernel Memory
Understanding Slab
Click to edit Master title style
• Slabs are small segments of memory that are allocated by the Linux kernel.
• Slab usage can be significant, check Slab in /proc/meminfo to see how
much memory is used for slabs on a system.
• The slabinfo command provides a list of slabs and how much memory is
used by each slab.
• Use the slabtop utility to get detailed information about the various slabs,
which will be updated while slabtop is running.
• To optimize slab usage, different kernel tunables can be used.
Click to edit Master title style

Lesson 10: Processes

10.1 How a Process is Created


Forking Processes
Click to edit Master title style
• A new process is created as a child of the current process.
• To do so, the current process creates an exact copy of itself, a new unique
process ID (PID) is assigned, and the code of the child process is loaded.
• This is done by the fork() system call.
• Next, execve() is used to replace the copied code with the new process
code
Using execve()
Click to edit Master title style
• In some cases, fork() cannot be used.
• As an alternative, execve() can be used without forking the parent process.
• The execve() system call replaces the current process with the new
process.
• By doing so, the new process gets the PID of the old process, and no
parent/child relationship is created.
Using exec
Click to edit Master title style
• The bash internal exec uses the execve() system call
• This can be useful for instance when a system has been booted with the
init=/bin/bash kernel argument.
• If that has happened, /bin/bash takes PID 1.
• From such an environment, you cannot just fork the systemd process, as it
would become a child of /bin/bash.
• The solution is to use exec /usr/lib/systemd/systemd, which replaces the
current bash process with the systemd process, which would get PID 1.
Click to edit Master title style

Lesson 10: Processes

10.2 Processes and Threads


Processes vs. Threads
Click to edit Master title style
• A process is an independent, self-contained execution environment that
has its own memory space.
• Each process has its own address space, file descriptors, and other
resources.
• Inter process communication needs to consider all of these resources, and
for that reason is relatively complex.
• Processes can create threads to run tasks that require concurrent
execution.
• Threads share the same memory space as the parent process.
• Threads are commonly used in web servers and database systems.
• The kernel scheduler is responsible for managing threads and processes.
• Threads and processes are scheduled in the same way.
Processes vs. Threads
Click to edit Master title style
• It's up to the application developer to decide whether threads or processes
are used.
• If applications require separate memory spaces for tasks, processes are
better, as each process gets its own memory space.
• If tasks need to share data, threads are a better choice as they share the
same memory space.
• If task isolation is important, processes should be used as no shared
resources are used.
• If tasks need to be synchronized, threads are better as shared resources are
used.
Demo: Showing Threads
Click to edit Master title style
• Start top, use f to show display options and select the nTH option to show
the number of threads for each process.
• Use ps -T -p <pid> to show thread information for any multi-threaded
process that you found. Notice the difference between PID and SPID,
where SPID is the thread ID.
• Use top -H to see a line for each thread.
• Threads also show in /proc/<pid>/task/
Demo: Showing Threads in htop
Click to edit Master title style
• On Red Hat related distributions
• dnf install epel-release
• dnf install htop
• Start htop
• Press F5. You'll see processes including child processes (black) and threads
(green) in a tree view.
Click to edit Master title style

Lesson 10: Processes

10.3 Killing a Zombie


Understanding How Processes are Stopped
Click to edit Master title style
• Processes can voluntarily stop, using the exit() system call.
• Processes involuntarily stop when they receive a signal.
• Signals can be sent by other users, other processes, or Linux itself.
• When a process is stopped, it issues the exit() system call.
• Next, the Linux kernel notifies this process parent by sending the SIGCHLD
signal.
• The parent next executes the wait() system call to read the status of the
child process as well as its exit code.
• While using wait(), the parent also removes the entry of the child process
from the process table.
• Once that is done, the process is gone
Understanding Zombies
Click to edit Master title style
• A zombie process is a process that has completed its execution, but still has
a PID in the process table
• A child process gets the status of zombie when the communication
between child and parent on child exit() is disturbed.
• This may happen if the parent doesn't execute the wait() system call.
• It may also happen when the parent doesn't receive the SIGCHLD signal from
the child..
• In ps aux output, zombie processes will show as <defunct>.
Understanding Orphan Processes
Click to edit Master title style
• An orphan process is an active process which parent process is finished
• Upon termination of the parent process, active child processes are
normally adopted by the init process
• This ensures that the child process is getting properly reaped when it exits
and doesn't become a zombie
Demo: Analyzing Orphan Processes
Click to edit Master title style
• sleep 3600 &
• sleep 7200 &
• ps faux # find the PPID of the sleep processes
• kill <PPID> # will fail
• kill -9 <PPID>
• ps faux # will show that sleep processes are adopted by systemd
Cleaning up Zombies
Click to edit Master title style
• A zombie cannot be killed because it is already dead.
• There are a few workarounds though:
• Use kill -s SIGCHLD <ppid> to send the SIGCHLD signal to the parent PID
• Kill the parent PID, which will cause the zombie process to be adopted by init so
that it can be reaped
• Reboot the system
• Or just wait, on occasion zombies go away by themselves
Demo: Cleaning up Zombies
Click to edit Master title style
• From the course Git repository https://github.com/sandervanvugt/luth, run
./zombie &
• Verify creation of the zombie using ps aux | grep <defunct>
• Try killing it - this won't work
• Find the PPID: ps -A -ostat,pid,ppid | grep Z; the PPID is in the third column
of output
• Use kill -s SIGCHLD <ppid> to send SIGCHLD to the PPID
• Verify the zombie is gone: ps aux | grep <defunct> and remember this
won't work in all zombie cases.
• If it isn't, kill the PPID: kill -9 <ppid>
Click to edit Master title style

Lesson 10: Processes

10.4 Priorities, Schedulers and Nice


Values
Understanding the Process Scheduler
Click to edit Master title style
• Linux uses schedulers at different levels.
• The process scheduler runs processes in realtime or normally.
• The I/O scheduler defines how I/O requests are handled.
• Currently Linux (kernel version 6) uses the Complete Fair Scheduler (CFS) as
the default scheduler.
• Within this scheduler, different policies can be used to define how tasks are
handled.
• The chrt command can be used to determine which scheduler policy to
use.
Understanding Scheduler Policies
Click to edit Master title style
• SCHED_OTHER: the default Linux time-sharing scheduling policy which is
used by most processes.
• SCHED_BATCH: a non-realtime scheduling policy that is designed for CPU-
intensive tasks.
• SCHED_IDLE: a policy which is intended for very low prioritized tasks.
• SCHED_FIFO: a real-time scheduler that uses the First In-First Out
algorithm.
• SCHED_RR: a real-time scheduler that uses the Round Robin scheduling
algorithm.
Using chrt
Click to edit Master title style
• Use chrt -p <pid> to find the current scheduler policy a process is using.
• chrt command line options can be used to handle a process by a different
scheduler:
• -o: SCHED_OTHER
• -b: SCHED_BATCH
• -i: SCHED_IDLE
• -f: SCHED_FIFO
• -r: SCHED_RR
Demo: Using SCHED_IDLE
Click to edit Master title style
• chrt --max
• dd if=/dev/zero of=/dev/null &
• chrt -i -p 0 $(pidof dd)
• chrt -r 50 dd if=/dev/zero of=/dev/null &
• top
Understanding Priority and Niceness
Click to edit Master title style
• While scheduling on the RR or FIFO realtime schedulers, a priority between
1 and 99 can be set to determine priority.
• While scheduling using the non-realtime schedulers, the process nice value
can be adjusted to tweak priority.
• Nice values range from -20 to 19.
• Negative nice values require root privileges and set a higher priority.
• Positive nice values set a lower priority.
Demo: Changing Nice Values
Click to edit Master title style
• dd if=/dev/zero of=/dev/null & (repeat 3 times)
• renice -5 $(pidof dd | awk '{ print $1 }')
• renice 10 $(pidof dd | awk '{ print $2 }')
• top # you may not see result
• echo 0 > /sys/bus/cpu/devices/cpu1/online # repeat for all CPUs except
cpu0
• top
• chrt -r 50 dd if=/dev/zero of=/dev/null &
• top
Click to edit Master title style

Lesson 10: Processes

10.5 Interprocess Communication,


Sockets, Pipes, and More
Understanding Inter Process Communication
Click to edit Master title style
• Inter process communication (IPC) defines how processes can
communicate, using shared data, without intervention of the operating
system kernel.
• A main goal of IPC is to reduce the functionality that is provided by the
kernel.
• IPCs are focused on communication on the same operating system
platform.
• Different solutions for IPC exists (see next slide).
• The different IPC solutions can be seen in the /proc/PID pseudo filesystem.
Approaches for IPC
Click to edit Master title style
• Shared file- / memory-based access; doesn't work well with large amounts
of data.
• A pipe opens a data channel that processes can use to communicate.
• Unnamed pipes are defined on the command line: ls | less
• Named pipes use a file as the pipe and can be bi-directional
• Message Queues: like pipes, but with re-ordering if data is received out of
order. Can be used by applications like rsyslogd.
• Sockets are using streams to enable IPC either using networking, or
without networking being involved.
Approaches for IPC
Click to edit Master title style
• Signals offer an OS interface to communicate with a process.
• Remote Procedure Calls allow processes to call other processes on the
same or remote systems.
• Semaphores are typically used for controlling access to resources by
multiple processes
• D-Bus is commonly used in desktop environments.
Demo: Using Named Pipes
Click to edit Master title style
• On terminal one:
• mkfifo myfifo
• cat myfifo
• On terminal two:
• cat > myfifo
• hello world
• Ctrl-C
• On terminal one:
• ls -l
• unlink myfifo
Remote Procedure Calls
Click to edit Master title style
• Remote Procedure Call (RPC) is used in distributed computing to execute
subroutines in different address spaces.
• RPC is developed for client-server communication.
• It is an early solution that was introduced in the 1970s and implemented in
the NFS protocol.
• With the rise of the Internet and HTTP-based communication, RPC became
less common.
• (NTS: make difference between networked and local IPC)
Click to edit Master title style

Lesson 10: Processes

10.6 Communicating on the D-Bus


Message Interface
Understanding D-Bus
Click to edit Master title style
• D-Bus is a method for IPC originating from the GNOME and KDE desktop
environments.
• D-Bus provides a software bus abstraction to which all processes connect.
• D-Bus provides one virtual channel, which no longer requires point-to-point
relations between processes.
• Typically, multiple buses are used.
Applications and D-Bus
Click to edit Master title style
• To work with D-Bus, the application needs to be made D-Bus aware.
• D-Bus aware applications expose uniform interfaces, provided by the object
paths.
• Code for a D-Bus aware application can be written in many languages,
including C/C++, Python, Java, Perl, and more .
• dbus-send can be used from Linux CLI:
• dbus-send --system --print-reply \ --dest=org.freedesktop.NetworkManager \
/org/freedesktop/NetworkManager \ org.freedesktop.DBus.Properties.Set \
string:"org.freedesktop.NetworkManager" \ string:"NetworkingEnabled" \
variant:boolean:false
Using D-Bus
Click to edit Master title style
• Applications communicate to other applications using a specific object path
like "/org/freedesktop/sample/object/name”.
• D-Bus is implemented as a systemd dbus-daemon that allows for
communication with the D-Bus kernel event layer.
• The daemon is accessible by systemd paths and sockets.
• D-Bus offers support for different programming languages.
Managing D-Bus
Click to edit Master title style
• Typically D-Bus is managed by using API calls from your applications or
scripts.
• A few commands work from the shell and give good insight:
• busctl list lists bus names.
• dbus-send allows you to send commands to specific object paths.
Click to edit Master title style

Lesson 10: Processes

10.7 Monitoring IPC Usage


Monitoring IPC Usage
Click to edit Master title style
• ipcs is giving an overview of all System V IPCs (limited)
• ipcs -l shows current limits on your IPCs
• lsof -p <PID> shows open files and if it's an IPC, lists the IPC type
• strace -p <PID> shows all system calls, which includes IPC specific
information as well
• socket
• connect
• bind
• shmget
• shmat
• msget
• msgrcv
NTS: demo
Click to edit Master title style
• lsof -p <PID> will show different IPC types in use
• /proc/sysvipc contains interfaces for the system V IPCs
• /proc/<pid>/fd shows different types of IPC that are related to files
• strace -p PID can show specific IPC also
• socket, connect, bind: for sockets
• shmget, shmat: for shared memory access
• msgget, msgrcv, msgsnd: for message queues
Click to edit Master title style

Lesson 11: Linux Commands


and How they Work
11.1 Exploring What Happens When
Commands are Executed
Understanding Command Execution
Click to edit Master title style
• When commands are executed, different things need to happen:
• The command needs access to specific system functions.
• The command needs to create an execution environment.
• It needs to load in memory.
• It requests access to system resources like files.
• It gets specific feature access through libraries.
• It needs to load library functions.
• All of these are provided by different Linux system components.
How Linux Runs Commands
Click to edit Master title style
• System calls provide access to key Linux functions.
• The fork() and execve() system calls are used to create the execution
environment.
• Virtual memory is allocated to run the command, and parts are loaded in
resident memory when needed.
• The command is executed in system space or user space.
• When executed in user space, permissions to access system calls are
checked.
• The command places pointers to required libraries in memory.
• From the program code, specific library calls are issued.
Understanding task_struct
Click to edit Master title style
• From the process point of view, it's the only thing running, and it has access
to all memory and CPU
• This is also referred to as the Linux "virtual machine model" (which has
nothing to do with hypervisor based virtualization)
• The kernel maintains this illusion in the task_struct of the Linux kernel
process table
Understanding task_struct
Click to edit Master title style
• task_struct is the backbone of the Linux kernel management system
• The task_struct includes various fields that hold process related
information, this shows in the different subdirectories in /proc/<PID>/
• When the scheduler switches the CPU from one process to another
(context switching), it saves the current state of the CPU in its task_struct
and loads the task_struct of the new process
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.2 System Space and User Space
How Kernel Feature Access Works
Click to edit Master title style
• User processes typically need access to specific functionality provided by
the Linux kernel.
• If a process runs in kernel space, this is easy, as it has unlimited access to all
kernel features.
• For processes running in user space, system calls are provided to expose
specific parts of kernel functionality.
• Capabilities are used like permissions to provide access to these system
calls.
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.3 Understanding System Calls
Understanding System Calls
Click to edit Master title style
• System calls are defined in the kernel source code to allow programs to
access functionality provided by the Linux kernel.
• They provide a controlled interface between kernel space and programs.
• Documentation about system calls is provided in section 2 of the man
pages
• When programs are executing system calls, they run specific parts of kernel
code from user space.
• The C library, which is provided by glibc, provides user space wrappers
around the kernel system calls, which makes it possible to use them from
application code.
NTS: Drawing (source: author provided)
Click to edit Master title style
user space
myprocess
|
glibc
|
-------------------capabilities---------------------------
kernel space. |
V
system calls <-> kernel
Using strace to Analyze System Calls
Click to edit Master title style
• Use strace <any_command> for an overview of system calls generated by
any command.
• Notice that strace output is written to STDERR, not to STDOUT.
• To make reading it easy, use strace ls |& less.
• Or write output to a file, using strace ls 2> lsout.txt
• strace results can be overwhelming.
strace Useful Options
Click to edit Master title style
• Use the -c option for a short list of how often each system call is used ; this
will not only show names of the system calls, but also how much time is
spent executing these system calls, and which system calls generate any
errors.
• Use -e to filter on a specific expression, like strace -e trace=open will only
show the open system call
• Use -f to trace processes that are forked by the current processes
• Use -p to attach to an already running process: strace -p 2921
Demo: Analyzing System Calls
Click to edit Master title style
• ls * # shows files in the current directory
• echo * # shows files in the current directory
• strace -c ls *
• strace -c echo * # is a lot faster
• strace ls |& less shows all system calls in a pager
Understanding strace Output
Click to edit Master title style
• The results of strace show different types of output
• system calls: the actual interaction with the kernel
• return values: show succes or failure of a system call. Negative return values
indicate an error
• blocking calls: system calls that wait for an event like read, write, select, poll
might be where a program is hanging
Demo: Analyzing System Calls
Click to edit Master title style
• dnf install -y netcat
• netcat -l 8080 # the command doesn't seem to be doing anything. What is
going on?
• strace netcat -l 8080 # will show that it starts listening for incoming
connections
• from another terminal: netcat -vz localhost 8080 will show that the port is
listening, after which the listening netcat will stop
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.4 How Processes get Access to
System Calls
Getting Access to System Calls
Click to edit Master title style
• In early versions of Linux, processes would run as root, or as non-privileged
user.
• This model was at risk of security threats.
• To offer more fine-grained access, capabilities were introduced.
• Administrators can use setcap to set specific capabilities.
• Use getcap to find capabilities currently set on programs.
• Processes can also get access to specific system calls through inheritance.
• If a parent process has access to specific syscalls, and forks a child process, the
child process inherits its capabilities.
• Or, the process is programmed to access specific capabilities.
Demo: Exploring Capability Usage
Click to edit Master title style
• Note: this demo works on older distributions only
• getcap /bin/ping
• strace /bin/ping -c 1 127.0.0.1
• Look for prctl(), capget() and capset()
• setcap cap_net_raw=-ep /bin/ping
• as non-root user: ping 127.0.0.1
• as root: setcap cap_net_raw+ep /bin/ping
Click to edit Master title style

Lesson 11: Linux Commands


and How They Work
11.5 How Process Memory is
Organized
Understanding Process Memory
Click to edit Master title style
• When a process starts, it allocates virtual memory.
• Within the allocated virtual address space, different areas are used.
• The code segment is where the program code is loaded. Its size is equal to the
size of the process file.
• The data segment is used to store data and hard-coded variables defined by the
programmer.
• The heap is used for dynamic memory allocation in addition to the data
segment, where variables are allocated and freed during process execution.
• Memory mapped regions are reserved for accessing libraries and shared files.
• The stack has a more static nature and is used for accessing function calls.
Understanding Stack Memory
Click to edit Master title style
• Stack memory is fast-accessible memory, used for small amounts of data.
• When processes start, a limited amount of stack memory is allocated.
• The size of the stack is limited, which is why it is used for small amounts of
data, like functions.
• It has a Last In, First Out (LIFO) access structure.
• Programs automatically allocate and deallocate stack memory.
• When stack memory is exhausted, you will see a stack overflow error.
Understanding Heap Memory
Click to edit Master title style
• Heap memory is used to store data that is not tied to a specific function or
procedure.
• It has no specific structure, and individual parts of the Heap can be
addressed individually.
• Heap memory is dynamically allocated and deallocated during application
runtime.
• Its size is dynamically adjusted.
• Access to heap memory is slower than access to stack memory.
• It is mainly used for larger structures.
• If processes add memory to the heap without ever deallocating it, a
memory leak occurs.
Analyzing Memory Usage
Click to edit Master title style
• A generic overview of process memory usage is in /proc/<PID>/status
• More details can be found in /proc/<PID>/maps
• An interpreted overview is provided by the pmap command
• When running pmap, you'll notice that memory locations change. This is
because of "address randomization" a technique that protects program
memory against specific attacks like "stack smashing"
Demo: Monitoring Heap and Stack Memory
Click to edit Master title style
• Notice that heap memory will show as [anon]
• cat /proc/$(pidof sshd)/status
• pmap $(pidof sshd)
• cat /proc/$(pidof sshd)/maps
• pmap -X $(pidof sshd)
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.6 Creating Processes
How Processes are Created
Click to edit Master title style
• New processes are created by the parent process.
• To do so, the parent process normally uses the fork() system call to create
the child process.
• After being created, the child process uses execve() to load its own stack,
heap, and data.
• Alternatively, a new process can be created using the bash internal exec
instead of fork().
• Doing so will trigger the execve() system call without using fork() first.
• When using exec, the old process is replaced with the new process.
Understanding fork()
Click to edit Master title style
• While creating a new process, the current process copies its execution
environment to create a child process.
• By using fork(), the child process gets its own PID.
• Initially, the child process components that are loaded in memory are
exactly the same as the parent process components.
• The first system call issued by the child process is execve().
• While using execve(), the parent process data is removed, and child
process data is loaded.
• While tracing process creation with strace, you'll see execve() as the first
system call that is issued.
Using fork() without execve()
Click to edit Master title style
• Normally, after using fork() to create a new process, execve() is used to
replace the process with the desired code.
• If parallel execution of programs is needed, it makes sense to use fork()
without a following execve().
• The result would be different processes, each having their own PID, but
running the same code.
Understanding exec
Click to edit Master title style
• exec is NOT a system call, it's a bash internal.
• It is also used to refer to a set of system calls, of which execve() is the most
important one.
• It can be used to replace the current process with new process code.
• While using exec, no new PID is generated.
• Using exec is mainly useful in troubleshooting scenarios, where the kernel
has booted with the init=/bin/bash argument and you want it to proceed
by loading systemd.
• systemd always needs PID 1, so using fork() to load systemd would not be
an option.
Demo: Investigating fork() and exec
Click to edit Master title style
• echo $$
• dnf install -y zsh
• zsh
• echo $$
• exit
• exec zsh
• echo $$
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.7 Allocating Memory
Understanding Memory Allocation
Click to edit Master title style
• To allocate memory, a couple of system calls are provided,
• mmap(): used to create a new mapping in the virtual address space of the
calling process .
• brk(): changes the location of the program break, which defines the end of the
process data segment. Increasing it results in allocating more memory.
• sbrk(): similar to brk().
• To make it easier for developers to allocate memory, the malloc() library
call is provided by glibc.
• Malloc dynamically determines which system call is most efficient to allocate
memory.
Analyzing Memory Allocation
Click to edit Master title style
• Use strace to analyze memory allocation.
• It will show the different system calls used, including their parameters.
• Depending on the command you're analyzing, you'll see mmap(), brk() and
sbrk().
• strace won't show malloc() as this is a library call and not a system call.
Click to edit Master title style

Lesson 11: Linux Commands


and how they Work
11.8 Accessing Libraries
Understanding Libraries
Click to edit Master title style
• Linux programs need access to common functionality.
• This is functionality required by other programs as well.
• Instead of compiling this functionality directly into the program, shared
libraries are used.
• A large set of libraries is used to provide access to the most common Linux
functionality.
Providing Library Access
Click to edit Master title style
• Programs can be linked to libraries in a dynamic or in a static way.
• Static linking requires the library functions to be compiled into the program
file.
• Dynamic linking allows the program file to load functions from external
library files.
• Almost all programs are dynamically linked to their libraries.
• The ldconfig program is statically linked to its libraries.
• Use ldd to get information about program library access.
Understanding the Linux Loader
Click to edit Master title style
• When programs are started, libraries need to be loaded
• This is the responsibility of the Linux loader, typically ld-linux-*
• Use file /path/to/any/program to find which linux loaded is used
Demo: Investigating Library Usage
Click to edit Master title style
• ldd /usr/bin/* shows lots of files with dynamically linked libraries.
• ldd /usr/sbin/ldconfig shows that this binary is statically linked.
• ldd /usr/lib64/ld-linux-* shows this program is also statically linked
• This makes sense, as ldconfig and ld-linux-* used for dynamic library
access.
About libc
Click to edit Master title style
• All Linux programs are using library functions provided by libc, often
referred to as glibc.
• This library contains core Linux functionality:
• Standard functions, like input and output control and string manipulation
• Wrappers to kernel provided libraries
• Dynamic library linking
• Error handling
• POSIX regular expressions
• and more; see man 7 libc for more information
• Notice that other Linux systems, like embedded Linux systems, may use an
alternative C library.
Click to edit Master title style

Lesson 11: Linux Commands


and How They Work
11.9 Analyzing Library Usage
Accessing Dynamic Libraries
Click to edit Master title style
• When Linux starts, the ldconfig command creates a library cache
• The library cache is stored in the binary file /etc/ld.so.cache
• To print its current contents, use ldconfig -p
• Programs that require access to libraries should update the library cache
through the package installer
• In some cases it is required to manually run the ldconfig command to
update the library cache
Analyzing Library Usage
Click to edit Master title style
• The ltrace command can be used to print library calls used by a process.
• Lots of output may be provided, consider adding the -l
libraryname.so.version option to print library functions from a specific
library only.
• ltrace -l libc.so.6 ls 2> lstrace.out
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.1 Running an Application on Linux
Running Typical Linux Applications
Click to edit Master title style
• To run an application, it needs access to its dependencies.
• While starting, the application claims system resources.
• Once started, the application has full access to all files on the server where
it started.
Disadvantages of Typical Linux Apps
Click to edit Master title style
• You cannot run applications that require access to a different version of the
same dependency.
• While installing the application, you need to install the application
dependencies also.
• Resource access is unlimited and an application may leave no resources for
others.
• Poorly secured applications may allow unauthorized access to system
resources.
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.2 Running Applications in a Chroot
Jail
Understanding Chroot
Click to edit Master title style
• The chroot jail was one of the first solutions to restrict applications to their
own resources.
• The idea is to present the contents of one directory as if it is all that exists
on the system, and limit the application to that directory only.
• To do so, all application dependencies need to be copied into the chroot
environment.
• This is a lot of work and it's easy to forget one or more dependencies.
Demo: Running httpd in a Chroot Jail
Click to edit Master title style
• mkdir /var/chroot
• dnf --installroot=/var/chroot install -y httpd
• cp /etc/{passwd,group,resolv.conf} /var/chroot/etc/
• mount -o bind /dev /var/chroot/dev
• mount -o bind /proc /var/chroot/proc
• mount -o bind /sys /var/chroot/sys
• chroot /var/chroot
• httpd &
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.3 Managing Namespaces
From Chroot to Namespaces
Click to edit Master title style
• The idea to restrict application access to specific resources in chroot jail
worked well.
• The concept has further been developed into namespaces, which provide
strict isolation for specific areas:
• cgroup: system resources
• IPC: inter-process communication
• network: networking
• mount: directory access
• pid: running processes
• users: the only namespace that can be created without CAP_SYS_ADMIN
• UST: hostname and NIS domain name
Monitoring Namespaces
Click to edit Master title style
• Use lsns to see namespaces currently in use.
• Alternatively, ls /proc/*/ns will show namespaces in use by processes.
• While starting processes through systemd, the RestrictNamespaces
directive can be used to automatically put it in specific namespaces.
Viewing Namespaces Processes
Click to edit Master title style
• A process in a namespace can only see its own namespace.
• From the host OS view, this process has a "regular" PID.
• Use common tools like ps to see it.
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.4 Using unshare to run
Namespaced Processes
Understanding unshare
Click to edit Master title style
• The Linux unshare command can be used to run processes in new
namespaces
• The unshare command will isolate (unshare) the new environment from
the current namespace
• Use unshare --fork --mount-proc -p -U zsh to run zsh in its own User and
process namespace
• The nsenter command allows you to run commands from the host OS in a
namespace, and is particularly useful for analyzing containers
Demo: Using unshare to Namespaces
Click to edit Master title style
• unshare --fork --pid --mount-proc zsh
• echo $$
• ps aux
• from another terminal: ps faux | grep -B 5 zsh
• lsns -p $(pidof zsh)
• nsenter -a -t $(pidof zsh) ps aux
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
###13.5 Using nsenter to
Investigage Namespaces
Understanding nsenter
Click to edit Master title style
• Containers can be used to run commands isolated in a specific namespace
• The nsenter command can be used to manually run commands in a specific
namespace
• This makes it a useful command for analyzing what is going on in Podman
containers
• While using nsenter, tools on the host OS can be used to figure out what is
going on inside container namespaces
• While using nsenter, use command line options to specify which
namespace to enter
Using nsenter to Analyze Podman Containers
Click to edit Master title style
• Podman containers use namespaces a lot
• Using nsenter can be useful to analyze podman containers, as it allows you
to use host tools to analyze what is going on in a container
Demo: Using nsenter to Investigate Containers
Click to edit Master title style
• Run this demo from a non-root shell
• podman run -d --name web01 nginx
• podman run -d --name web02 nginx
• podman ps
• podman inspect web01 --format '{{.State.Pid}}'
• podman inspect web02 --format '{{.State.Pid}}'
• sudo nsenter -u -t 3425 id
• sudo nsenter -u -t 5761 id
Demo: Using nsenter to Investigate Containers
Click to edit Master title style
• id
• sudo nsenter -n -t 3425 ip a
• sudo nsenter -n -t 5761 ip a
• sudo nsenter -n -t 3425 ss -pant
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.5 Running Linux Applications with
Namespaces and Cgroups
Running Restricted Systemd Applications
Click to edit Master title style
• The unshare command can be used to start an application in a namespace.
• Applications can be added to cgroups manually as well.
• Cgroups are mounted on a pseudo mount, and you can echo the PID value into
the cgroups.proc file.
Using unshare
Click to edit Master title style
• unshare can be used to run processes in a namespace, unshared from the
parent.
• unshare has different options to specify which namespaces should be
created.
• unshare --fork --pid --mount-proc zsh will run zsh in a new PID namespace.
• ps aux will show only processes in that namespace.
• pidof zsh will show zsh has been started as PID 1.
Demo: Manually Running an App in a Cgroup
Click to edit Master title style
• mount | grep cgroup
• cd /sys/fs/cgroup
• mkdir mycgroup
• dd if=/dev/zero of=/dev/null & echo $! > cgroup.procs
• echo 50 > cpu.weight
• top
• Notice that results will show in comparison to other processes. You may
have to disable one or more CPU cores.
Making it Easier with Systemd
Click to edit Master title style
• Systemd offers easy integration for running processes in namespaces as
well as Cgroups.
• If using mount namespaces, you'll need to copy all required files into the
namespaced directory.
• Cgroup directives are available:
• CPUWeight
• MemoryMax
• and more
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.6 From Restricted Linux
Applications to Containers
What is a Container
Click to edit Master title style
• A container is a standard unit of software that packages code and all its
dependencies so the application runs quickly and reliably from one
computing environment to another (source:
https://www.docker.com/resources/what-container/)
• Containers offer all that is needed to run applications in the
aforementioned way.
• Containers leverage Linux kernel namespaces and cgroups to run
containerized applications in a secure way.
Combining all in Containers
Click to edit Master title style
• Manually running applications the aforementioned way is cumbersome.
• All files needed in the mount namespace need to be copied over.
• Cgroup restrictions need to be set.
• Namespace restrictions need to be set.
• Containers make the process easier.
• Default images are provided.
• Standard images may be provided through container registries.
• The container runtime provides an easy interface to set resource limits and
namespaces.
• Docker made containers a success by combining cgroups and namespaces
and adding the Docker registry for efficient image delivery
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.7 Container Runtimes
Using Container Runtimes
Click to edit Master title style
• Containers can be started in many ways.
• Using a container runtime makes it easy to start containers from
standardized images.
• The runtime will also request all required kernel features to run the
container.
• Common runtimes are containerd, cri-o, and runc.
• These are integrated in container engines like Docker and Podman.
• Even if runtimes are convenient, it's important to realize that containers
can also be started without using a container runtime.
What a Container Runtime Adds
Click to edit Master title style
• Although running files from an extracted tarball in an environment that is
limited by cgroups and namespaces is essentially running a container, using
a container runtime adds features and functionality:
• Image management
• Layered filesystems
• Network management
• Standardization
• Lifecycle management
• Integration with orchestration tools
• API
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
13.8 Systemd Containers
Understanding systemd-nspawn
Click to edit Master title style
• Containers provide a way to run a process in multiple namespaces, with
cgroups applied.
• Docker and Podman are common ways to implement containers, but you
can also easily implement them on Linux using systemd-nspawn.
• systemd-nspawn is mainly about running applications in perfectly isolated
environment, and not so much about providing a distribution model for
applications as is the case with Docker registry and others.
Creating a Container
Click to edit Master title style
• To create your own container, you need a chroot filesystem containing all
that is required.
• In the next demo, the debootstrap package is used to create an easy
Debian-based container foundation.
• The systemd-container package contains systemd-nspawn as well as
machinectl, the main tool to manage the containers.
• To create the container, the "image" is created in the /var/lib/machines
directory, where systemd-nspawn expects to find the container images.
Demo: Preparing the Container
Click to edit Master title style
• On Ubuntu:
• sudo apt install debootstrap systemd-container bridge-utils
• sudo mkdir -p /var/lib/machines/helloworld
• sudo debootstrap stable /var/lib/machines/helloworld
http://deb.debian.org/debian/ # creates Debian
• sudo chown root:root /var/lib/machines
• sudo chmod 700 /var/lib/machines
Understanding Tools
Click to edit Master title style
• systemd-nspawn is used to start the container.
• It normally runs a root shell in the container environment.
• Use -b to actually boot the container and provide a root shell after.
• Use -U to tell it to create a user namespace.
• Use -M name to give it a name as well.
• Once the container is running, you can provide additional setup:
• Install dbus to allow usage of machinectl
• Configure networking
• Set the hostname
Demo: Using the Container
Click to edit Master title style
• Open a root shell to set the password
• sudo systemd-nspawn -UM helloworld
• # passwd
• # exit
• Boot into the container
• sudo systemd-nspawn -UbM helloworld
• sudo apt install dbus
Demo: Using machinectl
Click to edit Master title style
• machinectl list-images
• machinectl start helloworld
• machinectl status helloworld
• machinectl login helloworld # doesn't work without dbus
• machinectl stop helloworld
• machinectl enable helloworld
• machinectl disable helloworld
Click to edit Master title style

Lesson 13: Containers are


Linux, Linux is Containers
Real-world Scenario: Running Pure
Linux Containers
Demo
Click to edit Master title style
• mkdir /mychroot
• mkdir /mychroot/{bin,lib.lib64,etc}
• cp /bin/bash /mychroot/bin/
• ldd /bin/bash
• mount -o bind /lib /mychroot/lib/
• mount -o bind /lib /mychroot/lib/
• mount -o bind /lib64 /mychroot/lib64
• mount -o bind /etc /mychroot/etc
• chroot /mychroot/ /bin/bash
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.1 The C Programming Language
Understanding C Programs
Click to edit Master title style
• Linux is written in C.
• The source code of all Linux components is accessible as a readable file.
• Source files written in C need to be compiled into a binary program.
• Some programs that run on Linux are written in other languages: Python is
becoming increasingly popular.
Do You Need to Know C?
Click to edit Master title style
• Linux developers need knowledge of the C programming language.
• Linux administrators and (power) users don't, as most Linux components
are very well documented.
• If, however, you want to be able to do advanced troubleshooting or
performance optimization, it helps if you understand a bit of C.
• You should also know how to compile a source file in C to a working
program file.
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.2 Working Together in Git
The Need to Use Git
Click to edit Master title style
• Linux is open source.
• In open source, people are working together in different projects.
• To facilitate working together, Linux founder Linus Torvalds introduced Git
in 2005.
• Git is a distributed version control system that tracks changes in files.
• Using Git makes it easy for developers to propose changes to code, after
which the changes are reviewed and may be merged into the core code.
Git Platforms
Click to edit Master title style
• Different Git platforms are available.
• Public platforms include github.com and gitlabs.com.
• Private Git servers may be used as well.
Working with Git
Click to edit Master title style
• To work with Git, the developer clones the Git folder to the local computer.
• The Git folder contains the complete repository with a historic overview
and includes tracking functionality as well.
• After making changes to the code, the changes can be merged up to the Git
repository, where, after approval, the changes will be made final.
Demo: Using Git
Click to edit Master title style
• dnf install -y git
• git clone git://git.sv.gnu.org/coreutils
• ls coreutils/src
• less coreutils/src/ls.c
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.3 From Git Project to Linux
Distributions
Git and Linux
Click to edit Master title style
• Linux components find their origins in Git projects.
• Linux distributions often use the same Git repositories, which explains why
for the biggest part, there are no fundamental differences between the
Linux distributions.
• Sometimes, software can be obtained from different projects in Git.
• Often, the Linux distribution adds their own patches to code from Git.
• In open source, the patches need to be given back to the source project as
well.
• The result is that Linux distributions may include newer and uncommon
features, which later will be found in other distributions as well.
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.4 C Programs: from Source Code to
Binary
Compiler and Makefiles
Click to edit Master title style
• C code needs to be compiled to get usable code.
• Compiling a simple source file is straightforward: gcc simple.c -o simple
• Compiling may become more complex if external libraries are used.
• To include libraries and other dependencies, compiler commands may
become extremely long.
• To make compiling easier, you can use Makefiles.
• A Makefile is a recipe that defines exactly what needs to be done in the
compilation process.
• Use the make command to process instructions in a Makefile.
Understanding Object Files
Click to edit Master title style
• Complex programs are composed of many components and the
compilation process consists of multiple steps.
• First, the source code is compiled into the object file.
• Next, the object files are linked in the binary file.
• In compiling, the linker is the part that connects everything.
• In the command gcc -o myapp main.o aux.o, the myapp file is linked with
library functions in the main.o and aux.o files.
Understanding Header Files
Click to edit Master title style
• Header files are additional C source files that contain type and library
function declarations.
• stdio.h is a common header that makes essential functions available.
• Use #include <stdio.h> in the source file to include the header.
• Before making the binary file, the C preprocessor (cpp) joins source code,
object files, and header files.
Analyzing a C Source File
Click to edit Master title style
• #include <stdio.h> this is the preprocessor directive that tells the C
compiler to include the standard input-output header file. In this script, it is
required as it includes the print function that is used later.
• int main() { here the main function is defined. Every C program has a main
function as its core code. The int part indicates that this function returns an
integer value.
• printf("Hello World!\n"); here the printf function is used to print a string,
which is followed by the newline character.
• return 0; will send the exit code 0 to the main operating system to indicate
that the program terminated successfully.
• } marks the end of the main function.
Demo: Installing a Source File
Click to edit Master title style
• dnf groups install "Development Tools"
• cat luth/first.c
• gcc first.c -o first
• ./first
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.5 C and Libraries
C Programs and Libraries
Click to edit Master title style
• C Programs always use generic functions that are provided from other files.
• All C programs include the stdio.h for common I/O functions.
• Many programs include external libraries as well.
• These external libraries are dynamically linked when compiling the C
programs.
• You have learned earlier in this course how these can be managed using
commands like ldd and ldconfig.
Click to edit Master title style

Lesson 14: The Code Behind


Linux
14.6 Compiling a C Program from a
Makefile
Using Makefile
Click to edit Master title style
• More complex applications contain a Makefile.
• Specific instructions may be different, depending on the program.
• In general, start by reading the README file.
• Next, you'll often have to run a ./configure script to perform required
preparation.
• Next, use the make command to process the Makefile. This will create the
compiled files in the current directory.
• Use make install to install the compiled files in the appropriate locations.

You might also like