The File System
Files in Linux
Stefano Quer and Stefano Scanzio
Dipartimento di Automatica e Informatica
Politecnico di Torino
skenz.it/os [email protected]
Operating Systems 2
File System
❖ The file system is one of the most visible aspects
of an OS
❖ It provides mechanisms to save data
(permanently)
❖ It includes management of
➢ Files
➢ Direttories
➢ Disks and disk partitions
Operating Systems 3
Files
❖ Information is store for a long period of time
➢ independently from
▪ Termination of programs/processes, power supply,
etc.
❖ From the logical point of view a file is
I file
➢ A set of correlated information
▪ All information (i.e., numbers, characters, images,
etc.) are stored in a (electronic) device using a
coding system
➢ Contiguous address space
How is this information What is the actual
encoded? organization of this space?
Operating Systems 4
ASCII encoding
128 total characters
❖ De-facto standard 32 not printable
➢ ASCII, American Standard 96 printable
Code for Information Interchange
▪ Originally based on the English alphabet
▪ 128 characters are coded in 7-bit (binary numbers)
➢ Extended ASCII (or high ASCII)
▪ Extension of ASCII to 8-bit and 255 characters
▪ Several versions exist
● ISO 8859-1 (ISO Latin-1), ISO 8859-2 (Eastern
European languages), ISO 8859-5 for Cyrillic
languages, etc.
The alphabet of Klingom
language is not supported
by Extended ASCII
Operating Systems 5
Extended ASCII table
Operating Systems 6
Unicode encoding
❖ Industrial standard that includes the alphabets for
any existing writing system
➢ It contains more 110,000 characters
➢ It includes more than 100 sets of symbols
❖ Several implementations exist
➢ UCS (Universal Character Set)
➢ UTF (Unicode Tranformation Format)
▪ UTF-8, groups of 8 bits size (1, 2, 3 or 4 groups)
● ASCII coded in the first 8 bits
▪ UTF-16, groups of 16 bits size (1 or 2 groups)
▪ UTF-32, groups of 32 bits size (fixed length)
Operating Systems 7
Textual and binary files
❖ A file is basically a sequence of bytes written one
after the other
➢ Each byte includes 8 bits, with possible values 0 or 1
➢ As a consequence all files are binary
❖ Normally we can distinguish between
➢ Textual files (or ASCII)
➢ Binary files C sources, C++,
Java, Perl, etc.
Remark:
Executables, The UNIX/Linux kernel
Word, Excel, etc. does not distinguish
between binary and
textual files
Operating Systems 8
Textual files (or ASCII)
❖ Files consisting of data encoded in ASCII
➢ Sequence of 0 and 1, which (in groups of 8 bit)
codify ASCII symbols
❖ Textual files are usually “line-oriented”
➢ Newline: go to the next line
▪ UNIX/Linux and Mac OSX
● Newline = 1 character
● Line Feed (go to next line, LF, 10 10)
▪ Windows
● Newline = 2 characters
● Line Feed (go to next line, LF, 10 10)
+ Carriage Return (go to beginning of the line, CR, 13 10)
Operating Systems 9
Binary Files
❖ A sequence of 0 and 1, not “byte-oriented”
❖ The smallest unit that can be read/write is the bit
➢ Non easy the management of the single bit
➢ They usually include every possible sequence of 8
bits, which do not necessarily correspond to
printable characters, new-line, etc.
Operating Systems 10
Why are binary files used?
❖ Advantages
➢ Compactness (smaller average dimension)
▪ Examples: Number 10000010 occupies 6 characters,
(i.e., 6 bytes) in the Text/ASCII format, and 4 bytes
if coded in an integer (short)
➢ Ease of editing the file
▪ An integer always occupies the same space
➢ Ease of positioning on the file
▪ Fixed record structure
❖ Drawbacks
➢ Limited portability
➢ Impossibility to use a standard editor
Operating Systems 11
Example
String
“ciao”
Textual or binary file
‘c’ ‘i’ ‘a’ ‘o’
9910 10510 9710 11110
011000112 011010012 011001002 011011112
“231”
‘2’ ‘3’ ‘1’ Integer number
Textual file
5010 5110 4910
001100102 001100112 001100012
“231” Integer number
“23110” Binary file
111001112
Operating Systems 12
Serialization
❖ Process of translating a structure (e.g., C struct)
into a storable format
➢ Using serialization, a struct can be stored or
transmitted (on the network) as a single entity
➢ When the sequence of bits is read, it is done in
accordance with the serialization process, and the
struct is reconstructed in an identical manner
❖ Many languages support serialization using R/W
operations on a file
➢ Java, Python, Objective-C, Ruby, etc.
Operating Systems 13
Example
Text:
struct mys {
Single fields
int id; Characters on 8 bits (ASCII)
long int rn;
char n[L], c[L];
int mark;
} s; 1 100000 Romano Antonio 25
Binary: † Romano ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Serialization Antonio ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Ctr on 8 bits (ASCII)
??? †?R?o?m?a?n?o???ÌÌÌÌÌÌÌÌÌÌÌÌÌ
Binary: ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Serialization
ÌÌA?n?t?o?n?i?o???ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Ctr on 16 bits (UNICODE)
N.B. File dimension ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Operating Systems 14
ISO C Standard Library
❖ I/O operations with ANSI C can be performed
through different categories of functions
➢ Character by character
➢ Row by row
➢ Formatted I/O
➢ Binary I/O
➢ Read examples
▪ https://www.skenz.it/cs/c_language/file_reading_1
➢ Write examples
▪ https://www.skenz.it/cs/c_language/file_writing_1
➢ Binary I/O examples
▪ https://www.skenz.it/cs/c_language/write_and_read
_a_binary_file
Operating Systems 15
ISO C Standard Library
❖ Standard I/O is “fully buffered”
➢ The I/O operation is performed only when the I/O
buffer is full
➢ The “flush” operation indicates the actual write of
the buffer to the I/O
#include <stdio.h>
void setbuf (FILE *fp, char *buf);
int fflush (FILE *fp);
For concurrent processes, use:
Standard error is setbuf (stdout, 0);
never buffered fflush (stdout);
Operating Systems 16
Open and close a file
#include <stdio.h>
FILE *fopen (char *path, char *type);
FILE *fclose (FILE *fp);
❖ Access methods
➢ r, rb, w, wb, a, ab r+, r+b, etc.
➢ The UNIX kernel does not make any difference
between textual files (ASCII) and binary files
▪ The “b” option has no effect, e.g. “r”==“rb”,
“w”==“wb”, etc.
Operating Systems 17
I/O character by character
#include <stdio.h>
int getc (FILE *fp);
int fgetc (FILE *fp);
int putc (int c, FILE *fp);
int fputc (int c, FILE *fp);
❖ Returned values
➢ A character on success
➢ EOF on error, or when the end of the file is
reached
❖ The function
➢ getchar is equivalent to getc (stdin)
➢ putchar is equivalent to putc (c, stdout)
Operating Systems 18
I/O row by row
#include <stdio.h>
char *gets (char *buf);
char *fgets (char *buf, int n, FILE *fp);
int puts (char *buf);
int fputs (char *buf, FILE *fp);
❖ Returned values
➢ buf (gets/fgets), or a non-negative value
(puts/fputs) in the case of success
➢ NULL (gets/fgets), or EOF for errors or when the
end of file is reached (puts/fputs)
❖ Lines must be delimited by "new-line"
Operating Systems 19
Formatted I/O
#include <stdio.h>
int scanf (char format, …);
int fscanf (FILE *fp, char format, …);
int printf (char format, …);
int fprintf (FILE *fp, char format, …);
❖ High flexibility in data manipulation
➢ Formats (characters, integers, reals, etc.)
➢ Conversions
Operating Systems 20
Binary I/O
#include <stdio.h>
size_t fread (void *ptr, size_t size,
size_t nObj, FILE *fp);
size_t fwrite (void *ptr, size_t size,
size_t nObj, FILE *fp);
❖ Each I/O operation (single) operates on an
aggregate object of specific size
➢ With getc/putc it would be necessary to iterate on
all the fields of the struct
➢ With gets/puts it is not possible, because both
would terminate on NULL bytes or new-lines
Operating Systems 21
Binary I/O
#include <stdio.h>
size_t fread (void *ptr, size_t size,
size_t nObj, FILE *fp);
size_t fwrite (void *ptr, size_t size,
size_t nObj, FILE *fp);
❖ Returned values
➢ Number of objects written/read
➢ If the returned value does not correspond to the
parameter nObj ferror and feof can be
used to distinguish
▪ An error has occurred between the two cases
▪ The end of file has been reached
Operating Systems 22
Binary I/O
#include <stdio.h>
size_t fread (void *ptr, size_t size,
size_t nObj, FILE *fp);
size_t fwrite (void *ptr, size_t size,
size_t nObj, FILE *fp);
❖ Often used to manage binary files
➢ serialized R/W (single operation for the whole
struct)
➢ Potential problems in managing different
architectures
▪ Data format compatibility (e.g., integers, reals, etc.)
▪ Different offsets for the fields of the struct
Operating Systems 23
POSIX Standard Library
❖ I/O in UNIX can be entirely performed with only
5 functions
➢ open, read, write, lseek, close
❖ This type of access
➢ Is part of POSIX and of the Single UNIX
Specification, but not of ISO C
➢ It is normally defined with the term "unbuffered
I/O", in the sense that each read or write
operation corresponds to a system call
Operating Systems 24
System call open()
❖ In the UNIX kernel a "file descriptor" is a non-
negative integer
❖ Conventionally (also for shells)
➢ Standard input
▪ 0 = STDIN_FILENO
➢ Standard output
▪ 1 = STDOUT_FILENO
➢ Standard error
▪ 2 = STDERR_FILENO
These descriptors are defined
in the headers file unistd.h
Operating Systems 25
System call open()
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open (const char *path, int flags);
int open (const char *path, int flags,
mode_t mode);
❖ It opens a file defining the permissions
❖ Returned values
➢ The descriptor of the file on success
➢ -1 on error
Operating Systems 26
System call open()
int open (
❖ It can have 2 or 3 parameters const char *path,
int flags,
➢ The mode parameter is optional mode_t mode
❖ Path indicates the file to open );
❖ Flags has multiple options
➢ Can be obtained with the OR bit-by-bit of
constants defined in the header file fcntl.h
➢ One of the following three constants is mandatory
▪ O_RDONLY open for read-only access
▪ O_WRONLY open for write-only access
▪ O_RDWR open for read-write access
Operating Systems 27
System call open()
int open (
const char *path,
int flags,
mode_t mode
➢ Optional constants );
▪ O_CREAT creates the files if not exist
▪ O_EXCL error if O_CREAT is set and the file
exists
▪ O_TRUNC remove the content of the file
▪ O_APPEND append to the file
▪ O_SYNC each write waits that the physical
write operation is finished
before continuing
▪ ...
Operating Systems 28
System call open()
int open (
❖ Mode specifies access const char *path,
int flags,
permissions mode_t mode
);
➢ S_I[RWX]USR rwx --- ---
➢ S_I[RWX]GRP --- rwx ---
➢ S_I[RWX]OTH --- --- rwx
When a file is created, actual permissions are
obtained from the umask of the user owner
of the process
Operating Systems 29
System call read()
#include <unistd.h>
int read (int fd, void *buf, size_t nbytes);
❖ Read from file fd a number of bytes equal to
nbytes, storing them in buf
❖ Returned values
➢ number of read bytes on success
➢ -1 on error
➢ 0 in the case of EOF
Operating Systems 30
System call read()
#include <unistd.h>
int read (int fd, void *buf, size_t nbytes);
❖ The returned value is lower that nbytes
➢ If the end of the file is reached before nbytes
bytes have been read
➢ If the pipe you are reading from does not contain
nbytes bytes
Operating Systems 31
System call write()
#include <unistd.h>
int write (int fd, void *buf, size_t nbytes);
❖ Write nbytes bytes from buf in the file identified
by descriptor fd
❖ Returned values
➢ The number of written bytes in the case of
success, i.e., normally nbytes
➢ -1 on error
Operating Systems 32
System call write()
#include <unistd.h>
int write (int fd, void *buf, size_t nbytes);
❖ Remark
➢ write writes on the system buffer, not on the disk
▪ fd = open (file, O_WRONLY | O_SYNC);
➢ O_SYNC forces the sync of the buffers, but only
for ext2 file systems
Operating Systems 33
Examples: File R/W
float data[10];
if ( write(fd, data, 10*sizeof(float))==(-1) ) {
fprintf (stderr, "Error: Write %d).\n", n);
}
}
writing of the vector data (of
float)
struct {
char name[L];
int n;
float avg;
} item;
if ( write(fd,&item,sizeof(item)))==(-1) ) {
fprintf (stderr, "Error: Write %d).\n", n);
}
}
Writing of the serialized struct
item (with 3 fields)
Operating Systems 34
System call lseek()
#include <unistd.h>
off_t lseek (int fd, off_t offset, int whence);
❖ The current position of the file offset is
associated to each file
➢ The system call lseek assigns the value offset to
the file offset
➢ The offset value is expressed in bytes
Operating Systems 35
System call lseek()
#include <unistd.h>
off_t lseek (int fd, off_t offset, int whence);
❖ whence specifies the interpretation of offset
➢ If whence==SEEK_SET
▪ The offset is evaluated from the beginning of the file
➢ If whence==SEEK_CUR
▪ The offset is evaluated from the current position
➢ If whence==SEEK_END
▪ The offset is evaluated from the end of the file
The value of offset It is possible to leave
can be positive or "holes" in a file
negative (filled with zeros)
Operating Systems 36
System call lseek()
#include <unistd.h>
off_t lseek (int fd, off_t offset, int whence);
❖ Returned values
➢ new offset on success
➢ -1 on error
Operating Systems 37
System call close()
#include <unistd.h>
int close (int fd);
❖ Returned values
➢ 0 on success
➢ -1 on error
❖ All the open files are closed automatically when
the process terminates
Operating Systems 38
Example: File R/W
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define BUFFSIZE 4096
int main(void) {
int nR, nW, fdR, fdW;
char buf[BUFFSIZE];
fdR = open (argv[1], O_RDONLY);
fdW = open (argv[2], O_WRONLY | O_CREAT | O_TRUNC,
S_IRUSR | S_IWUSR);
if ( fdR==(-1) || fdW==(-1) ) {
fprintf (stdout, “Error Opening a File.\n“);
exit (1);
}
Operating Systems 39
Example : File R/W
while ( (nR = read (fdR, buf, BUFFSIZE)) > 0 ) {
nW = write (fdW, buf, nR);
if ( nR!=nW )
fprintf (stderr,
"Error: Read %d, Write %d).\n", nR, nW);
}
if ( nR < 0 )
fprintf (stderr, "Write Error.\n");
close (fdR);
close (fdW);
Error check on the last
exit(0); reading operation
}
This program works indifferently on text and
binary files