FILE SYSTEM
1
LONG TERM STORAGE
MANAGEMENT
▪ We often need to store
▪ large amount of information
▪ permanently
▪ Usually we use Hard disk & newly solid-state
drives for such long term storage
2
LONG TERM STORAGE
MANAGEMENT
▪ We can view this disks as a
▪ linear sequence of fixed-size
blocks
▪ supports two operations:
▪ Read Kth block.
▪ Write Kth block
▪ Now we have to answer
▪ How to find desired information?
▪ How to know which blocks are
free?
▪…
▪ Is it user friendly??
3
FILE SYSTEM
▪ file system consists of two distinct parts:
▪ A collection of files
▪ each storing related data
▪ A Directory structure
▪ organize and provide information about all the files in the system
4
DIRECTORIES/FOLDERS
▪ Naming is nice, but limited
▪ Humans like to group things together for convenience
▪ File systems allow this to be done with directories
(sometimes called folders)
▪ Grouping makes it easier to
▪ Find files in the first place: remember the enclosing directories
for the file
▪ Locate related files (or just determine which files are related)
5
SINGLE-LEVEL DIRECTORY
SYSTEMS
Root
directory
C
A A B
bla
foo bar baz
h
▪ One directory in the file system
▪ Example directory
▪ Contains 4 files (foo, bar, baz, blah)
▪ owned by 3 different people: A, B, and C (owners shown in red)
▪ Problem: what if user B wants to create a file called foo?
6
TWO-LEVEL DIRECTORY
SYSTEM
Root
directory
A B C
C
A A B B C C
bla
foo bar foo baz bar foo
h
▪ Solves naming problem: each user has her own directory
▪ Multiple users can use the same file name
▪ By default, users access files in their own directories
▪ Extension: allow users to access files in others’ directories
7
HIERARCHICAL DIRECTORY
SYSTEM
Root
directory
A B C
C
A A A B B C C
bla
Papers foo Photos foo Papers bar foo
h
A A B B
A
os.t sun foo. foo.
Family
ex set tex ps
A A A
sun kid Mo
set s m
8
FILE SYSTEM LAYOUT
9
FILE SYSTEM LAYOUT
▪ File systems are stored on hard disk
▪ Hard disk can be divided up into one or more
partitions, with independent file systems on each
partition
▪ Sector 0 of the disk is called the MBR(Master Boot
Record) and is used to boot the computer
▪ The end of the MBR contains the partition table
10
FILE SYSTEM LAYOUT
Entire disk
Partition table
Master Partition
Partition 1 Partition 2 Partition 4
boot record 3
Boot Super Free space Index
Files & directories
block block management nodes
11
FILE SYSTEM LAYOUT
▪ Partition table gives the starting and ending
addresses of each partition
▪ One of the partitions in the table is marked as active
▪ When the computer is booted, the BIOS reads in
and executes the MBR
▪ The first thing the MBR program does is locate the
active partition, read in its first block, called the
boot block , and execute it.
▪ The program in the boot block loads the operating
system contained in that partition.
12
FILE SYSTEM LAYOUT
▪ For uniformity, every partition starts with a boot
block,
▪ even if it does not contain a bootable operating system.
▪ Besides, it might contain one in the future
▪ Other than starting with a boot block, the layout of
a disk partition varies a lot from file system to file
system
▪ Often the file system contains a superblock
▪ It contains all the key parameters about the file
system
▪ read into memory when the computer is booted or the file
system is first touched.
13
FILE SYSTEM LAYOUT
▪ Typical information in the superblock includes
✔ a magic number to identify the file-system type
✔ the number of blocks in the file system
✔ other key administrative information.
14
I-NODES
▪ The structure that describes
▪ where the file is on the disk and
▪ the attributes of the file
▪ Associated with each file
▪ I-nodes have to be stored on disks
15
DISK BLOCK ALLOCATION
▪ Keeping track of which disk blocks go with which
file
▪ The most important issue in implementing file
system
▪ Several options
▪ Contiguous Allocation
▪ Linked list allocation
▪ Linked list allocation using a table in memory
▪ I-nodes
16
CONTIGUOUS ALLOCATION
17
CONTIGUOUS ALLOCATION
▪ OS maintains an ordered list of free disk blocks
▪ OS allocates a contiguous chunk of free blocks
when it creates a file.
▪ Need to store only the start location and size in the
file descriptor
18
CONTIGUOUS ALLOCATION
ADVANTAGE
▪ Simple to implement because keeping track of where a file’s
blocks are is reduced to remembering two numbers: the disk
address of the first block and the number of blocks in the
file.
▪ the read performance is excellent because the entire file can
be read from the disk in a single operation.
▪ Only one seek is needed (to the first block).
▪ After that, no more seeks or rotational delays are needed, so
data come in at the full bandwidth of the disk.
▪ Thus contiguous allocation is simple to implement and has
high performance.
▪ Usage: CD-ROM, DVD-ROM
19
CONTIGUOUS ALLOCATION
DISADVANTAGE
▪ Each file begins at the start of a new block, so that
if file A was really 3½ blocks, some space is
wasted at the end of the last block.
▪ over the course of time, the disk becomes
fragmented.
▪ Leads to unusable data hole.
20
LINKED LIST ALLOCATION
21
LINKED LIST ALLOCATION
▪ Keep a list of all the free blocks.
▪ In the file descriptor, keep a pointer to the first
block.
▪ In each block, keep a pointer to the next block
22
LINKED LIST ALLOCATION
ADVANTAGE
▪ every disk block can be used in this method.
▪ No space is lost to disk fragmentation (except for
internal fragmentation in the last block).
▪ It is sufficient for the directory entry to merely store
the disk address of the first block. The rest can be
found starting there.
23
LINKED LIST ALLOCATION
DISADVANTAGE
▪ although reading a file sequentially is straightforward,
random access is extremely slow.
▪ The amount of data storage in a block is no longer a
power of two because the pointer takes up a few bytes.
▪ While not fatal, having a peculiar size is less efficient
because many programs read and write in blocks whose
size is a power of two.
▪ With the first few bytes of each block occupied by a
pointer to the next block, reads of the full block size
require acquiring and concatenating information from
two disk blocks, which generates extra overhead due to
the copying.
24
LINKED LIST ALLOCATION
USING A TABLE IN MEMORY
▪ Taking the pointer out
of each disk block, and
putting it into a table in
memory
▪ Fast random access
(chain is in RAM)
25
LINKED LIST ALLOCATION
USING A TABLE IN MEMORY
ADVANTAGE
▪ Using this organization, the entire block is available for
data.
▪ Furthermore, random access is much easier.
▪ Although the chain must still be followed to find a given
offset within the file, the chain is entirely in memory,
▪ so it can be followed without making any disk references.
▪ Like the previous method, it is sufficient for the directory
entry to keep a single integer (the starting block number)
and still be able to locate all the blocks, no matter how
large the file is
26
LINKED LIST ALLOCATION
USING A TABLE IN MEMORY
DISADVANTAGE
▪ The primary disadvantage of this method is that the entire table
must be in memory all the time to make it work.
▪ With a 1-TB disk and a 1-KB block size, the table needs 1
billion entries, one for each of the 1 billion disk blocks.
▪ Each entry has to be a minimum of 3 bytes. For speed in
lookup, they should be 4 bytes.
▪ Thus the table will take up 3 GB or 2.4 GB of main memory
all the time, depending on whether the system is optimized for
space or time. Not wildly practical.
▪ Clearly the FAT idea does not scale well to large disks. It was
the original MS-DOS file system and is still fully supported by
all versions of Windows though.
27
I-NODES
▪ A data structure associated with each
file
▪ keeps track of which blocks belong to
which file
▪ used in UNIX file system
▪ One problem with i-nodes is that if
each one has room for a fixed
number of disk addresses, what
happens when a file grows beyond
this limit?
▪ One solution is to reserve the last
disk address not for a data block, but
instead for the address of a block
containing more disk-block
addresses
28
WHAT’S IN A DIRECTORY?
▪ Two types of information
▪ File names
▪ File metadata (size, timestamps, etc.)
▪ Basic choices for directory information
▪ Store all information in directory
▪ Fixed size entries, one per file
▪ Disk addresses and attributes in directory entry
▪ Store names & pointers to index nodes (i-nodes)
i-node
games attributes games
mail attributes mail i-node
news attributes news
i-node
research attributes research
i-node
Storing all information Using pointers to
in the directory index nodes
29
FIXED LENGTHY FILE
NAMING
▪ The simplest approach is to set a limit on file-name length,
typically 255 characters,
▪ Advantages
▪ Simple, easy to implement
▪ Disadvantages
▪ wastes a great deal of directory space, since few files
have such long names.
30
IN-LINE FILE NAMING
▪ all directory entries are of the same
size.
▪ Each directory entry contains a
fixed portion containing
✔ The length of the entry
✔ File attributes
✔ Actual File Name
▪ Each file name is terminated by a
special character (usually 0)
▪ To allow each directory entry to
begin on a word boundary, each file
name is filled out to an integral
number of words
31
IN-LINE FILE NAMING
▪ Disadvantages
▪ When a file is removed, a variable-sized gap is
introduced into the directory into which the next file
to be entered may not fit.
▪ A single directory entry may span multiple pages, so a
page fault may occur while reading a file name
32
IN-HEAP FILE NAMING
▪ Another way to handle
variable-length names is to make
the directory entries themselves
all fixed length
▪ keep the file names together in a
heap at the end of the directory
33
IN-HEAP FILE NAMING
▪ Advantages
▪ Solves the fragmentation problem
▪ Disadvantages
▪ Heap management
▪ Page fault still may occur
34
SHARED FILES
▪ When several users are working together on a project, they
often need to share files.
▪ As a result, it is often convenient for a shared file to appear
simultaneously in different directories belonging to
different users.
35
SHARED FILE PROBLEMS
▪ If directories really do contain disk addresses, then a copy
of the disk addresses will have to be made in B’s directory
when the file is linked.
▪ If either A or B subsequently appends to the file, the new
blocks will be listed only in the directory of the user doing
the append.
▪ The changes will not be visible to the other user
▪ Thus defeating the purpose of sharing.
36
SHARED FILES PROBLEMS
SOLUTION 1
▪ Disk blocks are not listed in directories, but in a i-node.
▪ The directories would then point just to the relevant i-nodes.
37
SHARED FILES PROBLEMS
SOLUTION 1
▪ Drawbacks
▪ B links to the shared file, the i-node records the file’s owner as
A.
▪ Creating a link does not change the ownership
▪ But it does increase the link count in the i-node, so the system
knows how many directory entries currently point to the file.
38
SHARED FILES PROBLEMS
SOLUTION 1
▪ Drawbacks
▪ If A later removes the file, the only thing to do is remove A’s
directory entry, but leave the i-node intact, with count set to 1,
▪ We now have a situation in which B is the only user having a
directory entry for a file owned by A.
▪ If the system does accounting or has quotas, A will continue
to be billed for the file until B decides to remove it, if ever, at
which time the count goes to 0 and the file is deleted.
39
SHARED FILES PROBLEMS
SOLUTION 2
▪ B links to one of A’s files by having the system create a
new file, of type LINK, and entering that file in B’s
directory.
▪ The new file contains just the path name of the file to which
it is linked.
▪ When B reads from the linked file, the operating system
sees that the file being read from is of type LINK, looks up
the name of the file, and reads that file.
▪ This approach is called symbolic linking, to contrast it with
traditional (hard) linking.
40
SHARED FILES PROBLEMS
SOLUTION 2
▪ Advantages
▪ With symbolic links previously described problem does not arise
because only the true owner has a pointer to the i-node.
▪ Users who have linked to the file just have path names, not
i-node pointers.
▪ When the owner removes the file, it is destroyed. Subsequent
attempts to use the file via a symbolic link will fail when the
system is unable to locate the file.
▪ Removing a symbolic link does not affect the file at all.
▪ They can be used to link to files on machines anywhere in the
world, by simply providing the network address of the machine
where the file resides in addition to its path on that machine.
41
THANK YOU
42