OPERATING SYSTEM (OS)
A.SWAMY GOUD,
ASSISTANT PROFESSOR,
BVRIT, NSP
UNIT - V
FILE MANAGEMENT
OS Syllabus - Unit V
File Management
Concept of a File
Access Methods
Directory Structure
File System Structure
Allocation Methods (Contiguous, linked, indexed)
Free Space Management (bit vector, linked list,
grouping)
Directory Implementation (linear list, hash table)
Efficiency and Performance.
Concept of a File
For most users, the file system is the most visible aspect of an
Operating System.
It provides the mechanism for on-line storage of and access to
both data and programs of the Operating System and all the
users of the computer system.
The file system consists of two distinct parts: a collection of
Files, each storing related data, and a Directory structure,
Computers can store information on various storage media,
such as Magnetic Disks, Magnetic Tapes, and Optical Disks.
This makes the computer system convenient to use, the
operating system provides a uniform logical view of information
storage.
The operating system abstracts from the physical properties of
its storage devices to define a logical storage unit, the file.
Files mapped onto physical devices, usually nonvolatile.
Concept of a File
A file is a named collection of related information that is
recorded on secondary storage.
Smallest allotment of nameable storage
Contiguous logical address space
Commonly, files represent programs (both source and object
forms) and data
Data
Numeric
Character
Binary
Program
Files may be free form or rigidly formed
In general, a file is a sequence of bits, bytes, lines, or records,
the meaning of which is defined by the file's creator and user.
File Structure
A file is named, for the convenience of its human users, and is
referred to by its name.
A file has a certain defined structure which depends on its type.
No structure - sequence of words, bytes.
Simple record structure
Lines
Fixed length
Variable length
Complex Structures
Formatted document
Relocatable load file
Who decides :
Operating System
Program / Programmer
File Attributes
Name – only information kept in human-readable form
Identifier – unique tag (number) identifies file within file system
Type – needed for systems that support different types
Location – pointer to file location on device
Size – current file size
Protection – Access-controls who can do reading, writing,
executing, etc.
Time, date, and user identification – data for protection,
security, and usage monitoring
Information about files are kept in the directory structure, which is
maintained on the disk
Typically file’s name and identifier
Attributes size may be > 1KB
In a system with many files, Directory structures may be > 1MB
File Operations
File is an Abstract data type
Operations include the following (and usually more)
Create – Two steps - find space, add entry to directory
Write – write data at current file position pointer location
and update pointer
Read – read file contents at pointer location, update pointer
Reposition within file (Seek) – change pointer location
Delete – free space and remove entry from directory
Truncate – delete data starting at pointer
Other common operations include appending new
information to the end of an existing file and renaming an
existing file.
File Types
Most operating systems recognize file types
A common technique for implementing file types is to include the
type as part of the file name.
The name is split into two parts - a name and an extension,
usually separated by a period character
Ex: resume.doc, server.java, readerthread.c
Automatically open a type of file via a specific application (.doc)
Only execute files of a given extension (.exe, .com)
Run files of a given type via a scripting language (.bat)
Can get more advanced
If source code modified since executable compiled, if attempt
made to execute, recompile and then execute
Mac OS encodes creating program’s name in file attributes
Double clicking on file passes the file name to appropriate
application
UNIX has magic number stored in file at first byte indicating file
type
File Structure
File types also can be used to indicate the internal structure of the
file.
Certain files must conform to a required structure that is
understood by the operating system.
Some operating systems extend into a set of system-supported file
structures, with sets of special operations for manipulating files with
those structures.
For instance, DEC's VMS operating system has a file system that
supports three defined file structures.
One problem - the operating system support multiple file structures:
the resulting size of the operating system is cumbersome.
If the operating system defines five different file structures, it needs
to contain the code to support these file structures.
Some operating systems impose (and support) a minimal number
of file structures.
This approach has been adopted in UNIX, MS-DOS, Macintosh
and others.
Access Methods
Files store information. When it is used, this information
must be accessed and read into computer memory.
The information in the file can be accessed in several
ways.
Some systems provide only one access method for files.
Other systems, such as those of IBM, support many access
methods, and choosing the right one for a particular
application is a major design problem.
Basically files are accessed in two ways
Sequential Access
Direct Access
Access Methods
Sequential Access – based on tape model of a file.
The simplest access method is Sequential Access.
Information in the file is processed in order, one record
after the other.
Editors and Compilers usually access files in this fashion.
Reads and writes are the major operations on a file.
Read Next
Write Next
Reset
Access Methods
Direct Access – random access, relative access.
A file is made up of fixed length that allow programs to read
and write records rapidly in no particular order.
The direct-access method is based on a disk model of a
file, since disks allow random access to any file block.
Direct-access files are of great use for immediate access to
large amounts of information.
Databases are often of this type.
Read N
Write N
Position to N
Read Next
Write Next
Rewrite N
N = Relative Block Number.
Simulation of Sequential
Access on Direct-access File
Not all operating systems support both sequential and
direct access for files.
Some systems allow only sequential file access; others
allow only direct access.
We can easily simulate sequential access on a direct-
access file
Disk Structure
Computer stores typically thousand, millions, and even
billions of files within a computer.
Files are stored on random-access storage devices,
including Hard Disks, Optical Disks, and Solid State
(memory-based) Disks.
A storage device can be used in its entirety for a file system.
It can also be subdivided into Partitions for finer-grained
control.
Disks or partitions can be RAID protected against failure
Disk or Partition can be used raw – without a file system, or
Formatted with a file system
Partitions also known as Minidisks, Slices
Entity containing file system known as a Volume
Disk Structure
Each volume containing file system also tracks that file
system’s info in Device Directory or Volume table of
contents or simply as Directory)
Records information - such as name, location, size, and
type - for all files on the volume.
Computer systems may have zero or more file systems, and
the file systems may be of varying types.
Similar to general-purpose file systems, there are many
special-purpose file systems, frequently all within the
same operating system or computer
A Typical File-system
Organization
Directory Structure
Directory similar to symbol table translating file names to
their directory entries
Can be organized in many ways
We want to be able to insert entries, to delete entries, to
search for a named entry, and to list all the entries in the
directory.
Organization needs to support operations including:
Search for a file or multiple files
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Directory Organization
Should have the features
Efficiency – locating a file quickly
Naming – convenient to users
Two users can have same name for different files
The same file can have several different names
Grouping – logical grouping of files by properties, (e.g., all
Java programs, all games, …) or arbitrarily
Following are the most common schemes for defining the
logical structure of a directory.
Single-level Directory
Two-Level Directory
Tree-Structured Directories
Single-Level Directory
The simplest directory structure is the single-level directory.
All files are contained in the same directory
Limitations :
Naming problem : when the number of files increases or when
the system has more than one user. If two users give same file
name, then the unique-name rule is violated.
Grouping problem : Even a single user on a single-level
directory may find it difficult to group them and remember the
names of all the files.
Two-Level Directory
The standard solution for single level directory limitations is to create a
separate directory for each user.
In the two-level directory structure, each user has his own user file
directory (UFD)
When a user refers to a particular file, only his own UFD is searched.
Thus, different users may have files with the same name, as long as all
the file names within each UFD are unique.
Advantage : Efficient searching
Limitation : No grouping capability
Added Directory Concepts
Many variations, but some components essential
Idea of current directory – default location for activities
Now need a path specification
If file is in current directory, just name it
If in another directory, must specify by more detailed name
Also need way to specify different filesystems
MS-DOS gives letter to each volume, “\” separates directory
name from file name – C:\userb\test
VMS uses letter for volume and “[]” for directory specification –
u:[sst.jdeck]login.com;1
Note the support for versions via the trailing number
Unix treats volume name as part of directory name -
/u/pbg/test
Many OS search a set of paths for command names
“ls” might search in current directory then in system directories
Tree-Structured Directories
A two-level directory as a two-level tree can be extended the
directory structure to a tree of arbitrary height
Tree-Structured Directories
This generalization allows users to create their own
subdirectories and to organize their files
A tree is the most common directory structure. The tree has
a root directory, and every file in the system has a unique
path name.
Allows users to create directories within their directory.
Directory can then contain files or other directories
Directory can be another file with defined formatting and
attribute indicating its type
Separate system calls to manage directory actions
If a file is needed that is not in the current directory, then the
user usually must either specify a path name or change the
current directory to be the directory holding that file.
Tree-Structured Directories
Path names can be of two types:
Absolute Path
Relative Path
An Absolute Path begins at the root and follows a down to
the specified file, giving the directory names on the path
Absolute path is full specification of file local –
Example /foo/bar/baz
The Relative Path defines a path from the current directory.
Example ../baz
Efficient searching : Search path
Grouping Capability
File-System Structure
Disks provide the bulk of secondary storage on which a file
system is maintained.
Disks have two characteristics that make them a convenient
medium for storing multiple files:
1. A disk can be rewritten in place
2. A disk can access directly any block of information it contains.
I/O transfers performed in blocks of sectors (usually 512 bytes)
File structure
Logical storage unit
Collection of related information
File system resides on secondary storage (disks)
Provided user interface to storage, mapping logical to physical
Provides efficient and convenient access to disk by allowing
data to be stored, located retrieved easily
File-System Structure
File systems provide efficient and convenient access to the
disk by allowing data to be stored, located, and retrieved
easily.
A file system poses two quite different design problems.
The first problem is defining how the file system should look
to the user.
The second problem is creating algorithms and data
structures to map the logical file system onto the physical
secondary-storage devices.
File control block – storage structure consisting of
information about a file
Device driver controls the physical device
The file system itself is generally composed of many
different levels.
Layered File System
Each level in the design uses the
features of lower levels to create new
features for use by higher levels.
The lowest level, the I/O control,
consists of Device drivers
A device driver can be thought of as a
translator.
Device drivers manage I/O devices at
the I/O control layer
Given commands like “read drive1,
cylinder 72, track 2, sector 10, into
memory location 1060” outputs low-
level hardware specific commands to
hardware controller
File System Layers
The basic file system needs only to issue generic
commands to the appropriate device driver to read and
write physical blocks on the disk.
Basic file system given command like “retrieve block 123”
translates to device driver
Also manages memory buffers (hold data in transit ) and
caches (hold frequently used data)
File organization module understands files, logical address,
and physical blocks
File-organization module can translate logical block
addresses to physical block addresses for the basic file
system to transfer.
The file-organization module also includes the free-space
manager, which tracks unallocated blocks
File System Layers
Logical file system manages metadata information
Metadata includes all of the file-system structure except the
actual data
Translates file name into file number, file handle, location by
maintaining file control blocks (inodes in Unix)
file-control blocks contains information about the file,
including ownership, permissions, and location of the file
contents.
When a layered structure is used for file-system
implementation, duplication of code is minimized.
Layering useful for reducing complexity and redundancy, but
adds overhead and can decrease performance
Logical layers can be implemented by any coding method
according to OS designer
File System Layers
Many file systems, sometimes many within an operating
system
Each with its own format
CD-ROM are written in ISO 9660
Unix has UFS (Unix File System), FFS (Fast File System)
Windows has FAT (File Allocation Table), FAT32, NTFS
(New Technology File System) as well as floppy, CD,
DVD Blu-ray file-system formats.
Linux has more than 40 types, the standard Linux file
system is known as the extended file system with version
ext2 and ext3
Distributed file systems, etc.
New ones – ZFS (Z File System), GoogleFS, Oracle ASM
(Automatic Storage Management), FUSE (Filesystem in
Userspace)
Directory Implementation
The selection of directory-allocation and directory-management
algorithms significantly affects the efficiency, performance, and reliability
of the file system.
Following are the algorithms used to implement Directory.
Linear list - The simplest method of implementing a directory is to use a
linear list of file names with pointers to the data blocks.
Simple to program
Time-consuming to execute
Linear search time
Could keep ordered alphabetically via linked list or use B+ tree
Hash Table – linear list with hash data structure.
With this method, a linear list stores the directory entries, but a hash data
structure is also used.
The hash table takes a value computed from the file name and returns a pointer
to the file name in the linear list.
Decreases directory search time
Collisions – situations where two file names hash to the same location
Only good if entries are fixed size, or use chained-overflow method
Allocation Methods -
Contiguous
An allocation method refers to how disk blocks are allocated for
files:
Three major methods of allocating disk space are in wide use:
contiguous, linked, and indexed.
Contiguous allocation – requires that each file occupy a set of
contiguous blocks on the disk
Disk addresses define a linear ordering on the disk.
Contiguous allocation of a file is defined by the disk address and
length (in block units) of the first block.
Best performance in most cases
Simple – only starting location (block #) and length (number of
blocks) are required
Problems include finding space for file, knowing file size, external
fragmentation, need for compaction off-line or on-line
Contiguous Allocation of Disk
Space
Allocation Methods - Linked
Linked allocation – each file a linked list of blocks.
Each block contains a pointer to the next block.
File ends at NIL pointer
No external fragmentation with linked allocation, and any free
block on the free-space list can be used to satisfy a request.
Free space management system called when new block needed
Improve efficiency by clustering blocks into groups but increases
internal fragmentation
Much like a linked list, but faster on disk and cacheable
New block allocation simple
The major problem is that it can be used effectively only for
sequential-access files.
To filed the ith block of a file, we must start at the beginning of that
file and follow the pointers until we get to the ith block.
Locating a block can take many I/Os and disk seeks
Linked Allocation
Allocation Methods - Indexed
Linked allocation solves the external-fragmentation and size-
declaration problems of contiguous allocation.
However, in the absence of a FAT, linked allocation cannot
support efficient direct access, since the pointers to the
blocks are scattered with the blocks themselves all over the
disk and must be retrieved in order.
Indexed allocation solves this problem by bringing all the
pointers together into one location: the Index Block
Each file has its own index block, which is an array of disk-
block addresses.
The ith entry in the index block points to the ith block of the
file.
The directory contains the address of the index block.
To find and read the ith block, we use the pointer in the ith
index-block entry.
Indexed Allocation
Efficiency and Performance
Disks tend to represent a major bottleneck in system
performance, since they are the slowest main computer
component.
Efficiency dependent on:
Disk allocation and directory algorithms in use.
Types of data kept in file’s directory entry
Pre-allocation or as-needed allocation of metadata
structures
Fixed-size or varying-size data structures
Efficiency and Performance
Even after the basic file-system algorithms have been selected,
we can still improve performance in several ways.
Performance:
Keeping data and metadata close together
Buffer cache – separate section of main memory for frequently
used blocks
Another issue that can affect the performance of I/0 is whether
writes to the file system occur synchronously or
asynchronously.
Synchronous writes sometimes requested by apps or needed by
OS
No buffering / caching – writes must hit disk before
acknowledgement
Asynchronous write - The data are stored in the cache, and
control returns to the caller.
Asynchronous writes more common, buffer-able, faster