Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views40 pages

OS Unit V - File Management

Uploaded by

milekeb824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views40 pages

OS Unit V - File Management

Uploaded by

milekeb824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

OPERATING SYSTEM (OS)

A.SWAMY GOUD,
ASSISTANT PROFESSOR,
BVRIT, NSP
UNIT - V
FILE MANAGEMENT
OS Syllabus - Unit V
File Management
Concept of a File
Access Methods
Directory Structure
File System Structure
Allocation Methods (Contiguous, linked, indexed)
Free Space Management (bit vector, linked list,
grouping)
Directory Implementation (linear list, hash table)
Efficiency and Performance.
Concept of a File
For most users, the file system is the most visible aspect of an
Operating System.
It provides the mechanism for on-line storage of and access to
both data and programs of the Operating System and all the
users of the computer system.
The file system consists of two distinct parts: a collection of
Files, each storing related data, and a Directory structure,
Computers can store information on various storage media,
such as Magnetic Disks, Magnetic Tapes, and Optical Disks.
This makes the computer system convenient to use, the
operating system provides a uniform logical view of information
storage.
The operating system abstracts from the physical properties of
its storage devices to define a logical storage unit, the file.
Files mapped onto physical devices, usually nonvolatile.
Concept of a File
A file is a named collection of related information that is
recorded on secondary storage.
 Smallest allotment of nameable storage
Contiguous logical address space
Commonly, files represent programs (both source and object
forms) and data
 Data
 Numeric
 Character
 Binary
 Program
 Files may be free form or rigidly formed
In general, a file is a sequence of bits, bytes, lines, or records,
the meaning of which is defined by the file's creator and user.
File Structure
 A file is named, for the convenience of its human users, and is
referred to by its name.
 A file has a certain defined structure which depends on its type.
 No structure - sequence of words, bytes.
 Simple record structure
 Lines
 Fixed length
 Variable length
 Complex Structures
 Formatted document
 Relocatable load file
 Who decides :
 Operating System
 Program / Programmer
File Attributes
 Name – only information kept in human-readable form
 Identifier – unique tag (number) identifies file within file system
 Type – needed for systems that support different types
 Location – pointer to file location on device
 Size – current file size
 Protection – Access-controls who can do reading, writing,
executing, etc.
 Time, date, and user identification – data for protection,
security, and usage monitoring
 Information about files are kept in the directory structure, which is
maintained on the disk
 Typically file’s name and identifier
 Attributes size may be > 1KB
 In a system with many files, Directory structures may be > 1MB
File Operations
 File is an Abstract data type
 Operations include the following (and usually more)
 Create – Two steps - find space, add entry to directory
 Write – write data at current file position pointer location
and update pointer
 Read – read file contents at pointer location, update pointer
 Reposition within file (Seek) – change pointer location
 Delete – free space and remove entry from directory
 Truncate – delete data starting at pointer
 Other common operations include appending new
information to the end of an existing file and renaming an
existing file.
File Types
 Most operating systems recognize file types
 A common technique for implementing file types is to include the
type as part of the file name.
 The name is split into two parts - a name and an extension,
usually separated by a period character
 Ex: resume.doc, server.java, readerthread.c
 Automatically open a type of file via a specific application (.doc)
 Only execute files of a given extension (.exe, .com)
 Run files of a given type via a scripting language (.bat)
 Can get more advanced
 If source code modified since executable compiled, if attempt
made to execute, recompile and then execute
 Mac OS encodes creating program’s name in file attributes
 Double clicking on file passes the file name to appropriate
application
 UNIX has magic number stored in file at first byte indicating file
type
File Structure
 File types also can be used to indicate the internal structure of the
file.
 Certain files must conform to a required structure that is
understood by the operating system.
 Some operating systems extend into a set of system-supported file
structures, with sets of special operations for manipulating files with
those structures.
 For instance, DEC's VMS operating system has a file system that
supports three defined file structures.
 One problem - the operating system support multiple file structures:
the resulting size of the operating system is cumbersome.
 If the operating system defines five different file structures, it needs
to contain the code to support these file structures.
 Some operating systems impose (and support) a minimal number
of file structures.
 This approach has been adopted in UNIX, MS-DOS, Macintosh
and others.
Access Methods
 Files store information. When it is used, this information
must be accessed and read into computer memory.
 The information in the file can be accessed in several
ways.
 Some systems provide only one access method for files.
 Other systems, such as those of IBM, support many access
methods, and choosing the right one for a particular
application is a major design problem.
 Basically files are accessed in two ways
Sequential Access
Direct Access
Access Methods
 Sequential Access – based on tape model of a file.
 The simplest access method is Sequential Access.
 Information in the file is processed in order, one record
after the other.
 Editors and Compilers usually access files in this fashion.
 Reads and writes are the major operations on a file.
Read Next
Write Next
Reset
Access Methods
 Direct Access – random access, relative access.
 A file is made up of fixed length that allow programs to read
and write records rapidly in no particular order.
 The direct-access method is based on a disk model of a
file, since disks allow random access to any file block.
 Direct-access files are of great use for immediate access to
large amounts of information.
 Databases are often of this type.
Read N
Write N
Position to N
Read Next
Write Next
Rewrite N
N = Relative Block Number.
Simulation of Sequential
Access on Direct-access File
 Not all operating systems support both sequential and
direct access for files.
 Some systems allow only sequential file access; others
allow only direct access.
 We can easily simulate sequential access on a direct-
access file
Disk Structure
 Computer stores typically thousand, millions, and even
billions of files within a computer.
 Files are stored on random-access storage devices,
including Hard Disks, Optical Disks, and Solid State
(memory-based) Disks.
 A storage device can be used in its entirety for a file system.
 It can also be subdivided into Partitions for finer-grained
control.
 Disks or partitions can be RAID protected against failure
 Disk or Partition can be used raw – without a file system, or
Formatted with a file system
 Partitions also known as Minidisks, Slices
 Entity containing file system known as a Volume
Disk Structure
 Each volume containing file system also tracks that file
system’s info in Device Directory or Volume table of
contents or simply as Directory)
 Records information - such as name, location, size, and
type - for all files on the volume.
 Computer systems may have zero or more file systems, and
the file systems may be of varying types.
 Similar to general-purpose file systems, there are many
special-purpose file systems, frequently all within the
same operating system or computer
A Typical File-system
Organization
Directory Structure
 Directory similar to symbol table translating file names to
their directory entries
 Can be organized in many ways
 We want to be able to insert entries, to delete entries, to
search for a named entry, and to list all the entries in the
directory.
 Organization needs to support operations including:
 Search for a file or multiple files
 Create a file
 Delete a file
 List a directory
 Rename a file
 Traverse the file system
Directory Organization
 Should have the features
 Efficiency – locating a file quickly
 Naming – convenient to users
 Two users can have same name for different files
 The same file can have several different names
 Grouping – logical grouping of files by properties, (e.g., all
Java programs, all games, …) or arbitrarily
 Following are the most common schemes for defining the
logical structure of a directory.
 Single-level Directory
 Two-Level Directory
 Tree-Structured Directories
Single-Level Directory
 The simplest directory structure is the single-level directory.
 All files are contained in the same directory

Limitations :
 Naming problem : when the number of files increases or when
the system has more than one user. If two users give same file
name, then the unique-name rule is violated.
 Grouping problem : Even a single user on a single-level
directory may find it difficult to group them and remember the
names of all the files.
Two-Level Directory
 The standard solution for single level directory limitations is to create a
separate directory for each user.
 In the two-level directory structure, each user has his own user file
directory (UFD)
 When a user refers to a particular file, only his own UFD is searched.
 Thus, different users may have files with the same name, as long as all
the file names within each UFD are unique.
 Advantage : Efficient searching
 Limitation : No grouping capability
Added Directory Concepts
 Many variations, but some components essential
 Idea of current directory – default location for activities
 Now need a path specification
 If file is in current directory, just name it
 If in another directory, must specify by more detailed name
 Also need way to specify different filesystems
 MS-DOS gives letter to each volume, “\” separates directory
name from file name – C:\userb\test
 VMS uses letter for volume and “[]” for directory specification –
u:[sst.jdeck]login.com;1
 Note the support for versions via the trailing number
 Unix treats volume name as part of directory name -
/u/pbg/test

 Many OS search a set of paths for command names


 “ls” might search in current directory then in system directories
Tree-Structured Directories
 A two-level directory as a two-level tree can be extended the
directory structure to a tree of arbitrary height
Tree-Structured Directories
 This generalization allows users to create their own
subdirectories and to organize their files
 A tree is the most common directory structure. The tree has
a root directory, and every file in the system has a unique
path name.
 Allows users to create directories within their directory.
 Directory can then contain files or other directories
 Directory can be another file with defined formatting and
attribute indicating its type
 Separate system calls to manage directory actions
 If a file is needed that is not in the current directory, then the
user usually must either specify a path name or change the
current directory to be the directory holding that file.
Tree-Structured Directories
 Path names can be of two types:
 Absolute Path
 Relative Path
 An Absolute Path begins at the root and follows a down to
the specified file, giving the directory names on the path
 Absolute path is full specification of file local –
Example /foo/bar/baz
 The Relative Path defines a path from the current directory.
Example ../baz
 Efficient searching : Search path
 Grouping Capability
File-System Structure
 Disks provide the bulk of secondary storage on which a file
system is maintained.
 Disks have two characteristics that make them a convenient
medium for storing multiple files:
1. A disk can be rewritten in place
2. A disk can access directly any block of information it contains.
 I/O transfers performed in blocks of sectors (usually 512 bytes)
 File structure
 Logical storage unit
 Collection of related information
 File system resides on secondary storage (disks)
 Provided user interface to storage, mapping logical to physical
 Provides efficient and convenient access to disk by allowing
data to be stored, located retrieved easily
File-System Structure
 File systems provide efficient and convenient access to the
disk by allowing data to be stored, located, and retrieved
easily.
 A file system poses two quite different design problems.
 The first problem is defining how the file system should look
to the user.
 The second problem is creating algorithms and data
structures to map the logical file system onto the physical
secondary-storage devices.
 File control block – storage structure consisting of
information about a file
 Device driver controls the physical device
 The file system itself is generally composed of many
different levels.
Layered File System
 Each level in the design uses the
features of lower levels to create new
features for use by higher levels.
 The lowest level, the I/O control,
consists of Device drivers
 A device driver can be thought of as a
translator.
 Device drivers manage I/O devices at
the I/O control layer
 Given commands like “read drive1,
cylinder 72, track 2, sector 10, into
memory location 1060” outputs low-
level hardware specific commands to
hardware controller
File System Layers
 The basic file system needs only to issue generic
commands to the appropriate device driver to read and
write physical blocks on the disk.
 Basic file system given command like “retrieve block 123”
translates to device driver
 Also manages memory buffers (hold data in transit ) and
caches (hold frequently used data)
 File organization module understands files, logical address,
and physical blocks
 File-organization module can translate logical block
addresses to physical block addresses for the basic file
system to transfer.
 The file-organization module also includes the free-space
manager, which tracks unallocated blocks
File System Layers
 Logical file system manages metadata information
 Metadata includes all of the file-system structure except the
actual data
 Translates file name into file number, file handle, location by
maintaining file control blocks (inodes in Unix)
 file-control blocks contains information about the file,
including ownership, permissions, and location of the file
contents.
 When a layered structure is used for file-system
implementation, duplication of code is minimized.
 Layering useful for reducing complexity and redundancy, but
adds overhead and can decrease performance
 Logical layers can be implemented by any coding method
according to OS designer
File System Layers
 Many file systems, sometimes many within an operating
system
 Each with its own format
 CD-ROM are written in ISO 9660
 Unix has UFS (Unix File System), FFS (Fast File System)
 Windows has FAT (File Allocation Table), FAT32, NTFS
(New Technology File System) as well as floppy, CD,
DVD Blu-ray file-system formats.
 Linux has more than 40 types, the standard Linux file
system is known as the extended file system with version
ext2 and ext3
 Distributed file systems, etc.
 New ones – ZFS (Z File System), GoogleFS, Oracle ASM
(Automatic Storage Management), FUSE (Filesystem in
Userspace)
Directory Implementation
 The selection of directory-allocation and directory-management
algorithms significantly affects the efficiency, performance, and reliability
of the file system.
 Following are the algorithms used to implement Directory.
 Linear list - The simplest method of implementing a directory is to use a
linear list of file names with pointers to the data blocks.
 Simple to program
 Time-consuming to execute
 Linear search time
 Could keep ordered alphabetically via linked list or use B+ tree
 Hash Table – linear list with hash data structure.
 With this method, a linear list stores the directory entries, but a hash data
structure is also used.
 The hash table takes a value computed from the file name and returns a pointer
to the file name in the linear list.
 Decreases directory search time
 Collisions – situations where two file names hash to the same location
 Only good if entries are fixed size, or use chained-overflow method
Allocation Methods -
Contiguous
 An allocation method refers to how disk blocks are allocated for
files:
 Three major methods of allocating disk space are in wide use:
contiguous, linked, and indexed.
 Contiguous allocation – requires that each file occupy a set of
contiguous blocks on the disk
 Disk addresses define a linear ordering on the disk.
 Contiguous allocation of a file is defined by the disk address and
length (in block units) of the first block.
 Best performance in most cases
 Simple – only starting location (block #) and length (number of
blocks) are required
 Problems include finding space for file, knowing file size, external
fragmentation, need for compaction off-line or on-line
Contiguous Allocation of Disk
Space
Allocation Methods - Linked
 Linked allocation – each file a linked list of blocks.
 Each block contains a pointer to the next block.
 File ends at NIL pointer
 No external fragmentation with linked allocation, and any free
block on the free-space list can be used to satisfy a request.
 Free space management system called when new block needed
 Improve efficiency by clustering blocks into groups but increases
internal fragmentation
 Much like a linked list, but faster on disk and cacheable
 New block allocation simple
 The major problem is that it can be used effectively only for
sequential-access files.
 To filed the ith block of a file, we must start at the beginning of that
file and follow the pointers until we get to the ith block.
 Locating a block can take many I/Os and disk seeks
Linked Allocation
Allocation Methods - Indexed
 Linked allocation solves the external-fragmentation and size-
declaration problems of contiguous allocation.
 However, in the absence of a FAT, linked allocation cannot
support efficient direct access, since the pointers to the
blocks are scattered with the blocks themselves all over the
disk and must be retrieved in order.
 Indexed allocation solves this problem by bringing all the
pointers together into one location: the Index Block
 Each file has its own index block, which is an array of disk-
block addresses.
 The ith entry in the index block points to the ith block of the
file.
 The directory contains the address of the index block.
 To find and read the ith block, we use the pointer in the ith
index-block entry.
Indexed Allocation
Efficiency and Performance
 Disks tend to represent a major bottleneck in system
performance, since they are the slowest main computer
component.
 Efficiency dependent on:
 Disk allocation and directory algorithms in use.
 Types of data kept in file’s directory entry
 Pre-allocation or as-needed allocation of metadata
structures
 Fixed-size or varying-size data structures
Efficiency and Performance
 Even after the basic file-system algorithms have been selected,
we can still improve performance in several ways.
Performance:
 Keeping data and metadata close together
 Buffer cache – separate section of main memory for frequently
used blocks
 Another issue that can affect the performance of I/0 is whether
writes to the file system occur synchronously or
asynchronously.
 Synchronous writes sometimes requested by apps or needed by
OS
 No buffering / caching – writes must hit disk before
acknowledgement
 Asynchronous write - The data are stored in the cache, and
control returns to the caller.
 Asynchronous writes more common, buffer-able, faster

You might also like