File-System Interface
File Concept
A file is a named collection of related information that is
recorded on secondary storage.
From a user's perspective, a file is the smallest allotment
of logical secondary storage; that is, data cannot be
written to secondary storage unless they are within a file.
Commonly, files represent programs (both source and
object forms) and data.
Data files may be numeric, alphabetic, alphanumeric, or
binary.
Files may be free form, such as text files, or may be
formatted rigidly.
In general, a file is a sequence of bits, bytes, lines, or
records, the meaning of which is defined by the file's
creator and user.
File Structure
A file has a certain defined structure, which depends on
its type.
A text file is a sequence of characters organized into
lines (and possibly pages).
A source file is a sequence of subroutines and functions,
each of which is further organized as declarations
followed by executable statements.
An object file is a sequence of bytes organized into
blocks understandable by the system's linker.
An executable file is a series of code sections that the
loader can bring into memory and execute.
File Attributes
Name – only information kept in human-readable form
Identifier – unique tag (number) identifies file within
file system
Type – needed for systems that support different
types
Location – pointer to file location on device
Size – current file size
Protection – controls who can do reading, writing,
executing
Time, date, and user identification – data for
protection, security, and usage monitoring
Information about files are kept in the directory
structure, which is maintained on the disk
File Operations
File is an abstract data type
Create
Write
Read
Reposition within file
Delete
Truncate
Open(Fi) – search the directory structure on disk for
entry Fi, and move the content of entry to memory
Close (Fi) – move the content of entry Fi in memory to
directory structure on disk
File Types – Name, Extension
Access Methods
Sequential Access
read next
write next
reset
no read after last write
(rewrite)
Direct Access
read n
write n
position to n
read next
write next
rewrite n
n = relative block number
Sequential-access File
Example of Index and Relative
Files
Directory Structure
A collection of nodes containing information about all files
Directory
Files
F1 F2 F4
F3
Fn
Both the directory structure and the files reside
on disk
Backups of these two structures are kept on
tapes
A Typical File-system Organization
Operations Performed on Directory
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Organize the Directory (Logically) to
Obtain
Efficiency – locating a file quickly
Naming – convenient to users
Two users can have same name for
different files
The same file can have several different
names
Grouping – logical grouping of files by
properties, (e.g., all Java programs, all games,
…)
Single-Level Directory
The simplest directory structure is the single-level directory. All files are
contained in the same directory, which is easy to support and understand
Naming problem
Grouping problem
Single-Level Directory
A single-level directory has significant limitations, however,
when the number of files increases or when the system has
more than one user.
Since all files are in the same directory, they must have
unique names.
Even a single user on a single-level directory may find it
difficult to remember the names of all the files as the
number of files increases.
It is not uncommon for a user to have hundreds of files on
one computer system and an equal number of additional
files on another system.
Two-Level Directory
In the two-level directory structure, each user has his own
user file directory (LTD).
The UFDs have similar structures, but each lists only the
files of a single user.
When a user refers to a particular file, only his own UFD is
Searched.
Thus, different users may have files with the same name, as
long as all the file names within each UFD are unique.
This structure effectively isolates one user from another.
Isolation is an advantage when the users are completely
independent but is a disadvantage when the users want to
cooperate on some task and to access one another's files.
A two-level directory can be thought of as a tree, or an
inverted tree, of height 2.
The root of the tree is the MFD. Its direct descendants are
the UFDs. The descendants of the UFDs are the files
themselves. The files are the leaves of the tree.
Two-Level Directory
Tree-Structured Directories
Tree-Structured Directories (Cont)
We have seen how to view a two-level directory as a two-level
tree, the natural generalization is to extend the directory structure
to a tree of arbitrary height.
This generalization allows users to create their own subdirectories
and to organize their files accordingly.
A tree is the most common directory structure. The tree has a root
directory, and every file in the system has a unique path name.
A directory (or subdirectory) contains a set of files or
subdirectories.
In normal use, each process has a current directory.
The current directory should contain most of the files that are of
current interest to the process.
When reference is made to a file, the current directory is searched.
If a file is needed that is not in the current directory, then the user
usually must either specify a path name or change the current
directory to be the directory holding that file.
Tree-Structured Directories (Cont)
To change directories, a system call is provided that takes
a directory name as a parameter and uses it to redefine
the current directory.
Thus, the user can change his current directory whenever
he desires.
Path names can be of two types: absolute and relative.
An absolute path name begins at the root and follows a
path down to the specified file, giving the directory names
on the path.
A relative path name defines a path from the current
directory.
Tree-Structured Directories (Cont)
Creating a new file is done in current directory
Delete a file
rm <file-name>
Creating a new subdirectory is done in current
directory
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
mail
prog copy prt expcount
Deleting “mail” deleting the entire subtree rooted
by “mail”
Acyclic-Graph Directories
Have shared subdirectories and files
Acyclic-Graph Directories (Cont.)
A tree structure prohibits the sharing of files or directories. An
acyclic graph —that is, a graph with no cycles—allows
directories to share subdirectories and files.
The same file or subdirectory may be in two different
directories.
The acyclic graph is a natural generalization of the tree-
structured directory scheme.
With a shared file, only one actual file exists, so any changes
made by one person are immediately visible to the other.
Sharing is particularly important for subdirectories; a new file
created by one person will automatically appear in all the
shared subdirectories.
When people are working as a team, all the files they want to
share can be put into one directory.
The UFD of each team member will contain this directory of
shared files as a subdirectory.
General Graph Directory
General Graph Directory (Cont.)
A serious problem with using an acyclic-graph structure is
ensuring that there are no cycles.
If cycles are allowed to exist in the directory, we likewise want
to avoid searching any component twice, for reasons of
correctness as well as performance.
A poorly designed algorithm might result in an infinite loop
continually searching through the cycle and never terminating.
One solution is to limit arbitrarily the number of directories
that will be accessed during a search.
A similar problem exists when we are trying to determine when
a file can be deleted. With acyclic-graph directory structures,
a value of 0 in the reference count means that there are no
more references to the file or directory, and the file can be
deleted.
However, when cycles exist, the reference count may not be 0
even when it is no longer possible to refer to a directory or file.
General Graph Directory (Cont.)
In this case, we generally need to use a garbage-
collection scheme to determine when the last reference
has been deleted and the disk space can be reallocated.
Garbage collection involves traversing the entire file
system, marking everything that can be accessed. Then, a
second pass collects everything that is not marked onto a
list of free space.
Garbage collection, however, is extremely time
consuming and is thus seldom attempted.
File Sharing
Sharing of files on multi-user systems is desirable
Sharing may be done through a protection scheme
On distributed systems, files may be shared across a
network
Network File System (NFS) is a common distributed
file-sharing method
File Sharing – Multiple Users
User IDs identify users, allowing permissions
and protections to be per-user
Group IDs allow users to be in groups,
permitting group access rights
Protection
File owner/creator should be able to control:
what can be done
by whom
Types of access
Read
Write
Execute
Append
Delete
List
Access Lists and Groups
Mode of access: read, write, execute
Three classes of users
RWX
a) owner access 7 11
1
RWX
b) group access 6 110
RWX
c) public access 1 001
Ask manager to create a group (unique name), say G, and
add some users to the group.
For a particular file (say game) or subdirectory, define an
appropriate access.
owner group public
chmod 761 game
Attach a group to a file
chgrp G game