File Organization and Processing
Lecture 3
File System
Mohamed Mead
Introduction
A file is an organized collection of data.
The organization of the file depends on the use of the data
and is determined by the program, operating system or
user who created the file.
All data in the computer is stored and retrieved as
files. Thus, files may take many different forms
Introduction
A data file consisting of alphanumeric Unicode
text that represents a program in source code
form and will serve as ‘‘data’’ input to a C++ compiler.
A data file configured in some special way to
represent an image, sound, or other object.
Introduction
The file system permits users to create data collections,
called files, with desirable properties, such as:
Long-term existence: Files are stored on disk or other secondary
storage and do not disappear when a user logs off.
Sharable between processes: Files have names and can have associated
access permissions that permit controlled sharing.
Structure: Depending on the file system, a file can have an internal
structure that is convenient for particular applications. In addition, files
can be organized into hierarchical or more complex structure to reflect
the relationships among files.
File Structure
Four terms are in common use when discussing files:
Field
Record
File
Database
field
A field is the basic element of data.
An individual field contains a single value, such as an
employee’s last name, a date, or the value of a
sensor reading.
It is characterized by its length and data type (e.g.,
ASCII string, decimal). Depending on the file
design, fields may be fixed length or variable
length.
record
A record is a collection of related fields that can be treated
as a unit by some application program.
For example, an employee record would contain such fields as
name, social security number, job classification, date of hire,
and so on.
Again, depending on design, records may be of fixed length or
variable length.
A record will be of variable length if some of its fields are of
variable length or if the number of fields may vary.
File
A file is a collection of similar records.
The file is treated as a single entity by users and applications.
Files have file names and may be created and deleted.
Access control restrictions usually apply at the file level. That is, in a
shared system, users and programs are granted or denied access to entire
files.
In some more sophisticated systems, such controls are enforced at the
record or even the field level.
Some file systems are structured only in terms of fields, not records. In
that case, a file is a collection of fields.
Database
A database is a collection of related data.
A database may contain all of the information related to an
organization or project, such as a business or a scientific
study.
The database itself consists of one or more types of files.
Usually, there is a separate database management system
that is independent of the operating system, although that
system may make use of some file management programs.
File Structure
Typical operations that must be supported include the
following:
Retrieve_All : Retrieve all the records of a file. This
will be required for an application that must process all
of the information in the file at one time.
For example, an application that produces a summary of
the information in the file would need to retrieve all
records.
File Structure
Retrieve_One : This requires the retrieval of just a
single record.
transaction-oriented applications need this operation.
Retrieve_Next : This requires the retrieval of the
record that is “next” in some logical sequence to the
most recently retrieved record.
A program that is performing a search may also use
this operation.
File Structure
Retrieve_Previous : Similar to Retrieve_Next , but
in this case the record that is “previous” to the
currently accessed record is retrieved.
Insert_One : Insert a new record into the file. It
may be necessary that the new record fit into a
particular position to preserve a sequencing of the
file.
File Structure
Delete_One : Delete an existing record. data structures may need
to be updated to preserve the sequencing of the file.
Update_One : Retrieve a record, update one or more of its fields,
and rewrite the updated record back into the file. Again, it may be
necessary to preserve sequencing with this operation.
If the length of the record has changed, the update operation is
generally more difficult than if the length is preserved.
Retrieve_Few : Retrieve a number of records. For example, an
application or user may wish to retrieve all records that satisfy a
certain set of criteria.
File Structure
1. Linux
Linux supports various file systems, but
common choices for the system disk on a block
device include XFS, JFS, and btrsfs.
For raw flash there are UBIFS, JFFS 2 and
YAFFS, among others.
File Structure
2. macOS
macOS uses the Apple File System(APFS),
which recently replaced a file system inherited
from classic mac old called HFS Plus(HFS+).
File Structure
3. Microsoft Windows
Windows makes use of the FAT, NTFS, exFAT
and ReFS file systems (the last of these is only
supported and usable in Windows Servers,
Windows 8,8.1,10.