1/8/23
Contents
1 Introduction to File Structures
2 History of File Structures Design
3 File Basics
4 File Management
7 LOGO
7
Introduction to
File Structures
8 LOGO
8
1
1/8/23
File Structure
v A File Structure is a combination of
representations for data in files and
of operations for accessing the data.
v A File Structure allows applications
to read, write and modify data.
v It might also support finding the data that matches some
search criteria or reading through the data in some
particular order.
9 LOGO
9
Data Processing
v Data processing from a computer science perspective
involves:
§ Storage of data
§ Organization of data
§ Access to data
v This will be built on your knowledge of:
10 LOGO
10
2
1/8/23
Data Structures vs. File Structures
v Both involve:
§ Representation of Data
§ Operations for accessing data
v Difference:
§ Data Structures deal with data in main memory
§ File Structures deal with data in secondary storage device (File)
Main Storage
(Memory) Secondary Storage
Data Structures File Structures
11 LOGO
11
Computer Architecture
12 LOGO
12
3
1/8/23
Main Memory vs. Secondary Storage
v Main Memory
§ Fast (since electronic)
§ Small (since expensive)
§ Volatile (information is lost when power failure
occurs)
v Secondary Storage
§ Slow (since electronic and mechanical)
§ Large (since cheap)
§ Stable, persistent (information is preserved longer)
13 LOGO
13
How Fast …?
v Typical times for getting information
§ Main memory: ~120 nanoseconds = 120 × 10-9
§ Magnetic Disks: ~30 milliseconds = 30 × 10-6
v An analogy keeping same time proportion as
above
§ Looking at the index of a book: 20 seconds
versus
§ Going to the library: 58 days
14 LOGO
14
4
1/8/23
Memory Hierarchy
CPU
Cache
Data
Request
Main Memory satisfying
for data request
Magnetic Disks
Tapes
15 LOGO
15
Main Goal of This Course
v Minimize number of trips to the disk in order to get desired
information (Ideally get what we need in one disk access
or get it with as few disk access as possible).
v Grouping related information so that we are likely to get
everything we need with only one trip to the disk (e.g.
name, address, phone number, account balance).
Locality of Reference in Time and Space
In order to achieve these goals, we need good
file structure design
16 LOGO
16
5
1/8/23
Good File Structure Design
v Fast access to great capacity
v Reduce the number of disk accesses
v By collecting data into buffers, blocks or buckets
v Manage growth by splitting these collections
17 LOGO
17
History of File
Structures Design
18 LOGO
18
6
1/8/23
History of File Structures Design
1. In the beginning… it was the tape
§ Sequential access
§ Access cost proportional to size of file
[Analogy to sequential access to array data structure]
2. Disks became more common
§ Direct access
[Analogy to access to position in array]
§ Indexes were invented
• list of keys and points stored in small file
• allows direct access to a large primary file
Great if index fits into main memory.
As file grows we have the same problem we had with a large primary
file
19 LOGO
19
History of File Structures Design
3. Tree structures emerged for main memory (1960`s)
§ Binary search trees (BST`s)
§ Balanced, self adjusting BST`s: e.g. AVL trees (1963)
4. A tree structure suitable for files was invented:
§ B trees (1979) and B+ trees
§ good for accessing millions of records with 3 or 4 disk accesses.
5. What about getting info with a single request?
§ Hashing Tables (Theory developed over 60’s and 70’s but still a
research topic)
Good when files do not change too much in time.
§ Expandable, dynamic hashing (late 70’s and 80’s)
One or two disk accesses even if file grows dramatically
20 LOGO
20
7
1/8/23
File Basics
21 LOGO
21
Computer File
v A computer file, or simply a file, is defined as a named
collection of data that exists on a storage medium, such
as a hard disk, CD, DVD, or USB flash drive.
v A file can contain a group of records, a document, a
photo, music, a video, an e-mail message, or a computer
program.
22 LOGO
22
8
1/8/23
Rules for Naming Files
v Every file has a name and might also have a file extension.
v When you save a file, you must provide a valid file name
that adheres to specific rules, referred to as file-naming
conventions.
v Each operating system has a unique set of file-naming
conventions.
23 LOGO
23
Rules for Naming Files
Microsoft Windows Mac OS
24 LOGO
24
9
1/8/23
Rules for Naming Files
v Some operating systems also contain a list of reserved
words that are used as commands or special identifiers.
You cannot use these words alone as a file name.
v You can also use spaces in file names. That’s a different
rule than for e-mail addresses, where spaces are not
allowed.
25 LOGO
25
File Extension
v A file extension (sometimes referred to as a file name
extension) is an optional file identifier that is separated from
the main file name by a period, as in Paint.exe.
v File extensions provide clues to a file’s contents. For
example .exe files (Windows) and .app files (Mac OS)
contain computer programs.
26 LOGO
26
10
1/8/23
File’s Location
27 LOGO
27
File’s Location
v To determine a file’s location, you must first specify the
device where the file is stored.
v You can store files on a hard drive, removable storage, a
network computer, or cloud-based storage.
v When working with Windows, each local storage device is
identified by a device letter. The main hard disk drive is
referred to as drive C:
v Macs do not use drive letters. Every storage device has a
name. The main hard disk is called Macintosh HD, for
example.
28 LOGO
28
11
1/8/23
File’s Location
v A disk partition is a section of a hard disk drive that is
treated as a separate storage unit.
v Every storage device has a directory containing a list of its
files.
v The main directory is referred to as the root directory. On
a PC, the root directory is identified by the device letter
followed by a backslash (C:\).
v A root directory can be subdivided into smaller lists. Each
list is called a subdirectory
29 LOGO
29
File’s Location
v A computer file’s location is defined by a file path
(sometimes called a file specification), which on a PC
includes the drive letter, folder(s), file name, and extension.
v Suppose that you have stored an MP3 file called Marley
One Love in the Reggae folder on your hard disk.
30 LOGO
30
12
1/8/23
File Format
v The term file format refers to the organization and layout
of data that is stored in a file.
v The format of a
file usually
includes a header,
data, and possibly
an end-of-file
marker.
v A file header is a section of data at the
beginning of a file that contains information
about a file, such as the date it was created,
the date it was last updated, its size, and its
file type. 31 LOGO
31
File Format
v Music files are stored differently than text files or graphics
files; but even within a single category of data, there are
many file formats.
v For example, graphics data can be stored in file formats
such as BMP, GIF, JPEG, or PNG.
v Although a file extension is a good indicator of a file’s
format, it does not really define the format.
32 LOGO
32
13
1/8/23
File Format: Executable File Extensions
v Windows software program consists of at least one
executable file with an .exe file extension. It might also
include a number of support programs with extensions
such as .dll, .vbx, and .ocx.
33 LOGO
33
File Format: Data File Extensions
v The list of data file formats is long.
34 LOGO
34
14
1/8/23
Why can’t I open some files
v When a file doesn’t open, one of three things probably went
Wrong:
§ The file might have been damaged by a transmission or disk error.
§ Someone might have accidentally changed the file extension.
§ Some file formats exist in several variations, and your software
might not have the capability to open a particular variation of the
format.
35 LOGO
35
File Management
36 LOGO
36
15
1/8/23
File Management
v File management encompasses any procedure that helps
you organize your computer-based files so that you can find
and use them more efficiently.
37 LOGO
37
Application-based File Management
v Applications
generally provide
a way to open
files and save
them in a specific
folder on a
designated
storage device.
Some
applications also
allow you to
delete and
rename files.
38 LOGO
38
16
1/8/23
Application-based File Management
v Creating a new
folder while
saving a file
39 LOGO
39
Saving Files on Windows
40 LOGO
40
17
1/8/23
Saving Files on Macs
41 LOGO
41
File Management Metaphors
v The operating system has a file management utility, such
as the Windows File Explorer or the Mac OS X Finder, to
handle different file operations.
v File management utilities often use some sort of storage
metaphor to help you visualize and mentally organize the
files on your disks.
42 LOGO
42
18
1/8/23
File Management Metaphors
Filing Cabinet Tree Structure
In this metaphor, each storage device In this metaphor, a tree represents a
corresponds to one of the drawers in a filing storage device.
cabinet. The drawers hold folders and the
folders hold files.
43 LOGO
43
File Management Metaphors
Combined Filing Cabinet & Tree Structure
Microsoft programmers combined the filing
cabinet metaphor to depict a tree structure in the
Windows file management utility
44 LOGO
44
19
1/8/23
File Management Tips
v Use descriptive names
v Maintain file extensions.
v Group similar files.
v Organize your folders from the top down.
v Consider using default folders.
v Use Public folders for files you want to share.
v Do not mix data files and program.
v Don’t store files in the root directory.
v Access files from the hard disk.
v Follow copyright rules.
v Delete or archive files you no longer need.
v Back up!
45 LOGO
45
20