0% found this document useful (0 votes)

38 views62 pages

Indexing

Uploaded by

f20211140

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views62 pages

Indexing

Uploaded by

f20211140

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

File Organization and Indexing

Data on External Storage

 Disks: Can retrieve random page at fixed cost

 But reading several consecutive pages is much cheaper than
reading them in random order
 Tapes: Can only read pages in sequence
 Cheaper than disks; used for archival storage
 File organization: Method of arranging a file of records on
external storage.
 Record id (rid) is sufficient to physically locate record
 Indexes are data structures that allow us to find the record
ids of records with given values in index search key fields
 Architecture: Buffer manager stages pages from external
storage to main memory buffer pool. File and index layers
make calls to the buffer manager.
Alternative File Organizations
Many alternatives exist, each ideal for some situations, and not
so good in others:
 Heap (random order) files: Suitable when typical access is
a file scan retrieving all records.
 Sorted Files: Best if records must be retrieved in some
order, or only a `range’ of records is needed.
 Indexes: Data structures to organize records via trees or
hashing.
 Like sorted files, they speed up searches for a subset
of records, based on values in certain (“search key”)
fields
 Updates are much faster than in sorted files.
Internal Schema Design
DBMS

request stored
stored record
record returned

File Manager

request stored
stored block
block returned

Disk Manager

disk I/O data read

operation from disk

Stored Database
Unordered Files
 Also called a heap or a pile file.
 New records are inserted at the end of the file.
 A linear search through the file records is necessary
to search for a record.
 This requires reading and searching half the file blocks on
the average, and is hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular field
requires sorting the file records.
Ordered Files
 Also called a sequential file.
 File records are kept sorted by the values of an ordering field.
 Insertion is expensive: records must be inserted in the correct
order.
 It is common to keep a separate unordered overflow (or

transaction) file for new records to improve insertion

efficiency; this is periodically merged with the main ordered
file.
 A binary search can be used to search for a record on its
ordering field value.
 This requires reading and searching log of the file blocks on
2
the average, an improvement over linear search.
 Reading the records in order of the ordering field is quite
efficient.
Ordered Files
Average Access Times
 The following table shows the average access time to
access a specific record for a given type of file
Sequential File Organization
 Suitable for applications that require sequential
processing of the entire file
 The records in the file are ordered by a search-key
Sequential File Organization (Cont.)
 Deletion – use pointer chains
 Insertion –locate the position where the record is to be
inserted
 if there is free space insert there

 if no free space, insert the record in an overflow block

 In either case, pointer chain must be updated

 Need to reorganize the file

from time to time to restore
sequential order
Multitable Clustering File
Organization
Store several relations in one file using a multitable clustering
file organization

department

instructor

multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)

 good for queries involving department instructor, and

for queries involving one single department and its
instructors
 bad for queries involving only department
 results in variable size records
 Can add pointer chains to link records of a particular
relation
Data Dictionary Storage
The Data dictionary (also called system catalog)
stores metadata; that is, data about data, such as
 Information about relations
 names of relations

 names, types and lengths of attributes of each relation

 names and definitions of views

 integrity constraints

 User and accounting information, including passwords

 Statistical and descriptive data
 number of tuples in each relation

 Physical file organization information

 How relation is stored (sequential/hash/…)

 Physical location of relation

 Information about indices

Index structures/files

 Dense, Sparse, Primary,

Secondary,
 Clustered, Un-clustered files
 I/O Cost based Analysis model
Introduction
 Issue
 How to get required records efficiently
 Example
 SELECT * from R;
 SELECT * from R where A=10;
 Index is a data structure that lets us find
quickly records with given ‘search key’ value
without having to look at more than a fraction
of all records
 An index takes a value for search key and
finds records with the matching value
Indexing
 Can we do anything else to improve query performance other
than selecting a good file organization?
 Yes, the answer lies in indexing
 Index - a data structure that allows the DBMS to locate
particular records in a file more quickly
 Very similar to the index at the end of a book to locate various
topics covered in the book
 Types of Index
 Primary index – one primary index per file
 Clustering index – one clustering index per file – data file is ordered
on a non-key field and the index file is built on that non-key field
 Secondary index – many secondary indexes per file
 Sparse index – has only some of the search key values in the
file
 Dense index – has an index corresponding to every search key
value in the file
16
 An index file takes much less space than the
corresponding data file
 An index is especially advantageous if it can
fit in memory
 A record can be found with only one disk I/O
 An index itself can be too large to fit in the
memory
 Multi-level indexes
 Only part of index in memory
Purposes of Data Indexing

 What is Data Indexing?

 Why is it important?
How DBMS Accesses Data?
 The operations read, modify, update, and
delete are used to access data from
database.

 DBMS must first transfer the data temporarily

to a buffer in main memory.

 Data is then transferred between disk and

main memory into units called blocks.
Time Factors

 The transferring of data into blocks is a

very slow operation.

 Accessing data is determined by the

physical storage device being used.
More Time Factors

 Querying data out of a database

requires more time.

 DBMS must search among the blocks of

the database file to look for matching
tuples.
Purpose of Data Indexing

 It is a data structure that is added to a

file to provide faster access to the data.

 It reduces the number of blocks that

the DBMS has to check.
Properties of Data Index

 It contains a search key and a pointer.

 Search key - an attribute or set of attributes

that is used to look up the records in a file.

 Pointer - contains the address of where the

data is stored in memory.

 It can be compared to the card catalog

system used in public libraries of the past.
Two Types of Indices

 Ordered index (Primary index or

clustering index) – which is used to
access data sorted by order of values.

 Hash index (secondary index or non-

clustering index ) - used to access data
that is distributed uniformly across a
range of buckets.
Index
 Mechanism for efficiently locating row(s)
without having to scan entire table
 Based on a search key: rows having a
particular value for the search key attributes
can be quickly located
 Don’t confuse candidate key with search key:
 Candidate key: set of attributes; guarantees
uniqueness
 Search key: sequence of attributes; does not
guarantee uniqueness –just used for search
Indexes
 Sometimes need to retrieve records by the values in
one or more fields, e.g.,
 Find all students in the “IS” department
 Find all students with a gpa > 3
 An index on a file is a:
 Disk-based data structure
 Speeds up selections on the search key fields for the index.
 Any subset of the fields of a relation can be index search key
 Search key is not the same as (candidate) key
 (e.g. doesn’t have to be unique).
 An index
 Contains a collection of index and data entries
 Supports efficient retrieval of all records with a given search
key value k.
Basic Concepts

 Indexing is used to speed up access to desired data.

 E.g. author catalog in library

 A search key is an attribute or set of attributes used to look up

records in a file. Unrelated to keys in the db schema.
 An index file consists of records called index entries.
 An index entry for key k may consist of
 An actual data record (with search key value k)
 A pair (k, rid) where rid is a pointer to the actual data record
 A pair (k, bid) where bid is a pointer to a bucket of record pointers
 Index files are typically much smaller than the original file if the
actual data records are in a separate file.
 If the index contains the data records, there is a single file with
a special organization.

Indexing and Hashing 27

Types of index structures
 Simple indexes on sorted files
 Usually, created on primary key
 Secondary indexes on unsorted files
 Clustered indexes
 B-trees, a commonly used structure
 Hash table
Types of Indices

 The records in a file may be unordered or ordered

sequentially by some search key.
 A file whose records are unordered is called a heap file.
 If an index contains the actual data records or the records
are sorted by search key in a separate file, the index is
called clustering (otherwise non-clustering).
 In an ordered index, index entries are sorted on the
search key value. Other index structures include trees and
hash tables.

Indexing and Hashing 29

Primary Indexes (On sorted
files)
 The simplest structure
 The data file is a sequential file
 The data file is sorted on a key, usually
primary key
 The index file consists of <key,pointer> pairs
 Types of indexes
 Dense: every record has an entry in the index
 Sparse: only some of the data records have
entries in the index
Types of Single-Level Indexes
 Primary Index
 Defined on an ordered data file
 The data file is ordered on a key field
 Includes one index entry for each block in the data file;
the index entry has the key field value for the first
record in the block, which is called the block anchor
 A similar scheme can use the last record in a block.
 A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every
search value.
Primary index
on the
ordering key
field of the file
Index Structure
 Contains:
 Index entries
 Can contain the data tuple itself (index and table are
integrated in this case); or
 Search key value and a pointer to a row having that value;
table stored separately in this case – unintegrated index
 Location mechanism
 Algorithm + data structure for locating an index entry with a
given search key value
 Index entries are stored in accordance with the
search key value
 Entries with the same search key value are stored together
(hash, B- tree)
 Entries may be sorted on search key value (B-tree)
Index Structure
S
Search key
value

Location Mechanism
Location mechanism
facilitates finding
index entry for S
S Index entries

Once index entry is

found, the row can
be directly accessed S, …….
Dense indexes

 Every key from the data file is represented

 Entries are in the same order as that of the file
 Binary search can be used to find the required
<key, pointer>
 No.of blocks searched ‘log n’ instead of n/2 on an
average
 Example: 1,000,000 tuples, 10 tuples/4096 byte
block, key field 30 bytes, pointer 8 bytes
 Data file takes 400MB space
 Index file will take 10,000 blocks with100 entries/block
 Search will involve at most log10000 = 13 blocks in
MM
 Memory can also be optimized by keeping only
most searched blocks in memory
 Hence a record can be retrieved with less than 14
disk I/Os
Sparse indexes
 Useful if dense index is too large
 Uses less space at the cost of possibly more time
to search
 Generally a record, usually the first, per block is
represented
 Sparse index for previous example would take only
1000 blocks, 4MB
 But, it can not give quick answer to query ‘does
there exist a record with key value K?”
 It requires one disk I/O with searching in the

block
 Search K: find entry with largest key  K
Sparse Vs Dense Index
 Dense index: index entry for each data
record
 Unclustered index must be dense
 Clustered index need not be dense
 Sparse index: index entry for each block
of data file
Sparse Vs. Dense Index
Id Name Dept

Sparse,
clustered
index sorted
on Id
data file sorted Dense,
on Id unclustered
index sorted
on Name
Clustered vs. Unclustered Index

 Clustered (main) index: index entries and rows

are ordered in the same way
 An integrated storage structure is always clustered
 There can be at most one clustered index on a table
 Unclustered (secondary) index: index entries and
rows are not sorted on the same search key
 An index file might be clustered or unclustered with
respect to the storage structure it references
 There can be many secondary indices on a table
Clustering and Non-clustering
 Non-clustering indices have to be dense.
 Indices offer substantial benefits when searching for
records.
 When a file is modified, every index on the file must
be updated. Updating indices imposes overhead on
database modification.
 Sequential scan using clustering index is efficient, but
a sequential scan using a non-clustering index is
expensive – each record access may fetch a new
block from disk.

Indexing and Hashing 41

Clustered Index
 Good for range searches
 Use location mechanism to locate index
entry at start of range
 This locates first data record.
 Subsequent data records are contiguous if
index is clustered (not so if unclustered)
 Minimizes page transfers and maximizes
likelihood of cache hits
Sparse Index Files

 A clustering index may be sparse.

 Index records for only some search-key values.
 To locate a record with search-key value k we:
 Find index record with largest search-key value < k

 Search file sequentially starting at the record to which

the index record points

 Less space and less maintenance overhead for insertions
and deletions.
 Generally slower than dense index for locating records.
 Good tradeoff: sparse index with an index entry for every
block in file, corresponding to least search-key value in the
block.
Indexing and Hashing 43
Types of Single-Level Indexes
 Secondary Index
A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.
The secondary index may be on a field which is a
candidate key and has a unique value in every record, or
a nonkey with duplicate values.
The index is an ordered file with two fields.
 The first field is of the same data type as some
nonordering field of the data file that is an indexing
field.
 The second field is either a block pointer or a record
pointer. There can be many secondary indexes (and
hence, indexing fields) for the same file.
 Includes one entry for each record in the data file;
hence, it is a dense index
A dense secondary
index (with block
pointers) on a
nonordering key
field of a file.
Secondary indexes
 SELECT name, address
FROM MovieStar
WHERE birthdate=DATE ‘1952-01-01’
 CREATE INDEX BDIndex ON MovieStar(birthdate);
 Secondary indexes are always ‘dense’
 Second level index could be ‘sparse’
 Secondary indexes are usually with duplicates
Secondary Indices Example

Secondary index on balance field of account

 Index record points to a bucket that contains

pointers to all the actual records with that particular
search-key value.
Multi-level indexes
 When an index is too large with even binary
search taking too many disk I/Os
 Define second level index: index on index
 This can continue to multi-level index structure
 Second and higher level indexes must be sparse
 Second level index in previous example would
take only 10 blocks, 40KB
 Search involves 2 disk I/Os and searching in the
block
Multilevel Index

 If an index does not fit in memory, access becomes

expensive.
 To reduce number of disk accesses to index records,
treat the index kept on disk as a sequential file and
construct a sparse index on it.
 outer index – a sparse index on main index

 inner index – the main index file

 If even outer index is too large to fit in main

memory, yet another level of index can be created,
and so on.
 Indices at all levels must be updated on insertion or
deletion from the file.
49
Multilevel Index (Cont.)
outer index inner index

Data
Index Block 0
Block 0

M
 Data
Block 1
M

Index 
Block 1

M



M

CIS552 Indexing and Hashing 50
Secondary indexes
 SELECT name, address
FROM MovieStar
WHERE birthdate=DATE ‘1952-01-01’
 CREATE INDEX BDIndex ON MovieStar(birthdate);

 Secondary index does not determine the

location of the record
 Secondary indexes are always ‘dense’
 Second level index could be ‘sparse’
 Secondary indexes are usually with duplicates
20
Secondary index 40

10 10
10 20
20
20 50
30
20
30 10
40 50
50
60
20
 Pointers in one index block may refer to
multiple data blocks
 Results in more number of Disk I/Os
 Unavoidable problem
 Using ‘bucket file’ between index file and data
file
 Single entry <k,p> for each value ‘k’ where p
points to location in bucket file containing all
other pointers of records with value ‘k’
 Avoids wastage of space due to multiple storage
of same value ‘k’
Definition of Bucket

 Bucket - another form of a storage unit

that can store one or more records of
information.

 Buckets are used if the search key value

cannot form a candidate key, or if the
file is not stored in search key order.
20
40

10 10
20 20
30
40 50
30
50
60 10
50

60
Index file 20

Bucket file Data file

 Application of ‘bucket file’
 It can help answer queries efficiently using
intersection of pointer sets
 Example
 SELECT title
FROM Movie
WHERE StudioName=‘Disney’ AND year=1995;
 This reduces number of Disk I/Os
Movie Tuples
Buckets for studio Buckets for year

Disney 1995

Studio index Year index

Estimating Costs
 For simplicity we estimate the cost of an operation by
counting the number of blocks that are read or
written to disk.
 We ignore the possibility of blocked access which
could significantly lower the cost of I/O.
 We assume that each relation is stored in a separate
file with B blocks and R records per block.

CIS552 Indexing and Hashing 58

Choosing Indexing Technique
 Five Factors involved when choosing the
indexing technique:
 access type
 access time
 insertion time
 deletion time
 space overhead
Indexing Definitions
 Access type is the type of access being used.
 Access time - time required to locate the
data.
 Insertion time - time required to insert the
new data.
 Deletion time - time required to delete the
data.
 Space overhead - the additional space
occupied by the added data structure.
Index Evaluation Metrics
 Access time for:
 Equality searches – records with a specified

value in an attribute
 Range searches – records with an attribute

value falling within a specified range.

 Insertion time
 Deletion time
 Space overhead

61
Primary and Secondary Indices

o Indices offer substantial benefits when searching for

records.
o BUT: Updating indices imposes overhead on database
modification --when a file is modified, every index on
the file must be updated,
o Sequential scan using primary index is efficient, but a
sequential scan using a secondary index is expensive
o Each record access may fetch a new block from

disk
o Block fetch requires about 5 to 10 micro seconds,

versus about 100 nanoseconds for memory access

DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
Indexing in Dbms
No ratings yet
Indexing in Dbms
19 pages
Basic File Operation
100% (3)
Basic File Operation
4 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
UNIT 4 Updated - 121124
No ratings yet
UNIT 4 Updated - 121124
52 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Index 1
No ratings yet
Index 1
25 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
Dbms Unit 5 Notes
No ratings yet
Dbms Unit 5 Notes
23 pages
Ss Three Data PR 1therm
No ratings yet
Ss Three Data PR 1therm
17 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Class 6
No ratings yet
Class 6
15 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
File Organization
No ratings yet
File Organization
11 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Unit 5
No ratings yet
Unit 5
185 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Basic 4 Computing Exam Paper
100% (1)
Basic 4 Computing Exam Paper
5 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Chapter1 Know Your Computer Worksheet
No ratings yet
Chapter1 Know Your Computer Worksheet
5 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Efficient File Indexing Methods
No ratings yet
Efficient File Indexing Methods
40 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Memory Management Notes - Operating System
No ratings yet
Memory Management Notes - Operating System
4 pages
SF02
80% (5)
SF02
26 pages
CC Unit 3 Specialized Cloud Mechanism
No ratings yet
CC Unit 3 Specialized Cloud Mechanism
35 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Computer Science 0478 2013 Scheme of Work
100% (2)
Computer Science 0478 2013 Scheme of Work
41 pages
Intro to Computers Course Guide
No ratings yet
Intro to Computers Course Guide
2 pages
Operating System Structures (Unit 2)
No ratings yet
Operating System Structures (Unit 2)
6 pages
Lesson 1.2 PC Components Lesson 2
No ratings yet
Lesson 1.2 PC Components Lesson 2
57 pages
Computer Architeture Bus Structure
No ratings yet
Computer Architeture Bus Structure
33 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Iot Assignment
No ratings yet
Iot Assignment
15 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Document 4
No ratings yet
Document 4
14 pages
The Functional Units
No ratings yet
The Functional Units
14 pages
Linux Case Study
100% (1)
Linux Case Study
31 pages
Basic Parts of Computer and Its Functions
No ratings yet
Basic Parts of Computer and Its Functions
17 pages
P8H61-M LX DDR3 Memory QVL List
No ratings yet
P8H61-M LX DDR3 Memory QVL List
2 pages
Lab 2 PDF
No ratings yet
Lab 2 PDF
7 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Belec BVL 1011 GB 1
No ratings yet
Belec BVL 1011 GB 1
4 pages
What Are The Physical Addresses For The Following Logical Addresses
100% (2)
What Are The Physical Addresses For The Following Logical Addresses
5 pages
CS-101 Final Paper File New Update 2019 To 2020
No ratings yet
CS-101 Final Paper File New Update 2019 To 2020
134 pages
BA 315 (Prelim Review)
No ratings yet
BA 315 (Prelim Review)
4 pages
Computer Science: 1. Introduction To Computers
No ratings yet
Computer Science: 1. Introduction To Computers
2 pages
Digital Forensics for Investigators
100% (1)
Digital Forensics for Investigators
110 pages
Dropboxs Transition From AWS To Custom Infrastructure
No ratings yet
Dropboxs Transition From AWS To Custom Infrastructure
8 pages
Multiboot USB Drive Creation Guide
100% (1)
Multiboot USB Drive Creation Guide
20 pages
File System of Mac OS X - Akila
No ratings yet
File System of Mac OS X - Akila
12 pages
Hard Disk Drive For Ix300 Operating Instructions
No ratings yet
Hard Disk Drive For Ix300 Operating Instructions
7 pages
What Are The Differences Between ROM and RAM
No ratings yet
What Are The Differences Between ROM and RAM
7 pages
File System Design & Access Methods
No ratings yet
File System Design & Access Methods
40 pages

Indexing

Uploaded by

Indexing

Uploaded by

File Organization and Indexing

Data on External Storage

 Disks: Can retrieve random page at fixed cost

disk I/O data read

transaction) file for new records to improve insertion

 if no free space, insert the record in an overflow block

 In either case, pointer chain must be updated

 Need to reorganize the file

 good for queries involving department instructor, and

 names, types and lengths of attributes of each relation

 names and definitions of views

 User and accounting information, including passwords

 Physical file organization information

 Physical location of relation

 Information about indices

 Dense, Sparse, Primary,

 What is Data Indexing?

 DBMS must first transfer the data temporarily

 Data is then transferred between disk and

 The transferring of data into blocks is a

 Accessing data is determined by the

 Querying data out of a database

 DBMS must search among the blocks of

 It is a data structure that is added to a

 It reduces the number of blocks that

 It contains a search key and a pointer.

 Search key - an attribute or set of attributes

 Pointer - contains the address of where the

 It can be compared to the card catalog

 Ordered index (Primary index or

 Hash index (secondary index or non-

 Indexing is used to speed up access to desired data.

 A search key is an attribute or set of attributes used to look up

Indexing and Hashing 27

 The records in a file may be unordered or ordered

Indexing and Hashing 29

Once index entry is

 Every key from the data file is represented

 Clustered (main) index: index entries and rows

Indexing and Hashing 41

 A clustering index may be sparse.

 Search file sequentially starting at the record to which

the index record points

Secondary index on balance field of account

 Index record points to a bucket that contains

 If an index does not fit in memory, access becomes

 inner index – the main index file

 If even outer index is too large to fit in main

 Secondary index does not determine the

 Bucket - another form of a storage unit

 Buckets are used if the search key value

Bucket file Data file

Studio index Year index

CIS552 Indexing and Hashing 58

value falling within a specified range.

o Indices offer substantial benefits when searching for

versus about 100 nanoseconds for memory access

You might also like