Chapter Outline
Database Design
Types of Single-level Ordered Indexes
Chapter 14
Primary Indexes
Clustering Indexes
Indexing Structures for Files Secondary Indexes
Multilevel Indexes
Dynamic Multilevel Indexes Using B-Trees and B+-
Trees
CS 6360.501 (Fall 2009)
Instructor: Sunan Han
The University of Texas at Dallas
Slide 14- 2
Indexes as Access Paths Indexes as Access Paths
Assume a file of records already exists with some primary The index file usually occupies considerably less
organization described in Ch13 (ordered, unordered or disk blocks than the data file because its entries are
hashed) much smaller (can be easily stored in memory)
Indexes are additional auxiliary access structures that are
A binary search on the index yields a pointer to the
used to speed up the retrieval of records
file record
In a database system, index structures provide an efficient
secondary access path without affecting the physical The index is usually specified on one field of the file
placement of records on disk (although it could be specified on several fields)
Indexes can also be characterized as dense or sparse One form of an index is a file of entries <field value,
A dense index has an index entry for every search key value pointer to record>, which is ordered by field value
(and hence every record) in the data file.
A sparse (or nondense) index, on the other hand, has index
The index is called an access path on the field
entries for only some of the search values
Slide 14- 3 Slide 14- 4
Single-Level Indexes: Primary Index
Primary index
Defined on an ordered data file on the ordered
The data file is ordered on a key field key field
Includes one index entry for each block in the data file;
the index entry has the key field value for the first record
in the block, which is called the block anchor
A similar scheme can use the last record in a block.
A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every search
value.
Slide 14- 5 Slide 14- 6
Example-1 Single-Level Indexes: Clustering Index
Suppose that records are key-field ordered in the file and that:
fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000 Defined on an ordered data file
The records are unspanned. Then, we get: The data file is ordered on a non-key field unlike primary
blocking factor Bfr = B/R = 1024/100 = 10 records/block
index, which requires that the ordering field of the data
number of file blocks b = r/Bfr = 30000/10 = 3000 blocks
For an index on the key field, assume the field size V = 9 bytes,
file have a distinct value for each record.
assume the record pointer size P = 6 bytes. Then: Includes one index entry for each distinct value of the
index entry size Ri = (V + P) = (9 + 6) = 15 bytes field; the index entry points to the first data block that
index blocking factor Bfri = B/Ri = 1024/15 = 68 entries/block contains records with that field value.
number of index blocks bi = b/ Bfri = 3000/68 = 45 blocks
(One index entry corresponds to one file block for sparse indexing) It is another example of nondense index where Insertion
binary search needs log2bi = log245 = 6 block accesses. It requires an and Deletion is relatively straightforward with a
additional block access to the data file => total block access is 7 clustering index.
Without index, the binary search cost on the file itself would be:
log2b = log23000 = 12 block accesses
Slide 14- 7 Slide 14- 8
A Clustering
Index Another Clustering
Example Index Example:
Records of same
clustering field value
are in separate blocks
Slide 14- 9 Slide 14- 10
Single-Level Indexes: Secondary Index
A secondary index provides a secondary means of accessing
a file for which some primary access already exists.
The secondary index may be on a field which is a candidate Example of a
key and has a unique value in every record, or a non-key with Dense
duplicate values. They are non-ordering fields
The index is an ordered file with two fields.
Secondary Index
The first field is of the same data type as some non-ordering for a Key Field
field of the data file that is an indexing field.
The second field is either a block pointer or a record pointer.
There can be many secondary indexes (and hence, indexing
fields) for the same file, for different records fields
Includes one entry for each record in the data file; hence, it is
a dense index
Slide 14- 11 Slide 14- 12
Example-2
Suppose that:
fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000
The records are unspanned. Then, we get:
blocking factor Bfr = B/R = 1024/100 = 10 records/block Example of a
number of file blocks b = r/Bfr = 30000/10 = 3000 blocks
We construct a secondary index on a nonordering candidate key
Secondary Index
field, assume the field size V = 9 bytes, assume the record pointer for a Nonkey
size P = 6 bytes. Then:
index entry size Ri = (V + P) = (9 + 6) = 15 bytes
Field
index blocking factor Bfri = B/Ri = 1024/15 = 68 entries/block
number of index blocks bi = r/ Bfri = 30000/68 = 442 blocks
(Each index entry corresponds to one file record for dense indexing)
binary search needs log2bi = log2442 = 9 block accesses (10 is the
total for the final data file block access)
This is compared to an average linear search cost (w/o index) of:
(b/2) = 3000/2 = 1500 block accesses
Slide 14- 13 Slide 14- 14
Summary of Single-Level Indexing Multi-Level Indexes
Primary indexing reduces the search cost of the original file
search on an ordering field
Because a single-level index is an ordered file, we can create
a primary index to the index itself to further reduce the
search cost
In this case, the original index file is called the first-level index
and the index to the index is called the second-level index.
We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
A multi-level index can be created for any type of first-level
index (primary, secondary, clustering) as long as the first-
level index consists of more than one disk block
Slide 14- 15 Slide 14- 16
Example-3
In example 1 (sparse primary index)
Two-level Primary fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000
Index blocking factor Bfr = B/R = 1024/100 = 10 records/block
number of file blocks b = r/Bfr = 30000/10 = 3000 blocks
index entry size Ri = (V + P) = (9 + 6) = 15 bytes
index blocking factor Bfri = B/Ri = 1024/15 = 68 entries/block
(This is called the fan-out factor of the multi-level index)
number of index blocks b1 = b/ Bfri = 3000/68 = 45 blocks
binary search needs log2b1 = log245 = 6 block accesses (7 is the total
for an additional access to the data file block)
For the second-level index to the 45 first-level index file blocks:
number of index blocks b2 = b1 / Bfri = 45/68 = 1 block
Total file block access is 1 (2nd-level) + 1 (1st-level) + 1 (data file) =
3
Slide 14- 17 Slide 14- 18
Multi-Level Indexes Search Trees
Such a multi-level index is a form of search tree A search tree of order p is a tree such that each
node contains at most p-1 search values and p
However, insertion and deletion of new index entries
pointers in the order <P1,K1,P2,K2, …, Pq-1,Kq-1,Pq>,
is a severe problem because every level of the index
where q ≤ p, each Pi is a pointer to a child node, or
is an ordered file
null and each Ki is a unique search value from some
This leads to dynamic multi-level indexes ordered set of values, and the following must hold:
Dynamic multi-level indexing leaves some additional 1. Within each node, K1 < K2 < …, < Kq-1
space in each block for inserting new entries 2. For all values X in the subtree pointed at by Pi,
Ki-1 < X < Ki, for 1<i<q and X < Ki if i=1, and Ki-1 < X
if i=q
Slide 14- 19 Slide 14- 20
A Node in a Search Tree with Pointers to FIGURE 14.9
Subtrees below It A search tree of order p = 3.
Slide 14- 21 Slide 14- 22
Dynamic Multilevel Indexes Using B-Trees and Dynamic Multilevel Indexes Using B-Trees and
B+-Trees B+-Trees
In B-Tree and B+-Tree data structures, each node An insertion into a node that is not full is quite
corresponds to a disk block efficient
Most multi-level indexes use B-tree or B+-tree data If a node is full the insertion causes a split into two
structures because of the insertion and deletion problem nodes
Space has to be reserved in each tree node to allow for Splitting may propagate to other tree levels
new index entries A deletion is quite efficient if a node does not
Each node is kept between half-full and completely full become less than half full
If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes
Slide 14- 23 Slide 14- 24
Difference between B-tree and B+-tree B-Trees
When used as an access structure on a key field in a data
file, a B-Tree of order p can be defined as follows
In a B-tree, pointers to data records exist at all levels 1. Each internal node in the B-tree is of the form
of the tree <P1,<K1,Pr1>,P2,<K2,Pr2>, …, Pq-1,<Kq-1,Prq>,Pq>, where q ≤ p,
each Pi is a tree pointer and Pri is a data pointer to the record
In a B+-tree, all pointers to data records exists at the whose search key field value is Ki (or the block containing the
leaf-level nodes record)
A B+-tree can have less levels (or higher capacity of 2. Within each node, K1 < K2 < …, < Kq-1
search values) than the corresponding B-tree 3. For all values X in the subtree pointed at by Pi,
Ki-1 < X < Ki, for 1<i<q and X < Ki if i=1, and Ki-1 < X if i=q
4. Each internal node has at least p/2 tree pointers
5. A node with q tree pointers (q ≤ p) has q-1 search key field
values (and q-1 data pointers)
6. All leaf nodes are at the same level and have their tree
pointers to be null
Slide 14- 25 Slide 14- 26
B-tree Structures B-Trees Insertion
It starts with a single root node at level 0
When it’s full with p-1 key values and an insertion occurs, two
nodes at level 1 are created and all values except the middle
one are evenly distributed in the two new nodes
The root keeps the middle value and adds two tree pointers
to the new split nodes
When any node in the B-tree is full, it undergoes the same
process to split into two node at the next level
When a node used up all its tree pointers (can not be split
any more) the split will propagate upwards
If it happens at the root, the root is split and a new root and
therefore a new tree level is added
Slide 14- 27 Slide 14- 28
B-Trees Deletion B+-Trees
The internal nodes are similar a search tree defined
When deletion of a record causes two neighboring earlier (<P1,K1,P2,K2, …, Pq-1,Kq-1,Pq>) except that
nodes to be less than half full, a merge will happen Ki-1 < X ≤ Ki, for 1<i<q and X ≤ Ki if i=1, and Ki-1 < X if i=q
Each internal node has at least p/2 tree pointers
This merge may cause a reduction of a tree level
The leaf nodes are define as follows
<<K1,Pr1>,<K2,Pr2>, …, ,<Kq-1,Prq>,Pnext>, where q ≤ p, each
Pri is a data pointer to the record whose search key field
value is Ki, or to a file block containing the record. Pnext
points to the next leaf node
K1 < K2 < …, < Kq-1
Each leaf node has at least p/2 values, or a
redistribution/deletion is needed
All leaf nodes are at the same level
Slide 14- 29 Slide 14- 30
The Nodes of a B+-tree
An Example
of an Insertion
in a B+-tree
Internal nodes form paths
to the leaf nodes that point
to the actual data
2 levels => up to 9 leaves
Slide 14- 31 Slide 14- 32
An Example of a
Deletion in a B+- Summary
tree
Types of Single-level Ordered Indexes
Primary Indexes
Causes a redistribution
at the same level Clustering Indexes
Secondary Indexes
Multilevel Indexes
Dynamic Multilevel Indexes Using B-Trees and B+-
Causes a redistribution Trees
at higher levels
Slide 14- 33 Slide 14- 34
Assignment #12
Page 545: 14.14 a, b, c, d, e
Due date 11/23/09
Slide 14- 35