CPS216: Data-intensive
Computing Systems
Operators for Data Access
(contd.)
Shivnath Babu
Insertion in a B-Tree
n=2
49
15 36
49
Insert: 62
Insertion in a B-Tree
n=2
49
15 36
49
62
Insert: 62
Insertion in a B-Tree
n=2
49
15 36
49
62
Insert: 50
Insertion in a B-Tree
49
15 36
n=2
62
49
50
62
Insert: 50
Insertion in a B-Tree
49
15 36
n=2
62
49
50
62
Insert: 75
Insertion in a B-Tree
49
15 36
n=2
62
49
50
62 75
Insert: 75
Insertion
Insertion
Insertion
10
Insertion
11
Insertion
12
Insertion
13
Insertion
14
Insertion
15
Insertion
16
Insertion
17
Insertion
18
Insertion: Primitives
Inserting into a leaf node
Splitting a leaf node
Splitting an internal node
Splitting root node
19
Inserting into a Leaf
Node
58
54 57 60 62
20
Inserting into a Leaf
Node
58
54 57
60 62
21
Inserting into a Leaf
Node
58
54 57 58 60 62
22
Splitting a Leaf Node
61
54 66
54 57 58 60 62
23
Splitting a Leaf Node
61
54 66
54 57 58 60 62
24
Splitting a Leaf Node
61
54 66
54 57 58
60 61 62
25
Splitting a Leaf Node
59
61
54 66
54 57 58
60 61 62
26
Splitting a Leaf Node
61
54 59 66
54 57 58
60 61 62
27
Splitting an Internal Node
21 99
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting an Internal Node
21 99
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting an Internal Node
66
21 99
[21,66)
40 54 59
[54, 59)
[66, 99)
74 84
[ 59, 66)
[66,74)
Splitting the Root
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting the Root
59
40 54 66 74 84
[54, 59) [ 59, 66) [66,74)
Splitting the Root
66
40 54 59
[54, 59)
74 84
[ 59, 66)
[66,74)
Deletion
34
Deletion
redistribute
35
Deletion
36
Deletion - II
37
Deletion - II
merge
Deletion - II
39
Deletion - II
40
Deletion - II
41
Deletion - II
Not needed
merge
42
Deletion - II
43
Deletion: Primitives
Delete key from a leaf
Redistribute keys between sibling
leaves
Merge a leaf into its sibling
Redistribute keys between two
sibling internal nodes
Merge an internal node into its
sibling
44
Merge Leaf into Sibling
72
54 58 64
67 85
68 72 75
45
Merge Leaf into Sibling
72
54 58 64
67 85
68 75
46
Merge Leaf into Sibling
72
67 85
54 58 64 68 75
47
Merge Leaf into Sibling
72
85
54 58 64 68 75
48
Merge Internal Node into
Sibling
41 48 52
59
63 74
[52, 59)
[59,63)
49
Merge Internal Node into
Sibling
59
41 48 52 59 63
[52, 59)
[59,63)
50
B-Tree Roadmap
B-Tree
Recap
Insertion (recap)
Deletion
Construction
Efficiency
B-Tree variants
Hash-based Indexes
51
Question
How does insertion-based construction
perform?
52
B-Tree Construction
Sort
48 57 41 15 75 21 62 34 81 11 97 13
53
B-Tree Construction
11 13 15
21 34 41
48 57 62
75 81 97
11 13 15 21 34 41 48 57 62 75 81 97
Scan
B-Tree Construction
21 48 75
11 13 15
21 34 41
Scan
48 57 62
75 81 97
B-Tree Construction
hy is sort-based construction better than
insertion-based one?
56
Cost of B-Tree
Operations
Height of B-Tree: H
Assume no duplicates
Question: what is the random I/O
cost of:
Insertion:
Deletion:
Equality search:
Range Search:
57
Height of B-Tree
Number of keys: N
B-Tree parameter: n
log N
Height log
N =
n
log n
In practice: 2-3 levels
58
Question: How do you pick paramete
1. Ignore inserts and deletes
2. Optimize for equality searches
3. Assume no duplicates
59
Roadmap
B-Tree
B-Tree variants
Sparse Index
Duplicate Keys
Hash-based Indexes
60
Roadmap
B-Tree
B-Tree variants
Hash-based Indexes
Static Hash Table
Extensible Hash Table
Linear Hash Table
61
Hash-Based Indexes
Adaptations of main memory hash
tables
Support equality searches
No range searches
62
Indexing Problem (recap)
Index Keys
record pointer
a1
a2
A = val
ai
an
Main Memory Hash Table
buckets
key
h (key)
0
1
32
48
10
(null)
27
75
3
4
h (key) = key % 8 5
21
6
7
55
(null)
(null)
(null)
(null)
64
Adapting to disk
1 Hash Bucket = 1 Block
All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys
65
Adapting to Disk
How do we handle this?
66
Adapting to disk
1 Hash Bucket = 1 Block
All keys that hash to bucket stored in
the block
Intuition: keys in a bucket usually
accessed together
No need for linked lists of keys
but need linked list of blocks
(overflow blocks)
67
Adapting to Disk
68
Adapting to disk
Bucket Id Disk Address mapping
Contiguous blocks
Store mapping in main memory
Too large?
Dynamic Linear and Extensible
hash tables
69
Beware of claims that assume 1 I/O
for hash tables and 3 I/Os for B-Tree!!
70