Digital Search Tree
Department of Computer Science and Information
Engineering
National Taipei University
莊東穎教授
Digital Search Tree
• Assume fixed number of bits.
• Not empty =>
– Root contains one dictionary pair (any pair).
– All remaining pairs whose key begins with a
0 are in the left subtree.
– All remaining pairs whose key begins with a
1 are in the right subtree.
– Left and right subtrees are digital subtrees
on remaining bits.
2
Example
• Start with an empty digital search tree
and insert a pair whose key is 0110.
0110
• Now, insert a pair whose key is 0010.
0110
0010
3
Example
• Now, insert a pair whose key is 1001.
0110 0110
0010 0010 1001
4
Example
• Now, insert a pair whose key is 1011.
0110 0110
0010 1001 0010 1001
1011
5
Example
• Now, insert a pair whose key is 0000.
0110 0110
0010 1001 0010 1001
1011 0000 1011
6
Search/Insert/Delete
0110
0010 1001
0000 1011
• Complexity of each operation is O(#bits in a key).
• #key comparisons = O(height).
• Expensive when keys are very long.
7
Binary Trie
• Information Retrieval.
• At most one key comparison per
operation.
• Fixed length keys.
– Branch nodes.
• Left and right child pointers.
• No data field(s).
– Element nodes.
• No child pointers.
• Data field to hold dictionary pair.
8
Example
0 1
0 0 1
1100
0 1 0
0001 0011
0 1
1000 1001
At most one key comparison for a search.
9
Fixed Length Insert
0 1
0 0 1
1
1100
0 1 0111 0
0001 0011
0 1
1000 1001
Insert 0111. Zero compares.
10
Fixed Length Insert
0 1
0 0 1
1
1100
0 1 0111 0
0001 0011
0 1
1000 1001
Insert 1101.
11
Fixed Length Insert
1100
0 1
0 0 1
1
0 1 0111 0 0
0001 0011
0 1 0
1000 1001
Insert 1101.
12
Fixed Length Insert
0 1
0 0 1
1
0 1 0111 0 0
0001 0011
0 1 0 1
1000 1001 1100 1101
Insert 1101. One compare. 13
Fixed Length Delete
0 1
0 0 1
1
0 1 0111 0 0
0001 0011
0 1 0 1
1000 1001 1100 1101
Delete 0111.
14
Fixed Length Delete
0 1
0 0 1
0 1 0 0
0001 0011
0 1 0 1
1000 1001 1100 1101
Delete 0111. One compare. 15
Fixed Length Delete
0 1
0 0 1
0 1 0 0
0001 0011
0 1 0 1
1000 1001 1100 1101
Delete 1100.
16
Fixed Length Delete
0 1
0 0 1
0 1 0 0
0001 0011
0 1 1
1000 1001 1101
Delete 1100. 17
Fixed Length Delete
1101
0 1
0 0 1
0 1 0 0
0001 0011
0 1
1000 1001
Delete 1100. 18
Fixed Length Delete
1101
0 1
0 0 1
0 1 0
0001 0011
0 1
1000 1001
Delete 1100.
19
Fixed Length Delete
0 1
0 0 1
1101
0 1 0
0001 0011
0 1
1000 1001
Delete 1100. One compare. 20
Compressed Binary Tries
• No branch node whose degree is 1.
• Add a bit# field to each branch node.
• bit# tells you which bit of the key to use
to decide whether to move to the left or
right subtrie.
21
Binary Trie
1
0 1
2
0 0 1
3
0 1 0 0
0001 4 4
0011
0 1 0 1
1000 1001 1100 1101
bit# field shown in black outside branch node. 22
Compressed Binary Trie
0 1
1
3 2
0 1
0 1
0001 0011
4 4
0 1 0 1
1000 1001 1100 1101
bit# field shown in black outside branch node. 23
Compressed Binary Trie
0 1
1
3 2
0 1
0 1
0001 0011
4 4
0 1 0 1
1000 1001 1100 1101
#branch nodes = n – 1. 24
Insert
0 1
1
3 2
0 1
0 1
0001 0011
4 4
0 1 0 1
1000 1001 1100 1101
Insert 0010. 25
Insert
0 1
1
3 2
0 1
0 1
0001 4
0 1 4 4
0010 0011 0 1 0 1
1000 1001 1100 1101
Insert 0100. 26
Insert
1
0
1
2 2
0 1 0 1
3
0100 4 4
0 1
0 1 0 1
0001 4 1000 1001 1100
0 1 1101
0010 0011
27
Delete
1
0
1
2 2
0 1 0 1
3
0100 4 4
0 1
0 1 0 1
0001 4 1000 1001 1100 1101
0 1
0010 0011
Delete 0010.
28
Delete
0 1
1
2 2
0 1 0 1
3
0100 4 4
0 1
0 1 0 1
0001 0011 1000 1001 1100 1101
Delete 1001. 29
Delete
0 1
1
2 2
0 1 0 1
3
0100
1000 4
0 1
0 1
0001 0011
1100 1101
30
Patricia
• Practical Algorithm To Retrieve
Information Coded In Alphanumeric.
• Compressed binary trie.
• All nodes are of the same data type (binary
tries use branch and element nodes).
– Pointers to only one kind of node.
– Simpler storage management.
Patricia
• Uses a header node.
• Remaining nodes define a trie structure that
is the left subtree of the header node.
• Trie structure is the same as that for the
compressed binary trie of previous lecture.
Node Structure
bit# LC Pair RC
• bit# = bit used for branching
• LC = left child pointer
• Pair = dictionary pair
• RC = right child pointer
33
Compressed Binary Trie To Patricia
0 1
1
3 2
0 1
0 1
0001 0011
4 4
0 1 0 1
1000 1001 1100 1101
Move each element into an ancestor or header node. 34
Compressed Binary Trie To Patricia
0 0001
0 1101
3 1
0011 2
0 1001
1 0 1
4 4
1000 1100
1
0 1
0 35
Insert
Insert 0000101 0
0000101
Insert 0000000 0
0000101
5
0000000
36
Insert 0000000
Insert 0
0000101
5
0000000
0
0000101
Insert 0000010
5
0000000
6
0000010
37
0 Insert
0000101 0
0000101
5
0000000 4
0001000
6
0000010 5
0000000
6
0000010
Insert 0001000 38
Insert
0
0000101
4
0001000
5
0000000
6
0000010
Insert 0000100 39
Insert
0
0000101
4
0001000
5
0000000
6 7
0000010 0000100
Insert 0001010 40
Insert
0
0000101
4
0001000
5 6
0000000 0001010
6 7
0000010 0000100
Insert 0001010 41
Delete
• Let p be the node that contains the
dictionary pair that is to be deleted.
• Case 1: p has one self pointer.
• Case 2: p has no self pointer.
42
p Has One Self Pointer
• p = header => trie is now empty.
– Set trie pointer to null.
• p != header => remove node p and
update pointer to p.
p p
0001000 0000000
43
p Has No Self Pointer
• Let q be the node that has a back pointer to p.
• Node q was determined during the search for
the pair with the delete key k.
p
0001000
Blue pointer could
be red or black.
q
y
44
p Has No Self Pointer
p
0001000
q
y
r
z
• Use the key y in node q to find the unique
node r that has a back pointer to node q. 45
p Has No Self Pointer
p
0001000
y
q
y
r
z
• Copy the pair whose key is y to node p.
46
p Has No Self Pointer
p
0001000
y
q
y
r
z
• Change back pointer to q in node r to
point to node p. 47
p Has No Self Pointer
p
0001000
y
q
y
r
z Node q now has been
removed from trie.
• Change forward pointer to q from
parent(q) to child of q. 48
Tries
Definition
• A trie is an index structure that is
particulary useful when key values are of
varying size.
• Trie is a tree of degree m >= 2 in which the
branching at any level is determined not by
the entire key value, but by only a potion of
it.
• When a subtrie contains only one key value,
it is replaced by a node of type element.
50
• This trie contains two types of nodes: element
node, and branch node.
– A element node has only a data member;
– A branch node contains pointers to subtries.
51
• Since we assume that each character is one of the
26 letters of the alphabet, a branch node has 27
pointer data member; the extra pointer is used
for the blank character which is used to
terminate all keys.
52
Sampling Strategies
• The number of levels in the trie will depend on
the strategy or key sampling technique used to
determine the branching at each level.
• The goal is to make a trie with the fewest
number of levels.
53
Trie constructed for data of the preceding one
sampling one character at a time, from right to left.
54
An optimal trie for the first one trie sampling on the
first level done by using the fourth character of the
key values.
55
• The key value may be interpreted as consisting of
digits using any radix we desire.
– Using a radix of 27^2 would result in two-character sampling.
• The maximum number of levels in a trie can be
kept low by adopting a different strategy for
element nodes.
56
Number of levels is limited to 3; keys have been
sampled from left to right, one character at a time.
57
Insertion into a Trie
• Consider the first trie and insert into it
the keys bobwhite and blue jay.
– First, we search bobwhite in trie and find
that σ.link[‘o’] = 0. So bobwhite is not in
trie and we can insert it here.
– Next, search bluejay, and we find an
element node contains bluebird.
– The keys bluebird and blurjay are sampled
until the sampling results in two different
values.It happens at fifth letter.
58
The result of insertion.
59
Deletion from a Trie
• From the preceding trie we just insert 2 key.
– First we delete bobwhite. To do this we set σ.link[‘o’]
= 0. No other changes need to be made.
– Next let us delete bluejay. This deletion leaves us
with only one key value in the subtrie δ3.
That means the node δ3 may be deleted, and ρ can be
moved up one level.
– The same can be done for node δ2 and δ1.
– Finally nodeσ is reached, and ρ cant be moved up any
more levels. So we set σ.link[‘l’] = ρ.
60
• To facilitate deletion from tries, it is useful to
add a count data member in each branch node.
(This data member contains the number of
children the node has.)
61