The tree data structure
Trees
COL 106
Acknowledgement :Many slides are courtesy
Douglas Harder, UWaterloo
1
The tree data structure
3
Trees
A rooted tree data structure stores information in nodes
– Similar to linked lists:
• There is a first node, or root
• Each node has variable number of references to successors
(children)
• Each node, other than the root, has exactly one node as its
predecessor (or parent)
The tree data structure
4
What are trees suitable for ?
4
The tree data structure
5
To store hierarchy of people
The tree data structure
To store organization of departments
6
6
The tree data structure
7
To capture the evolution of languages
7
The tree data structure
8
To organize file-systems
Unix file system
The tree data structure
9
Markup elements in a webpage
9
The tree data structure
To store phylogenetic data
10
This will be our running example. Will illustrate tree
concepts using actual phylogenetic data. 10
The tree data structure
11
Terminology
All nodes will have zero or more child nodes or children
– I has three children: J, K and L
For all nodes other than the root node, there is one
parent node
– H is the parent of I
The tree data structure
12
Terminology
The degree of a node is defined as the number of its
children: deg(I) = 3
Nodes with the same parent are siblings
– J, K, and L are siblings
The tree data structure
13
Terminology
Phylogenetic trees have nodes with degree 2 or 0:
The tree data structure
14
Terminology
Nodes with degree zero are also called leaf nodes
All other nodes are said to be internal nodes, that is, they
are internal to the tree
The tree data structure
15
Terminology
Leaf nodes:
The tree data structure
16
Terminology
Internal nodes:
The tree data structure
17
Terminology
These trees are equal if the order of the children is
ignored (Unordered trees )
They are different if order is relevant (ordered
trees)
– We will usually examine ordered trees (linear orders)
– In a hierarchical ordering, order is not relevant
The tree data structure
18
Terminology
The shape of a rooted tree gives a natural
flow from the root node, or just root
The tree data structure
19
Terminology
A path is a sequence of nodes
(a0, a1, ..., an)
where ak + 1 is a child of ak is
The length of this path is n
E.g., the path (B, E, G)
has length 2
The tree data structure
20
Terminology
Paths of length 10 (11 nodes) and 4 (5 nodes)
Start of these paths
End of these paths
The tree data structure
21
Terminology
For each node in a tree, there exists a unique path from
the root node to that node
The length of this path is the depth of the node, e.g.,
– E has depth 2
– L has depth 3
The tree data structure
22
Terminology
Nodes of depth up to 17
0
14
17
The tree data structure
23
Terminology
The height of a tree is defined as the
maximum depth of any node within the
tree
The height of a tree with one node is 0
– Just the root node
For convenience, we define the height of
the empty tree to be –1
The tree data structure
24
Terminology
The height of this tree is 17
17
The tree data structure
25
Terminology
If a path exists from node a to node b:
– a is an ancestor of b
– b is a descendent of a
Thus, a node is both an ancestor and a
descendant of itself
– We can add the adjective strict to exclude
equality: a is a strict descendent of b if a is a
descendant of b but a ≠ b
The root node is an ancestor of all nodes
The tree data structure
26
Terminology
The descendants of node B are B, C, D, E, F, and G:
The ancestors of node I are I, H, and A:
The tree data structure
27
Terminology
All descendants (including itself) of the indicated node
The tree data structure
28
Terminology
All ancestors (including itself) of the indicated node
The tree data structure
29
Terminology
Another approach to a tree is to define the tree
recursively:
– A degree-0 node is a tree
– A node with degree n is a tree if it has n children and all
of its children are disjoint trees (i.e., with no intersecting
nodes)
Given any node a within a tree
with root r, the collection of a and
all of its descendants is said to
be a subtree of the tree with
root a
The tree data structure
30
Example: XHTML
Consider the following XHTML document
<html>
<head>
<title>Hello World!</title>
</head>
<body>
<h1>This is a <u>Heading</u></h1>
<p>This is a paragraph with some
<u>underlined</u> text.</p>
</body>
</html>
The tree data structure
31
Example: XHTML
Consider the following XHTML document
<html> title
<head>
<title>Hello World!</title> heading
</head>
<body>
<h1>This is a <u>Heading</u></h1>
body of page
<p>This is a paragraph with some
<u>underlined</u> text.</p>
</body>
</html>
paragraph
underlining
The tree data structure
32
Example: XHTML
The nested tags define a tree rooted at the HTML tag
<html>
<head>
<title>Hello World!</title>
</head>
<body>
<h1>This is a <u>Heading</u></h1>
<p>This is a paragraph with some
<u>underlined</u> text.</p>
</body>
</html>
The tree data structure
33
Example: XHTML
Web browsers render this tree as a web page
The tree data structure
34
Tree ADT
• Data: Nodes in the tree • Query methods:
• Generic methods: • boolean isInternal(p)
• integer size() • boolean isLeaf (p)
• boolean isEmpty() • boolean isRoot(p)
• Accessor methods: • Update methods:
• node root() update methods may be
defined by data structures
• node parent(p)
implementing the Tree ADT
• list<node> children(p) (focus on access methods
for now)
34
The tree data structure
35
A Linked Structure for General Trees
• A node is represented by
an object storing
• Element 0
• Parent node
• Sequence of children B
nodes
0 0
A D F
A D F
0 0
C E C E
35
The tree data structure
36
A Linked Structure for General Trees
Class Node {
Object element;
Node parent;
List<Node> Children; // or array of Nodes
}
The tree data structure
37
Tree using Array
• Each node contains a field for data and an array
of pointers to the children for that node
– Missing child will have null pointer
• Tree is represented by pointer to root
• Allows access to ith child in O(1) time
• Very wasteful in space when only few nodes in
tree have many children (most pointers are null)
37
The tree data structure
38
Tree using Linked Lists
• Each node contains a field for data and pointer to
a list containing children for that node
– Missing child will have null pointer
• Tree is represented by pointer to root
• Allows access to ith child in O(i) time
• Efficient in terms of space but access takes time
38
The tree data structure
39
Tree Traversals
• A traversal visits the nodes of a tree in a
systematic manner
• We will see three types of traversals
• Pre-order
• Post-order
• In-order
39
The tree data structure
40
Flavors of (Depth First) Traversal
• In a preorder traversal, a node is visited before
its descendants
• In a postorder traversal, a node is visited after
its descendants
• In an inorder traversal a node is visited after
its left subtree and before its right subtree
40
The tree data structure
Preorder Traversal
41
Process the root
Process the nodes in the all subtrees in their order
Algorithm preOrder(v)
visit(v)
for each child w of v
preOrder(w)
Lecture 5: Trees
Preorder Traversal
P
M L
S E
R
A A
T E
Preorder traversal: node is visited before its descendants
The tree data structure
43
Postorder traversal
1. Process the nodes in all subtrees in their order
2. Process the root
Algorithm postOrder(v)
for each child w of v
postOrder(w)
visit(v)
43
Postorder Traversal
P
M L
S E
R
A A
T E
Postorder traversal: node is visited before its descendants
The tree data structure
45
Inorder traversal
1. Process the nodes in the left subtree
2. Process the root
3. Process the nodes in the right subtree
Algorithm InOrder(v)
InOrder(v->left)
visit(v)
InOrder(v->right)
For simplicity, we consider tree having at most 2
children, though it can be generalized.
45
Inorder Traversal
P
M L
S E
R
A A
T E
Inorder traversal: node is visited after its left subtree
and before its right subtree
Computing Height of Tree
Can be computed using the following idea:
1. The height of a leaf node is 0
2. The height of a node other than the leaf is the
maximum of the height of the left subtree and the
height of the right subtree plus 1.
Height(v) = max[height(vàleft) + height(vàright)] + 1
Details left as exercise.
47
More examples
Which traversal will use if:
1. Want to evaluate the depth of every node ?
2. Given a tree representing arithmetic expression,
print it in postfix notation ?
3. Given the directory structure of files, figure out the
total memory usage at each node?
4. Given the directory structure of files, print the
complete file names for each file ?
48
The tree data structure
49
Binary Trees
Every node has degree up to 2.
Proper binary tree: each internal node has
degree exactly 2.
49
The tree data structure
50
Binary Tree
• A binary tree is a tree with the • Applications:
following properties: • arithmetic expressions
• Each internal node has two children • decision processes
• The children of a node are an ordered • searching
pair
• We call the children of an internal A
node left child and right child
• Alternative recursive definition: a
binary tree is either B C
• a tree consisting of a single node, or
• a tree whose root has an ordered pair
of children, each of which is a disjoint D E F G
binary tree
H I
50
The tree data structure
51
Arithmetic Expression Tree
• Binary tree associated with an arithmetic expression
• internal nodes: operators
• leaves: operands
• Example: arithmetic expression tree for the expression
(2 * (a - 1) + (3 * b))
* *
2 - 3 b
a 1
51
How many leaves L does a complete binary tree of
height h have?
The number of leaves at depth d = 2d
If the height of the tree is h it has 2h
leaves.
L = 2h.
52
What is the height h of a complete binary tree with
L leaves?
leaves = 1 height = 0
leaves = 2 height = 1
leaves = 4 height = 2
leaves = L height = Log2L
Since L = 2h
log2L = log22h
h = log2L
Data Structures and Algorithms 53
The number of internal nodes of a complete binary
tree of height h is ?
Internal nodes = 0 height = 0
Internal nodes = 1 height = 1
Internal nodes = 1 + 2 height = 2
Internal nodes = 1 + 2 + 4 height = 3
1 + 2 + 22 + . . . + 2 h-1 = 2h -1 Geometric series
Thus, a complete binary tree of height = h has 2h-1 internal
nodes.
54
The number of nodes n of a complete
binary tree of height h is ?
nodes = 1 height = 0
nodes = 3 height = 1
nodes = 7 height = 2
nodes = 2h+1- 1 height = h
Since L = 2h
and since the number of internal nodes = 2h-1 the
total number of nodes n = 2h+ 2h-1 = 2(2h) – 1 = 2h+1- 1.
55
If the number of nodes is n then what is the
height?
nodes = 1 height = 0
nodes = 3 height = 1
nodes = 7 height = 2
nodes = n height = Log2(n+1) - 1
Since n = 2h+1-1
n + 1 = 2h+1
Log2(n+1) = Log2 2h+1
Log2(n+1) = h+1
h = Log2(n+1) - 1
56
What if the tree is not complete (but proper) ?
Height could lie in the range [log n, n/2]
Number of leaves = Number of internal nodes + 1
57
BinaryTree ADT
• The BinaryTree ADT • Update methods may
extends the Tree ADT, be defined by data
i.e., it inherits all the structures
methods of the Tree implementing the
ADT BinaryTree ADT
• Additional methods:
• node leftChild(p)
• node rightChild(p)
• node sibling(p)
58