Greedy
Huffman codes
R. Inkulu
http://www.iitg.ac.in/rinkulu/
(Huffman codes)
1 / 15
Encoding symbols using bits
Given a set of symbols S, the code of S is a one-to-one function
: S N, where each element of N is a binary number. The codeword
of a symbol x S is (x).
The fixed-length code does not take frequency of occurrence of
individual symbols into account; hence, not space-efficient.
The variable-length code helps in improving the space-efficiency: assign
longer code to less frequently used symbols and vice versa.
(Huffman codes)
2 / 15
Prefix code
Difficulty in decoding text with an arbitrary variable-length code:
ex. how to decode 01, when (a) = 0, (b) = 1, (c) = 01
The variable-length code in which no codeword is a prefix of another is
termed as a prefix code.
with code 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000
decoding 0010000011101 yields cecab
(Huffman codes)
3 / 15
Optimal prefix codes
Given a set of symbols S with their frequency of occurrences, fx for every
x S, determine a space-efficient prefix code that assigns a unique
codeword for each symbol x in S.
The average number of bits required per letter (ABL) is
xS fx .|(x)|.
Hence, the objective is to choose a code that minimizes ABL.
(Huffman codes)
4 / 15
Optimal prefix code example
For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15,
with fixed-length code, ABL is 3
with 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000,
ABL(1 ) = 2.25
with 2 (a) = 11, 2 (b) = 10, 2 (c) = 01, 2 (d) = 001, 2 (e) = 000,
ABL(2 ) = 2.23
(Huffman codes)
5 / 15
Representing prefix codes using binary trees
0
a
0
0
e
1
b
1
d
0
c
1
b
0
e
0
d
1
a
1
c
c
0
e
0
b
1
a
1
d
(a) = 11, (b) =
(a) = 1, (b) =
(a) = 11, (b) =
10, (c) = 01, (d) =
011, (c) = 010, (d) =
01, (c) = 001, (d) =
001, (e) = 000
001, (e) = 000
10, (e) = 000
Consider a binary tree T with each leaf of T is labeled with a distinct
letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
(Huffman codes)
6 / 15
Representing prefix codes using binary trees
0
a
0
0
e
1
b
1
d
0
c
1
b
0
e
0
d
1
a
1
c
0
0
e
0
b
1
a
1
d
(a) = 11, (b) =
(a) = 1, (b) =
(a) = 11, (b) =
10, (c) = 01, (d) =
011, (c) = 010, (d) =
01, (c) = 001, (d) =
001, (e) = 000
001, (e) = 000
10, (e) = 000
Consider a binary tree T with each leaf of T is labeled with a distinct
letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
The encoding of S constructed from T is a prefix code.
(Huffman codes)
6 / 15
Representing prefix codes using binary trees (cont)
a
0
0
e
1
d
0
c
1
b
0
e
0
d
1
a
1
c
1
b
0
e
0
b
1
a
1
d
(a) = 11, (b) =
(a) = 1, (b) =
(a) = 11, (b) =
10, (c) = 01, (d) =
011, (c) = 010, (d) =
01, (c) = 001, (d) =
001, (e) = 000
001, (e) = 000
10, (e) = 000
Given a prefix code, we can build a binary tree recursively.
(Huffman codes)
7 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
(Huffman codes)
8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
searching for a binary tree T
(Huffman codes)
8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T
(Huffman codes)
8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T
so that together they minimize ABL =
(Huffman codes)
xS fx .|(x)|
xS fx . depthT (x).
8 / 15
Objective in terms of binary trees
Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T
so that together they minimize ABL =
xS fx .|(x)|
xS fx . depthT (x).
a
0
0
e
1
d
0
c
1
b
0
e
0
d
1
a
1
c
1
b
c
0
e
0
b
1
a
1
d
(a) = 11, (b) =
(a) = 1, (b) =
(a) = 11, (b) =
10, (c) = 01, (d) =
011, (c) = 010, (d) =
01, (c) = 001, (d) =
001, (e) = 000
001, (e) = 000
10, (e) = 000
For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15, the rightmost one gives an optimal prefix code.
(Huffman codes)
8 / 15
Optimal binary tree is full
The binary tree corresponding to the optimal prefix code is full.
(Huffman codes)
9 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
(Huffman codes)
10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
searching for a full binary tree T
(Huffman codes)
10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
searching for a full binary tree T
labeling the leaves of T
(Huffman codes)
10 / 15
Objective in terms of full binary trees
Constructing an optimal prefix code involves
searching for a full binary tree T
labeling the leaves of T
so that together they minimize ABL =
(Huffman codes)
xS fx .|(x)|
xS fx . depthT (x).
10 / 15
Labeling leaves of a given optimal full binary tree
For any two leaves u and v with depth(u) < depth(v) in an optimal full
binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument
(Huffman codes)
11 / 15
Labeling leaves of a given optimal full binary tree
For any two leaves u and v with depth(u) < depth(v) in an optimal full
binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument
With the above in place, choice of assignment of symbols among leaves
of the same depth does not affect the ABL.
(Huffman codes)
11 / 15
Algorithm to label the leaves of a given optimal full
binary tree
take leaves of least depth and label them with the highest-frequency
symbols in any order
take leaves of next least depth and label them with the highest-frequency
symbols in any order
etc.,
(Huffman codes)
12 / 15
Observations to construct an optimal full binary tree
There is an optimal prefix code, with corresponding tree T , in which the
two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .
(Huffman codes)
13 / 15
Observations to construct an optimal full binary tree
There is an optimal prefix code, with corresponding tree T , in which the
two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .
Let x and y be the two lowest-frequency letters. Let T be a full binary
tree corresponding to an optimal prefix code for S {y, z} {w} with
fw = fy + fz . Also, let T be the tree obtained by attaching leaves y and z
as children of node w of T .
Then, ABL(T) = ABL(T ) + fw .
(Huffman codes)
13 / 15
Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them
siblings of the binary tree T to be constructed, before setting S to
S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.
0
6/21
1
0
3/21
0
0
12/21
1
21/21
1
0
d:6/21 e:4/21
9/21
1
f:5/21
c:3/21
a:1/21 b:2/21 (freq)
(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11
(Huffman codes)
14 / 15
Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them
siblings of the binary tree T to be constructed, before setting S to
S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.
0
6/21
1
0
3/21
0
0
12/21
1
21/21
1
0
d:6/21 e:4/21
9/21
1
f:5/21
c:3/21
a:1/21 b:2/21 (freq)
(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11
using priority queue, takes O(|S| lg |S|) time
(Huffman codes)
14 / 15
Correctness
feasibility: every symbol got a symbol
(Huffman codes)
15 / 15
Correctness
feasibility: every symbol got a symbol
optimality: induction on the size of the alphabet
(Huffman codes)
15 / 15