Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
109 views27 pages

Huffman Codes

The document discusses Huffman codes, which are a type of optimal prefix variable-length code used for lossless data compression. It describes how Huffman codes are constructed by building a binary tree from the frequency of symbols, with more frequent symbols placed higher in the tree and assigned shorter codewords. The algorithm works by recursively combining the two least frequent symbols into a new node, until a full binary tree is constructed, allowing generation of an optimal prefix code that minimizes the average codeword length.

Uploaded by

Nikhil Yadala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views27 pages

Huffman Codes

The document discusses Huffman codes, which are a type of optimal prefix variable-length code used for lossless data compression. It describes how Huffman codes are constructed by building a binary tree from the frequency of symbols, with more frequent symbols placed higher in the tree and assigned shorter codewords. The algorithm works by recursively combining the two least frequent symbols into a new node, until a full binary tree is constructed, allowing generation of an optimal prefix code that minimizes the average codeword length.

Uploaded by

Nikhil Yadala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Greedy

Huffman codes

R. Inkulu
http://www.iitg.ac.in/rinkulu/

(Huffman codes)

1 / 15

Encoding symbols using bits

Given a set of symbols S, the code of S is a one-to-one function

: S N, where each element of N is a binary number. The codeword


of a symbol x S is (x).
The fixed-length code does not take frequency of occurrence of

individual symbols into account; hence, not space-efficient.


The variable-length code helps in improving the space-efficiency: assign

longer code to less frequently used symbols and vice versa.

(Huffman codes)

2 / 15

Prefix code

Difficulty in decoding text with an arbitrary variable-length code:


ex. how to decode 01, when (a) = 0, (b) = 1, (c) = 01
The variable-length code in which no codeword is a prefix of another is

termed as a prefix code.


with code 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000
decoding 0010000011101 yields cecab

(Huffman codes)

3 / 15

Optimal prefix codes

Given a set of symbols S with their frequency of occurrences, fx for every

x S, determine a space-efficient prefix code that assigns a unique


codeword for each symbol x in S.
The average number of bits required per letter (ABL) is

xS fx .|(x)|.

Hence, the objective is to choose a code that minimizes ABL.

(Huffman codes)

4 / 15

Optimal prefix code example

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15,


with fixed-length code, ABL is 3
with 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000,

ABL(1 ) = 2.25
with 2 (a) = 11, 2 (b) = 10, 2 (c) = 01, 2 (d) = 001, 2 (e) = 000,

ABL(2 ) = 2.23

(Huffman codes)

5 / 15

Representing prefix codes using binary trees


0

a
0

0
e

1
b

1
d

0
c

1
b

0
e

0
d

1
a

1
c

c
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.

(Huffman codes)

6 / 15

Representing prefix codes using binary trees


0

a
0

0
e

1
b

1
d

0
c

1
b

0
e

0
d

1
a

1
c

0
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
The encoding of S constructed from T is a prefix code.

(Huffman codes)

6 / 15

Representing prefix codes using binary trees (cont)

a
0

0
e

1
d

0
c

1
b

0
e

0
d

1
a

1
c

1
b

0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Given a prefix code, we can build a binary tree recursively.

(Huffman codes)

7 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

so that together they minimize ABL =

(Huffman codes)

xS fx .|(x)|

xS fx . depthT (x).

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

so that together they minimize ABL =

xS fx .|(x)|

xS fx . depthT (x).

a
0

0
e

1
d

0
c

1
b

0
e

0
d

1
a

1
c

1
b

c
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15, the rightmost one gives an optimal prefix code.

(Huffman codes)

8 / 15

Optimal binary tree is full

The binary tree corresponding to the optimal prefix code is full.

(Huffman codes)

9 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T
labeling the leaves of T

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T
labeling the leaves of T

so that together they minimize ABL =

(Huffman codes)

xS fx .|(x)|

xS fx . depthT (x).

10 / 15

Labeling leaves of a given optimal full binary tree

For any two leaves u and v with depth(u) < depth(v) in an optimal full

binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument

(Huffman codes)

11 / 15

Labeling leaves of a given optimal full binary tree

For any two leaves u and v with depth(u) < depth(v) in an optimal full

binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument
With the above in place, choice of assignment of symbols among leaves

of the same depth does not affect the ABL.

(Huffman codes)

11 / 15

Algorithm to label the leaves of a given optimal full


binary tree

take leaves of least depth and label them with the highest-frequency

symbols in any order


take leaves of next least depth and label them with the highest-frequency

symbols in any order


etc.,

(Huffman codes)

12 / 15

Observations to construct an optimal full binary tree

There is an optimal prefix code, with corresponding tree T , in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .

(Huffman codes)

13 / 15

Observations to construct an optimal full binary tree

There is an optimal prefix code, with corresponding tree T , in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .
Let x and y be the two lowest-frequency letters. Let T be a full binary

tree corresponding to an optimal prefix code for S {y, z} {w} with


fw = fy + fz . Also, let T be the tree obtained by attaching leaves y and z
as children of node w of T .
Then, ABL(T) = ABL(T ) + fw .

(Huffman codes)

13 / 15

Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to


S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.

0
6/21
1

0
3/21
0

0
12/21
1

21/21
1
0

d:6/21 e:4/21

9/21
1
f:5/21

c:3/21

a:1/21 b:2/21 (freq)


(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11

(Huffman codes)

14 / 15

Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to


S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.

0
6/21
1

0
3/21
0

0
12/21
1

21/21
1
0

d:6/21 e:4/21

9/21
1
f:5/21

c:3/21

a:1/21 b:2/21 (freq)


(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11

using priority queue, takes O(|S| lg |S|) time


(Huffman codes)

14 / 15

Correctness

feasibility: every symbol got a symbol

(Huffman codes)

15 / 15

Correctness

feasibility: every symbol got a symbol


optimality: induction on the size of the alphabet

(Huffman codes)

15 / 15

You might also like