Huffman Coding
Applications
❑ Compression technique(image, audio, text)
❑ Reduce size of data
❑ Fax Machines
Encoding Messages
❑ Codes used by computer systems
▪ ASCII
✔ uses 8 bits per character
✔ can encode 256 characters
▪ Unicode
✔ 16 bits per character
✔ can encode 65536 characters
❑ ASCII and Unicode are fixed-length code
▪ all characters represented by same number of
bits
Problems
❑ Suppose that we want to encode a message constructed
from the symbols A, B, C, D, and E using a fixed-length
code.
▪ How many bits are required to encode each symbol?
✔ at least 3 bits are required
✔ 2 bits are not enough (can only encode four
symbols)
▪ How many bits are required to encode the message
DEAACAAAAABA?
✔ there are twelve symbols, each requires 3 bits
✔ 12*3 = 36 bits are required
Drawbacks of fixed-length codes
❑ Wasted space
▪ Unicode uses twice as much space as ASCII
inefficient for plain-text messages containing only
ASCII characters
❑ Same number of bits used to represent all characters
▪ ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’
❑ Potential solution: use variable-length codes
▪ variable number of bits to represent characters when
frequency of occurrence is known
▪ short codes for characters that occur frequently
Purpose of Huffman Coding
❑ Proposed by Dr. David A. Huffman in 1952
– “A Method for the Construction of Minimum
Redundancy Codes”
❑ Applicable to many forms of data transmission
– Our example: text files
The Basic Algorithm
❑ Code word lengths are no longer fixed like ASCII.
❑ Code word lengths vary and will be shorter for the
more frequently used characters.
Building a Tree
Scan the original text
❑ Consider the following short text:
Eerie eyes seen near lake.
❑ Count up the occurrences of all characters in the text
Building a Tree
Scan the original text
Eerie eyes seen near lake.
❑ What characters are present?
E e r i space
y s n a r l k .
Building a Tree
Scan the original text
Eerie eyes seen near lake.
❑ What is the frequency of each character in the text?
Char Freq. Char Freq. Char Freq.
E 1 y 1 k 1
e 8 s 2 . 1
r 2 n 2
i 1 a 2
space 4 l 1
Building a Tree
Prioritize characters
❑ Create binary tree nodes with character and
frequency of each character
❑ Place nodes in a priority queue
– The lower the occurrence, the higher the priority
in the queue
Building a Tree
∙ The queue after inserting all nodes
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
Building a Tree
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
Building a Tree
y l k . r s n a sp e
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Tree
y l k . r s n a sp e
2
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Tree
k . r s n a sp e
2
1 1 2 2 2 2 4 8
E i
1 1
y l
1 1
Building a Tree
2
k . r s n a 2 sp e
1 1 2 2 2 2 4 8
y l
1 1
E i
1 1
Building a Tree
r s n a 2 2 sp e
2 2 2 2 4 8
y l
E i 1 1
1 1
k .
1 1
Building a Tree
r s n a 2 2 sp e
2
2 2 2 2 4 8
E i y l k .
1 1 1 1 1 1
Building a Tree
n a 2 sp e
2 2
2 2 4 8
E i y l k .
1 1 1 1 1 1
r s
2 2
Building a Tree
n a 2 sp e
2 2 4
2 2 4 8
E i y l k . r s
1 1 1 1 1 1 2 2
Building a Tree
2 4 e
2 2 sp
8
4
y l k . r s
E i 1 1 1 1 2 2
1 1
n a
2 2
Building a Tree
2 4 4 e
2 2 sp
8
4
y l k . r s n a
E i 1 1 1 1 2 2 2 2
1 1
Building a Tree
4 4 e
2 sp
8
4
k . r s n a
1 1 2 2 2 2
2 2
E i y l
1 1 1 1
Building a Tree
4 4 4
2 sp e
4 2 2 8
k . r s n a
1 1 2 2 2 2
E i y l
1 1 1 1
Building a Tree
4 4 4
e
2 2 8
r s n a
2 2 2 2
E i y l
1 1 1 1
2 sp
4
k .
1 1
Building a Tree
4 4 4 6 e
2 sp 8
r s n a 2 2 4
2 2 2 2
k .
E i y l 1 1
1 1 1 1
Building a Tree
4 6 e
2 2 2 8
sp
4
E i y l k .
1 1 1 1 1 1
8
4 4
r s n a
2 2 2 2
Building a Tree
4 6 e 8
2 2 2 8
sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree
8
e
8
4 4
10
r s n a
2 2 2 2 4
6
2 2
2 sp
4
E i y l k .
1 1 1 1 1 1
Building a Tree
8 10
e
8 4
4 4
6
2 2
r s n a 2 sp
2 2 2 2 4
E i y l k .
1 1 1 1 1 1
Building a Tree
10
16
4
6
2 2 e 8
2 sp 8
4
E i y l k . 4 4
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree
10 16
4
6
e 8
2 2 8
2 sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree
26
16
10
4 e 8
6 8
2 2
2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree
26
16
10
4 e 8
6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree
•This tree contains the new code 26
words for each character.
16
•Frequency of root node should 10
equal number of characters in
4 e 8
text. 6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Eerie eyes seen near lake. 26
characters
Encoding the File
Traverse Tree for Codes
∙ Perform a traversal of the
tree to obtain new code
words. 26
∙ Going left is a 0 going 16
10
right is a 1.
∙ code word is only 4
6
e
8
8
completed when a leaf 2 2 2 sp 4 4
node is reached. 4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Encoding the File
Traverse Tree for Codes
Char Code
E 0000
i 0001
y 0010 26
l 0011 16
k 0100 10
. 0101 4 e 8
space 011 6 8
e 10 2 2 2 sp 4 4
r 1100 E i y l k .
4
s 1101 1 1 1 1 1 1 r s n a
2 2 2 2
n 1110
a 1111
Encoding the File
∙ Rescan text and encode file
using new code words Char Code
E 0000
Eerie eyes seen near lake.
i 0001
y 0010
l 0011
0000101100000110011 k 0100
1000101011011010011 . 0101
1110101111110001100 space 011
e 10
1111110100100101 r 1100
s 1101
n 1110
a 1111
Encoding the File
Results
∙ Have we made things any 0000101100000110011
better? 1000101011011010011
∙ 73 bits to encode the text 1110101111110001100
∙ ASCII would take 8 * 26 = 1111110100100101
208 bits
If modified code used 4 bits per character are needed. Total
bits 4 * 26 = 104.
Example
Build the Huffman coding tree for the message
This is his message
Character frequencies
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2
2 2
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5
2 2 4 6
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6
4 4
2 2 2 2 6
E H
1 1 1 1 3 3 5
A G M T _ I S
Step 7
8 1
1
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Step 8
1
9
8 1
1
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Label edges
1
0 9 1
8 1
0 1 1
0 1
4 4 6 5
0 1 0 1 0 1 S
2 2 2 2 3 3
0 1 0 1 E H _ I
1 1 1 1
A G M T
Huffman code & encoded message
S 11
E 010
H 011
This is his _ 100
message I 101
A 0000
G 0001
M 0010
T 0011
00110111011110010111100011101111000010010111100000001010
Summary
∙ Huffman coding is a technique used
to compress files for transmission
∙ Uses statistical coding
– more frequently used symbols have
shorter code words
∙ Works well for text and fax
transmissions
∙ An application that uses several
data structures