Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views52 pages

L8 - Huffman Algorithm

Huffman Coding is a compression technique used to reduce data size for various applications, including image, audio, and text. It utilizes variable-length codes based on character frequency, allowing more frequent characters to have shorter codes, thus minimizing wasted space compared to fixed-length codes like ASCII and Unicode. The document outlines the algorithm for building a Huffman tree, encoding messages, and demonstrates its efficiency through an example.

Uploaded by

Shibly Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views52 pages

L8 - Huffman Algorithm

Huffman Coding is a compression technique used to reduce data size for various applications, including image, audio, and text. It utilizes variable-length codes based on character frequency, allowing more frequent characters to have shorter codes, thus minimizing wasted space compared to fixed-length codes like ASCII and Unicode. The document outlines the algorithm for building a Huffman tree, encoding messages, and demonstrates its efficiency through an example.

Uploaded by

Shibly Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Huffman Coding

Applications

❑ Compression technique(image, audio, text)


❑ Reduce size of data
❑ Fax Machines
Encoding Messages
❑ Codes used by computer systems
▪ ASCII
✔ uses 8 bits per character
✔ can encode 256 characters
▪ Unicode
✔ 16 bits per character
✔ can encode 65536 characters
❑ ASCII and Unicode are fixed-length code
▪ all characters represented by same number of
bits
Problems
❑ Suppose that we want to encode a message constructed
from the symbols A, B, C, D, and E using a fixed-length
code.
▪ How many bits are required to encode each symbol?
✔ at least 3 bits are required

✔ 2 bits are not enough (can only encode four


symbols)
▪ How many bits are required to encode the message
DEAACAAAAABA?
✔ there are twelve symbols, each requires 3 bits

✔ 12*3 = 36 bits are required


Drawbacks of fixed-length codes

❑ Wasted space
▪ Unicode uses twice as much space as ASCII
inefficient for plain-text messages containing only
ASCII characters
❑ Same number of bits used to represent all characters
▪ ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’

❑ Potential solution: use variable-length codes


▪ variable number of bits to represent characters when
frequency of occurrence is known
▪ short codes for characters that occur frequently
Purpose of Huffman Coding

❑ Proposed by Dr. David A. Huffman in 1952


– “A Method for the Construction of Minimum
Redundancy Codes”
❑ Applicable to many forms of data transmission
– Our example: text files
The Basic Algorithm

❑ Code word lengths are no longer fixed like ASCII.


❑ Code word lengths vary and will be shorter for the
more frequently used characters.
Building a Tree
Scan the original text

❑ Consider the following short text:

Eerie eyes seen near lake.

❑ Count up the occurrences of all characters in the text


Building a Tree
Scan the original text

Eerie eyes seen near lake.


❑ What characters are present?

E e r i space
y s n a r l k .
Building a Tree
Scan the original text

Eerie eyes seen near lake.


❑ What is the frequency of each character in the text?

Char Freq. Char Freq. Char Freq.


E 1 y 1 k 1
e 8 s 2 . 1
r 2 n 2
i 1 a 2
space 4 l 1
Building a Tree
Prioritize characters

❑ Create binary tree nodes with character and


frequency of each character
❑ Place nodes in a priority queue
– The lower the occurrence, the higher the priority
in the queue
Building a Tree

∙ The queue after inserting all nodes

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
Building a Tree

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
Building a Tree

y l k . r s n a sp e
1 1 1 1 2 2 2 2 4 8

E i
1 1
Building a Tree

y l k . r s n a sp e
2
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Tree

k . r s n a sp e
2
1 1 2 2 2 2 4 8
E i
1 1

y l
1 1
Building a Tree

2
k . r s n a 2 sp e
1 1 2 2 2 2 4 8
y l
1 1
E i
1 1
Building a Tree

r s n a 2 2 sp e
2 2 2 2 4 8
y l
E i 1 1
1 1

k .
1 1
Building a Tree

r s n a 2 2 sp e
2
2 2 2 2 4 8
E i y l k .
1 1 1 1 1 1
Building a Tree

n a 2 sp e
2 2
2 2 4 8
E i y l k .
1 1 1 1 1 1

r s
2 2
Building a Tree

n a 2 sp e
2 2 4
2 2 4 8

E i y l k . r s
1 1 1 1 1 1 2 2
Building a Tree

2 4 e
2 2 sp
8
4
y l k . r s
E i 1 1 1 1 2 2
1 1

n a
2 2
Building a Tree

2 4 4 e
2 2 sp
8
4
y l k . r s n a
E i 1 1 1 1 2 2 2 2
1 1
Building a Tree

4 4 e
2 sp
8
4
k . r s n a
1 1 2 2 2 2

2 2

E i y l
1 1 1 1
Building a Tree

4 4 4
2 sp e
4 2 2 8
k . r s n a
1 1 2 2 2 2
E i y l
1 1 1 1
Building a Tree

4 4 4
e
2 2 8
r s n a
2 2 2 2
E i y l
1 1 1 1

2 sp
4
k .
1 1
Building a Tree

4 4 4 6 e
2 sp 8
r s n a 2 2 4
2 2 2 2
k .
E i y l 1 1
1 1 1 1
Building a Tree

4 6 e
2 2 2 8
sp
4
E i y l k .
1 1 1 1 1 1
8

4 4

r s n a
2 2 2 2
Building a Tree

4 6 e 8
2 2 2 8
sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree

8
e
8
4 4
10
r s n a
2 2 2 2 4
6
2 2
2 sp
4
E i y l k .
1 1 1 1 1 1
Building a Tree

8 10
e
8 4
4 4
6
2 2
r s n a 2 sp
2 2 2 2 4
E i y l k .
1 1 1 1 1 1
Building a Tree

10
16
4
6
2 2 e 8
2 sp 8
4
E i y l k . 4 4
1 1 1 1 1 1

r s n a
2 2 2 2
Building a Tree

10 16
4
6
e 8
2 2 8
2 sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree

26

16
10

4 e 8
6 8
2 2
2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree

26

16
10

4 e 8
6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree

•This tree contains the new code 26


words for each character.
16
•Frequency of root node should 10
equal number of characters in
4 e 8
text. 6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2

Eerie eyes seen near lake. 26


characters
Encoding the File
Traverse Tree for Codes

∙ Perform a traversal of the


tree to obtain new code
words. 26
∙ Going left is a 0 going 16
10
right is a 1.
∙ code word is only 4
6
e
8
8
completed when a leaf 2 2 2 sp 4 4
node is reached. 4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Encoding the File
Traverse Tree for Codes
Char Code
E 0000
i 0001
y 0010 26
l 0011 16
k 0100 10
. 0101 4 e 8
space 011 6 8
e 10 2 2 2 sp 4 4
r 1100 E i y l k .
4
s 1101 1 1 1 1 1 1 r s n a
2 2 2 2
n 1110
a 1111
Encoding the File

∙ Rescan text and encode file


using new code words Char Code
E 0000
Eerie eyes seen near lake.
i 0001
y 0010
l 0011
0000101100000110011 k 0100
1000101011011010011 . 0101
1110101111110001100 space 011
e 10
1111110100100101 r 1100
s 1101
n 1110
a 1111
Encoding the File
Results

∙ Have we made things any 0000101100000110011


better? 1000101011011010011
∙ 73 bits to encode the text 1110101111110001100
∙ ASCII would take 8 * 26 = 1111110100100101
208 bits
If modified code used 4 bits per character are needed. Total
bits 4 * 26 = 104.
Example
Build the Huffman coding tree for the message
This is his message
Character frequencies

A G M T E H _ I S

1 1 1 1 2 2 3 3 5

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2

2 2

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5

2 2 4 6

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6

4 4

2 2 2 2 6
E H

1 1 1 1 3 3 5
A G M T _ I S
Step 7

8 1
1

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
Step 8
1
9

8 1
1

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
Label edges
1
0 9 1

8 1
0 1 1
0 1

4 4 6 5
0 1 0 1 0 1 S

2 2 2 2 3 3
0 1 0 1 E H _ I

1 1 1 1
A G M T
Huffman code & encoded message

S 11
E 010
H 011
This is his _ 100
message I 101
A 0000
G 0001
M 0010
T 0011

00110111011110010111100011101111000010010111100000001010
Summary

∙ Huffman coding is a technique used


to compress files for transmission
∙ Uses statistical coding
– more frequently used symbols have
shorter code words
∙ Works well for text and fax
transmissions
∙ An application that uses several
data structures

You might also like