0% found this document useful (0 votes)

3 views34 pages

Communication Theory II - Lecture 7

Uploaded by

Pabasara Eranga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views34 pages

Communication Theory II - Lecture 7

Uploaded by

Pabasara Eranga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Communication Theory II

Lecture 7: Source Coding Theorem, Huffman Coding

Source Coding Theorem
• An important problem in communication is the efficient
representation of data generated by a discrete source.
• The device that performs the representation is called a
source encoder.
• A binary code encodes each character as a
binary string or codeword.
• We would like to find a binary code that encodes
the file using as few bits as possible, ie.,
compresses it as much as possible.
• In a fixed-length code each codeword has the
same length.
• In a variable-length code codewords may have
different lengths.
Example

• Suppose that we have a 100, 000 character data

file that we wish to store . The file contains only 6
characters, appearing with the following
frequencies:
Examples of fixed and variable length
codes

The fixed length-code requires 300, 000 bits to store the file.

The variable-length code uses only

(45·1+13·3+12·3+16·3+9·4+5·4)·1000 = 224, 000 bits, saving a lot of
space!

Can we do better?
Code

• A code will be a set of codewords,

• e.g., {000, 001, 010, 011, 100, 101} and {0, 101, 100,
111, 1101, 1100}
Example
Morse code is a method of encoding text characters as
sequences of two different signal durations, such as short
and long signals, dots and dashes, or "dits" and "dahs." It
was developed in the 1830s and 1840s by Samuel Morse
and Alfred Vail for use with their electric telegraph system.
In Morse code, each letter of the alphabet, as well as
numbers and some punctuation marks, is represented by a
unique combination of short and long signals. The duration
of a short signal is typically referred to as a "dot" or a
"dit," while the duration of a long signal is called a "dash"
or a "dah." The pauses between signals within a letter are
short, and the pauses between letters and words are
longer.

More probable letters have short codes. E.g. letter E and

letter Q, or letter A and letter J.

Morse code - Wikipedia

Encoding

• replace the characters by the codewords.

Example: Γ = {a, b, c, d}

If the code is C1 = {a= 00, b = 01, c = 10, d = 11}.

then "bad" is encoded into 010011

If the code is C3 = {a = 1, b = 110, c = 10, d = 111}

then "bad" is encoded into 1101111
Decoding
• Given an encoded message, decoding is the
process of turning it back into the original
message

C1 = {a= 00, b = 01, c = 10, d = 11}.

For example relative to C1, 010011 is uniquely

decodable to "bad".
C3 = {a = 1, b = 110, c = 10, d = 111}
But, relative to C3, 1101111 is not uniquely
decipherable since it could have encoded
either "bad" or "acad".
Codeword length and entropy

Average codeword length

Encoded length
Probability of Sk Coding efficiency
Minimum
possible value of 𝐿ത
Codeword length and entropy
Shannon's source coding theorem

This means that the average number of bits per source symbol
must be greater than or equal to the entropy of the source. It
implies that to achieve efficient compression, the average number
of bits per symbol should be close to the entropy value

entropy provides a theoretical lower bound on the average

number of bits per symbol in an optimal coding scheme, and it
serves as a fundamental measure of the information content of
the source.
Data compression
➢ Data compression reduces data size while maintaining
essential information.
➢ Entropy measures the average information or uncertainty in
a source.
➢ Entropy indicates compressibility and redundancy within the
data.
➢ The average number of bits per symbol should be close to
entropy for efficient compression.
➢ Coding schemes assign shorter codes to more frequent
symbols and longer codes to less frequent ones.
➢ Shannon's source coding theorem establishes a lower
bound: average bits per symbol ≥ entropy.
➢ Practical compression algorithms like Huffman coding aim to
achieve close-to-entropy compression.
Prefix Code
• A code is called a prefix code if no codeword is a
prefix of any other code word.

• A prefix code has the important property that it is

always uniquely decodable.

In a prefix code, each symbol is represented by a unique binary codeword. The key
characteristic of a prefix code is that no codeword is a prefix (initial segment) of another
codeword. This property guarantees unambiguous decoding because when we encounter a
sequence of bits during decoding, we can determine the corresponding symbol without the
need for lookahead or further examination.
Prefix Code- example

Symbol A: 0 Symbol B: 10 Symbol C: 110 Symbol D: 111

If we encounter the bit sequence "10" during decoding, we can

be certain that it corresponds to symbol B because "10" is a
complete codeword. We don't need to look ahead or check for
any additional bits.

Symbol A: 0 Symbol B: 10 Symbol C: 100

In this case, if we encounter the bit sequence "10," we can't

determine the symbol yet because "10" could either represent
symbol B or be the prefix of another codeword. The code lacks
the prefix property, leading to ambiguity during decoding.

Prefix codes, such as Huffman coding, are widely used in

various data compression applications because they provide
efficient and reliable compression while ensuring clear and
unambiguous decoding.
Optimum Source Coding Problem
The objective is to find a binary prefix code that minimizes the average
number of bits required to encode symbols from a given alphabet,
based on their frequency distribution.

• Start with the given alphabet A and its corresponding frequency distribution f(ai).

• Sort the symbols in A based on their frequencies in non-decreasing order. The

symbol with the lowest frequency will have the longest code, and the symbol with
the highest frequency will have the shortest code.

• The problem: Given an alphabet A = {a1, . . . , an} with frequency distribution

f(ai) find a binary prefix code C for A that minimizes the number of bits
Huffman Code

• Huffman developed a nice greedy algorithm for

solving this problem and producing a minimum
cost (optimum) prefix code. The code that it
produces is called a Huffman code.
Huffman Code
Example of Huffman Coding

assigning '0' for

the left branch
and '1' for the
right branch
Example of Huffman Coding – Continued
Example of Huffman Coding – Continued
Example of Huffman Coding – Continued
Example of Huffman Coding – Continued
Example of Huffman Coding
Calculate average code length and entropy

Encoded length

Probability of Sk
Huffman Coding
➢ Is coding unique?

• The Huffman encoding process is not unique due to

two variations in the process.

The first variation is in the assignment of '0' and '1’

to the last two source symbols during the splitting
stage. However, the resulting differences are trivial.

The second variation occurs when the probability of

a combined symbol equals another probability in the
list. Different placements can result in code words of
different lengths, but the average code-word length
remains the same.
Huffman Coding- variance of the average code-word length
Average codeword length

length of the code word

probability
• When a combined symbol is moved as high as possible during the
Huffman coding process, the resulting Huffman code tends to have a
significantly smaller variance compared to when it is moved as low as
possible.
• Based on this observation, it is reasonable to choose the former
Huffman code (combined symbol moved as high as possible) over the
latter (combined symbol moved as low as possible) to reduce the
variability in code-word lengths.
Huffman Coding
Weaknesses

• Data with uniform probabilities (equal frequencies), the overhead of storing the
Huffman tree or codebook can outweigh the benefits of compression. In such
cases, the compression achieved by Huffman coding might not be significant.

• Sensitivity to input distribution: The effectiveness of Huffman coding heavily

depends on the frequency distribution of the input data. If the distribution
changes significantly, the entire Huffman tree must be recomputed, which might
not be practical in real-time scenarios or streaming data.

• Encoding and decoding complexity: Constructing the Huffman tree and encoding
data can be computationally intensive, especially for large alphabets or data
streams. While decoding is generally efficient, the encoding process requires
traversing the tree for each character to obtain its corresponding code
Huffman Coding

Suppose we have the following string of characters that we want to

encode using Huffman code.

"ABRACADABRA"

What is the bit length ratio of Huffman code, to the following fixed-length (3 bits per
character) coding?
A -> 000
B -> 001
C -> 010
D -> 011
R-> 100
Example

Compute two different Huffman codes for this alphabet. In one case, move a
combined symbol in the coding procedure as high as possible, and in the
second case, move it as low as possible. Hence, for each of the two codes, find
the average code-word length and the variance of the average code-word length
over the ensemble of letters.

Steps:-
1. Generate Huffman code
2. calculate average code length
3. calculate variance of the average code-word length
Example
Letter S0 S1 S2 S3 S4
Probability 0.55 0.15 0.15 0.1 0.05

Compute two different Huffman codes for this alphabet. In one case, move a
combined symbol in the coding procedure as high as possible, and in the
second case, move it as low as possible. Hence, for each of the two codes,
find the average code-word length and the variance of the average code-
word length over the ensemble of letters.

Steps:-
1. Generate Huffman code
2. calculate average code length
3. calculate variance of the average code-word length
Example

1. Compute the Huffman code for this source, moving a “combined” symbol as low
as possible.
1. Compute the Huffman code for this source, moving a “combined” symbol as high as possible.
Comparison

Average code length = 1.9 Average code length = 1.9

variance of the variance of the
average code-word average code-word
length = 0.99 length = 1.29

Both has same average code length but different variance values.
Based on this observation, it is reasonable to choose the former Huffman code
(combined symbol moved as high as possible) over the latter (combined symbol
moved as low as possible) to reduce the variability in code-word lengths.
In Huffman coding, the tree construction process involves iteratively combining symbols with the lowest
probabilities until all symbols are combined into a single tree. During this process, there is flexibility in the
order of combining the symbols, which can result in different code-word lengths.

For the former Huffman code (combined symbol moved as high as possible):
When the combined symbol (formed by merging two least probable symbols) is placed higher up in the tree, it
will have a shorter code length than if it were placed lower down. This is because, in a binary tree, the depth
of the nodes determines the code length. Placing the combined symbol higher up means it will be closer to the
root, resulting in a shorter code length.

For the latter Huffman code (combined symbol moved as low as possible):
When the combined symbol is placed lower down in the tree, it will have a longer code length compared to if it
were placed higher up, as it will be farther from the root in the binary tree structure.

Since the former Huffman code places combined symbols higher up in the tree, it tends to result in shorter
code lengths for the most probable symbols. This leads to a Huffman code with smaller variance in code-word
lengths because the difference between the longest and shortest code lengths is minimized. Consequently, the
average code-word length remains closer to the optimal value, resulting in a more efficient compression
scheme.

In summary, choosing the former Huffman code with combined symbols moved as high as possible reduces
the variability in code-word lengths, resulting in a more balanced and efficient encoding, compared to the
latter Huffman code where combined symbols are moved as low as possible.
Example
A discrete memoryless source has an alphabet of seven symbols whose
probabilities of occurrence are as described here:

Compute the Huffman code for this source, moving a “combined” symbol as high as
possible. Explain why the computed source code has an efficiency of 100 percent

Coding Theory
No ratings yet
Coding Theory
49 pages
4 Huffman and Shannon Fano Coding
No ratings yet
4 Huffman and Shannon Fano Coding
23 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Source Coding
No ratings yet
Source Coding
18 pages
Digital Comm Class Notes Personal
No ratings yet
Digital Comm Class Notes Personal
40 pages
Module IV
No ratings yet
Module IV
37 pages
Huffman
No ratings yet
Huffman
17 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Huffman Coding: A Case Study of A Comparison Between Three Different Type Documents
No ratings yet
Huffman Coding: A Case Study of A Comparison Between Three Different Type Documents
5 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Huffman Coding: Efficient Encoding Algorithm
No ratings yet
Huffman Coding: Efficient Encoding Algorithm
16 pages
Information Theory: Dr. Muhammad Imran Farid
No ratings yet
Information Theory: Dr. Muhammad Imran Farid
32 pages
LectureNotes01 PDF
No ratings yet
LectureNotes01 PDF
29 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
No ratings yet
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
15 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
Source Coding
No ratings yet
Source Coding
9 pages
ch3 Part1
No ratings yet
ch3 Part1
7 pages
Huffman Coding: Greedy Algorithm Guide
No ratings yet
Huffman Coding: Greedy Algorithm Guide
27 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
Huffman Coding for Engineers
No ratings yet
Huffman Coding for Engineers
17 pages
Data Compression
No ratings yet
Data Compression
28 pages
Week 3
No ratings yet
Week 3
30 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
Information Theory in Dynamic Systems
No ratings yet
Information Theory in Dynamic Systems
44 pages
DCT Based Coding
No ratings yet
DCT Based Coding
49 pages
CH 6
No ratings yet
CH 6
21 pages
M2 Prefixcode
No ratings yet
M2 Prefixcode
44 pages
Lec.4n - COMM 552 Information Theory and Coding
No ratings yet
Lec.4n - COMM 552 Information Theory and Coding
23 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
Huffman
No ratings yet
Huffman
53 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Source Coding
No ratings yet
Source Coding
35 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
Huffman Coding Algorithm
No ratings yet
Huffman Coding Algorithm
3 pages
Huffman Coding A Case Study of A Comparison
No ratings yet
Huffman Coding A Case Study of A Comparison
2 pages
Huffman
No ratings yet
Huffman
11 pages
Huffman Code
No ratings yet
Huffman Code
25 pages
Publication 3 26433 1410
No ratings yet
Publication 3 26433 1410
6 pages
Huff Man
No ratings yet
Huff Man
8 pages
Dce 1
No ratings yet
Dce 1
21 pages
Huffman Coding
No ratings yet
Huffman Coding
17 pages
Huffman Code
No ratings yet
Huffman Code
51 pages
Communication Theory II - Lecture 4
No ratings yet
Communication Theory II - Lecture 4
39 pages
Microwave Engineering Lecture 4-2023 (KDU)
No ratings yet
Microwave Engineering Lecture 4-2023 (KDU)
39 pages
Communication Theory II - Lecture 6
No ratings yet
Communication Theory II - Lecture 6
22 pages
Microwave Engineering Lecture 5-2023 (KDU)
No ratings yet
Microwave Engineering Lecture 5-2023 (KDU)
41 pages
Accident Alert With Gps Tracker: PDTN Dakshina C/ENG/6092/ET Stream: Et Progress Review Presentation
No ratings yet
Accident Alert With Gps Tracker: PDTN Dakshina C/ENG/6092/ET Stream: Et Progress Review Presentation
14 pages
IDP2022 Progress Evaluation Schedule
No ratings yet
IDP2022 Progress Evaluation Schedule
2 pages
Binary Codes
No ratings yet
Binary Codes
36 pages
Matlab Report
No ratings yet
Matlab Report
6 pages
My Family 29 Jan 2025 055905452
No ratings yet
My Family 29 Jan 2025 055905452
8 pages
Error Detection Correction S
No ratings yet
Error Detection Correction S
25 pages
Data Compression Using LZ77
No ratings yet
Data Compression Using LZ77
1 page
SIMULATION CODES of DC
No ratings yet
SIMULATION CODES of DC
15 pages
Codes and Cryptography: Mathematics in Modern World Finals
No ratings yet
Codes and Cryptography: Mathematics in Modern World Finals
38 pages
03 Without Solve
No ratings yet
03 Without Solve
23 pages
ELGIN - Manual de Programacao I9 - Rev2 0
No ratings yet
ELGIN - Manual de Programacao I9 - Rev2 0
106 pages
ASCII - Binary Character Table
No ratings yet
ASCII - Binary Character Table
1 page
An Introduction To Image Compression
No ratings yet
An Introduction To Image Compression
58 pages
Communication System
No ratings yet
Communication System
37 pages
Learn Java - String Methods Cheatsheet - Codecademy PDF
No ratings yet
Learn Java - String Methods Cheatsheet - Codecademy PDF
4 pages
ECE - DIP - Unit 4
No ratings yet
ECE - DIP - Unit 4
43 pages
Run-Length Encoding
No ratings yet
Run-Length Encoding
3 pages
DC Unit 5 Success 5
No ratings yet
DC Unit 5 Success 5
18 pages
Pic Favorite
No ratings yet
Pic Favorite
23 pages
Slide 2 Suplemen Huffman Coding
No ratings yet
Slide 2 Suplemen Huffman Coding
13 pages
Adc Syllabus
No ratings yet
Adc Syllabus
2 pages
CT20244448917 HackQuest9 Report
No ratings yet
CT20244448917 HackQuest9 Report
3 pages
Weaver Recent Contributions To The Mathematical Theory of Communication
No ratings yet
Weaver Recent Contributions To The Mathematical Theory of Communication
12 pages
ASCII
No ratings yet
ASCII
2 pages
Tam To Unicode
No ratings yet
Tam To Unicode
10 pages
Data Compressionquestion Bank kcs064
No ratings yet
Data Compressionquestion Bank kcs064
51 pages
Coding Theory - Test 1
No ratings yet
Coding Theory - Test 1
2 pages
Answer All Questions PART A - (5 2 10)
No ratings yet
Answer All Questions PART A - (5 2 10)
2 pages
Digital Encoding Techniques
No ratings yet
Digital Encoding Techniques
17 pages
Contemporary Video Compression Standards: H.265/HEVC, VP9, VP10, Daala
No ratings yet
Contemporary Video Compression Standards: H.265/HEVC, VP9, VP10, Daala
4 pages
CS-2 Lab Question Bank
No ratings yet
CS-2 Lab Question Bank
4 pages
LBC
No ratings yet
LBC
14 pages

Communication Theory II - Lecture 7

Uploaded by

Communication Theory II - Lecture 7

Uploaded by

Communication Theory II

Lecture 7: Source Coding Theorem, Huffman Coding

• Suppose that we have a 100, 000 character data

The variable-length code uses only

• A code will be a set of codewords,

More probable letters have short codes. E.g. letter E and

Morse code - Wikipedia

• replace the characters by the codewords.

If the code is C1 = {a= 00, b = 01, c = 10, d = 11}.

If the code is C3 = {a = 1, b = 110, c = 10, d = 111}

C1 = {a= 00, b = 01, c = 10, d = 11}.

For example relative to C1, 010011 is uniquely

Average codeword length

entropy provides a theoretical lower bound on the average

• A prefix code has the important property that it is

Symbol A: 0 Symbol B: 10 Symbol C: 110 Symbol D: 111

If we encounter the bit sequence "10" during decoding, we can

Symbol A: 0 Symbol B: 10 Symbol C: 100

In this case, if we encounter the bit sequence "10," we can't

Prefix codes, such as Huffman coding, are widely used in

• Sort the symbols in A based on their frequencies in non-decreasing order. The

• The problem: Given an alphabet A = {a1, . . . , an} with frequency distribution

• Huffman developed a nice greedy algorithm for

assigning '0' for

• The Huffman encoding process is not unique due to

The first variation is in the assignment of '0' and '1’

The second variation occurs when the probability of

length of the code word

• Sensitivity to input distribution: The effectiveness of Huffman coding heavily

Suppose we have the following string of characters that we want to

Average code length = 1.9 Average code length = 1.9

You might also like