SOURCE CODING
1. Definition of Source Coding Terms
(a) Code Length
(b) Code efficiency
(c) Code redundancy
2. Source Coding Theory
3. Classification of Codes
4. Entropy Encoding
(a) Shannon-Fano
(b) Huffman
ECE 416 – Digital Communication
Friday, 23 March 2018
1
SYLLABUS
2
DIGITAL COMMUNICATION
Baseband signal
Signal Source Source Channel Modulator
& Transducer Encoder Encoder
Microphone Sampling Error Control
TV Camera Companding
Flow-sensor Encrypting
etc
Channel
Free space
Co-axial cable
Baseband signal Water
fibre
Signal Output Message Channel Demodulator
Recovery decoder
SOURCE CODING DEFINITION & OBJECTIVE
1. Sources Coding is the conversion of the
output of a Discrete Memory-less Source
(DMS) into a binary sequence.
2. The Objective of Source Coding is to
minimize the average bit rate required to
represent signal by reducing the redundancy
of the information source.
4
CLASSIFICATION OF INFORMATION SOURCES
Information Sources fall into two categories:
a) Memory sources where the current symbol
depends on the previous symbols,
b) Memory-less sources where current symbols
are independent of previous symbols.
5
CODE LENGTH DEFINITION
• Assume X is a DMS with finite entropy H(X)
and alphabet {x1,x2,..,xn) and with
corresponding probabilities P(xi) for i =
1,2,..,m.
• If the binary code word assigned to a symbol xi
by the source encoder has length n, then the
codeword length is the number of binary
digits in the code word.
6
OTHER DEFINITIONS
1. The average Codeword Length is given by:
m
L P ( xi )ni
i 1
2. Code Efficiency is defined as:
Lmin
L
where Lmin is the minimum possible value of the average
code length, L.
3. The code is said to be efficient when the code length,η
tends to 1.
3. Code redundancy is defined as:
1 7
SOURCE CODING THEOREM
• The Source coding theorem states that a DMS
X with entropy H(X), and average code length
L per symbol is bound by
L ≥ H(X)
• Further, that L can be made closer to H(X) as
desired through some suitable code.
8
CLASSIFICATION OF CODES
1. Fixed Length Code:
A code whose code length is fixed.
9
CLASSIFICATION OF CODES
2. Variable Length Code
A code whose length varies for different
symbols.
10
CLASSIFICATION OF CODES
3. Distinct Code is a code in which each code
word is distinguishable from other code words
11
CLASSIFICATION OF CODES
4. Prefix-free Codes
Codes in which no codeword can be formed
by adding code symbols to another codeword
12
CLASSIFICATION OF CODES
5. Uniquely Decodable Codes
A code in which the original sequence can be
reconstructed perfectly from the encoded
binary sequence.
13
CLASSIFICATION OF CODES
6. Instantaneous Codes
1. A code which has the end of any codeword is
recognizable without examining subsequent code
symbols.
2. Instantaneous Codes have the property that no
codeword is a prefix of another codeword.
14
CLASSIFICATION OF CODES
7. Optimal Codes:
A code is said to be optimal if it is
instantaneous and has a minimum average
Length, 𝐿𝑚𝑖𝑛
15
WORKED EXAMPLE - 1
1. A Discrete Memory-less Source X has alphabet
{x1, x2} and associated probabilities, P(x1)=0.9,
P(x2)=0.1 where the symbols are encoded as:
Find the efficiency and redundancy of the code
16
SOLUTION
Entropy is:
Code efficiency is
Code redundancy is:
17
EXAMPLE 2
2. A Discrete Memory-Less Source X has
alphabet {x1,x2,x3,x4} and a source coding as
shown below.
xi P(xi) Code
X1 0.81 0
X2 0.09 10
X3 0.09 110
X4 0.01 111
Determine the efficiency and redundancy of
the code.
18
SOLUTION
• The average code word length, L is:
• The entropy H(X) is given by:
H(X)
• Code Efficiency is therefore:
H ( X ) 0.938
0.727
L 1.29
• Code redundancy is therefore:
1 0.273 27%
19
ENTROPY CODING
1. Entropy coding refers to the design of a
variable length code such that its average
codeword length approaches the entropy.
There are two main type of entropy coding,
i.e
(a) Shannon-Fano Coding
(b) Huffman Coding
20
SHANNON-FANO CODING
• Named after Claude Shannon and Robert Fano, is
a technique for constructing a prefix-code based
on a set of symbols and their probabilities
(estimated or measured).
• Shannon–Fano coding is suboptimal in the sense
that it does not achieve the lowest possible
expected code word length like Huffman coding.
• The technique was first proposed in Shannon's "A
Mathematical Theory of Communication", his
1948 article introducing the field of information
theory.
21
SHANNON-FANO CODING
The Shannon-Fano Code is generated by using
the following procedure:
1. List the source symbols in the order of
decreasing probability.
2. Partition the set into two sets with as nearly
equal probabilities as possible and assign 0 to the
upper set and 1 to the lower set.
3. Continue with the process each time partitioning
the sets with as nearly equal probabilities as
possible until further partitioning of sets is not
possible.
22
SHANNON-FANO CODING
1. Original symbol list and Probabilities
x(i) P(x(i))
x1 0.05
x2 0.30
x3 0.08
x4 0.25
x5 0.20
x6 0.12
2. Sort in the order of decreasing probabilities
x(i) P(x(i))
x2 0.30
x4 0.25
x5 0.20
x6 0.12
x3 0.08
x1 0.05
3. Partition into 2 sets above and below 0.5 (approx.)
x(i) P(x(i)) Step 1
x2 0.30 0
Assign 0
x4 0.25 0
x5 0.20 1
x6 0.12 1
x3 0.08 1
Assign 1
x1 0.05 1
23
SHANNON-FANO CODING
4. Partition into 2 sets above and below the middle points
x(i) P(x(i)) Step 1 Step 2
x2 0.30 0 0
x4 0.25 0 1
x5 0.20 1 0
x6 0.12 1 1
x3 0.08 1 1
Remaining
x1 0.05 1 1
5. Partition the remaining into 2 sets above and below the middle points
x(i) P(x(i)) Step 1 Step 2 Step 3
x2 0.30 0 0
x4 0.25 0 1
x5 0.20 1 0
x6 0.12 1 1 0
x3 0.08 1 1 1
x1 0.05 1 1 1 Remaining
5. Partition the remaining into 2 sets above and below the middle points
x(i) P(x(i)) Step 1 Step 2 Step 3 Step 4 Code
x2 0.30 0 0 00
x4 0.25 0 1 01
x5 0.20 1 0 10
x6 0.12 1 1 0 110
x3 0.08 1 1 1 0 1110
24
x1 0.05 1 1 1 1 1111
SHANNON-FANO CODING EXAMPLE 1
A DMS has four symbols, i.e x1. x2, x3 and x4 with
probabilities P(x1)=1/2, P(x2)=1/4, P(x3)=P(x4)=1/8.
Construct the Shannon-Fano Code and determine the
code efficiency.
25
SHANNON-FANO CODE-EXAMPLE 1 - SOLUTION
1. Shannon-Fano Code
x(i) P(x(i)) Step 1 Step 2 Step 3 Code
x1 0.500 0 0
x2 0.250 1 0 10
x3 0.125 1 1 0 110
x4 0.125 1 1 1 111
𝐼 𝑥1 = 𝑙𝑜𝑔2 2 = 1 = n1
𝐼 𝑥2 = 𝑙𝑜𝑔2 4 = 2 = n2
𝐼 𝑥3 = 𝑙𝑜𝑔2 8 = 3 = n3
𝐼 𝑥4 = 𝑙𝑜𝑔2 8 = 3 = n4
4
1 1 1 1
𝐻 𝑋 = 𝑃 𝑥𝑖 𝐼 𝑥𝑖 = 1 + 2 + 3 + 3 = 1.75
2 4 8 8
𝑖=1
4
1 1 1 1
𝐿 = 𝑃 𝑥𝑖 𝑛𝑖 = 1 + 2 + 3 + 3 = 1.75
2 4 8 8
𝑖=1
1. Efficiency
𝐻(𝑋)
𝑛= = 1 𝑜𝑟 100% 26
𝐿
HUFFMAN CODE
1. Huffman coding is a lossless data compression
algorithm using variable length codes.
2. Lengths of the assigned codes are based on the
frequencies of corresponding characters.
a) Most frequent character gets the shortest code.
b) Least frequent character gets the longest code.
3. The variable-length codes assigned to input
characters are Prefix Codes, i.e
– the codes are assigned in such a manner that the
code assigned to one character is not prefix of code
assigned to any other character.
27
HUFFMAN CODE
1. The Huffman Code results in a code that is optimal
and is therefore a code with the highest efficiency.
2. The Huffman procedure is based on the following
observations regarding optimum prefix codes.
a) Symbols that occur more frequently (have a higher
probability of occurrence) will have the shortest code
words.
b) The two symbols that occur least frequently will have the
same length.
c) Code words corresponding to the two lowest probability
symbols differ only in the last bit.
28
STEPS IN HUFFMAN CODING
There are mainly two major
steps in Huffman Coding, i.e
1. Build a Huffman Tree from
input characters.
2. Traverse the Huffman Tree
and assign codes to
characters. Character Code
a 0
b 111
d 1011
d 100
r 110
29
! 1010
ASSIGN CODES
1. Start encoding from the last reduction on the
tree.
2. Assign 0 to the first digit of the code words for
all symbols associated with the first probability.
Assign 1 to the second probability.
3. Assign 0 and 1 to the second digit of the two
probabilities that were combined in previous
reduction step while retaining all assignments in
the previous stage
4. Repeat the process until the first column is
reached.
30
USING THE HUFFMAN CODE IN PRACTICE
• Assume that you a character file that you would like to compress. By parsing
through the list, a computer stablishes that there are 100,000 characters with a
frequency of occurrence as shown below.
Character Frequency
A 45,000
B 13,000
C 12,000
D 16,000
E 9,000
F 5,000
Total 100,000
• Determine a code that encodes the file using as few bits as possible.
SOLUTION 1: USING A HAMMER
A fixed code scheme would require 3 bits per character., i.e 2𝑣 ≥ 6
Therefore using this code we will store 3 x 100,000 = 300kbits
----------
If on the other hand we used a byte to store each character, we would have
required a file of size 8 x 100,000 or 800kbits
31
HUFFMAN CODE FOR THE EXAMPLE
1. Average Code Length, L = 2.24
2. Bits required = L x 100,000 = 224,000 bits
32
HOMEWORK
• Determine the Huffman code for the following
codes and their corresponding probabilities.
Character Probability Character Probability
A 0.05 F 0.3 0.3 0.3 0.3 0.4 0.6
B 0.15 C 0.2 0.2 0.2 0.3 0.3
C 0.2 G 0.1 0.1 0.2 0.2 0.3
D 0.05 B 0.15 0.15 0.15 0.2
E 0.15 E 0.15 0.15 0.15
F 0.3 A 0.05 0.1
G 0.1 D 0.05
33
FIRST, CREATE THE TREE
Character Probability Character Probability
F 0.3 0.3 0.3 0.3 0.4 0.6
A 0.05
C 0.2 0.2 0.2 0.3 0.3
B 0.15
B 0.15 0.15 0.2 0.2 0.3
C 0.2
E 0.15 0.15 0.15 0.2
D 0.05
G 0.1 0.1 0.15
E 0.15
F 0.3 A 0.05 0.1
G 0.1 D 0.05
34
USE ONLINE CALCULATOR TO CROSS-CHECK
35