Source Coding
1
Reference: En. Mohd Nazri ahmud
Digital Communication Blocks
Source coding deals with the task of forming efficient descriptions of information sources.
Efficient descriptions permit a reduction in the memory or bandwidth resources required to
store or to transport sample realizations of the source data.
Source Coding
Source encoding is the efficient representation of data generated by a
source.
Consider a discrete source whose output of k different symbols sk is converted by the
source encoder into a block of 0s and 1s denoted by bk
Examples: (1) Output of a 12-bit
digital-to-analog converter (which
outputs one of 4096 discrete
levels)
(2) 8-bit ASCII characters emitted
by a computer keyboard A discrete source is said to be memoryless if the symbols emitted by the
source are statistically independent.
For an efficient source encoding, knowledge of the statistics of the source is required.
If some source symbols are more probable than others, we can assign short code
words to frequent symbols and long code words to rare source symbols.
Assume that the kth symbol, sk occurs with probability pk , k=0,1…..K-1.
Let the binary code word assigned to symbol sk have length lk (in bits) K −1
Therefore the average code-word length of the source encoder is given by L = pk lk
k =0
Source Coding
Let Lmin denotes the minimum possible value of code-word length
Lmin
The Coding efficiency of the source encoder is given by =
L
bits per symbol
where E{X} is the expected value of X.
bits per symbol
4
Data Compaction
A waveform source is a random process of some random variable. We classically
consider this random variable to be time, so that the waveform of interest is a time varying
waveform. Important examples of time-varying waveforms are the outputs of transducers
used in process control, such as temperature, pressure, velocity, and flow rates, speech and
music.
Data compaction is important because signals generated contain a significant
amount of redundant info and waste communication resources during
transmission.
For efficient transmission, the redundant info should be removed prior to
transmission.
Data compaction is achieved by assigning short description to the most
frequent outcomes of the source output and longer description to the less
frequent ones.
Some source-coding schemes for data compaction:-
• Prefix coding
• The Huffman Coding
5
• The Lempel-Ziv Coding
Prefix Coding
A prefix code is a code in which no code word is the prefix of any
other code word
Example: Consider the three source codes described below
Source Probability Code I Code II Code III
Symbol of
Occurrence
s0 0.5 0 0 0
s1 0.25 1 10 01
s2 0.125 00 110 011
s3 0.125 11 111 0111
6
Prefix Coding
Source Probability Code I Code II Code III
Symbol of
Occurrence
s0 0.5 0 0 0
s1 0.25 1 10 01
s2 0.125 00 110 011
s3 0.125 11 111 0111
Is Code I a prefix code?
It is NOT a prefix code since the bit 0, the code word for s0, is a
prefix of 00, the code word for s2 and the bit 1, the code word for s1,
is a prefix of 11, the code word for s3.
Is Code II a prefix code?
A prefix code has the important property
Yes that it is always uniquely decodable
Is Code III a prefix code?
No
7
Prefix Coding - Example
Source Code I Code II Code III Code IV
Symbol ✓ x x ✓
s0 0 0 0 00
s1 10 01 01 01
s2 110 001 011 10
s3 1110 0010 110 110
s4 1111 0011 111 111
Prefix code?
8
Huffman Coding
The Huffman code is a prefix-free, variable-length code that can achieve the
shortest average code length for a given input alphabet.
Basic idea : Assign to each symbol a sequence of bits roughly equal in length
to the amount of information conveyed by the symbol.
Huffman encoding algorithm:
Step 1: The source symbols are listed in order of decreasing probability.
The two source symbols of lowest probability are assigned a 0 and 1.
Step 2: These two source symbols are regarded as being combined into
a new source symbol with probability equal to the sum of the two original
probabilities. The probability of the new symbol is placed in the list in
accordance with its value.
The procedure is repeated until we are left with a final list of symbols of
only two for which a 0 and 1 are assigned.
The code for each source symbol is found by working backward and
tracing the sequence of 0s and 1s assigned to that symbol as well as its
9
successors.
Huffman Coding – Average Code Length
K −1
L = pk lk
k =0
= 0.4(2) + 0.2(2) + 0.2(2) + 0.1(3) + 0.1(3)
= 2.2
10
Huffman Coding – Exercise
Symbol S0 S1 S2
Probability 0.7 0.15 0.15
Compute the Huffman code.
What is the average code-word length?
11
Huffman Coding – Exercise
12
Huffman Coding – variations
When the probability of the combined symbol is found to equal another probability
in the list, we may proceed by placing the probability of the new symbol as high as
possible or as low as possible.
13
Huffman Coding – Two variations
14
Huffman Coding – Two variations
15
Huffman Coding – Two variations
Which one to choose?
16
Huffman Coding – Exercise
Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625
Compute the Huffman code by placing the probability of the combined symbol
as high as possible.
What is the average code-word length?
17
Huffman Coding – Exercise Answer
Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625
18
Ternary Huffman Coding
19
Huffman Coding – Exercise
20
Huffman Encoding Efficiency
– Self Information or Entropy
• H(X) (The best possible average number of bits)
– Average number of bits per letter
nk = number of bits per symbol
So the efficiency =
Redundancy = 1 - Efficiency
Take-home message: Huffman Coding
22
Lempel-Ziv Coding
• A major difficulty in using the Huffman code is that the symbol probabilities must
be known or estimated, and both the encoder and decoder must know the coding
tree.
• Lempel-Ziv code is an adaptive coding technique that does not require prior
knowledge of symbol probabilities
• Lempel-Ziv coding is the basis of well-known ZIP for data compression (Lossless
coding).
• Perform coding of groups of characters of varying lengths.
• The code assumes that a dictionary exists containing already-coded segments of a
sequence of alphabet symbols. Data is encoded by looking through the existing
dictionary for a match to the next short segment in the sequence being coded.
23
24
25
26
• LZ Coding Example:
Note: Encoded Blocks = Binary code of the location of the Prefix and value of the last
bit position (i.e., code from the subsequence).
In this example, the binary encoded block 1101 in position 9. The last bit, 1, is
the innovation symbol. The remaining bits, 110, point to the root subsequence 10
in position 6. Hence, the block 1101 is decoded into 101, which is correct.
Lempel–Ziv algorithm uses fixed-length codes to represent a variable number of
source symbols; this feature makes the Lempel–Ziv code suitable for
27
synchronous transmission.
Lempel-Ziv Coding – Exercise
Encode the following sequence using Lempel-Ziv algorithm assuming that 0
and 1 are already stored
11101001100010110100….
28
Lempel-Ziv Coding – Exercise Answer
Encode the following sequence using Lempel-Ziv algorithm assuming that 0
and 1 are already stored
11101001100010110100….
29
Lempel-Ziv Coding – Exercise Answer
Encode the following sequence using Lempel-Ziv algorithm assuming that 0
and 1 are already stored
11101001100010110100….
30
Lempel-Ziv Coding – Exercise Answer
Encode the following sequence using Lempel-Ziv algorithm assuming that 0
and 1 are already stored
11101001100010110100….
31
Shannon –Fano Coding Technique
Algorithm.
Step 1: Arrange all messages in descending order of
probability.
Step 2: Divide the Seq. in two groups in such a way that sum
of probabilities in each group is nearly same.
Step 3: Assign 0 to Upper group and 1 to Lower group.
Step 4: Repeat the Step 2 and 3 for Group 1 and 2 and
So on……..
SF Coding Example-1
• Shannon–Fano does not always produce optimal prefix codes. For this
reason, Shannon–Fano is almost never used.
• Huffman coding is almost as computationally simple and produces prefix
codes that always achieve the lowest expected code word length.
• Shannon–Fano coding is used in the IMPLODE compression method, which
is part of the ZIP file format. 33
SF Example-2
Messages
Pi Coding Procedure No. Of Code
Mi Bits
M1 ½ 0 1 0
M2 1/8/ 1 0 0 3 100
M3 1/8 1 0 1 3 101
M4 1/16 1 1 0 0 4 1100
M5 1/16 1 1 0 1 4 1101
M6 1/16 1 1 1 0 4 1110
M7 1/32 1 1 1 1 0 5 11110
M8 1/32 1 1 1 1 1 5 11111
35
Proof of Source Coding Theorem
Average code-
word length of
the source
encoder
K −1
L = pk lk
k =0
36
Courtesy:Archana c
Cont’d…
Use natural logarithm
We have
Substitute
Multiplied in both sides
37
Cont’d…
Use natural logarithm
Using Kraft's Mc Millan inequality
38
Cont’d…
39
Thanks
40