Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views40 pages

Digital Comm Class Notes Personal

The document discusses source coding, which involves creating efficient representations of information to reduce memory and bandwidth usage. It covers techniques such as Huffman coding and Lempel-Ziv coding, explaining their algorithms and applications in data compression. Additionally, it highlights the importance of understanding symbol probabilities for effective coding and introduces the Shannon-Fano coding technique, noting its limitations compared to Huffman coding.

Uploaded by

Anonymous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views40 pages

Digital Comm Class Notes Personal

The document discusses source coding, which involves creating efficient representations of information to reduce memory and bandwidth usage. It covers techniques such as Huffman coding and Lempel-Ziv coding, explaining their algorithms and applications in data compression. Additionally, it highlights the importance of understanding symbol probabilities for effective coding and introduces the Shannon-Fano coding technique, noting its limitations compared to Huffman coding.

Uploaded by

Anonymous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Source Coding

1
Reference: En. Mohd Nazri ahmud
Digital Communication Blocks

Source coding deals with the task of forming efficient descriptions of information sources.
Efficient descriptions permit a reduction in the memory or bandwidth resources required to
store or to transport sample realizations of the source data.
Source Coding
Source encoding is the efficient representation of data generated by a
source.

Consider a discrete source whose output of k different symbols sk is converted by the


source encoder into a block of 0s and 1s denoted by bk

Examples: (1) Output of a 12-bit


digital-to-analog converter (which
outputs one of 4096 discrete
levels)
(2) 8-bit ASCII characters emitted
by a computer keyboard A discrete source is said to be memoryless if the symbols emitted by the
source are statistically independent.

For an efficient source encoding, knowledge of the statistics of the source is required.

If some source symbols are more probable than others, we can assign short code
words to frequent symbols and long code words to rare source symbols.

Assume that the kth symbol, sk occurs with probability pk , k=0,1…..K-1.

Let the binary code word assigned to symbol sk have length lk (in bits) K −1
Therefore the average code-word length of the source encoder is given by L =  pk lk
k =0
Source Coding

Let Lmin denotes the minimum possible value of code-word length


Lmin
The Coding efficiency of the source encoder is given by =
L

bits per symbol

where E{X} is the expected value of X.

bits per symbol

4
Data Compaction
A waveform source is a random process of some random variable. We classically
consider this random variable to be time, so that the waveform of interest is a time varying
waveform. Important examples of time-varying waveforms are the outputs of transducers
used in process control, such as temperature, pressure, velocity, and flow rates, speech and
music.

Data compaction is important because signals generated contain a significant


amount of redundant info and waste communication resources during
transmission.

For efficient transmission, the redundant info should be removed prior to


transmission.

Data compaction is achieved by assigning short description to the most


frequent outcomes of the source output and longer description to the less
frequent ones.

Some source-coding schemes for data compaction:-


• Prefix coding
• The Huffman Coding
5
• The Lempel-Ziv Coding
Prefix Coding

A prefix code is a code in which no code word is the prefix of any


other code word

Example: Consider the three source codes described below

Source Probability Code I Code II Code III


Symbol of
Occurrence
s0 0.5 0 0 0
s1 0.25 1 10 01
s2 0.125 00 110 011
s3 0.125 11 111 0111

6
Prefix Coding
Source Probability Code I Code II Code III
Symbol of
Occurrence
s0 0.5 0 0 0
s1 0.25 1 10 01
s2 0.125 00 110 011
s3 0.125 11 111 0111

Is Code I a prefix code?

It is NOT a prefix code since the bit 0, the code word for s0, is a
prefix of 00, the code word for s2 and the bit 1, the code word for s1,
is a prefix of 11, the code word for s3.
Is Code II a prefix code?
A prefix code has the important property
Yes that it is always uniquely decodable
Is Code III a prefix code?
No

7
Prefix Coding - Example
Source Code I Code II Code III Code IV
Symbol ✓ x x ✓
s0 0 0 0 00
s1 10 01 01 01
s2 110 001 011 10
s3 1110 0010 110 110
s4 1111 0011 111 111

Prefix code?

8
Huffman Coding
The Huffman code is a prefix-free, variable-length code that can achieve the
shortest average code length for a given input alphabet.

Basic idea : Assign to each symbol a sequence of bits roughly equal in length
to the amount of information conveyed by the symbol.
Huffman encoding algorithm:
Step 1: The source symbols are listed in order of decreasing probability.
The two source symbols of lowest probability are assigned a 0 and 1.

Step 2: These two source symbols are regarded as being combined into
a new source symbol with probability equal to the sum of the two original
probabilities. The probability of the new symbol is placed in the list in
accordance with its value.

The procedure is repeated until we are left with a final list of symbols of
only two for which a 0 and 1 are assigned.

The code for each source symbol is found by working backward and
tracing the sequence of 0s and 1s assigned to that symbol as well as its
9
successors.
Huffman Coding – Average Code Length

K −1
L =  pk lk
k =0

= 0.4(2) + 0.2(2) + 0.2(2) + 0.1(3) + 0.1(3)


= 2.2

10
Huffman Coding – Exercise
Symbol S0 S1 S2
Probability 0.7 0.15 0.15

Compute the Huffman code.

What is the average code-word length?

11
Huffman Coding – Exercise

12
Huffman Coding – variations
When the probability of the combined symbol is found to equal another probability
in the list, we may proceed by placing the probability of the new symbol as high as
possible or as low as possible.

13
Huffman Coding – Two variations

14
Huffman Coding – Two variations

15
Huffman Coding – Two variations

Which one to choose?

16
Huffman Coding – Exercise

Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625

Compute the Huffman code by placing the probability of the combined symbol
as high as possible.

What is the average code-word length?

17
Huffman Coding – Exercise Answer

Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625

18
Ternary Huffman Coding

19
Huffman Coding – Exercise

20
Huffman Encoding Efficiency
– Self Information or Entropy
• H(X) (The best possible average number of bits)
– Average number of bits per letter

nk = number of bits per symbol

So the efficiency =

Redundancy = 1 - Efficiency
Take-home message: Huffman Coding

22
Lempel-Ziv Coding
• A major difficulty in using the Huffman code is that the symbol probabilities must
be known or estimated, and both the encoder and decoder must know the coding
tree.

• Lempel-Ziv code is an adaptive coding technique that does not require prior
knowledge of symbol probabilities

• Lempel-Ziv coding is the basis of well-known ZIP for data compression (Lossless
coding).

• Perform coding of groups of characters of varying lengths.

• The code assumes that a dictionary exists containing already-coded segments of a


sequence of alphabet symbols. Data is encoded by looking through the existing
dictionary for a match to the next short segment in the sequence being coded.

23
24
25
26
• LZ Coding Example:
Note: Encoded Blocks = Binary code of the location of the Prefix and value of the last
bit position (i.e., code from the subsequence).

In this example, the binary encoded block 1101 in position 9. The last bit, 1, is
the innovation symbol. The remaining bits, 110, point to the root subsequence 10
in position 6. Hence, the block 1101 is decoded into 101, which is correct.

Lempel–Ziv algorithm uses fixed-length codes to represent a variable number of


source symbols; this feature makes the Lempel–Ziv code suitable for
27
synchronous transmission.
Lempel-Ziv Coding – Exercise

Encode the following sequence using Lempel-Ziv algorithm assuming that 0


and 1 are already stored

11101001100010110100….

28
Lempel-Ziv Coding – Exercise Answer

Encode the following sequence using Lempel-Ziv algorithm assuming that 0


and 1 are already stored

11101001100010110100….

29
Lempel-Ziv Coding – Exercise Answer

Encode the following sequence using Lempel-Ziv algorithm assuming that 0


and 1 are already stored

11101001100010110100….

30
Lempel-Ziv Coding – Exercise Answer

Encode the following sequence using Lempel-Ziv algorithm assuming that 0


and 1 are already stored

11101001100010110100….

31
Shannon –Fano Coding Technique
Algorithm.
Step 1: Arrange all messages in descending order of
probability.

Step 2: Divide the Seq. in two groups in such a way that sum
of probabilities in each group is nearly same.

Step 3: Assign 0 to Upper group and 1 to Lower group.

Step 4: Repeat the Step 2 and 3 for Group 1 and 2 and


So on……..
SF Coding Example-1

• Shannon–Fano does not always produce optimal prefix codes. For this
reason, Shannon–Fano is almost never used.
• Huffman coding is almost as computationally simple and produces prefix
codes that always achieve the lowest expected code word length.
• Shannon–Fano coding is used in the IMPLODE compression method, which
is part of the ZIP file format. 33
SF Example-2
Messages
Pi Coding Procedure No. Of Code
Mi Bits

M1 ½ 0 1 0

M2 1/8/ 1 0 0 3 100

M3 1/8 1 0 1 3 101

M4 1/16 1 1 0 0 4 1100

M5 1/16 1 1 0 1 4 1101

M6 1/16 1 1 1 0 4 1110

M7 1/32 1 1 1 1 0 5 11110

M8 1/32 1 1 1 1 1 5 11111
35
Proof of Source Coding Theorem
Average code-
word length of
the source
encoder
K −1
L =  pk lk
k =0

36
Courtesy:Archana c
Cont’d…

Use natural logarithm

We have

Substitute
Multiplied in both sides

37
Cont’d…

Use natural logarithm

Using Kraft's Mc Millan inequality

38
Cont’d…

39
Thanks
40

You might also like