SKET 3583
Digital Communication Systems
Source Coding
Huffman Code
• Prefix-free,variable length code that can
achieve the shortest average code length for an
alphabet
• Most frequent symbols have short codes
• Procedure
– List all symbols and probabilities in descending
order
– Merge branches with two lowest probabilities,
combine their probabilities
– Repeat until one branch is left
1
Huffman Code Example
0.4 0.4 0.4 0.4 0.6 1
a 1.0
0.2 0.2 0.2 0.4 1 0.6 0.4
b 0
0.1 0.2 0.2 1 0.2
c 0.4 0
d
0.1 0.1 1 0.2
0 The Code: n 2.4
0.2
0.1 a 11
e 10.2 0.1 0 b 00 Compression
0.1
f 0 c 101 Ratio:
d 100 3.0/2.4=1.25
e 011 Entropy:
f 010 2.32
• Step 1 : Arrange the messages in order of
decreasing prob.
message Prob.
s5 0.5
s3 0.25
s2 0.125
s1 0.0625
s4 0.0625
2
5
n ni pi
i 1
4(0.0625 ) 3(0.125) 2(0.25) 4(0.0625 ) 1(0.5)
1.875 bit/messag e
5
H X pi log pi
i 1
2(0.0625 ) log( 0.0625 ) 0.125 log( 0.125) 0.25 log( 0.25)
0.5 log( 0.5) 1.875 bit/message
Lempel-Ziv Algorithm
• Look through a code dictionary with
already coded segments
– If matches segment, send <dictionary address,
next character> and store segment + new
character in dictionary
– If no match, store in dictionary, send
<0,symbol>
3
Example LZ Coding
• Encode [a b a a b a b b b b b b b a b b b b b a]
Code Dictionary
Address Contents Encoded Packets
1 a <0,a> Note: 9 code
2 b <0,b> words, 3 bit
3 aa <1,a> address, 1
4 ba <2,a>
bit for new
5 bb <2,b>
6 bbb <5,b> character,
7 bba <5,a>
8 bbbb <6,b>
<4,->
Notes on Lempel-Ziv Codes
• Compression gains occur for long
sequences
• Other variations exist that encode data
using packets and run length encoding:
<# characters back, length, next character>
4
Lempel-Ziv (ZIP) Codes
• Huffman codes have some shortcomings
– Know symbol probability information a priori
– Coding tree must be known at coder/decoder
• Lempel-Ziv algorithm use text itself to
iteratively construct a sequence of variable
length code words
• Used in gzip, UNIX compress, LZW
algorithms