ELE5211: Advanced Topics in CE
Module Five
(Multimedia – Compression Algorithms)
Tutor: Dr. Hassan A. Bashir1
Information Theory and
Compression Algorithms
2
Information Theory and
Compression Algorithms
3
Information Theory (Shannon Theory)
4
Information Theory (Shannon Theory)
5
Entropy and Code Length
6
Run-Length Coding (RLC)
7
Run-Length Coding (RLC)
Run length encoding (RLC) is a technique that is not
so widely used these days
but it is a great way to get a feel for some of the
issues around using compression.
Imagine we have the following simple black and
white image.
One very simple way a computer can store this image in binary is by using a
format where ‘1' means white and ‘0' means black
This is a "bitmap", because we've mapped the pixels onto the values of bits
Using this method, the above image would be represented in the following way:
8
Run-Length Coding (RLC)
100111101111001 1, 2, 4, 1, 4, 2, 1
011111000111110 0, 1, 5, 3, 5, 1
111110000011111 5, 5, 5
111100000001111 ….
111000000000111
….
Can we represent the same image using fewer bits,
but still be able to reconstruct the original image?
Yes, we can. One of the many methods is called run length encoding.
Replace each row with numbers that say how many consecutive pixels
are the same colour,
Always starting with the number of white pixels.
For example, the first row in the image above contains one white, two black,
four white, one black, four white, two black, and one white pixel. 9
Decompression of RLC
Exercise 1: 4, 11, 3
4, 9, 2, 1, 2
Can you decompress the following code? 4, 9, 2, 1, 2
How many pixels were there in the original image? 4, 11, 3
How many numbers were used to represent those 4, 9, 5
pixels? 4, 9, 5
How much space have we saved using this alternate 5, 7, 6
representation, and how can we measure it? 0, 17, 1
1, 15, 2
RLC Usage
The main place that black and white scanned images are used now is on fax
machines, which use this approach to compression. One reason that it works so
well with scanned pages is that the number of consecutive white pixels is huge.
In fact, there will be entire scanned lines that are nothing but white pixels. A
typical fax page is 200 pixels across or more, so replacing 200 bits with one
10
number is a big saving.
Variable Length Coding (VLC)
11
Variable Length Coding (VLC)
12
Variable Length Coding (VLC)
13
Variable Length Coding
14
Variable Length Coding
15
Huffman Coding
16
Huffman Coding
17
Huffman Coding
18
Properties of Huffman Coding
19
Fixed vs. Variable Length Coding
20
Exercise: Shannon vs. Huffman Coding
Shannon-Fano: 89 bits
Huffman: 87 bits
Fix Length Coding : 117 bits
21
Extended Huffman Coding
22
Extended Huffman Coding
23
Adaptive Huffman Coding
24
Adaptive Huffman Coding
25
Adaptive Huffman Coding
(Tree Updating)
26
27
Adaptive Huffman Coding
28
Adaptive Huffman Coding
29
Adaptive Huffman Coding
30
31
Adaptive Huffman Coding
32
Dictionary-based Coding
• Lempel–Ziv–Welch is a universal lossless data compression
algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.
• LZW compression is the compression of a file into a smaller file
using a table-based lookup algorithm.
• LZW compression works by:
• reading a sequence of symbols,
• grouping the symbols into strings, and
• converting the strings into codes.
• Because the codes take up less space than the strings they
replace, we get compression.
33
Dictionary-based Coding
• Two commonly-used file formats in which LZW compression is
used are
• the GIF image format served from Web sites and
• the TIFF image format.
• Both ZIP and LZW are lossless compression methods.
• That means that no data is being lost in the compression,
unlike a lossy format like JPG.
• You can open and save a TIFF file as many times you like without
degrading the image.
• If you try that with JPG, the image quality will deteriorate more
each time.
34
Dictionary-based Coding
35
36
Dictionary-based Coding
37
38
Dictionary-based Coding
39
40
Dictionary-based Coding
(LZW - Remarks)
41
Arithmetic Coding
• Arithmetic coding (AC) is a form of entropy encoding used in
lossless data compression.
• Normally, a string of characters such as the words "hello there" is
represented using a fixed number of bits per character, as in the
ASCII code.
• Comparison of AC with Huffman
• Arithmetic Coding is superior to Huffman coding in the sense
that it can assign a fractional number of bits for the codewords
of the symbols, whereas
• in Huffman coding an integral number of bits have to be
assigned to a codeword of a symbol.
42
Arithmetic Coding
43
Arithmetic Coding
44
Arithmetic Coding
45
Arithmetic Code for: CAEE$
46
Arithmetic Coding
47
Arithmetic Coding
48
Arithmetic Coding
For the above example, low = 0.33184, high = 0.3322.
If we assign 1 to the 1st binary fraction bit, it would be 0.1 in binary,
and its decimal value is:
value(code) = value(0.1) = 0.5 > high
Hence, we assign 0 to the first bit. Since value(0.0) = 0 < low, the
while loop continues.
Assigning 1 to the 2nd bit makes a binary code 0.01 and value(0.01) =
0.25, which is less than high, so it is accepted. Since it is still true that
value(0.01) < low, the iteration continues.
Eventually, the binary codeword generated is: 0.01010101
which is 2-2+2-4+2-6+2-8 = 0.33203125 49
Arithmetic Coding
50
Arithmetic Coding
51
Lossless Image Compression
52
Lossless Image Compression
53
Lossless JPEG
54
Lossless JPEG
55
Lossless JPEG
56
Lossless JPEG
57
Homework
Use Arithmetic Coding to encode your Surname
58