MULTIMEDIA SYSTEMS
ITU-07319
Introduction to Compression
OUTLINE
Introduction
What is Compression
Why is Compression Important in Multimedia Systems
Coding
Basic Types of Data compression
Classification of Data Compression Types
Performance Metrics
Lossless compression
INTRODUCTION
the number of bytes of data, Examples
stored, processed, and the ever-present,
transmitted keeps soaring, ever-growing
and in the process, keeps Internet;
transforming our world. the explosive
development of
Mega106, mobile
Giga109,
communications;
and the ever-
tera 1012,
Peta 1015, Exa 1018, increasing
Zetta 1021, importance of
yotta 1024 video
communication.
INTRODUCTION
It would not be practical to put images, let alone audio and
video, on websites if it were not for data compression
algorithms.
Cellular phones would not be able to provide
communication with increasing clarity without compression.
The advent of digital TV would not be possible without
compression.
INTRODUCTION TO DATA COMPRESSION
Data compression, which for a long time was the domain of
a relatively small group of engineers and scientists, is now
ubiquitous.
Make a call on your cell phone, and you are using
compression.
Surf on the Internet, and you are using (or wasting) your
time with assistance from compression.
Listen to music or watch a movie, and you are being
entertained courtesy of compression.
INTRODUCTION TO DATA COMPRESSION
Data
volume
Data Volume
After
Before
Compressio
Compression
n
Data Speed
Data Speed After
Before Compression
Compression
WHAT IS DATA COMPRESSION
Data compression is the art or science of representing
information in a compact form there by reducing the
number of bits needed to represent data.
We create these compact representations by identifying and
using structures that exist in the data.
Data can be characters in a text file, numbers that are samples
of speech or image waveforms, or sequences of numbers that
are generated by other processes.
WHY DATA COMPRESSION
The reason we need data compression is that:
The number of bytes required to represent multimedia
data can be huge (consider images, video and animation).
Compressing data can save storage capacity, speed up
file transfer, and decrease costs for storage hardware
and network bandwidth
ADVANTAGES OF DATA
COMPRESSION
Efficientutilization of storage space
Data storage and transmission cost money and this
cost increases with the amount of data available
Efficient Utilization of bandwidth for
transmission
bandwidth requirements are usually much greater
than availability and are costly
Therefore, Compression is a viable technique that
can reduce cost since the final compressed data
takes less memory (storage space) and less
DISADVANTAGES OF DATA
COMPRESSION
Despite the importance and need for compression, there
are some disadvantages:
Compressed data must be decompressed to be viewed
(or heard), thus extra processing is required.
Therefore, the design of data compression schemes
involve trade-offs between various factors, including
the degree of compression, the amount of distortion
introduced (if using a lossy compression scheme), and
the computational resources required to compress and
uncompress the data.
CODING AS A COMPRESSION TECHNIQUE
There are two kinds of coding of information
1. Source Coding - The process of source coding usually results in
the removal of redundancy in the signal
Coding for efficient representation of information. (Also called Compression)-
coding/transforming information using fewer bits than the original representation.
2. Channel Coding - This is done to ensure error free transmission of
information through a noisy medium.
It involves adding some extra bits to data before transmission and removal at the
receiving end
Therefore, compression is source coding.
SOURCE CODING VS CHANNEL CODING
sen Source coding Channel coding
d
Receive
SOURCE CODING
sen sen sen sen
d d d d
Receive
COMPRESSION
•
TECHNIQUES
There are two kinds of compression - Lossless and Lossy
• Compression techniques take advantage of redundancy in digital images.
• Types of redundancies
• Spatial redundancy: due to the correlation between neighbouring pixel
values.
• Spectral redundancy: due to the correlation between different color
planes or spectral bands.
• Lossy techniques, in addition, take advantage of HVS (Human Visual
System) properties.
CLASSIFICATION OF IMAGE
COMPRESSION
Lossless Coding techniques
(Entropy Coding)
Repetitive Lossless
Statistical Bitplane
Sequence Predictive
Encoding Encoding
Encoding Coding
RLE Huffman Arithmetic Differential pulse-code
LZW modulation (DPCM)
LOSSLESS
COMPRESSION
• In Lossless Compression, data are reconstructed after
compression without errors. i.e. no information is lost.
Typical application domains where you do not want to loose
information is compression of text, files and fax.
• Most used for image data in medical imaging or the compression
of maps in the context of land registry where no information loss
can be tolerated
• For all lossless compression techniques there is a well known
trade-off:
Compression Ratio – Coder Complexity – Coder Delay.
LOSSLESS
• . COMPRESSION
Lossless Compression is important in areas where very small differences or
diversion from the original data may result in different interpretation and
meanings that may harm human or environment. The following are such
examples where such differences can not be tolerated.
1. TEXT INFORMATION
AFTER
BEFORE COMPRESSION COMPRESSION
Do not send money Do now send
money
AFTER
BEFORE COMPRESSION
COMPRESSION
TZS1,000,000,000
TZS 100,000,000
LOSSLESS
• . RECORDSCOMPRESSION
2. MEDICAL
For example, suppose we compressed a radiological image in a lossy
fashion; and the difference between the reconstruction Y and the original X
was visually undetectable. If this image was later enhanced, the previously
undetectable differences may cause the appearance of artifacts that could
seriously mislead the radiologist and result in wrong diagnosis and wrong
treatment.
AFTER COMPRESSION
BEFORE COMPRESSION
THE BRAIN OF THE
A SIGNIFICANT LUMP ON
PATIENT LOOKS
A PATIENT BRAIN
NORMAL
3. GEOGRAPHICAL DATA
BEFORE COMPRESSION
AFTER COMPRESSION
PRESENCE OF LAND
LAND LOOKS NORMAL
DEGRADATION
LOSSLESS
• COMPRESSION
3.. GEOGRAPHICAL DATA
Data obtained from satellites often are processed later to
obtain different numerical indicators of vegetation,
deforestation, and so on. If the reconstructed data are not
identical to the original data, processing may result in
“enhancement” of the differences. It may not be possible to
go back and obtain the same data over again. Therefore, it
is not advisable to allow for any differences to appear in the
compression process.
BEFORE COMPRESSION
AFTER COMPRESSION
PRESENCE OF LAND
LAND LOOKS NORMAL
DEGRADATION
LOSSY
COMPRESSION
• Many situations require compression
where we want the reconstruction (Y) to
be identical to the original (X).
• In a number of situations it is possible to
relax this requirement in order to get
more compression.
• In these situations, we look to lossy
compression techniques
LOSSY
COMPRESSION
• In addition to the tradeoff between coding efficiency –
coder complexity – coding delay, the additional aspect
of compression quality arises with the use of lossy
methods.
• Quality is hard to assess since for many application
human perception is the only relevant criterion.
• However, there are some scenarios where other factors
need to be considered, e.g. when compressed data is
used in matching procedures like in biometrics.
CLASSIFICATION OF IMAGE
COMPRESSION
Lossy Coding techniques
(Source Coding)
Block Lossy
Transform Subband Fractal Vector
Truncation Predictive quantization
Coding Coding Coding Coding Coding
DPAM DFT Subbands
ADPCM DCT Wavelets
Delta Haar
modulation Hadamard
PERFORMANCE METRICS FOR
LOSSY TECHNIQUES
• Compression Ratio
(CR) size _ o f _ o rig in a l _ d a ta
CR
size _ o f _ co m p ressed _ d a ta
• Peak Signal to Noise Ratio (PSNR)
p ea k _ d a ta _ va lu e
P S N R 2 0 lo g 10
RM SE
• RMSE is the Root Mean Square Error between the
original and reconstructed data
• Speed (Complexity) of encoding and
decoding
EXAMPLES OF LOSSLESS CODING-
THE HUFFMAN ALGORITHM
The Huffman algorithm is now briefly summarised:
A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it
sorted at all times (e.g., ABCDE).
2. Repeat until the OPEN list has only one node left:
a) From OPEN pick two nodes having the lowest
frequencies/probabilities, create a parent node of
them. 24
HUFFMAN CODING contd..
b) Assign the sum of the children's frequencies/ probabilities to
the parent node and insert it into OPEN.
c) Assign code 0, 1 to the two branches of the tree, and delete
the children from OPEN.
Assuming a basic information represented by the following
scheme:
Symbol A B C D E
Count 15 7 6 6 5
25
HUFFMAN CODING contd..
Figure 2: The Huffman Coding
26
HUFFMAN CODING contd..
A=
B=
C=
D=
E=
HUFFMAN CODING contd..
The following points are worth noting
about the Huffman Coding algorithm:
Decoding for these algorithms is trivial
as long as the coding table (the
statistics) is sent before the data.
There is a bit overhead for sending this,
negligible if the data file is big.
HUFFMAN CODING contd..
Unique Prefix Property: no code is a prefix
to any other code (all symbols are at the leaf
nodes) -> great for decoder, unambiguous.
If prior statistics are available and accurate,
then Huffman coding is very good.
HUFFMAN CODING OF
IMAGES
With Huffman coing; in order to encode images,
the following is done:
Divide an image up into 8x8 blocks
Each block is a symbol to be coded
Compute Huffman codes for set of blocks
Encode blocks accordingly
ARITHMETIC CODING
Huffman coding and the like use an integer number (k) of bits for
each symbol, hence k is never less than 1. Sometimes, e.g., when
sending a 1-bit image, compression becomes impossible.
Map all possible length 2, 3 … messages to intervals in the range
[0..1] (in general, need – log p bits to represent interval of size p).
To encode message, just send enough bits of a binary fraction that
uniquely specifies the interval.
ARITHMETIC CODING
Problem: how to determine probabilities?
Simple idea is to use adaptive model: Start with
guess of symbol frequencies. Update frequency
with each new symbol.
Another idea is to take account of intersymbol
probabilities, e.g., Prediction by Partial Matching.
LEMPEL-ZIV-WELCH (LZW) ALGORITHM
The LZW algorithm is a very common compression
technique.
Suppose we want to encode the Oxford Concise English
dictionary which contains about 159,000 entries.
Why not just transmit each word as an 18 bit number?
LEMPEL-ZIV-WELCH (LZW) ALGORITHM
Problems:
Too many bits,
everyone needs a dictionary,
only works for English text.
Solution:
Find a way to build the dictionary
adaptively.
LEMPEL-ZIV-WELCH (LZW) ALGORITHM
Original methods due to Ziv and Lempel in
1977 and 1978. Terry Welch improved the
scheme in 1984 (called LZW compression).
It is used in UNIX compress -1D token stream
(similar to below)
It used in GIF compression - 2D window
tokens (treat image as with Huffman
Coding).
THE LZW COMPRESSION ALGORITHM CAN
BE SUMMARISED AS FOLLOWS:
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary
w = wk;
else
add wk to the dictionary;
output the code for w;
w = k;
}
THE LZW DECOMPRESSION ALGORITHM IS AS
FOLLOWS:
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}
ENTROPY ENCODING SUMMARY
Huffman maps fixed length symbols to variable length codes.
Optimal only when symbol probabilities are powers of 2.
Arithmetic maps entire message to real number range based
on statistics. Theoretically optimal for long messages, but
optimality depends on data model. Also can be CPU/memory
intensive.
38
ENTROPY ENCODING SUMMARY
Lempel-Ziv-Welch is a dictionary-based compression
method. It maps a variable number of symbols to a fixed
length code.
Adaptive algorithms do not need a priori estimation of
probabilities, they are more useful in real time applications.
39
BASICS OF INFORMATION
THEORY
The entropy η of an information source with alphabet S = {s1, s2, . . . ,
sn} is:
n
1
H ( S ) pi log 2
i 1 pi
n
pi log 2 pi
i 1
pi – probability that symbol si will occur in S.
log 1
2 pi
– indicates the amount of information ( self-information as defined
by Shannon) contained in si, which corresponds to the number
of bits needed to encode si.
40
BASICS OF INFORMATION
THEORY
What is entropy? is a measure of the number of specific
ways in which a system may be arranged, commonly
understood as a measure of the disorder of a system.
As an example, if the information source S is a gray-
level digital image, each si is a gray-level intensity
ranging from 0 to (2k − 1), where k is the number of bits
used to represent each pixel in an uncompressed image.
We need to find the entropy of this image; which the
number of bits to represent the image after
compression.
DISTRIBUTION OF GRAY-LEVEL
INTENSITIES
Figure 3: Histograms for Two Gray-level Images.
Figure 3 (a) shows the histogram of an image with
uniform distribution of gray-level intensities, i.e., ∀i
pi = 1/256. Hence, the entropy of this image is:
log2256 = 8
DISTRIBUTION OF GRAY-LEVEL
INTENSITIES
Figure 3 Histograms for Two Gray-level Images.
Figure 3 (b) shows the histogram of an image with
two possible values (binary image). Its entropy is
0.92.
DISTRIBUTION OF GRAY-LEVEL
INTENSITIES
It is interesting to observe that in the above
uniform-distribution example (fig. 1 (a)) we
found that α = 8, the minimum average
number of bits to represent each gray-level
intensity is at least 8. No compression is
possible for this image.
In the context of imaging, this will correspond
to the “worst case,” where neighboring pixel
values have no similarity.
44
RUN-LENGTH CODING
RLC is one of the simplest forms of data
compression.
The basic idea is that if the information
source has the property that symbols tend to
form continuous groups, then such symbol
and the length of the group can be coded.
Consider a screen containing plain black text
on a solid white background. 45
RUN-LENGTH CODING
There will be many long runs of white pixels in the blank
space, and many short runs of black pixels within the text.
Let us take a hypothetical single scan line, with B
representing a black pixel and W representing white:
WWWWWBWWWWBBBWWWWWWBWWW
If we apply the run-length encoding (RLE) data compression
algorithm to the above hypothetical scan line, we get the
following: 5W1B4W3B6W1B3W
The run-length code represents the original 21 characters in
only 14.
46
VARIABLE-LENGTH CODING
variable-length coding (VLC) is one of the
best-known entropy coding methods
This is based on the Shannon–Fano
algorithm, Huffman coding, and adaptive
Huffman coding.
47
SHANNON–FANO ALGORITHM
To illustrate the algorithm, let us suppose the
symbols to be coded are the characters in the
word HELLO.
The frequency count of the symbols is
Symbol H E L O
Count 1 1 2 1
48
SHANNON–FANO ALGORITHM
The encoding steps of the Shannon–Fano
algorithm can be presented in the following top-
down manner:
1. Sort the symbols according to the frequency
count of their occurrences.
2. Recursively divide the symbols into two parts,
each with approximately the same number of
counts, until all parts contain only one symbol.
49
SHANNON–FANO ALGORITHM
A natural way of implementing the above
procedure is to build a binary tree.
As a convention, let us assign bit 0 to its left
branches and 1 to the right branches.
Initially, the symbols are sorted as LHEO.
50
SHANNON–FANO ALGORITHM
As Fig. 2 shows, the first division yields two
parts: L with a count of 2, denoted as L:(2); and
H, E and O with a total count of 3, denoted as H,
E, O:(3).
The second division yields H:(1) and E, O:(2).
The last division is E:(1) and O:(1).
51
SHANNON–FANO ALGORITHM
Figure 4: Coding Tree for HELLO by Shannon-Fano
52
Table 1: Result of Performing Shannon-Fano on
HELLO
Symbol Count Log2 p1 Code # of bits
i used
L 2 1.32 0 2
H 1 2.32 10 2
E 1 2.32 110 3
O 1 2.32 111 3
TOTAL # of bits: 10
53
Figure 5: Another coding tree for HELLO by Shannon-Fano
54
Table 2: Another Result of Performing Shannon-Fano
on HELLO (see Figure 5)
Symbol Count Log2 1 Code # of bits used
pi
L 2 1.32 00 4
H 1 2.32 01 2
E 1 2.32 10 2
O 1 2.32 11 2
TOTAL # of bits: 10
55
SHANNON–FANO ALGORITHM
1) The Shannon–Fano 2) The Huffman
algorithm delivers algorithm requires
satisfactory coding results prior statistical
for data compression, but knowledge about
it was soon outperformed the information
and overtaken by the source, and such
Huffman coding method. information is often
not available.
3) This is particularly true in multimedia applications,
where future data is unknown before its arrival, as
for example in live (or streaming) audio
56
and video.
SHANNON–FANO
ALGORITHM
Even when the statistics are available, the
transmission of the symbol table could
represent heavy overhead.
The solution is to use adaptive Huffman
coding compression algorithms, in which
statistics are gathered and updated
dynamically as the data stream arrives.
57
DICTIONARY-BASED CODING
The Lempel- Unlike variable-length
Ziv-Welch coding, in which the
(LZW) lengths of the codewords
algorithm are different, LZW uses
employs an fixed-length codewords
adaptive, to represent variable
dictionary- length strings of
based symbols/characters that
compression commonly occur together,
technique. such as words in English
text. 58
DICTIONARY-BASED CODING
As in the other LZW proceeds by
adaptive placing longer and
compression longer repeated entries
techniques, the LZW into a dictionary, then
encoder and decoder emitting (sending) the
builds up the same code for an element
dictionary dynamically rather than the string
while receiving the data
itself, if the element
the encoder and the has already been
decoder both develop placed in the dictionary.
the same dictionary.
59
DICTIONARY-BASED CODING
Remember, the LZW is an adaptive
algorithm, in which the encoder and
decoder independently build their own
string tables. Hence, there is no overhead
involving transmitting the string table.
LZW is used in many applications, such
as UNIX compress, GIF for images,
WinZip, and others.
60