SUBMITTED TO:-
PROF. SHILPA
SUBMITTED BY:-
BINTI LAMBA
17103018
ABSTRACT
Data compression is the method of reducing the number of bits/space needed to represent
data.
Why data compression? There are many reasons for example
Many people have hobby of gather data and dislike to delete any of it. So, in due
course it is going to full/overflow.
Large Data need a long time for data transfer people dislike that.
There are many well-known methods for data compression. They are based on dissimilar ideas
and are suitable for different types of data. But they all have the same principle that they compress
data by eliminating the redundancy in the original data.
CONTENTS
INTRODUCTION
TYPES OF DATA COMPRESSION
LOSSY AND LOSSLESS COMPRESSION
RUN LENGTH ENCODING
HUFFMAN CODING
JEPG
CONCLUSION
REFERENCES
INTRODUCTION
Data compression is the method of converting raw data into a new data stream that has a
smaller size. Which means is the technique of reducing the number of bits/space needed to
represent data.
Why data compression? There are many reasons for example
Many people have hobby of gather data and dislike to delete any of it. So, in due course it is
going to full/overflow.
Large Data need a long time for data transfer people dislike that..
Overall we can say
Make ideal use of limited storage space
Save time and help to boost resources
• In sending data over communication line: less time to transmit and less
storage to host
• Reduce the memory required for storage
• Improve the data access rate from storage device
TYPES OF DATA COMPRESSION
There are two types of data compression.
1. Lossy compression
2. Lossless compression
LOSSY COMPRESSION
In lossy compression some information is missing during the processing, where the data is
stored into relevant and irrelevant data. The irrelevant (unimportant) data is removed by the system.
It provides much higher compression degree but there will be some loss of information compared to
the original source file. The main advantage is that the loss cannot be perceptible to. Visually lossless
compression is cantered on understanding about colour images and human perception.
LOSSLESS COMPRESSION
• In lossless methods, original data and the data after compression and decompression are
precisely the same.
• Redundant data is eliminated in compression and added during decompression.
• Lossless methods are used when we can’t afford to lose any data
LOSSLESS COMPRESSION
RUN LENGTH ENCODING
The elementary idea behind this method to data compression is this: if a data item take place
n consecutive times in the input data substitute the n occurrences with a single pair <n d>. The n
consecutive occurrences of a data item (d) are called run length of n and this method is called run
length encoding or RLE.
HUFFMAN CODING
A frequently used method for data compression is Huffman coding. The method starts by
structure a list of all the alphabet symbols in descending order of their probabilities. It then builds a
tree with a symbol at every leaf from the bottom up.
By doing this we can allocate fewer bits to symbols that occur more frequently and more bits to
symbols appear less often. Which mean data get further compressed.
⚫ Algorithm:
1. Make a leaf node for respectively code symbol
Add the generation probability of each symbol to the leaf node
1. Take the two leaf nodes with the minimum probability and connect them into a
new node
• Add 1/ 0 to each of the two branches
• The probability of the new node is the addition of the probabilities of the
two joining nodes
2. If there is only one node left, the code structure is finished. If not, go back to (2)
HUFFMAN CODING EXAMPLE
LOSSY COMPRESSION
JPEG
(JOINT PHOTOGRAPHIC EXPERTS GROUP)
JPEG compression is used to compress pictures and graphics. JPEG is a frequently used method of
lossy compression for digital images. The amount of compression can be attuned, tolerating a
selectable trade-off amid storage size and image quality.
Basic idea:
Change the picture into a linear/vector sets of numbers that exposes the
redundancies.
The redundancies is then detached by one of lossless compression
approaches.
Discrete Cosine Transform
Discrete Cosine Transform, a procedure for demonstrating waveform data as a weighted sum
of cosines. The discrete cosine transform assistance separate the image into parts of differing
significance (with respect to the image's visual feature). The two dimensional DCT is the process use
in practice. The pixels of an image are linked in two dimensions, not just in one dimension. This is why
image compression approaches use the two dimensional DCT, given by
Quantization
The next phase is the Quantization process which is the key source of the Lossy Compression. The
values in the quantization table are selected to preserve low-frequency data and discard high-
frequency feature as humans are less precarious to the loss of information in this extent.
Each DCT term is divided by the corresponding place in the Quantisation table and then pointed to
the adjacent figure as shown below. In each table the low frequency terms are in the top left hand
corner and the high frequency terms are in the bottom right hand corner.
Compression
The last step is Compression
Quantized values are read from the table and redundant 0s are eliminated.
To group the 0s together, the table is read diagonally in a zigzag style. The reason is if the
table doesn’t have fine changes, the bottom right corner of the table is all 0s.
JPEG usually uses lossless run-length encoding at the compression phase
Example:-
CONCLUSION
Data compression is a subject of much significance and many applications. The Methods of
data compression studied for nearly 4 decades. Data compression is still a developing field. It has come
to be so popular that researches are going on in this field to develop the compression percentage and
speed. Of course the objective of data compression branch is to develop improved and enhanced
compression techniques.
REFERENCES
https://www.ics.uci.edu/~dan/pubs/DataCompression.pdf
Introduction to Data Compression By Khalid sayood
https://en.wikipedia.org/wiki/Data compression/