Text Compression Using Huffman Coding
Text Compression Using Huffman Coding
INTRODUCTION TO TEXT COMPRESSION USING HUFFMAN CODING
Huffman Coding is a widely used algorithm in lossless data compression. Named after David A. Huffman,
who developed it in 1952, this algorithm provides an optimal way of encoding information based on its
frequency of occurrence. The core idea is to assign shorter codes to more frequent symbols and longer
codes to less frequent symbols.
OBJECTIVES
- To understand the need for data compression.
- To learn how Huffman Coding works.
- To implement Huffman Coding step-by-step.
- To visualize the process using Huffman Trees.
- To analyze the advantages, disadvantages, and real-world applications.
NEED FOR COMPRESSION
In digital systems, storage and transmission of data is a major concern. Reducing file size without losing
information is essential in many applications such as web development, software engineering, and
networking. Huffman coding is one of the simplest and most efficient compression techniques.
STEPS IN HUFFMAN CODING ALGORITHM
Page 1
Text Compression Using Huffman Coding
1. Count the frequency of each character in the input text.
2. Create a priority queue (min-heap) and insert all characters with their frequency.
3. While there is more than one node in the queue:
a. Extract the two nodes with the lowest frequency.
b. Create a new internal node with these two nodes as children and with frequency equal to the sum of their
frequencies.
c. Insert the new node back into the queue.
4. The remaining node is the root of the Huffman Tree.
5. Assign binary codes to each character by traversing the tree.
EXAMPLE
Input text: "BCAADDDCCACACAC"
Step 1: Frequency Count
A: 5, B: 1, C: 6, D: 3
Step 2: Create Priority Queue
[ B(1), D(3), A(5), C(6) ]
Step 3-4: Build Tree
1. Combine B(1) and D(3) -> Node1(4)
2. Combine Node1(4) and A(5) -> Node2(9)
Page 2
Text Compression Using Huffman Coding
3. Combine Node2(9) and C(6) -> Root(15)
Huffman Tree:
(15)
/ \
(6) (9)
C / \
(4) A(5)
/ \
B(1) D(3)
Step 5: Assign Codes
A: 11, B: 100, C: 0, D: 101
Encoded Text: 1000110111010010110
ADVANTAGES
- Provides optimal prefix codes.
- Easy to implement and understand.
- Works well with data having skewed frequency distribution.
DISADVANTAGES
- Requires frequency table and tree structure for decoding.
Page 3
Text Compression Using Huffman Coding
- Less efficient if all characters have similar frequencies.
APPLICATIONS
- File compression formats like ZIP and GZIP.
- Image compression (e.g., JPEG baseline).
- Multimedia codecs.
- Network protocols for data transfer efficiency.
CONCLUSION
Huffman Coding is a foundational algorithm in data compression. Its simplicity and efficiency make it ideal for
many applications, especially where lossless compression is critical. Understanding its steps and structure
using discrete mathematical concepts helps in implementing efficient systems.
Page 4