Dr.
AMBEDKAR INSTITUTE OF TECHNOLOGY
(An Autonomous Institute, Affiliated to Visvesvaraya Technological University, Belagavi,
Accredited by NAAC, with ‘A’ Grade)
Near Jnana Bharathi Campus, Bengaluru – 560056
Department of computer science and engineering
REPORT ON
Text Compression Using Prefix Codes
Submitted in partial fulfillment of the award of the Degree of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
SUBMITTED
by
NAME USN
Venkatesh Naik S 1DA24CS416
Sandesh M 1DA23CS151
Tejas L 1DA23CS180
Rohit R P 1DA23CS148
Submitted to :
Vinutha M S
1
Text Compression Using Prefix Codes
Unit Reference: Unit 4 - Introduction to Graph Theory
1. Problem Statement:
In modern communication systems, especially SMS and IoT-based platforms, reducing data
size without losing information is crucial due to limited bandwidth. The goal is to compress
textual data efficiently using prefix codes.
2. Real-Time Application:
Prefix codes (like Huffman coding) are used in:
• SMS compression to save transmission costs
• Chat applications to reduce message payloads
• Embedded systems/IoT devices with minimal memory
3. Theoretical Background:
Prefix codes are binary codes where no code is a prefix of another. Huffman coding is a
greedy algorithm that assigns variable-length codes to input characters based on their
frequencies. Frequently occurring characters are given shorter codes.
4. Solution Methodology:
• Count frequency of each character in the input text
• Build a binary Huffman Tree using a min-heap
• Assign binary codes by traversing the tree
• Encode the text using these codes
• Decode by traversing the tree based on binary input
2
5. Python Implementation:
import heapq
from collections import Counter
class Node:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(text):
freq = Counter(text)
heap = [Node(char, fr) for char, fr in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
n1 = heapq.heappop(heap)
n2 = heapq.heappop(heap)
merged = Node(None, n1.freq + n2.freq)
merged.left = n1
merged.right = n2
heapq.heappush(heap, merged)
return heap[0]
def generate_codes(node, prefix="", code_map={}):
if node is None:
return
3
if node.char:
code_map[node.char] = prefix
generate_codes(node.left, prefix + "0", code_map)
generate_codes(node.right, prefix + "1", code_map)
return code_map
def encode(text, code_map):
return ''.join(code_map[char] for char in text)
def decode(encoded_text, root):
decoded = ""
node = root
for bit in encoded_text:
node = node.left if bit == "0" else node.right
if node.char:
decoded += node.char
node = root
return decoded
text = "hello hello sms compression"
root = build_huffman_tree(text)
code_map = generate_codes(root)
encoded = encode(text, code_map)
decoded = decode(encoded, root)
print("Prefix Codes:", code_map)
print("Encoded Binary:", encoded)
print("Decoded Text:", decoded)
4
6. Output Sample:
Prefix Codes: {'h': '1011', 'e': '010', 'l': '00', 'o': '111', ' ': '10', 's': '011', 'm': '1101', 'c': '1100', 'p':
'1000', 'r': '1001', 'i': '1010', 'n': '1110'}
Encoded Binary: 1011010000...
Decoded Text: hello hello sms compression
7. Conclusion:
Using prefix codes like Huffman coding allows significant reduction in the size of textual
data, making it suitable for bandwidth-sensitive applications like SMS and IoT.
8. Future Scope:
• Integrate with real-time messaging apps
• Compare with arithmetic coding or LZW for performance
• Extend to multimedia compression
******************************Thank you********************************