Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views6 pages

04huffman 2x2

The document discusses data compression techniques, focusing on Huffman coding and prefix codes. It explains how to efficiently encode symbols based on their frequency of occurrence, using binary trees to represent the codes. The document also details the greedy algorithm used in Huffman encoding to achieve the minimum average bits per letter.

Uploaded by

M Gh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

04huffman 2x2

The document discusses data compression techniques, focusing on Huffman coding and prefix codes. It explains how to efficiently encode symbols based on their frequency of occurrence, using binary trees to represent the codes. The document also details the greedy algorithm used in Huffman encoding to achieve the minimum average bits per letter.

Uploaded by

M Gh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Compression

4.8 Huffman Codes Q. Given a text that uses 32 symbols (26 different letters, space, and
some punctuation characters), how can we encode this text in bits?

Q. Some symbols (e, t, a, o, i, n) are used far more often than others.
How can we use this to reduce our encoding?

These lecture slides are supplied by Mathijs de Weerd


Q. How do we know when the next symbol begins?

Ex. c(a) = 01 What is 0101?


c(b) = 010
c(e) = 1

Data Compression Prefix Codes

Q. Given a text that uses 32 symbols (26 different letters, space, and Definition. A prefix code for a set S is a function c that maps each x∈
some punctuation characters), how can we encode this text in bits? S to 1s and 0s in such a way that for x,y∈S, x≠ y, c(x) is not a prefix of
A. We can encode 25 different symbols using a fixed length of 5 bits per c(y).
symbol. This is called fixed length encoding.
Ex. c(a) = 11
Q. Some symbols (e, t, a, o, i, n) are used far more often than others. c(e) = 01
How can we use this to reduce our encoding? c(k) = 001
A. Encode these characters with fewer bits, and the others with more bits. c(l) = 10
c(u) = 000
Q. How do we know when the next symbol begins? Q. What is the meaning of 1001000001 ?
A. Use a separation symbol (like the pause in Morse), or make sure that
there is no ambiguity by ensuring that no code is a prefix of another one.
Suppose frequencies are known in a text of 1G:
Ex. c(a) = 01 What is 0101? fa=0.4, fe=0.2, fk=0.2, fl=0.1, fu=0.1
c(b) = 010 Q. What is the size of the encoded text?
c(e) = 1

3 4
Prefix Codes Optimal Prefix Codes

Definition. A prefix code for a set S is a function c that maps each x∈ Definition. The average bits per letter of a prefix code c is the sum
S to 1s and 0s in such a way that for x,y∈S, x≠ y, c(x) is not a prefix of over all symbols of its frequency times the number of bits of its
c(y). encoding:
ABL(c) = ! f x # c( x)
x"S
Ex. c(a) = 11
c(e) = 01
c(k) = 001
c(l) = 10 We would like to find a prefix code that is has the lowest possible
c(u) = 000 average bits per letter.
Q. What is the meaning of 1001000001 ?
A. “leuk”
Suppose we model a code in a binary tree…
Suppose frequencies are known in a text of 1G:
fa=0.4, fe=0.2, fk=0.2, fl=0.1, fu=0.1
Q. What is the size of the encoded text?
A. 2*fa + 2*fe + 3*fk + 2*fl + 4*fu = 2.4G

5 6

Representing Prefix Codes using Binary Trees Representing Prefix Codes using Binary Trees

Ex. c(a) = 11 Ex. c(a) = 11


c(e) = 01 0 1
c(e) = 01 0 1
c(k) = 001 c(k) = 001
c(l) = 10 c(l) = 10
c(u) = 000 0 1
c(u) = 000 0 1
0 1 0 1

l a l a
e e

0 1 0 1

u k u k

Q. How does the tree of a prefix code look? Q. How does the tree of a prefix code look?
A. Only the leaves have a label.
Pf. An encoding of x is a prefix of an encoding of y if and only if the
path of x is a prefix of the path of y.

7 8
Representing Prefix Codes using Binary Trees Representing Prefix Codes using Binary Trees

Q. What is the meaning of Q. What is the meaning of


111010001111101000 ? 0 1
111010001111101000 ? 0 1
A. “simpel”

0 1 0 1 0 1 0 1
ABL(T ) = ! f x # depth T ( x) ABL(T ) = ! f x # depth T ( x)
x"S x"S
e i e i

0 1 1 0 1 1

l m l m
0 1 0 1

s p Q. How can this prefix code be made more efficient? s p

9 10

Representing Prefix Codes using Binary Trees Representing Prefix Codes using Binary Trees

Q. What is the meaning of Definition. A tree is full if every node that is not a leaf has two
111010001111101000 ? 0 1
children.
A. “simpel”
Claim. The binary tree corresponding to the optimal prefix code is full.
0 1
Pf.
0 1
ABL(T ) = ! f x # depth T ( x)
x"S w
e i

0 1 0 1

l m s u

0 1

p v
Q. How can this prefix code be made more efficient? s
A. Change encoding of p and s to a shorter one.
This tree is now full.

11 12
Representing Prefix Codes using Binary Trees Optimal Prefix Codes: False Start

Definition. A tree is full if every node that is not a leaf has two Q. Where in the tree of an optimal prefix code should letters be placed
children. with a high frequency?

Claim. The binary tree corresponding to the optimal prefix code is full.
Pf. (by contradiction)
Suppose T is binary tree of optimal prefix code and is not full.

This means there is a node u with only one child v.



w
Case 1: u is the root; delete u and use v as the root

 Case 2: u is not the root


u
– let w be the parent of u
– delete u and make v be a child of w in place of u

v
 In both cases the number of bits needed to encode any leaf in the
subtree of v is decreased. The rest of the tree is not affected.
 Clearly this new tree T’ has a smaller ABL than T. Contradiction.

13 14

Optimal Prefix Codes: False Start Optimal Prefix Codes: Huffman Encoding

Q. Where in the tree of an optimal prefix code should letters be placed Observation. Lowest frequency items should be at the lowest level in
with a high frequency? tree of optimal prefix code.
A. Near the top.
Observation. For n > 1, the lowest level always contains at least two
Greedy template. Create tree top-down, split S into two sets S1 and S2 leaves.
with (almost) equal frequencies. Recursively build tree for S1 and S2.
[Shannon-Fano, 1949] fa=0.32, fe=0.25, fk=0.20, fl=0.18, fu=0.05 Observation. The order in which items appear in a level does not
matter.

Claim. There is an optimal prefix code with tree T* where the two
lowest-frequency letters are assigned to leaves that are siblings in T*.

l a e a
e k
0.25 0.18 0.32 0.20 0.25 0.32 Greedy template. [Huffman, 1952] Create tree bottom-up.
Make two leaves for two lowest-frequency letters y and z.
u k u l Recursively build tree for the rest using a meta-letter for yz.
0.05 0.20 0.05 0.18
15 16
Optimal Prefix Codes: Huffman Encoding Optimal Prefix Codes: Huffman Encoding

Huffman(S) { Huffman(S) {
if |S|=2 { if |S|=2 {
return tree with root and 2 leaves return tree with root and 2 leaves
} else { } else {
let y and z be lowest-frequency letters in S let y and z be lowest-frequency letters in S
S’ = S S’ = S
remove y and z from S’ remove y and z from S’
insert new letter ω in S’ with fω=fy+fz insert new letter ω in S’ with fω=fy+fz
T’ = Huffman(S’) T’ = Huffman(S’)
T = add two children y and z to leaf ω from T’ T = add two children y and z to leaf ω from T’
return T return T
} }
} }

Q. What is the time complexity? Q. What is the time complexity?


A. T(n) = T(n-1) + O(n)
so O(n2)
Q. How to implement finding lowest-frequency letters efficiently?
A. Use priority queue for S: T(n) = T(n-1) + O(log n) so O(n log n)

17 18

Huffman Encoding: Greedy Analysis Huffman Encoding: Greedy Analysis

Claim. Huffman code for S achieves the minimum ABL of any prefix Claim. Huffman code for S achieves the minimum ABL of any prefix
code. code.
Pf. by induction, based on optimality of T’ (y and z removed, ω added) Pf. by induction, based on optimality of T’ (y and z removed, ω added)
(see next page) (see next page)

Claim. ABL(T’)=ABL(T)-fω Claim. ABL(T’)=ABL(T)-fω


Pf. Pf.

ABL(T ) = $ f x " depthT (x)


x#S
= f y " depthT (y) + f z " depthT (z) + $ f " depthT (x)
x
x#S,x% y,z
= ( f y + f z ) " (1+ depthT (& )) + $ f " depthT (x)
x
x#S,x% y,z
= f& " (1+ depthT (& )) + $ f " depthT (x)
x
x#S,x% y,z
= f& + $ f x " depthT ' (x)
x#S'
= f& + ABL(T' )

19 20
!
Huffman Encoding: Greedy Analysis Huffman Encoding: Greedy Analysis

Claim. Huffman code for S achieves the minimum ABL of any prefix Claim. Huffman code for S achieves the minimum ABL of any prefix
code. code.
Pf. (by induction over n=|S|) Pf. (by induction over n=|S|)
Base: For n=2 there is no shorter code than root and two leaves.
Hypothesis: Suppose Huffman tree T’ for S’ of size n-1 with ω instead
of y and z is optimal.
Step: (by contradiction)

21 22

Huffman Encoding: Greedy Analysis Huffman Encoding: Greedy Analysis

Claim. Huffman code for S achieves the minimum ABL of any prefix Claim. Huffman code for S achieves the minimum ABL of any prefix
code. code.
Pf. (by induction) Pf. (by induction)
Base: For n=2 there is no shorter code than root and two leaves. Base: For n=2 there is no shorter code than root and two leaves.
Hypothesis: Suppose Huffman tree T’ for S’ of size n-1 with ω instead Hypothesis: Suppose Huffman tree T’ for S’ with ω instead of y and z
of y and z is optimal. (IH) is optimal. (IH)
Step: (by contradiction) Step: (by contradiction)
 Idea of proof:  Suppose Huffman tree T for S is not optimal.
– Suppose other tree Z of size n is better.  So there is some tree Z such that ABL(Z) < ABL(T).
– Delete lowest frequency items y and z from Z creating Z’  Then there is also a tree Z for which leaves y and z exist that are
– Z’ cannot be better than T’ by IH. siblings and have the lowest frequency (see observation).
 Let Z’ be Z with y and z deleted, and their former parent labeled ω.
 Similar T’ is derived from S’ in our algorithm.
 We know that ABL(Z’)=ABL(Z)-fω, as well as ABL(T’)=ABL(T)-fω.
 But also ABL(Z) < ABL(T), so ABL(Z’) < ABL(T’).
 Contradiction with IH.

23 24

You might also like