KEMBAR78

Nptel Week6 Module5 Greedy Huffman Code | PDF | Code | Encodings

Open navigation menu

Scribd

0% found this document useful (0 votes)

106 views36 pages

Nptel Week6 Module5 Greedy Huffman Code

The document discusses Huffman coding, an optimal prefix coding technique for lossless data compression. It begins by introducing the concept of variable-length encoding and prefix codes. It then describes how to construct an optimal prefix code by viewing the encoding as a binary tree and proving properties about the structure of the optimal tree. The document concludes by outlining Huffman's algorithm, which uses these properties to recursively combine symbols into a full binary tree that represents the optimal prefix code.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views36 pages

Nptel Week6 Module5 Greedy Huffman Code

The document discusses Huffman coding, an optimal prefix coding technique for lossless data compression. It begins by introducing the concept of variable-length encoding and prefix codes. It then describes how to construct an optimal prefix code by viewing the encoding as a binary tree and proving properties about the structure of the optimal tree. The document concludes by outlining Huffman's algorithm, which uses these properties to recursively combine symbols into a full binary tree that represents the optimal prefix code.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

NPTEL MOOC,JAN-FEB 2015

Week 6, Module 5

DESIGN AND ANALYSIS 

OF ALGORITHMS
Greedy algorithms: Huffman codes

MADHAVAN MUKUND, CHENNAI MATHEMATICAL INSTITUTE

http://www.cmi.ac.in/~madhavan
Communication and
compression
Messages in English/Hindi/Tamil/… are
transmitted between computers in binary

Encode letters {a,b,…,z} as strings over {0,1}

26 letters, 25 = 32, use strings of length 5?

Can we optimize the amount of data to transfer?

Use shorter strings for more frequent letters?

Variable length encoding
Morse code

Encode letters using dots (0) and dashes (1)

Encoding of e is 0, t is 1, a is 01

Decode 0101 — etet, aa, eta, aet?

Use pauses between letters to distinguish

Like an extra symbol in encoding

Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
c
Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
c e
Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
c e c
Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
c e c a
Variable length encoding

Prefix code

Encoding E( ), E(x) is not a prefix of E(y) for any x,y

In Morse code E(e) = 0 is a prefix of E(a) = 01

x a b c d e
Example: {a,b,c,d,e}

E(x) 11 01 001 10 000

Decode 0010000011101
c e c a b
Optimal prefix codes

Measure frequency f(x) of each letter x

Fraction of occurrences of x over large body of

text

A = {x1,x2,…,xn}, f(x1) + f(x2) + … + f(xn) = 1

f(x) is the “probability” that next letter is x

Optimal prefix codes …
Message M consists of n symbols

For each letter x, n ∙ f(x) occurrences of x in M

Each x is encoded by E(x) with length |E(x)|

Total length of encoded message:

Sum over all x, n ∙ f(x) ∙ |E(x)|

Average number of bits per letter

Sum over all x, f(x) ∙ |E(x)|

Optimal prefix codes ..
Suppose we have  x a b c d e
these frequencies  E(x) 11 01 001 10 000
for our example
f(x) 0.32 0.25 0.20 0.18 0.05

Average number of bits per letter is

0.32 ∙ 2 + 0.25 ∙ 2 + 0.20 ∙ 3 + 0.18 ∙ 2 + 0.05 ∙ 3

2.25

Fixed length encoding uses 3 bits per letter

25% saving using variable length code

Optimal prefix codes ..
A better encoding  x a b c d e
  E(x) 11 10 01 001 000
f(x) 0.32 0.25 0.20 0.18 0.05

Average number of bits per letter is

0.32 ∙ 2 + 0.25 ∙ 2 + 0.20 ∙ 2 + 0.18 ∙ 3 + 0.05 ∙ 3

2.23

Given a set of letters A with frequencies, produce a prefix

code that is as eﬃcient as possible

Minimize ABL(A) — Average Bits per Letter

Codes as trees
Encoding can be viewed x a b c d e
as a binary tree
E(x) 11 01 001 10 000

Path to a node is a binary

string—left is 0, right is 1

Label each node by the

letter it encodes

b d a
Prefix code: only leaves
encode letters e c
Codes as trees
Encoding can be viewed x a b c d e
as a binary tree
E(x) 11 10 01 001 000

Path to a node is a binary

string—left is 0, right is 1

Label each node by the

letter it encodes

c b a
Prefix code: only leaves
encode letters e d
Codes as trees …
Full tree: Every node has 0 or 2 children

Claim 1: Any optimal prefix code generates a full tree

If any node has only one child, we can promote its

child and create a shorter tree

 
 
 
Codes as trees …
Claim 2: In an optimal tree, if a leaf labelled x is at a
smaller depth than a leaf labelled y, then f(y) ≤ f(x)

If f(y) > f(x), exchange labels to get a better tree 

 
 
 
 
 
Codes as trees …

Claim 3: In an optimal tree, if a leaf at maximum

depth is labelled x then its sibling is also a leaf.

If not, the sibling of this leaf has children

There is a leaf at a lower depth

But depth of the leaf labelled x was at maximum

depth
A recursive solution
From Claim 3, leaves at maximum depth occur in
pairs

From Claim 2, these must have lowest frequencies

Pick letters x and y such that f(x) and f(y) are

lowest

We will assign these to a pair of leaves at

maximum depth (left/right does not matter)
A recursive solution …
“Combine” x and y into a new letter “xy” with f(xy) =
f(x) + f(y)

New alphabet A’ is original A - {x,y} + {xy}

Recursively find an optimal encoding of A’

Base case, |A’| = 2, assign the two letters codes 0, 1

Replace the leaf labelled “xy” by a node with two

children labelled x and y

Huﬀman’s algorithm — Huﬀman coding

Huffman’s algorithm
x a b c d e
f(x) 0.32 0.25 0.20 0.18 0.05
Huffman’s algorithm
x a b c d e Combine 
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de
f(x) 0.32 0.25 0.20 0.23
Huffman’s algorithm
x a b c d Combine 
e
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de Combine 
f(x) 0.32 0.25 0.20 0.23 c, de as “cde”
x a b cde
f(x) 0.32 0.25 0.43
Huffman’s algorithm
x a b c Combine 
d e
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de Combine 
f(x) 0.32 0.25 0.20 0.23 c, de as “cde”
x a b cde Combine 
f(x) 0.32 0.25 0.43 a, b as “ab”
x ab cde
f(x) 0.57 0.43
Huffman’s algorithm
x a b c Combine 
d e
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de Combine 
f(x) 0.32 0.25 0.20 0.23 c, de as “cde”
x a b cde Combine 
f(x) 0.32 0.25 0.43 a, b as “ab”
x ab cde Two letters, 
f(x) 0.57 0.43 base case ab cde
Huffman’s algorithm
x a b c Combine 
d e
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de Combine 
f(x) 0.32 0.25 0.20 0.23 c, de as “cde”
x a b cde Split “ab” 
f(x) 0.32 0.25 0.43 as a, b cde

x ab cde Two letters,  a b

f(x) 0.57 0.43 base case
Huffman’s algorithm
x a b c Combine 
d e
f(x) 0.32 0.25 0.20 0.18 0.05 d, e as “de”

x a b c de Split “cde” 
f(x) 0.32 0.25 0.20 0.23 as c, de
a b c de
x a b cde Split “ab” 
f(x) 0.32 0.25 0.43 as a, b
x ab cde Two letters, 
f(x) 0.57 0.43 base case
Huffman’s algorithm
x a b c Split “de” 
d e
f(x) 0.32 0.25 0.20 0.18 0.05 as d, e

x a b c de a b c
Split “cde” 
f(x) 0.32 0.25 0.20 0.23 as c, de
d e
x a b cde Split “ab” 
f(x) 0.32 0.25 0.43 as a, b
x ab cde Two letters, 
f(x) 0.57 0.43 base case
Optimality

By induction on the size of the alphabet A

For |A| = 2, base case, clearly the code that uses

{0,1} for the two letters is optimal

Assuming our algorithm is optimal for |A| = k-1, we

have to show it is also optimal for |A| = k
Optimality

Combine lowest frequency x, y into xy

Construct a tree T’ for this alphabet

ABL(T’) optimal by induction

Expand xy into x,y to get T from T’

Claim: ABL(T) - ABL(T’) = f(xy)

Optimality
Claim: ABL(T) - ABL(T’) = f(xy)

From T’ to T, only xy, x, y change contribution to

ABL

Subtract depth(xy)f(xy), add (1+depth(xy))(f(x) + f(y))

f(xy) = f(x)+f(y), so 

depth(xy)f(xy) = depth(xy)(f(x) + f(y))

Hence ABL(T) is bigger than ABL(T’) by 

f(x)+f(y) = f(xy)
Optimality
Suppose there is another tree S with ABL(S) < ABL(T)

Can shuﬄe labels of max depth leaves in S, so that

lowest frequency pair x and y label siblings

Merge, x and y into xy and contract S to S’

S’ is over same alphabet as T’, T’ is optimal by

induction, so ABL(T’) ≤ ABL(S’)

ABL(S) - ABL(S’) = ABL(T) - ABL(T’) = f(xy), 

so ABL(T) ≤ ABL(S) as well, contradiction!
Implementation, complexity
At each recursive step, extract letters with
minimum frequency and replace by composite
letter with combined frequency

Store frequencies in an array

Linear scan to find minimum values

|A| = k, number of recursive calls is k - 1

Complexity is O(k2)
Implementation, complexity
At each recursive step, extract letters with
minimum frequency and replace by composite
letter with combined frequency

Instead, maintain frequencies in a heap

O(log k) to find minimum values and insert new

combined letter

Complexity drops to O(k log k)

Why is Huffman coding
greedy?

We recursively combine letters with two lowest

frequencies

This is a locally optimal choice

We never go back and consider other ways of

pairing up letters
Historical note
Shannon and Fano tried a divide and conquer
approach

Partition A as A1, A2

Sum of frequencies in A1, A2 roughly equal

Solve each partition recursively

Shannon-Fano codes are not optimal

Huﬀman heard about this problem in a class by Fano

and later found an optimal solution

You might also like

Notes 7 2013 - Arithmetic Coding
No ratings yet
Notes 7 2013 - Arithmetic Coding
34 pages
Data Compression - Unit 3
No ratings yet
Data Compression - Unit 3
18 pages
Week 03-Informtion Sources and Source Coding
No ratings yet
Week 03-Informtion Sources and Source Coding
25 pages
Data Structures & Algorithms Quiz
No ratings yet
Data Structures & Algorithms Quiz
9 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Algorithms For Data Science: CSOR W4246
No ratings yet
Algorithms For Data Science: CSOR W4246
58 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
1st Term Mathematics Note - SS 1
No ratings yet
1st Term Mathematics Note - SS 1
20 pages
Module 7
No ratings yet
Module 7
20 pages
Huffman Coding: Greedy Algorithm Guide
No ratings yet
Huffman Coding: Greedy Algorithm Guide
27 pages
Variables Tarea
No ratings yet
Variables Tarea
10 pages
Huffman
No ratings yet
Huffman
22 pages
Multimedia Data Compression Manual
No ratings yet
Multimedia Data Compression Manual
10 pages
Plexity Algorithms
No ratings yet
Plexity Algorithms
28 pages
Problem E: Huffman Codes
No ratings yet
Problem E: Huffman Codes
2 pages
Chapter Five Data Compression
No ratings yet
Chapter Five Data Compression
28 pages
Introduction To Digital Communications and Information Theory
No ratings yet
Introduction To Digital Communications and Information Theory
8 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
No ratings yet
ICS 220 - Data Structures and Algorithms: Dr. Ken Cosh
22 pages
Module IV
No ratings yet
Module IV
37 pages
Lecture# 08 Greedy Algorithms
No ratings yet
Lecture# 08 Greedy Algorithms
63 pages
Algebra Study Guide
No ratings yet
Algebra Study Guide
55 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
MMDC&S Lab Manual
No ratings yet
MMDC&S Lab Manual
19 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Huffman Coding: Algorithm & Example
No ratings yet
Huffman Coding: Algorithm & Example
22 pages
Huffman Code
No ratings yet
Huffman Code
7 pages
Shadabali Assigment
No ratings yet
Shadabali Assigment
29 pages
04huffman 2x2
No ratings yet
04huffman 2x2
6 pages
5.2 Huffman Algorithm
No ratings yet
5.2 Huffman Algorithm
12 pages
CA Module 1
No ratings yet
CA Module 1
64 pages
Chapter 6-Greedy Algorithms
No ratings yet
Chapter 6-Greedy Algorithms
32 pages
Huffman Coding
No ratings yet
Huffman Coding
7 pages
Algebra Study Guide
No ratings yet
Algebra Study Guide
54 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
EE4740 Lecture4 Slides
No ratings yet
EE4740 Lecture4 Slides
43 pages
8 Hufman
No ratings yet
8 Hufman
10 pages
Ccs353 Mdcs Lab Manual
No ratings yet
Ccs353 Mdcs Lab Manual
30 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Communication Theory II - Lecture 7
No ratings yet
Communication Theory II - Lecture 7
34 pages
Elementary Division
No ratings yet
Elementary Division
17 pages
Distortionless Source Coding Guide
No ratings yet
Distortionless Source Coding Guide
80 pages
W11 Greedy Algorithms Lecture 21 06052024 115021am
No ratings yet
W11 Greedy Algorithms Lecture 21 06052024 115021am
6 pages
DC Sessional I
No ratings yet
DC Sessional I
1 page
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Pascal Output Answer
100% (1)
Pascal Output Answer
13 pages
1314 Sample Lab Module 3
No ratings yet
1314 Sample Lab Module 3
4 pages
Huffman Coding: Data Compression Guide
No ratings yet
Huffman Coding: Data Compression Guide
15 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Calculus and Vectors First Edition Chris Kirkpatrick PDF Available
No ratings yet
Calculus and Vectors First Edition Chris Kirkpatrick PDF Available
139 pages
SE-Comps SEM3 DS-CBCGS DEC18 SOLUTION
No ratings yet
SE-Comps SEM3 DS-CBCGS DEC18 SOLUTION
36 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
6 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
6 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
5 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
5 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
6 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
7 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
9 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
5 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
7 pages
CH 12
No ratings yet
CH 12
42 pages
MC
No ratings yet
MC
41 pages
Intext Questions - 1: NCERT Solution For Class 9 Science Chapter 8 - Motion
No ratings yet
Intext Questions - 1: NCERT Solution For Class 9 Science Chapter 8 - Motion
14 pages
Sample Internship Resume
No ratings yet
Sample Internship Resume
2 pages
Unit 1 Ipr
No ratings yet
Unit 1 Ipr
37 pages
Record Index
No ratings yet
Record Index
1 page
Pointers Programs
No ratings yet
Pointers Programs
8 pages
9-10. Decrease and Conquer Analysis of Insertion, Bubble, Selection Sort
No ratings yet
9-10. Decrease and Conquer Analysis of Insertion, Bubble, Selection Sort
44 pages
EE242 Midterm
No ratings yet
EE242 Midterm
6 pages
CS229 Lecture Notes: The K-Means Clustering Algorithm
No ratings yet
CS229 Lecture Notes: The K-Means Clustering Algorithm
3 pages
2 Marks Dpco
No ratings yet
2 Marks Dpco
21 pages
B.Tech 4th Yr 7 Sem CSE Fuzzy Set Theory Objective
No ratings yet
B.Tech 4th Yr 7 Sem CSE Fuzzy Set Theory Objective
29 pages
First-Order Logic - w9
No ratings yet
First-Order Logic - w9
35 pages
Highlights9 3
No ratings yet
Highlights9 3
10 pages
Boolean Logic & Circuit Design Notes
No ratings yet
Boolean Logic & Circuit Design Notes
7 pages
VLSI Interview Questions
No ratings yet
VLSI Interview Questions
43 pages
Fuzzy Logic for Engineers
No ratings yet
Fuzzy Logic for Engineers
28 pages
Algorithms Lecture
No ratings yet
Algorithms Lecture
5 pages
Cellular Automata, Many-Valued Logic, and Deep Neural Networks
No ratings yet
Cellular Automata, Many-Valued Logic, and Deep Neural Networks
42 pages
Programming, Data Structures and Algorithms Using Python: Course Outline
No ratings yet
Programming, Data Structures and Algorithms Using Python: Course Outline
1 page
Module 4 Language of Functions and Relations
No ratings yet
Module 4 Language of Functions and Relations
3 pages
Data Structures For Disjoint Sets - 1.PDF Unit 4
No ratings yet
Data Structures For Disjoint Sets - 1.PDF Unit 4
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
2 pages
Linear Programming Part 3
No ratings yet
Linear Programming Part 3
20 pages
Continuity
No ratings yet
Continuity
5 pages
Polynomial 2 Class 10
No ratings yet
Polynomial 2 Class 10
6 pages
Ib Math Studies Unit 3 Review Notes
No ratings yet
Ib Math Studies Unit 3 Review Notes
8 pages
Advanced AI Search Techniques
No ratings yet
Advanced AI Search Techniques
135 pages
LOGIC
No ratings yet
LOGIC
35 pages
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
No ratings yet
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
24 pages
Vatsa - Vatsa (2009) - Discrete Mathematics, 4th Edition PDF
83% (6)
Vatsa - Vatsa (2009) - Discrete Mathematics, 4th Edition PDF
315 pages
Lab 04. Relational, Logical and Conditional Operators in C++
No ratings yet
Lab 04. Relational, Logical and Conditional Operators in C++
8 pages
FL&T Unit 3 - 1 - 1724732026415
No ratings yet
FL&T Unit 3 - 1 - 1724732026415
17 pages
Algorithm Analysis Tutorial
No ratings yet
Algorithm Analysis Tutorial
3 pages
Depth - First - Traversal Exam Questions 2
No ratings yet
Depth - First - Traversal Exam Questions 2
4 pages
01 Functions, Function Notation, Domain & Range
No ratings yet
01 Functions, Function Notation, Domain & Range
27 pages