KEMBAR78
5.2 Huffman Algorithm | PDF | Code | Computer Science
0% found this document useful (0 votes)
8 views12 pages

5.2 Huffman Algorithm

Uploaded by

bsse1642
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views12 pages

5.2 Huffman Algorithm

Uploaded by

bsse1642
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Huffman Codes

• Widely used technique for data compression

• Assume the data to be a sequence of characters

• Looking for an effective way of storing the data

• Binary character code

– Uniquely represents a character by a binary string

1
Fixed-Length Codes

E.g.: Data file containing 100,000 characters


a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• 3 bits needed

• a = 000, b = 001, c = 010, d = 011, e = 100, f = 101

• Requires: 100,000  3 = 300,000 bits

2
Huffman Codes

• Idea:
– Use the frequencies of occurrence of characters to
build a optimal way of representing each character

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

3
Variable-Length Codes
E.g.: Data file containing 100,000 characters

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• Assign short codewords to frequent characters and


long codewords to infrequent characters
• a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100
• (45  1 + 13  3 + 12  3 + 16  3 + 9  4 + 5  4)
1,000
= 224,000 bits
4
Prefix Codes

• Prefix codes:
– Codes for which no codeword is also a prefix of some
other codeword
– Better name would be “prefix-free codes”

• We can achieve optimal data compression using


prefix codes
– We will restrict our attention to prefix codes

5
Encoding with Binary Character Codes

• Encoding
– Concatenate the codewords representing each
character in the file

• E.g.:
– a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100

– abc = 0  101  100 = 0101100

6
Decoding with Binary Character Codes
• Prefix codes simplify decoding
– No codeword is a prefix of another  the codeword
that begins an encoded file is unambiguous
• Approach
– Identify the initial codeword
– Translate it back to the original character
– Repeat the process on the remainder of the file
• E.g.:
– a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100
– 001011101 = 0 0  101 1101 = aabe

7
Prefix Code Representation
• Binary tree whose leaves are the given characters
• Binary codeword
– the path from the root to the character, where 0 means “go to the
left child” and 1 means “go to the right child”
• Length of the codeword
– Length of the path from root to the character leaf (depth of node)
100 100
0 1 0 1
86 14 a: 45 55
0 1 0 0 1
25 30
58 28 14
0 1 0 1
0 1 0 1 0 1
c: 12 b: 13 14 d: 16
a: 45 b: 13 c: 12 d: 16 e: 9 f: 5
0 1
f: 5 e: 9 8
Optimal Codes
• An optimal code is always represented by a full
binary tree
– Every non-leaf has two children
– Fixed-length code is not optimal, variable-length is
• How many bits are required to encode a file?
– Let C be the alphabet of characters
– Let f(c) be the frequency of character c
– Let dT(c) be the depth of c’s leaf in the tree T
corresponding to a prefix code
B (T )  f ( c )d T ( c ) the cost of tree T
cC

9
Constructing a Huffman Code
• A greedy algorithm that constructs an optimal prefix code
called a Huffman code
• Assume that:
– C is a set of n characters
– Each character has a frequency f(c)
– The tree T is built in a bottom up manner
• Idea: f: 5 e: 9 c: 12 b: 13 d: 16 a: 45
– Start with a set of |C| leaves
– At each step, merge the two least frequent objects: the frequency of
the new node = sum of two frequencies
– Use a min-priority queue Q, keyed on f to identify the two least
frequent objects
10
Example
f: 5 e: 9 c: 12 b: 13 d: 16 a: 45 c: 12 b: 13 14 d: 16 a: 45
0 1
f: 5 e: 9

14 d: 16 25 a: 45 25 30 a: 45
0 1 0 1 0 1 0 1
f: 5 e: 9 c: 12 b: 13 c: 12 b: 13 14 d: 16
0 1
f: 5 e: 9

a: 45 55 0 100 1
0 1
a: 45 55
25 30 0 1
0 1 0 1
25 30
c: 12 b: 13 14 d: 16 0 1 0 1
0 1 c: 12 b: 13 d: 16
14
f: 5 e: 9 0 1
f: 5 e: 9 11
Building a Huffman Code
Alg.: HUFFMAN(C) Running time: O(nlgn)
1. n  C 
2. Q  C O(n)
3. for i  1 to n – 1
4. do allocate a new node z
5. left[z]  x  EXTRACT-MIN(Q)
O(nlgn)
6. right[z]  y  EXTRACT-MIN(Q)
7. f[z]  f[x] + f[y]
8. INSERT (Q, z)
9. return EXTRACT-MIN(Q)
12

You might also like