Huffman Encoding

Huffman Coding is a lossless data compression technique that assigns variable-length codes to characters based on their frequency, ensuring that no code is a prefix of another to avoid ambiguity in decoding. It is widely used in various compression formats such as GZIP and JPEG, and is favored over newer coding schemes due to patent issues. The technique effectively reduces the size of data while preserving all details.

Uploaded by

Snehasis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views16 pages

Huffman Encoding

Uploaded by

Snehasis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Huffman Coding

• Huffman Coding is a technique of compressing data to reduce its size

without losing any of the details. It was first developed by David
Huffman. It is a lossless variable code encoding.

• Huffman Coding is generally useful to compress the data in which

there are frequently occurring characters. The codes generated are
variable. The more frequently occurring characters are given smaller
codes and the less frequently occurring characters are given larger
codes. The most frequent character gets smallest code.
• The variable-length codes assigned to input characters are Prefix
Codes, means the codes (bit sequences) are assigned in such a way
that the code assigned to one character is not the prefix of code
assigned to any other character. This is how Huffman Coding makes
sure that there is no ambiguity when decoding the generated
bitstream.
• Let us understand prefix codes with a counter example. Let there be
four characters a, b, c and d, and their corresponding variable length
codes be 00, 01, 0 and 1. This coding leads to ambiguity because code
assigned to c is the prefix of codes assigned to a and b. If the
compressed bit stream is 0001, the de-compressed output may be
“cccd” or “ccb” or “acd” or “ab”.
Each ASCII character like A,B, etc takes 8 bits for coding.
• Huffman Coding prevents any ambiguity in the decoding process
using the concept of prefix code ie. a code associated with a character
should not be present in the prefix of any other code. The tree
created above helps in maintaining the property.

• Huffman coding is done with the help of the following steps.

Code A : 11, B:100, C:0, D: 101
In original string the total frequency of all characters=15. Each character takes 8 bits as they are encoded
in ASCII. So, original length of bits = 15*8=120. Using Huffman coding, the size of transmitted bits is less.
Note here that the most frequent character C has the smallest code 0.
Huffman encoding is used in:
• Huffman is widely used in all the mainstream compression formats
that you might encounter - from GZIP, PKZIP (winzip etc) and BZIP2, to
image formats such as JPEG and PNG