Huffman Coding
• Huffman Coding is a technique of compressing data to reduce its size
without losing any of the details. It was first developed by David
Huffman. It is a lossless variable code encoding.
• Huffman Coding is generally useful to compress the data in which
there are frequently occurring characters. The codes generated are
variable. The more frequently occurring characters are given smaller
codes and the less frequently occurring characters are given larger
codes. The most frequent character gets smallest code.
• The variable-length codes assigned to input characters are Prefix
Codes, means the codes (bit sequences) are assigned in such a way
that the code assigned to one character is not the prefix of code
assigned to any other character. This is how Huffman Coding makes
sure that there is no ambiguity when decoding the generated
bitstream.
• Let us understand prefix codes with a counter example. Let there be
four characters a, b, c and d, and their corresponding variable length
codes be 00, 01, 0 and 1. This coding leads to ambiguity because code
assigned to c is the prefix of codes assigned to a and b. If the
compressed bit stream is 0001, the de-compressed output may be
“cccd” or “ccb” or “acd” or “ab”.
Each ASCII character like A,B, etc takes 8 bits for coding.
• Huffman Coding prevents any ambiguity in the decoding process
using the concept of prefix code ie. a code associated with a character
should not be present in the prefix of any other code. The tree
created above helps in maintaining the property.
• Huffman coding is done with the help of the following steps.
Code A : 11, B:100, C:0, D: 101
In original string the total frequency of all characters=15. Each character takes 8 bits as they are encoded
in ASCII. So, original length of bits = 15*8=120. Using Huffman coding, the size of transmitted bits is less.
Note here that the most frequent character C has the smallest code 0.
Huffman encoding is used in:
• Huffman is widely used in all the mainstream compression formats
that you might encounter - from GZIP, PKZIP (winzip etc) and BZIP2, to
image formats such as JPEG and PNG
• Huffman encoding still dominates the compression industry since
newer arithmetic and range coding schemes are avoided due to their
patent issues.
• https://www.youtube.com/watch?v=co4_ahEDCho
• https://www.programiz.com/dsa/huffman-coding
• ARIHMETIC ENCODING:
• https://neptune.ai/blog/lossless-data-compression-using-arithmetic-
encoding-in-python-and-its-applications-in-deep-learning