0% found this document useful (0 votes)

93 views30 pages

Chapter Three

This document discusses different data compression techniques including lossless and lossy compression, entropy coding, Huffman coding, adaptive coding, and dictionary-based coding (LZW). It provides details on each technique such as how Huffman coding builds a tree based on symbol frequencies to assign shorter codes to more common symbols. It also explains how adaptive coding dynamically updates codes as the probability distribution changes and how LZW replaces repeated strings with codes from an evolving dictionary to compress text.

Uploaded by

mekuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views30 pages

Chapter Three

Uploaded by

mekuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 3

Multimedia Data Compression

3.1 Lossless and Lossy compression
3.2 Entropy coding
3.3 Huffman coding
3.4 Adaptive coding
3.5 Dictionary-based coding (LZW)
3.1 Lossless and Lossy compression
• Compression: the process of coding that will effectively reduce the
total number of bits needed to represent certain information.

Fig 3.1 A general data compression scheme

• We call the output of the encoder codes or codewords.
• The intermediate medium could either be data storage or a
communication/computer network.
• If the compression and decompression processes induce no
information loss, the compression scheme is lossless; otherwise, it is
lossy.
B0
compression ratio =
B1
B0 – number of bits before compression
B1 – number of bits after compression

• In general, we would desire any codec (encoder/decoder scheme) to

have a compression ratio much larger than 1.0.
• The higher the compression ratio, the better the lossless compression
scheme, as long as it is computationally feasible.
3.2 Entropy coding
• The entropy η of an information source with alphabet S = {s1, s2, . . . , sn} is:
n
1
 = H ( S ) =  pi log 2
i =1 pi
n
= − pi log 2 pi
i =1

• pi – probability that symbol si will occur in S.

1
• log
– indicates the amount of information contained in si, which
2 pi
corresponds to the number of bits needed to encode si.
• The definition of entropy is aimed at identifying often-occurring
symbols in the datastream as good candidates for short codewords in
the compressed bitstream.
• We use a variable-length coding scheme for entropy coding—
frequently occurring symbols are given codes that are quickly
transmitted, while infrequently occurring ones are given longer codes.
• For example, E occurs frequently in English, so we should give it a
shorter code than Q, say.
• If we use to denote the average length (measured in bits) of the
codewords produced by the encoder, the Shannon Coding Theorem
states that the entropy is the best we can do (under certain
conditions):

• Coding schemes aim to get as close as possible to this theoretical

lower bound.
3.3 Huffman coding
• Huffman coding is an efficient method of compressing data without
losing information.
• Huffman coding provides an efficient, unambiguous code by analyzing
the frequencies that certain symbols appear in a message.
• Symbols that appear more often will be encoded as a shorter-bit
string while symbols that aren't used as much will be encoded as
longer strings.
• There are mainly two major parts in Huffman Coding
1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.
Algorithm
1. Initialization: put all symbols on the list sorted according to their
frequency counts.
2. Repeat until the list has only one symbol left.
a) From the list, pick two symbols with the lowest frequency counts. Form a
Huffman subtree that has these two symbols as child nodes and create a
parent node for them.
b) Assign the sum of the children’s frequency counts to the parent and insert
it into the list, such that the order is maintained.
c) Delete the children from the list.
3. Assign a codeword for each leaf based on the path from the root.
Properties of Huffman coding
1. Unique Prefix Property: No Huffman code is a prefix of any other
Huffman code - precludes any ambiguity in decoding.
2. Optimality: minimum redundancy code - proved optimal for a given
data model (i.e., a given, accurate, probability distribution):
a) The two least frequent symbols will have the same length for their Huffman
codes, differing only at the last bit.
b) Symbols that occur more frequently will have shorter Huffman codes than
symbols that occur less frequently.
c) The average code length for an information source S is strictly less than η +
1.
l   +1
Example:
• Suppose the string below is to be sent over a network.

• Each character occupies 8 bits. There are a total of 15 characters in

the above string. Thus, a total of 8*15 = 120 bits are required to send
this string.
• Using the Huffman Coding technique, we can compress the string to a
smaller size.
• Huffman coding first creates a tree using the frequencies of the
character and then generates code for each character.
• Once the data is encoded, it has to be decoded. Decoding is done
using the same tree.
Huffman coding is done with the help of the following steps.
1. Calculate the frequency of each character in the string.

2. Sort the characters in increasing order of the frequency. These are

stored in a priority queue Q.

3. Make each unique character as a leaf node.

4. Create an empty node z. Assign the minimum frequency to the left
child of z and assign the second minimum frequency to the right
child of z. Set the value of the z as the sum of the above two
minimum frequencies.

5. Remove these two minimum frequencies from Q and add the sum
into the list of frequencies (* denote the internal nodes in the figure
above).
6. Insert node z into the tree.
7. Repeat steps 3 to 5 for all the characters.

(a) (b)
8. For each non-leaf node, assign 0 to the left edge and 1 to the right
edge.
• For sending the above string over a network, we have to send the
tree as well as the above compressed-code. The total size is given by
the table below.

• Without encoding, the total size of the string was 120 bits. After
encoding the size is reduced to 32+15+28 = 75 bits.
Decoding the code
• For decoding the code, we can take the code and traverse through
the tree to find the character.
• Let 101 is to be decoded, we can traverse from the root as in the
figure below.
3.4 Adaptive coding
• The Huffman algorithm requires prior statistical knowledge about the
information source, and such information is often not available.
• This is particularly true in multimedia applications, where future data
is unknown before its arrival, as for example in live (or streaming)
audio and video.
• Even when the statistics are available, the transmission of the symbol
table could represent heavy overhead.
• The solution is to use adaptive compression algorithms, in which
statistics are gathered and updated dynamically as the datastream
arrives. The probabilities are no longer based on prior knowledge but
on the actual data received so far.
• The new coding methods are “adaptive” because, as the probability
distribution of the received symbols changes, symbols will be given
new (longer or shorter) codes.
• This is especially desirable for multimedia data, when the content
(the music or the color of the scene) and hence the statistics can
change rapidly.
Adaptive Huffman coding
Procedures:
• Initial_code assigns symbols with some initially agreed-upon
codes, without any prior knowledge of the frequency counts for
them. For example, some conventional codes such as ASCII may be
used for coding character symbols.
• update_tree is a procedure for constructing an adaptive Huffman
tree. It basically does two things: it increments the frequency counts
for the symbols (including any new ones), and updates the
configuration of the tree.
Example
Adaptive Huffman Coding for Symbol String AADCCDD

Initial code assignment for AADCCDD using adaptive Huffman coding

• Let us assume that the initial code assignment for both the encoder
and decoder simply follows the ASCII order for the 26 symbols in an
alphabet, A through Z, as the table above shows.
• To improve the implementation of the algorithm, we adopt an
additional rule: if any character/symbol is to be sent the first time, it
must be preceded by a special symbol, NEW. The initial code for NEW
is 0. The count for NEW is always kept as 0 (the count is never
increased); hence it is always denoted as NEW:(0)
• It is important to emphasize that the code for a particular symbol
often changes during the adaptive Huffman coding process. The more
frequent the symbol up to the moment, the shorter the code.
• For example, after AADCCDD, when the character D overtakes A as
the most frequent symbol, its code changes from 101 to 0. This is of
course fundamental for the adaptive algorithm—codes are reassigned
dynamically according to the new probability distribution of the
symbols.
3.5 Dictionary-based coding (LZW)
• LZW(Lempel-Ziv-Welch) employs an adaptive – dictionary based
compression technique. Unlike variable- length coding, in which the
length of code words are different, LZW uses fixed- length codeword
to represent variable-length strings of symbols/characters that
commonly occur together, such as words in English text.
• The LZW encoder and decoder build up the same dictionary
dynamically while receiving the data.
• LZW places longer and longer repeated entries into a dictionary, and
then emits the code for an element, rather than the string itself, if the
element has already been placed in the dictionary.
Algorithm:
Example
LZW Compression for String ABABBABCABABBA
• Let us start with a very simple dictionary (also referred to as a string
table), initially containing only three characters, with codes as follows:

• Now if the input string is ABABBABCABABBA, the LZW compression

algorithm works as follows:
• The output codes are 1 2 4 5 2 3 4 6 1. Instead of 14 characters, only 9
codes need to be sent. If we assume each character or code is
transmitted as a byte, that is quite a saving (the compression ratio
would be 14/9 = 1.56).
• LZW is an adaptive algorithm, in which the encoder and decoder
independently build their own string tables. Hence, there is no
overhead involving transmitting the string table.

Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Multimedia Data Compression Guide
No ratings yet
Multimedia Data Compression Guide
21 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Compression Methods: Huffman & LZ
100% (1)
Compression Methods: Huffman & LZ
26 pages
Data Compression
No ratings yet
Data Compression
28 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
CH 6
No ratings yet
CH 6
21 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
Huffman Coding in C++
No ratings yet
Huffman Coding in C++
10 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Huffman
No ratings yet
Huffman
53 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Huffman and Lempel-Ziv-Welch
No ratings yet
Huffman and Lempel-Ziv-Welch
14 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Modification of Adaptive Huffman Coding For Use in
No ratings yet
Modification of Adaptive Huffman Coding For Use in
6 pages
Huffman Coding: Greedy Algorithm Guide
No ratings yet
Huffman Coding: Greedy Algorithm Guide
27 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Huff Man
No ratings yet
Huff Man
8 pages
Unit 2
No ratings yet
Unit 2
82 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
MMC Module 3
No ratings yet
MMC Module 3
65 pages
Compression (Compatibility Mode)
No ratings yet
Compression (Compatibility Mode)
12 pages
Data Compression
No ratings yet
Data Compression
35 pages
Data Compression Unit-2
No ratings yet
Data Compression Unit-2
74 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Unit 2 CA209
No ratings yet
Unit 2 CA209
29 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Unit 2
No ratings yet
Unit 2
28 pages
Image Compression
100% (1)
Image Compression
38 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Dce 1
No ratings yet
Dce 1
21 pages
Compression & Huffman Codes
No ratings yet
Compression & Huffman Codes
29 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
DC 3
No ratings yet
DC 3
20 pages
4 Huffman and Shannon Fano Coding
No ratings yet
4 Huffman and Shannon Fano Coding
23 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Text Compression
No ratings yet
Text Compression
16 pages
University of Management & Technology: Submitted By: Usama Dastagir 14030027011 Hassan Humayoun 14030027043
No ratings yet
University of Management & Technology: Submitted By: Usama Dastagir 14030027011 Hassan Humayoun 14030027043
7 pages
20 Compression
No ratings yet
20 Compression
58 pages
2024-11-12 Huffman Trees 分享 -
No ratings yet
2024-11-12 Huffman Trees 分享 -
11 pages
Multimedia System Design Guide
No ratings yet
Multimedia System Design Guide
75 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
Huffman Coding
No ratings yet
Huffman Coding
23 pages
Multimedia System: Chapter Eight: Multimedia Data Compression
No ratings yet
Multimedia System: Chapter Eight: Multimedia Data Compression
29 pages
Multimedia Systems Mid Exam
100% (1)
Multimedia Systems Mid Exam
4 pages
MM System Outline
No ratings yet
MM System Outline
2 pages
PHEM Report
No ratings yet
PHEM Report
2 pages
Ganda Dasee Jaboo
86% (7)
Ganda Dasee Jaboo
20 pages
HSCH 9551 Avago
No ratings yet
HSCH 9551 Avago
2 pages
Engineering Differential Equations
No ratings yet
Engineering Differential Equations
57 pages
Nuclear Physics Foundations
No ratings yet
Nuclear Physics Foundations
21 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
29 pages
Newtons Law of Motion - Phys213
No ratings yet
Newtons Law of Motion - Phys213
8 pages
Introduction To The Pythagorean Tarot
100% (1)
Introduction To The Pythagorean Tarot
8 pages
Special Purpose Diodes Overview
No ratings yet
Special Purpose Diodes Overview
131 pages
KG Basin
No ratings yet
KG Basin
8 pages
Shake Theory
No ratings yet
Shake Theory
8 pages
Physical Science Question Class X
No ratings yet
Physical Science Question Class X
9 pages
Arnold 1998
No ratings yet
Arnold 1998
17 pages
Expo Lesson Plan
No ratings yet
Expo Lesson Plan
28 pages
Lecture 1 and 3 - Brief Introduction and Casting Fundamental
No ratings yet
Lecture 1 and 3 - Brief Introduction and Casting Fundamental
69 pages
GLC60 70VX 1
No ratings yet
GLC60 70VX 1
8 pages
Business Intelligence Architectures
No ratings yet
Business Intelligence Architectures
14 pages
The X3: Dealer Specification Guide From August 2019 Production
No ratings yet
The X3: Dealer Specification Guide From August 2019 Production
12 pages
Did You Know That Over $140 Billion
No ratings yet
Did You Know That Over $140 Billion
11 pages
Webasto BlueCool Classic Operation, Settings, Troubleshooting
No ratings yet
Webasto BlueCool Classic Operation, Settings, Troubleshooting
54 pages
Chronicle PDF
No ratings yet
Chronicle PDF
2 pages
Topic 6. Other Laws
No ratings yet
Topic 6. Other Laws
15 pages
List of Important Mathematicians
No ratings yet
List of Important Mathematicians
8 pages
MSDS 6. 33kv, 33 KV, PT
No ratings yet
MSDS 6. 33kv, 33 KV, PT
2 pages
HVAC Duct Design Lab Guide
No ratings yet
HVAC Duct Design Lab Guide
8 pages
Tutorial 1
No ratings yet
Tutorial 1
18 pages
Denso Cat 09 2024 1
100% (1)
Denso Cat 09 2024 1
23 pages
SER Plagiarism Report
No ratings yet
SER Plagiarism Report
2 pages
Definition: The Object Is The Person or Thing Affected by The Action
No ratings yet
Definition: The Object Is The Person or Thing Affected by The Action
25 pages
Forecasting: Roles, Steps, Techniques
No ratings yet
Forecasting: Roles, Steps, Techniques
23 pages
WPS & PQR of Ravindra Kumar
No ratings yet
WPS & PQR of Ravindra Kumar
4 pages
Assignment 1: Fundamentals: of Financial Management (FIBA 201)
No ratings yet
Assignment 1: Fundamentals: of Financial Management (FIBA 201)
8 pages

Chapter Three

Uploaded by

Chapter Three

Uploaded by

Chapter 3

Multimedia Data Compression

Fig 3.1 A general data compression scheme

• In general, we would desire any codec (encoder/decoder scheme) to

• pi – probability that symbol si will occur in S.

• Coding schemes aim to get as close as possible to this theoretical

• Each character occupies 8 bits. There are a total of 15 characters in

2. Sort the characters in increasing order of the frequency. These are

3. Make each unique character as a leaf node.

Initial code assignment for AADCCDD using adaptive Huffman coding

• Now if the input string is ABABBABCABABBA, the LZW compression

You might also like