0% found this document useful (0 votes)

20 views12 pages

Compression (Compatibility Mode)

The document provides an overview of data compression, defining it as the representation of information using fewer bits while discussing its necessity for reducing storage and transmission costs. It distinguishes between lossless and lossy compression techniques, detailing various methods such as Run-length Encoding and Static Huffman Coding, along with their advantages and limitations. Additionally, it covers compression utilities, formats, and the importance of the prefix property in ensuring unique decodability of Huffman codes.

Uploaded by

subhrand66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views12 pages

Compression (Compatibility Mode)

Uploaded by

subhrand66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

20-May-19

Data Compression

Introduction to Data Compression (1)

• What is Data Compression?
• Why Data Compression?
• How is Data Compression possible?
• Lossless and Lossy Data Compression
• Static, Adaptive, and Hybrid Compression
• Compression Utilities and Formats
• Run-length Encoding
• Static Huffman Coding
• The Prefix property

1
20-May-19

What is Data Compression?

• Definition:
• Data compression (or Source coding) is the
representation of an information source (e.g. a
data file, a speech signal, an image, or a video
signal) as accurately as possible using the fewest
number of bits.

• Note: Compressed data can only be understood if

the decoding method is known by the receiver.

Why Data Compression?

• Data storage and transmission cost money. This

cost increases with the amount of data available.

• This cost can be reduced by processing the data so

that it takes less memory and less transmission
time.

• Data transmission is faster by using better

transmission media or by compressing the data.

2
20-May-19

Why Data Compression? (Contd)

• Disadvantage of Data compression:
Compressed data must be decompressed to be
viewed (or heard), thus extra processing is
required.

• The design of data compression schemes therefore

involve trade-offs between various factors,
including the degree of compression, the amount
of distortion introduced (if using a lossy
compression scheme), and the computational
resources required to compress and uncompress
the data.

How is data compression possible?

 Compression is possible because information
usually contains redundancies, or information
that is often repeated.

Examples include reoccurring letters,

numbers or pixels

 File compression programs remove this

redundancy.

3
20-May-19

Lossless and Lossy Compression Techniques

• Data compression techniques are broadly
classified into lossless and lossy.

• Lossless techniques enable exact

reconstruction of the original document from
the compressed information.
• Exploit redundancy in data
• Applied to general data
• Examples: Run-length, Huffman, LZ77,
LZ78, and LZW
• Compression ratio typically no better than 4:1
for lossless compression on many kinds of files

Lossless and Lossy Compression Techniques (Contd)

• Lossy compression - reduces a file by

permanently eliminating certain redundant
information
• Exploit redundancy and human perception
• Applied to audio, image, and video
• Examples: JPEG and MPEG
• Compression ratios typically no better than 10:1
• Lossy techniques usually achieve higher
compression rates than lossless ones but the
latter are more accurate.

4
20-May-19

Classification of Lossless Compression Techniques

• Lossless techniques are classified into static,
adaptive (or dynamic), and hybrid.
• In a static method the mapping from the set of
messages to the set of codewords is fixed before
transmission begins, so that a given message is
represented by the same codeword every time it
appears in the message being encoded.
• Static coding requires two passes: one pass
to compute probabilities (or frequencies) and
determine the mapping, and a second pass to
encode.
• Examples: Static Huffman Coding

Classification of Lossless Compression Techniques (contd)

• In an adaptive method the mapping from the set of

messages to the set of codewords changes over time.

• All of the adaptive methods are one-pass methods;

only one scan of the message is required.
• Examples: LZ77, LZ78, LZW, and Adaptive
Huffman Coding

• An algorithm may also be a hybrid, neither completely

static nor completely dynamic.

5
20-May-19

Compression Utilities and Formats

• Compression tool examples:
 winzip, pkzip, compress, gzip

• General compression formats:

 .zip, .gz

• Common image compression formats:

JPEG, JPEG 2000, BMP, GIF, PCX, PNG, TGA, TIFF, WMP

• Common audio (sound) compression formats:

MPEG-1 Layer III (known as MP3), RealAudio (RA, RAM, RP), AU, Vorbis,
WMA, AIFF, WAVE, G.729a

• Common video (sound and image) compression formats:

MPEG-1, MPEG-2, MPEG-4, DivX, Quicktime (MOV), RealVideo (RM),
Windows Media Video (WMV), Video for Windows (AVI), Flash video (FLV)

The following string:

Run-length encoding
 BBBBHHDDXXXXKKKKWWZZZZ
can be encoded more compactly by replacing each repeated string of characters by a single instance of
the repeated character and a number that represents the number of times it is repeated:
 4B2H2D4X4K2W4Z
Here "4B" means four B's, and 2H means two H's, and so on. Compressing a string in this way is called
run-length encoding.

As another example, consider the storage of a rectangular image. As a single color bitmapped image, it
can be stored as:

The rectangular image can be compressed with run-length encoding by counting identical bits as
follows:
 0, 40 The first line says that the first line of the bitmap consists of
 0, 40 40 0's. The third line says that the third line of the bitmap
 0,10 1,20 0,10 consists of 10 0's followed by 20 1's followed by 10 more 0's,
 0,10 1,1 0,18 1,1 0,10 and so on for the other lines
 0,10 1,1 0,18 1,1 0,10
 0,10 1,1 0,18 1,1 0,10
 0,10 1,20 0,10
 0,40

Run-length encoding cannot work for all files

6
20-May-19

Static Huffman Coding

• Static Huffman coding assigns variable length codes to symbols

based on their frequency of occurrences in the given message.
Low frequency symbols are encoded using many bits, and high
frequency symbols are encoded using fewer bits.

• The message to be transmitted is first analyzed to find the

relative frequencies of its constituent characters.

• The coding process generates a binary tree, the Huffman code

tree, with branches labeled with bits (0 and 1).

• The Huffman tree (or the character codeword pairs) must be sent
with the compressed information to enable the receiver decode
the message.

Static Huffman Coding Algorithm

• Find the frequency of each character in the file to be compressed;

• For each distinct character create a one-node binary tree containing the character and
its frequency as its priority;

• Insert the one-node binary trees in a priority queue in increasing order of frequency;

• while (there are more than one tree in the priority queue) {
▪ dequeue two trees t1 and t2;
▪ Create a tree t that contains t1 as its left subtree and t2 as its right subtree; // 1
▪ priority (t) = priority(t1) + priority(t2);
▪ insert t in its proper location in the priority queue; // 2
}

• Assign 0 and 1 weights to the edges of the resulting tree, such that the left and right
edge of each node do not have the same weight; // 3

Note: The Huffman code tree for a particular set of characters is not unique.
(Steps may be done differently).

The complexity of the algorithm on a set of n characters is

O(n log n)

7
20-May-19

Static Huffman Coding example

Example: Information to be transmitted over the internet contains
the following characters with their associated frequencies:

Character a e l n o s t
Frequency 45 65 13 45 18 22 53

Use Huffman technique to answer the following questions:

 Build the Huffman code tree for the message.

 Use the Huffman tree to find the codeword for each character.

 If the data consists of only these characters, what is the total number of
bits to be transmitted? What is the compression ratio?

 Verify that your computed Huffman codewords satisfy the Prefix

property.

Static Huffman Coding example (cont’d)

8
20-May-19

Static Huffman Coding example (cont’d)

9
20-May-19

Static Huffman Coding example (cont’d)

The sequence of zeros and ones that are the arcs in the path from the root to each leaf node are
the desired codes:
character a e l n o s t

Huffman 110 10 0110 111 0111 010 00

codeword

10
20-May-19

Static Huffman Coding example (cont’d)

If we assume the message consists of only the characters a,e,l,n,o,s,t then the
number of bits for the compressed message will be 696:

If the message is sent uncompressed with 8-bit ASCII representation for the
characters, we have 261*8 = 2088 bits.

Static Huffman Coding example (cont’d)

Assuming that the number of character-codeword pairs and the pairs are included at the beginning of
the binary file containing the compressed message in the following format:

7 in binary (significant bits)

a110
e10
Characters are in 8-bit ASCII
l0110 codes
n111
o0111
s010
t00
sequence of zeroes and ones for the compressed message

Number of bits for the transmitted file = bits(7) + bits(characters) + bits(codewords) + bits(compressed message)
= 3 + (7*8) + 21 + 696 = 776

Compression ratio = bits for ASCII representation / number of bits transmitted

= 2088 / 776 = 2.69

Thus, the size of the transmitted file is 100 / 2.69 = 37% of the original ASCII file
(i.e., 37% compression has been achieved)

11
20-May-19

The Prefix Property

 Data encoded using Huffman coding is uniquely decodable. This is

because Huffman codes satisfy an important property called the prefix
property:

In a given set of Huffman codewords, no codeword is a prefix of

another Huffman codeword

 For example, in a given set of Huffman codewords, 10 and 101 cannot

simultaneously be valid Huffman codewords because the first is a prefix
of the second.

 We can see by inspection that the codewords we generated in the

previous example are valid Huffman codewords.

The Prefix Property (cont’d)

To see why the prefix property is essential, consider the codewords given below
in which “e” is encoded with 110 which is a prefix of “f”

character a b c d e f
codeword 0 101 100 111 110 1100

The decoding of 11000100110 is ambiguous:

11000100110 => face

11000100110 => eaace

Synopsis On: Data Compression
No ratings yet
Synopsis On: Data Compression
25 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Compression Methods: Huffman & LZ
100% (1)
Compression Methods: Huffman & LZ
26 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Data Compression
No ratings yet
Data Compression
28 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
102 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Compression & Huffman Codes
No ratings yet
Compression & Huffman Codes
29 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
57 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Chapter 4 Lossless Compression Algorithims
No ratings yet
Chapter 4 Lossless Compression Algorithims
30 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Chapter-5 Data Compression
No ratings yet
Chapter-5 Data Compression
53 pages
Application of Compression
No ratings yet
Application of Compression
14 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Data Compression for Tech Experts
100% (1)
Data Compression for Tech Experts
14 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Data Structures and Algorithms Compression Methods
No ratings yet
Data Structures and Algorithms Compression Methods
21 pages
Huffman Encoding
No ratings yet
Huffman Encoding
16 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Mod 2 Multimedia Communications
No ratings yet
Mod 2 Multimedia Communications
20 pages
Optimization Problems
No ratings yet
Optimization Problems
38 pages
Unit 5 - Data Compression
No ratings yet
Unit 5 - Data Compression
46 pages
Compression Huff Merged
No ratings yet
Compression Huff Merged
27 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Chapter 3 Multimedia Compression
0% (1)
Chapter 3 Multimedia Compression
61 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
71 pages
Department of Information and Communication Engineering (ICE)
No ratings yet
Department of Information and Communication Engineering (ICE)
11 pages
Ajayroyal828@gmail - Com 9908104197
No ratings yet
Ajayroyal828@gmail - Com 9908104197
10 pages
Huffman Coding in C++
No ratings yet
Huffman Coding in C++
10 pages
Unit 2
No ratings yet
Unit 2
28 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Data Compression
No ratings yet
Data Compression
18 pages
05 Compression
No ratings yet
05 Compression
46 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Multimedia Data Compression Guide
No ratings yet
Multimedia Data Compression Guide
21 pages
16 San
No ratings yet
16 San
7 pages
Lec 42024
No ratings yet
Lec 42024
13 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Chapter 3 Multimedia Compression
No ratings yet
Chapter 3 Multimedia Compression
61 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Multimedia Class 11
No ratings yet
Multimedia Class 11
6 pages
DC 3
No ratings yet
DC 3
20 pages
Algorithm
No ratings yet
Algorithm
14 pages
Basics of Compression
No ratings yet
Basics of Compression
10 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
Data Compression (RCS 087)
No ratings yet
Data Compression (RCS 087)
51 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
5 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
Laser Barcode Scanner Manual
No ratings yet
Laser Barcode Scanner Manual
22 pages
Async Redis Client Guide
No ratings yet
Async Redis Client Guide
98 pages
BCom 1
No ratings yet
BCom 1
35 pages
Semantic Communications Overview Open Issues and Future Research Directions
No ratings yet
Semantic Communications Overview Open Issues and Future Research Directions
10 pages
Driving NC II: Workplace Communication
No ratings yet
Driving NC II: Workplace Communication
42 pages
CS Assignment: Algorithms & Complexity
No ratings yet
CS Assignment: Algorithms & Complexity
8 pages
GEC 324 Part 1
No ratings yet
GEC 324 Part 1
16 pages
MSC Thesis Jianyu Chen 2
No ratings yet
MSC Thesis Jianyu Chen 2
50 pages
I Purposive Communication
No ratings yet
I Purposive Communication
30 pages
Communication Skills Notes For 24 - 25 Session 1st Semester - 090848
No ratings yet
Communication Skills Notes For 24 - 25 Session 1st Semester - 090848
9 pages
Coding Theory for Tech Enthusiasts
No ratings yet
Coding Theory for Tech Enthusiasts
46 pages
Huffman
No ratings yet
Huffman
35 pages
The Mathematical Universe
0% (2)
The Mathematical Universe
285 pages
Communication Has Been Defined As A Process
No ratings yet
Communication Has Been Defined As A Process
6 pages
5 Sem Syllabus
No ratings yet
5 Sem Syllabus
21 pages
Business Communication Quiz
No ratings yet
Business Communication Quiz
4 pages
Steganography: Art of Data Hiding
No ratings yet
Steganography: Art of Data Hiding
24 pages
Directing Controlling
No ratings yet
Directing Controlling
19 pages
The Secret Code of God
100% (9)
The Secret Code of God
285 pages
TREES
No ratings yet
TREES
26 pages
Book PDF
No ratings yet
Book PDF
137 pages
Thank You For Purchase: You May Also Like Pirate Coding or STEM Activities
100% (1)
Thank You For Purchase: You May Also Like Pirate Coding or STEM Activities
12 pages
Communication
No ratings yet
Communication
57 pages
Source Coding Techniques
No ratings yet
Source Coding Techniques
18 pages
Practical Final AE-402 (Section A)
No ratings yet
Practical Final AE-402 (Section A)
3 pages
Disciplines and Ideas in The Applied Social Sciences: Senior High School
100% (1)
Disciplines and Ideas in The Applied Social Sciences: Senior High School
52 pages
Dce 1
No ratings yet
Dce 1
21 pages
Convolutional Codes Explained
No ratings yet
Convolutional Codes Explained
65 pages
Communication Types & Processes
No ratings yet
Communication Types & Processes
16 pages

Compression (Compatibility Mode)

Uploaded by

Compression (Compatibility Mode)

Uploaded by

20-May-19

Introduction to Data Compression (1)

What is Data Compression?

• Note: Compressed data can only be understood if

Why Data Compression?

• Data storage and transmission cost money. This

• This cost can be reduced by processing the data so

• Data transmission is faster by using better

Why Data Compression? (Contd)

• The design of data compression schemes therefore

How is data compression possible?

Examples include reoccurring letters,

 File compression programs remove this

Lossless and Lossy Compression Techniques

• Lossless techniques enable exact

Lossless and Lossy Compression Techniques (Contd)

• Lossy compression - reduces a file by

Classification of Lossless Compression Techniques

Classification of Lossless Compression Techniques (contd)

• In an adaptive method the mapping from the set of

• All of the adaptive methods are one-pass methods;

• An algorithm may also be a hybrid, neither completely

Compression Utilities and Formats

• General compression formats:

• Common image compression formats:

• Common audio (sound) compression formats:

• Common video (sound and image) compression formats:

The following string:

Run-length encoding cannot work for all files

Static Huffman Coding

• Static Huffman coding assigns variable length codes to symbols

• The message to be transmitted is first analyzed to find the

• The coding process generates a binary tree, the Huffman code

Static Huffman Coding Algorithm

The complexity of the algorithm on a set of n characters is

Static Huffman Coding example

Use Huffman technique to answer the following questions:

 Build the Huffman code tree for the message.

 Verify that your computed Huffman codewords satisfy the Prefix

Static Huffman Coding example (cont’d)

Static Huffman Coding example (cont’d)

Static Huffman Coding example (cont’d)

Static Huffman Coding example (cont’d)

Static Huffman Coding example (cont’d)

Huffman 110 10 0110 111 0111 010 00

Static Huffman Coding example (cont’d)

Static Huffman Coding example (cont’d)

7 in binary (significant bits)

Compression ratio = bits for ASCII representation / number of bits transmitted

The Prefix Property

 Data encoded using Huffman coding is uniquely decodable. This is

In a given set of Huffman codewords, no codeword is a prefix of

 For example, in a given set of Huffman codewords, 10 and 101 cannot

 We can see by inspection that the codewords we generated in the

The Prefix Property (cont’d)

The decoding of 11000100110 is ambiguous:

11000100110 => face

11000100110 => eaace

You might also like