KEMBAR78
Data Compression Techniques Guide | PDF | Data Compression | Video
0% found this document useful (0 votes)
157 views41 pages

Data Compression Techniques Guide

This document discusses data compression techniques for multimedia. It begins by outlining the motivation for compression due to the large storage and bandwidth requirements of audio and video. It then describes different compression types including lossy vs lossless and various techniques like entropy encoding, source coding, and hybrid coding. Specific entropy encoding techniques like Run Length Encoding and Huffman coding are also mentioned.

Uploaded by

Joshua chirchir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views41 pages

Data Compression Techniques Guide

This document discusses data compression techniques for multimedia. It begins by outlining the motivation for compression due to the large storage and bandwidth requirements of audio and video. It then describes different compression types including lossy vs lossless and various techniques like entropy encoding, source coding, and hybrid coding. Specific entropy encoding techniques like Run Length Encoding and Huffman coding are also mentioned.

Uploaded by

Joshua chirchir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Multimedia Systems

Chapter 7: Data Compression (1)


References
• Chapter 7 from our Textbook: “Multimedia
Fundamentals: Media Coding and Content
Processing”
• Slides from the reference book:
Fundamentals of Multimedia
Outline
• Introduction
• Motivation for compression
• Coding requirements
• Compression types
• General Data Compression Scheme
• Compression Techniques • Entropy Encoding
– Run Length Encoding
– Huffman Coding

Introduction
• Video and audio have much higher storage
requirements than text
• Data transmission rates (in terms of
bandwidth requirements) for sending
continuous media are considerably higher
than text
• Efficient compression of audio and video
data, including some compression standards,
will be considered in this chapter
Motivation for Compression
• Terminology
– 1 kbit = 1000 bit
– 1 Kbit = 1024 bit (= 210)
– 1 Mbit = 1024 x 1024 bit (= 210 * 210 = 220)
• Discrete Data: Considering a small window of 640 x
480 pixels on a display
– Text
– Vector Image
– Bitmap Image
• Continuous Data: Required storage space per
second
– Uncompressed speech of telephone quality
– Uncompressed stereo audio signal of CD quality
– Video sequence
Motivation for Compression:
Discrete Data
• Text
– Assuming 2 bytes are used for every 8 x 8 pixel character,
• Character per screen page = ...
• Storage required per screen page = ...

• Vector Image
– Assuming that a typical image consists of 500 lines, each of which
is defined by its coordinates in the x direction and the y direction,
and an 8-bit attribute field
• Coordinates in the x direction require ...
• Coordinates in the y direction require ...
• Bits per line = ...
• Storage required per screen page
Motivation for Compression:
• Bitmap Image
– Assuming using 256 colors requiring a single byte per pixel
• Storage required per screen page = ...

Continuous Data
• Uncompressed speech of telephone
quality
– Assuming being sampled at 8 kHz and
quantized using 8 bit per sample yielding a
data stream of 64 Kbit/second
• Storage space required per second = ...
Motivation for Compression:
• Uncompressed stereo audio signal of
CD quality
– Assuming being sampled at 44.1 kHz and
quantized using 16 bits
• Data rate = ...
• Storage space required per second = ...
Continuous Data
• Video sequence
Motivation for Compression:
– Assuming 25 full frames per second, luminance
and chrominance of each pixel are coded using
3 bytes, luminance sampled at 13.5 MHz while
chrominance (R-Y and B-Y) is sampled at 6.75
MHz, each, and samples are uniformly coded
using 8 bits.
• Bandwidth = ...
• Data Rate = ...
• Storage space required per second = ...
Motivation for Compression:
Continuous Data
• Processing uncompressed video data streams
requires
– Storage space in the gigabyte – Buffer space in
the megabyte
– Data transfer rates of 140 Mbit/s [per
unidirectional connection]
• These requirements can be considerably
lowered by employing compression
Can Multimedia Data be
Significantly Compressed?
• Redundancy can be exploited to do
compression
• Spatial redundancy
– correlation between neighboring pixels in
image/video
• Spectral redundancy
– correlation among colors
• Psycho-visual redundancy
– Perceptual properties of human visual system
What Makes “Good” Compression
• Quality of compressed and decompressed data
should be as good as possible
• Compression/decompression process should be as
simple as possible
• Decompression time must not exceed certain
thresholds
• [De]/Compression requirements can be divided
into
– Dialogue mode (video conferencing)
– Retrieval mode (digital libraries)
– Both
Coding Requirements: Dialogue
Mode
• End-to-end delay does not exceed 150
ms for compression and decompression
alone.
– Ideally, compression and decompression should
not exceed 50ms in order to ensure natural
dialogue.
• In addition
– delay in the network,
– communications protocol processing in the end system,
– data transfer to and from the respective input and output
devices.

Coding Requirements: Retrieval


Mode
• Fast forward and fast rewind with simultaneous
display (or playback) of the data should be
possible
• Random access to single images or audio passages
in a data stream should be possible in less than 0.5
s. – Maintains interaction aspects in retrieval systems
• Decompression of images, video or audio
passages should be possible without interpreting
all preceding data.
– Allows random access and editing
Coding Requirements: Both Modes
• Support display of the same data in different systems –
Formats have to be independent of frame size and video frame rate
• Audio and video compression should support different
data rates at different qualities
• Precisely synchronize audio and video
• Support for economical solution
– Software
– Few VLSI chips
• Enable cooperation of different systems
– Data generated on a multimedia system can be reproduced on
another system (e.g. course materials).
Compression Types
• Physical versus logical Compression
– Physical
• Performed on data regardless of what information it contains
• Translates a series of bits to another series of bits
– Logical
• Knowledge-based
– e.g. United Kingdom to UK

• Spatial Compression – 2D or single image


• Temporal Compression – 3D or video
• Codec – Compression / Decompression
• Color / intensity … same thing
• Symmetric
Compression Types
– Compression and decompression roughly use the same
techniques and take just as long
– Data transmission which requires compression and
decompression on-the-fly will require these types of
algorithms
• Asymmetric
– Most common is where compression takes a lot more
time than decompression
• In an image database, each image will be compressed once and
decompressed many times
– Less common is where decompression takes a lot more
time than compression
Compression Types
• Creating many backup files which will hardly ever be read
• Non-adaptive
– Contain a static dictionary of predefined
substrings to encode which are known to occur
with high frequency
• Adaptive
– Dictionary is built from scratch
• Lossless
– decompress(compress(data)) = data
– Used for computer data, medical images, etc.
Compression Types
• Lossy
– decompress(compress(data)) ≠ data
– Some distortion
– A small change in pixel values may be invisible
– Suited for audio and video
General Data Compression Scheme
Input Data Encoder
(compression)
Codes /
Codewords
Storage or Networks

Codes /
Codewords Decoder
B0 = # bits required before compression (decompression)
B1 = # bits required after compression
Output Data
Compression Ratio = B0 / B1.

Compression Techniques
Coding Type Basis Run- Technique
length Coding
Entropy Huffman Coding
Encoding Arithmetic Coding
DPCM
Prediction
DM
FFT
Transformation
DCT
Source Coding
Bit Position
Layered Coding Subsampling
Sub-band Coding
Vector Quantization
JPEG
MPEG
Hybrid Coding
H.263
Many Proprietary Systems

Compression Techniques
• Entropy Coding
– Semantics of the information to be encoded are ignored
– Lossless compression technique
– Can be used for different media regardless of their
characteristics
• Source Coding
– Takes into account the semantics of the information to be
encoded.
– Often lossy compression technique
– Characteristics of medium are exploited
• Hybrid Coding
– Most multimedia compression algorithms are hybrid techniques

Entropy Encoding
• Information theory is a discipline in applied mathematics involving
the quantification of data with the goal of enabling as much data as
possible to be reliably stored on a medium and/or communicated
over a channel.
• According to Claude E. Shannon, the entropy η(eta) of an
information source with alphabet S = {s1, s2, ..., sn} is defined as

pi
i=1 pi i=1
where pi is the probability that symbol si in S will occur.

Entropy Encoding
• In science, entropy is a measure of the disorder of a
system.
– More entropy means more disorder
– Negative entropy is added to a system when more order is
given to the system.
• The measure of data, known as information entropy, is
usually expressed by the average number of bits needed
for storage or communication.
– The Shannon Coding Theorem states that the entropy is the best
we can do (under certain conditions). i.e., for the average length
of the codewords produced by the encoder, l’,
η ≤ l’
Entropy Encoding
• Example 1: What is the entropy of an image with
uniform distributions of gray-level intensities (i.e.
pi = 1/256 for all i)?
• Example 2: What is the entropy of an image
whose histogram shows that one third of the
pixels are dark and two thirds are bright?
Entropy Encoding: Run-Length
• Data often contains sequences of identical bytes.
Replacing these repeated byte sequences with the
number of occurrences reduces considerably the
overall data size.
• Many variations of RLE
– One form of RLE is to use a special marker M-byte that
will indicate the number of occurrences of a character
• “c”!#
– How many bytes are used above? When do you think the M-byte
should be used?

• ABCCCCCCCCDEFGGG
is encoded as Note: This encoding is DIFFERENT
Entropy Encoding: Run-Length
from what is mentioned in your book
ABC!8DEFGGG

– What if the string contains the “!” character?


– How much is the compression ratio for this example

• Many variations of RLE :


– Zero-suppression: In this case, one character that is
repeated very often is the only character used in the
RLE. In this case, the M-byte and the number of
additional occurrences are stored.
• When do you think the M-byte should be used, as opposed to
using the regular representation without any encoding?
• Many variations of RLE :
Entropy Encoding: Run-Length
– If we are encoding black and white images (e.g.
Faxes), one such version is as follows:
(row#, col# run1 begin, col# run1 end, col# run2 begin, col# run2
end, ... , col# runk begin, col# runk end)
(row#, col# run1 begin, col# run1 end, col# run2 begin, col# run2
end, ... , col# runr begin, col# runr end)
...
(row#, col# run1 begin, col# run1 end, col# run2 begin, col# run2
end, ... , col# runs begin, col# runs end)
Entropy Encoding: Huffman Coding
• One form of variable length coding
• Greedy algorithm
• Has been used in fax machines, JPEG and
MPEG
Algorithm huffman
Input: A set C = {c1 , c2 , ... , cn} of n characters and
their frequencies {f(c1), f(c2 ) , ... , f(cn )} Output: A
Huffman tree (V, T) for C.
1. Insert all characters into a min-heap H according to their
frequencies.
2. V = C; T = {}
3. for j = 1 to n – 1
Entropy Encoding: Huffman Coding
4. c = deletemin(H)
5. c’ = deletemin(H)
6. f(v) = f(c) + f(c’) // v is a new node
7. Insert v into the minheap H
8. Add (v,c) and (v,c’) to tree T making c and c’ children of v
in
T
9. end for
• Example
Entropy Encoding: Huffman Coding
• Most important properties of Huffman Coding
– Unique Prefix Property: No Huffman code is a
prefix of any other Huffman code
• For example, 101 and 1010 cannot be Huffman codes. Why?
– Optimality: The Huffman code is a
minimumredundancy code (given an accurate data
model)
• The two least frequent symbols will have the same length for
their Huffman code, whereas symbols occurring more frequently
will have shorter Huffman codes
• It has been shown that the average code length of an information
source S is strictly less than η+ 1, i.e.
Entropy Encoding: Huffman Coding
η ≤ l’ < η+ 1
Entropy Encoding: Adaptive
Huffman Coding
• The Huffman method assumes that the
frequencies of occurrence of all the
symbols of the alphabet are known
apriori.
– This is rarely the case in practice
– Semi-adaptive Huffman coding has been
employed where data is read twice, the first
pass being to determine the frequencies
Entropy Encoding: Adaptive
• Disadvantage: Too slow for real-time applications –
Another solution is Adaptive Huffman Coding
• Employed by Unix’s “compact” program.
Huffman Coding
• Decoder “mirrors” the operations of the encoder, as
they both may occur at different times
• Main idea of the algorithm is as follows
– Encoder and Decoder both start with an empty Huffman
Coding Tree
• No symbol is assigned codes yet.
Entropy Encoding: Adaptive
– First symbol read is written on the output stream in its
uncompressed form
• In fact, each uncompressed character being read for the first time is read
this way
– That is why we need an escape character to determine when we read an
uncompressed character for the first time.
– This escape character is denoted by NEW and given frequency 0 all the
time
– This symbol is then added to the tree and a code is
assigned to it.
Entropy Encoding: Adaptive
Huffman Coding
– Next time this symbol is encountered, its code will
be written in the output screen and its frequency is
increased by 1.
– Since this modifies the tree, it is checked whether it
is a
Huffman tree or not
• If not, it will be rearranged, through swaps, and new
codes will be assigned
• Sibling Property
– Must be preserved during swaps
Entropy Encoding: Adaptive
– All nodes are arranged in the order of increasing counts, left to right
and bottom to top.
– During a swap, the farthest node with count N is swapped with the
node whose count has just been increased to N + 1.

Huffman Coding
• Example

You might also like