KEMBAR78
Information and Data Comparison | PDF | Computers
0% found this document useful (0 votes)
131 views102 pages

Information and Data Comparison

The document discusses information theory and data compression. It introduces basic concepts of data compression, including why compression is needed in the modern digital world. It allows efficient transmission of images, video and audio by reducing the number of bits needed to represent such data. Compression works by identifying and utilizing structures that exist in the data.

Uploaded by

Mirna Attallah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views102 pages

Information and Data Comparison

The document discusses information theory and data compression. It introduces basic concepts of data compression, including why compression is needed in the modern digital world. It allows efficient transmission of images, video and audio by reducing the number of bits needed to represent such data. Compression works by identifying and utilizing structures that exist in the data.

Uploaded by

Mirna Attallah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Tanta University

Faculty of Computer and Informatics

Information Theory
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

and Data
2022/2023

Compression 2022/2023 2022/2023

Level 3
Prepared by
30210061601189 30210061601189 30210061601189

Dr. Omnia Elbarbary


Assistant Professor- Department of information System
1
Faculty of Computer and Informatics – Tanta University

Course Syllabus
IT351: Information Theory and Data Compression
Prerequisite: Math (3) - Object Oriented Programming
Instructor: Dr. Omnia Elbarbary
Textbooks:
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

1. Information Theory, Pattern Recognition, and Neural Networks (English, free), David J. C.
MacKay, University of Cambridge, ISBN: 9780521642989. FREE, downloadable from the
course Gdrive.
2. A Student's Guide To Coding And Information Theory, STEFAN M. MOSER PO-NING CHEN.
3. Introduction to information theory based on examples (in Japanese), ISBN 978-4-06-153803-0,
Shinichi Oishi (1993) Kodansha (not required)

Course description:
This course “Information Theory and Data Compression” gives knowledge and basic skills as follows.
Transmitting
2022/2023 information efficiently and accurately is one of the important technical
2022/2023 challenges in the
2022/2023

modern digital society. Information theory is rooted in mathematical formulation and provides a
theoretical solution to this problem. The idea of information theory makes it possible to construct an
efficient coding for information communication and error correction by utilizing the probability and the
statistical theorem. Information theory plays an important role in fields such as image data compression,
cryptology theory, network communication, information quantity evaluation, etc.

Course objectives:
The main contents of this course are mathematical formulation of source coding and channel coding,
efficient
construction method of coding, and entropy for measuring information ambiguity and information
volume.
30210061601189 30210061601189 30210061601189
Lecture Contents:
• Lecture 1: Course introduction, Intro. to Information Theory
• Lecture 2: Lossless Compression
• Lecture 3 – 4: Variable length Coding and Shannon-Fano Coding

2
• Lecture 5 – 6: Data compression & Huffman Code (Prefix-free, Trees & Codes, Kraft Inequality,
Trees with Prob., Huffman Code)
• Lecture 7 – 8: Dictionary Compression
• Lecture 9: Entropy & Source Coding
• Lecture 10 Which lossless algorithm is most suited for a given application
• Lecture 11: Lossy Compression
• Lecture 12: Image Compression
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• Lecture 13: Video Compression

Laboratory Exercises:
• Lab 0: tutorial using programming language (2 sessions)
• Lab 1 (2 sessions): Channel coding
• Lab 2 (3 sessions): Source coding - Data compression using Huffman Coding
• Lab 3 (4 sessions): Image transmission over AWGN channel
o Part 1: without source and channel coding
o Part 2: with/without compression
o Part 3:
2022/2023 with/without ECC (Hamming 7,4) 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

3
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Chapter 1
2022/2023 2022/2023 2022/2023

Basic
Compression
30210061601189 30210061601189 30210061601189

Concepts 4
Introduction

In the last decade we have been witnessing a transformation—some call it a


revolution—in the way we communicate, and the process is still under way.
This transformation includes the ever-present, ever-growing Internet; the
explosive development of mobile communications; and the ever-increasing
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

importance of video communication. Data compression is one of the enabling


technologies for each of these aspects of the multimedia revolution. It would
not be practical to put images, let alone audio and video, on websites if it were
not for data compression algorithms. Cellular phones would not be able to
provide communication with increasing clarity were it not for compression.
The advent of digital TV would not be possible without compression. Data
compression, which for a long time was the domain of a relatively small
group of engineers and scientists, 2022/2023
2022/2023 is now ubiquitous. Make2022/2023
a long-distance
call and you are using compression. Use your modem, or your fax machine,
and you will benefit from compression. Listen to music on your mp3 player or
watch a DVD and you are being entertained courtesy of compression. So,
what is data compression, and why do we need it? Most of you have heard of
JPEG and MPEG, which are standards for representing images, video, and
audio. Data compression algorithms are used in these standards to reduce the
number of bits required to represent an image or a video sequence or music.
In brief, data compression is the art
30210061601189
or science of representing
30210061601189
information in a
30210061601189

compact form. We create these compact representations by identifying and


using structures that exist in the data. Data can be characters in a text file,
numbers that are samples of speech or image waveforms, or sequences of
5
numbers that are generated by other processes. The reason we need data
compression is that more and more of the information that we generate and
use is in digital form—in the form of numbers represented by bytes of data.
And the number of bytes required to represent multimedia data can be huge.
For example, in order to digitally represent 1 second of video without
compression (using the CCIR 601 format), we need more than 20 megabytes,
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

or 160 megabits. If we consider the number of seconds in a movie, we can


easily see why we would need compression. To represent 2 minutes of
uncompressed CD-quality music (44,100 samples per second, 16 bits per
sample) requires more than 84 million bits. Downloading music from a
website at these rates would take a long time.

1.1 Compression Techniques


2022/2023 2022/2023 2022/2023

When we speak of a compression technique or compression algorithm, we are


actually referring to two algorithms. There is the compression algorithm that
takes an input 𝑥 and generates a representation 𝑥 that requires fewer bits, and
there is a reconstruction algorithm that operates on the compressed
representation 𝑥 to generate the reconstruction 𝑦. These operations are
shown schematically in Figure 1.1. We will follow convention and refer to
both the compression and reconstruction algorithms together to mean the
compression algorithm.
30210061601189 30210061601189 30210061601189

6
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Based on the requirements of reconstruction, data compression schemes can


be divided into two broad classes: lossless compression schemes, in which 𝑦.
is identical to 𝑥., and lossy compression schemes, which generally provide
2022/2023 2022/2023 2022/2023
much higher compression than lossless compression but allow 𝑦. to be
different from 𝑥.

1.1.1 Lossless Compression

Lossless compression techniques, as their name implies, involve no loss of


information. If data have been losslessly compressed, the original data can be
recovered exactly from the compressed data. Lossless compression is
generally used for applications that
30210061601189
cannot tolerate any difference
30210061601189
between the
30210061601189

original and reconstructed data.


Text compression is an important area for lossless compression. It is very
important that the reconstruction is identical to the text original, as very small

7
differences can result in statements with very different meanings. Consider
the sentences “Do not send money” and “Do now send money.” A similar
argument holds for computer files and for certain types of data such as bank
records.
If data of any kind are to be processed or “enhanced” later to yield more
information, it is important that the integrity be preserved. For example,
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

suppose we compressed a radiological image in a lossy fashion, and the


difference between the reconstruction 𝑦 and the original 𝑥 was visually
undetectable. If this image was later enhanced, the previously undetectable
differences may cause the appearance of artifacts that could seriously mislead
the radiologist. Because the price for this kind of mishap may be a human life,
it makes sense to be very careful about using a compression scheme that
generates a reconstruction that is different from the original.
2022/2023 2022/2023 2022/2023
Data obtained from satellites often are processed later to obtain different
numerical indicators of vegetation, deforestation, and so on. If the
reconstructed data are not identical to the original data, processing may result
in “enhancement” of the differences. It may not be possible to go back and
obtain the same data over again. Therefore, it is not advisable to allow for any
differences to appear in the compression process. There are many situations
that require compression where we want the reconstruction to be identical to
the original. There are also a number of situations in which it is possible to
relax this requirement in order to30210061601189
30210061601189 get more compression. In these situations,
30210061601189

we look to lossy compression techniques.

8
1.1.2 Lossy Compression

Lossy compression techniques involve some loss of information, and data that
have been compressed using lossy techniques generally cannot be recovered
or reconstructed exactly.
In return for accepting this distortion in the reconstruction, we can generally
obtain much higher compression ratios than is possible with lossless
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

compression.
In many applications, this lack of exact reconstruction is not a problem. For
example, when storing or transmitting speech, the exact value of each sample
of speech is not necessary. Depending on the quality required of the
reconstructed speech, varying amounts of loss of information about the value
of each sample can be tolerated. If the quality of the reconstructed speech is to
be similar to that heard on the telephone,
2022/2023 2022/2023
a significant loss of 2022/2023
information can
be tolerated. However, if the reconstructed speech needs to be of the quality
heard on a compact disc, the amount of information loss that can be tolerated
is much lower.
Similarly, when viewing a reconstruction of a video sequence, the fact that the
reconstruction is different from the original is generally not important as long
as the differences do not result in annoying artifacts. Thus, video is generally
compressed using lossy compression.
Once we have developed a data compression scheme, we need to be able to
30210061601189 30210061601189 30210061601189
measure its performance. Because of the number of different areas of
application, different terms have been developed to describe and measure the
performance.

9
1.1.3 Measures of Performance

A compression algorithm can be evaluated in a number of different ways. We


could measure the relative complexity of the algorithm, the memory required
to implement the algorithm, how fast the algorithm performs on a given
machine, the amount of compression, and how closely the reconstruction
resembles the original. In this book we will mainly be concerned with the last
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

two criteria. Let us take each one in turn.


A very logical way of measuring how well a compression algorithm
compresses a given set of data is to look at the ratio of the number of bits
required to represent the data before compression to the number of bits
required to represent the data after compression. This ratio is called the
compression ratio. Suppose storing an image made up of a square array of
256×256 pixels requires 65,536 bytes.
2022/2023 2022/2023
The image is compressed
2022/2023
and the
compressed version requires 16,384 bytes. We would say that the
compression ratio is 4:1. We can also represent the compression ratio by
expressing the reduction in the amount of data required as a
percentage of the size of the original data. In this particular example the
compression ratio calculated in this manner would be 75%.
Another way of reporting compression performance is to provide the average
number of bits required to represent a single sample. This is generally referred
to as the rate. For example, in the case of the compressed image described
30210061601189 30210061601189 30210061601189
above, if we assume 8 bits per byte (or pixel), the average number of bits per
pixel in the compressed representation is 2. Thus, we would say that the rate
is 2 bits per pixel. In lossy compression, the reconstruction differs from the

10
original data. Therefore, in order to determine the efficiency of a compression
algorithm, we have to have some way of quantifying the difference. The
difference between the original and the reconstruction is often called the
distortion Lossy techniques are generally used for the compression of data
that originate as analog signals, such as speech and video. In compression of
speech and video, the final arbiter of quality is human. Because human
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

responses are difficult to model mathematically, many approximate measures


of distortion are used to determine the quality of the reconstructed waveforms.
Other terms that are also used when talking about differences between the
reconstruction and the original are fidelity and quality. When we say that the
fidelity or quality of a reconstruction is high, we mean that the difference
between the reconstruction and the original is small. Whether this difference
is a mathematical difference or a perceptual difference should be evident from
2022/2023 2022/2023 2022/2023
the context.

1.2 Modeling and Coding

While reconstruction requirements may force the decision of whether a


compression scheme is to be lossy or lossless, the exact compression scheme
we use will depend on a number of different factors. Some of the most
important factors are the characteristics of the data that need to be
compressed. A compression technique that will work well for the compression
30210061601189 30210061601189 30210061601189
of text may not work well for compressing images. Each application presents
a different set of challenges.

11
There is a saying attributed to Bobby Knight, the basketball coach at Texas
Tech University: “If the only tool you have is a hammer, you approach every
problem as if it were a nail.” Our intention in this book is to provide you with
a large number of tools that you can use to solve the particular data
compression problem. It should be remembered that data compression, if it is
a science at all, is an experimental science. The approach that works best for a
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

particular application will depend to a large extent on the redundancies


inherent in the data.
The development of data compression algorithms for a variety of data can be
divided into two phases. The first phase is usually referred to as modeling. In
this phase we try to extract information about any redundancy that exists in
the data and describe the redundancy in the form of a model. The second
phase is called coding. A description of the model and a “description” of how
2022/2023 2022/2023 2022/2023
the data differ from the model are encoded, generally using a binary alphabet.
The difference between the data and the model is often referred to as the
residual. In the following three examples we will look at three different ways
that data can be modeled. We will then use the model to obtain compression.

30210061601189 30210061601189 30210061601189

If we were to transmit or store the binary representations of these numbers,


we would need to use 5 bits per sample. However, by exploiting the structure

12
in the data, we can represent the sequence using fewer bits. If we plot these
data as shown in Figure 1.2, we see that the data seem to fall on a straight
line. A model for the data could therefore be a straight line given by the
equation
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

Thus, the structure in the data can be characterized by an equation. To make


use of this structure, let’s examine the difference between the data and the
model. The difference (or residual) is given by the sequence

The residual sequence consists of only three numbers {−1, 0, 1}. If we assign
a code of 00 to −1, a code of 01 to 0, and a code of 10 to 1, we need to use 2
bits to represent each element of
30210061601189
the residual sequence. Therefore,
30210061601189 30210061601189
we can
obtain compression by transmitting or storing the parameters of the model and
the residual sequence. The encoding can be exact if the required compression
is to be lossless, or approximate if the compression can be lossy.

13
The type of structure or redundancy that existed in these data follows a simple
law. Once we recognize this law, we can make use of the structure to predict
the value of each element in the sequence and then encode the residual.
Structure of this type is only one of many types of structure. Consider the
following example.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

The sequence does not seem to follow a simple law as in the previous case.
However, each value is close to the previous value. Suppose we send the first
14
value, then in place of subsequent values we send the difference between it
and the previous value. The sequence of transmitted values would be

Like the previous example, the number of distinct values has been reduced.
Fewer bits are required to represent each number and compression is
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

achieved. The decoder adds each received value to the previous decoded
value to obtain the reconstruction corresponding to the received value.
Techniques that use the past values of a sequence to predict the current value
and then encode the error in prediction, or residual, are called predictive
coding schemes. Assuming both encoder and decoder know the model being
used, we would still have to
send the value of the first element of
2022/2023 the sequence. _
2022/2023 2022/2023

A very different type of redundancy is statistical in nature. Often we will


encounter sources that generate some symbols more often than others. In
these situations, it will be advantageous to assign binary codes of different
lengths to different symbols.
Example 1.2.3:
Suppose we have the following sequence:
a/barayaran/barray/bran/bfar/bfaar/bfaaar/baway
which is typical of all sequences generated by a source. Notice that the
30210061601189 30210061601189 30210061601189
sequence is made up of eight different symbols. In order to represent eight
symbols, we need to use 3 bits per symbol. Suppose instead we used the code
shown in Table 1.1. Notice that we have assigned a codeword with only a

15
single bit to the symbol that occurs most often, and correspondingly longer
codewords to symbols that occur less often. If we substitute the codes for each
symbol, we will use 106 bits to encode the entire sequence. As there are 41
symbols in the sequence, this works out to approximately 2_58 bits per
symbol. This means we have obtained a compression ratio of 1.16:1. We will
study how to use statistical redundancy of this sort .
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

When dealing with text, along with statistical redundancy, we also see
redundancy in the form of words that repeat often. We can take advantage of
this form of redundancy by constructing a list of these words and then
represent them by their position in the list. This type of compression scheme
is called a dictionary compression scheme. We will study these schemes in the
following Chapters.
1.3 What is Compression Technology ?
30210061601189 30210061601189 30210061601189

Compression denotes compact representation of data. In this lecture we


exclusively cover compression of digital data. Examples for the kind of data
you typically want to compress are e.g.
16
• text

• source-code

• arbitrary files
• images
• video
• audio data
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• speech
Obviously, these data are different in terms of data volume, data structure,
intended usage etc. For that reason, it is plausible to develop specific
compression technologies for different data and application types,
respectively. The idea of a general-purpose compression technique for all
thinkable application scenarios has to be entirely abandoned. Instead, we find
a large number of very different techniques with respect to target data types
2022/2023 2022/2023 2022/2023
and target application environments (e.g. data transmission in networks like
streaming, storage and long term archiving applications, interactive
multimedia applications like gaming etc.). Given
this vast number of different techniques, there are different ways how to
classify compression techniques:
with respect to the type of data to be compressed
• with respect to the target application area
• with respect the fundamental building blocks of the algorithms used
Terminology: when talking about
30210061601189 compression, we often
30210061601189 mean “lossy
30210061601189

compression” while “lossless compression”


is often termed as coding. However, not all coding algorithms do actually lead
to lossless compression, e.g. error correction codes. Like in every other field
17
in computer science or engineering, the dominating language in compression
technologies is English of course. There are hardly any comprehensive and
up-to date German books available, and there do NOT exist any German
journals in the field. Codec denotes a complete system capable of encoding
and decoding data which consists of an Encoder and a Decoder, transcoding is
a conversion from one encoded digital representation into another one. A
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

fundamental term
in the area is compression rate (or compression ratio) which denotes the
relation between the size of the original data before compression and the size
of the compressed data. Compression ratio therefore rates the effectivity of a
compression system in terms of data reduction capability. This must not be
confused with other measures of compressed data size like bit per pixel (bpp)
or bit per sample (bps).
2022/2023 2022/2023 2022/2023

1.4 Why do we still need compression?

Compression Technology is employed to efficiently use storage space, to save


on transmission capacity and transmission time, respectively. Basically, its all
about saving resources and money. Despite of the overwhelming advances in
the areas of storage media and transmission networks it is quite a surprise that
still compression technology is required. One important reason is that also the
resolution and amount of digital data has increased (e.g. HD-TV resolution,
30210061601189 30210061601189 30210061601189
ever-increasing sensor sizes in consumer cameras), and that there are still
application areas where resources are limited, e.g. wireless networks. Apart

18
from the aim of simply reducing the amount of data, standards like MPEG-4,
MPEG-7, and MPEG-21 offer additional functionalities.
During the last years three important trends have contributed to the fact that
nowadays compression
technology is as important as it has never been before – this development has
already changed the way we work with multimedia data like text, speech,
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

audio, images, and video which will lead to new products and applications:
• The availability of highly effective methods for compressing various types
of data.
• The availability of fast and cheap hardware components to conduct
compression on single-chip systems, microprocessors, DSPs and VLSI
systems.
• Convergence of computer, communication, consumer electronics,
2022/2023 2022/2023 2022/2023
publishing, and entertainment industries.

1.5 Why is it possible to compress data ?

Compression is enabled by statistical and other properties of most data types,


however, data types exist which cannot be compressed, e.g. various kinds of
noise or encrypted data. Compression-enabling properties are:
• Statistical redundancy: in non-compressed data, all symbols are represented
with the same number of bits independent of their relative frequency (fixed
30210061601189 30210061601189 30210061601189
length representation).

19
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023


• Correlation: adjacent data samples tend to be equal or similar (e.g. think of
images or video data).
There are different types of correlation:
– Spatial correlation
– Spectral correlation
– Temporal correlation
In addition, in many data types there is a significant amount of irrelevancy
since the human brain is not able to process and/or perceive the entire amount
of data. As a consequence, such
30210061601189 data can be omitted without
30210061601189 degrading
30210061601189

perception. Furthermore, some data contain more abstract properties which


are independent of time, location, and resolution and can be described very
efficiently (e.g. fractal properties).
20
1.6 History of compression technologies

• 1st century B.C.: Stenography 19th century: Morse- and Braille alphabets
• 50ies of the 20th century: compression technologies exploiting statistical
redundancy are developed – bit-patterns with varying length are used to
represent individual symbols according to their relative frequency.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• 70ies: dictionary algorithms are developed – symbol sequences are mapped


to shorter indices using dictionaries.
• 70ies: with the ongoing digitization of telephone lines telecommunication
companies got interested in procedures how to get more channels on a single
wire.
• early 80ies: fax transmission over analog telephone lines.
• 80ies: first applications involving2022/2023
2022/2023
digital images appear on2022/2023
the market, the
“digital revolution” starts with compressing audio data
• 90ies: video broadcasting, video on demand, etc.

1.7 Selection criteria for choosing a compression scheme

If it is evident that in an application compression technology is required, it has


to be decided which type of technology should be employed. Even if it is not
evident at first sight, compression may lead to certain surprising benefits and
can offer additional functionalities
30210061601189 due to saved resources.30210061601189
30210061601189 When selecting a
specific system, quite often we are not entirely free to choose due to standards
or de-facto standards – due to the increasing development of open systems
and the eventual declining importance of standards (example: MPEG-4
21
standardization !) these criteria might gain even more importance in the
future. Important selection
criteria are for example:
• data dimensionality: 1-D, 2-D, 3-D, .........
• lossy or lossless compression: dependent of data type, required quality,
compression rate
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• quality: with target quality is required for my target application?


• algorithm complexity, speed, delay: on- or off-line application, real-time
requirements
• hardware or software solution: speed vs. price (video encoder boards are still
costly)
• encoder / decoder symmetry: repeated encoding (e.g. video conferencing)
vs. encoding only once but
2022/2023 2022/2023 2022/2023
decoding often (image databases, DVD, VoD, ....)
• error robustness and error resilience: do we expect storage or transmission
errors
• scalability: is it possible to decode in different qualities / resolutions without
hassle?
• progressive transmission: the more data we transmit, the better the quality
gets. standard required: do we build a closed system which is not intended to
interact with other systems (which can be desired due to market position, e.g.,
medical imaging) or do we want to
30210061601189 exchange data across many
30210061601189 systems.
30210061601189

• multiple encoding / decoding: repeated application of lossy compression,


e.g. in video editing

22
• adaptivity: do we expect data with highly varying properties or can we pre-
adapt to specific properties (fingerprint compression)
• synchronization issues – audio and video data should both be frame-based
preferably
• transcoding: can the data be converted into other datatypes easily (e.g.,
MJPEG −− > MPEG)
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

23
Chapter 2
Introduction to information theory and
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Entropy calculation
2.1 Overview
The treatment of data compression in this book is not very mathematical.
However, we do need some mathematical preliminaries to appreciate the
compression techniques we will discuss. Compression schemes can be
2022/2023 2022/2023 2022/2023
divided into two classes, lossy and lossless. Lossy compression schemes
involve the loss of some information, and data that have been compressed
using a lossy scheme generally cannot be recovered exactly. Lossless schemes
compress the data without loss of information, and the original data can be
recovered exactly from the compressed data.
In this chapter, some of the ideas in information theory that provide the
framework for the development of lossless data compression schemes are
briefly reviewed. We will also look at some ways to model the data that lead
to efficient coding schemes. We have
30210061601189 assumed some knowledge
30210061601189 of probability
30210061601189

concepts.

24
2.2 A Brief Introduction to Information Theory
Although the idea of a quantitative measure of information has been around
for a while, the person who pulled everything together into what is now called
information theory was Claude Elwood Shannon [7], an electrical engineer at
Bell Labs. Shannon defined a quantity called self-information. Suppose we
have an event A, which is a set of outcomes of some random experiment. If
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

P(A) is the probability that the event A will occur, then the self-information
associated with A is given by

Note that we have not specified the base of the log function. We will discuss
this in more detail later in the chapter. The use of the logarithm to obtain a
measure of information was not an arbitrary choice as we shall see later in this
chapter. But first let’s see if the use2022/2023
2022/2023 of a logarithm in this context makes sense
2022/2023

from an intuitive point of view. Recall that log(1) = 0, and −log(x) increases
as x decreases from one to zero. Therefore, if the
probability of an event is low, the amount of self-information associated with
it is high; if the probability of an event is high, the information associated
with it is low. Even if we ignore the mathematical definition of information
and simply use the definition we use in everyday language, this makes some
intuitive sense. The barking of a dog during a burglary is a high-probability
event and, therefore, does not contain
30210061601189 too much information.
30210061601189 However, if the
30210061601189

dog did not bark during a burglary, this is a low-probability event and
contains a lot of information. (Obviously, Sherlock Holmes understood
information theory!)1 Although this equivalence of the mathematical and
25
semantic definitions of information holds true most of the time, it does not
hold all the time. For example, a totally random string of letters will contain
more information (in the mathematical sense) than a well-thought-out treatise
on information theory.
Another property of this mathematical definition of information that makes
intuitive sense is that the information obtained from the occurrence of two
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

independent events is the sum of the information obtained from the


occurrence of the individual events. Suppose A and B are two independent
events. The self-information associated with the occurrence of both event A
and event B is, by Equation (2.1),

2022/2023 2022/2023 2022/2023

The unit of information depends on the base of the log. If we use log base 2,
the unit is bits; if we use log base e, the unit is nets; and if we use log base 10,
the unit is Hartley’s. Note that to calculate the information in bits, we need to
30210061601189 30210061601189 30210061601189
take the logarithm base 2 of the probabilities. Because this probably does not
appear on your calculator, let’s review logarithms briefly. Recall that

26
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

‫‪2022/2023‬‬ ‫‪2022/2023‬‬ ‫‪2022/2023‬‬

‫‪30210061601189‬‬ ‫‪30210061601189‬‬ ‫‪30210061601189‬‬

‫‪27‬‬
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

‫‪2022/2023‬‬ ‫‪2022/2023‬‬ ‫‪2022/2023‬‬

‫‪30210061601189‬‬ ‫‪30210061601189‬‬ ‫‪30210061601189‬‬

‫‪28‬‬
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

‫‪2022/2023‬‬ ‫‪2022/2023‬‬ ‫‪2022/2023‬‬

‫‪30210061601189‬‬ ‫‪30210061601189‬‬ ‫‪30210061601189‬‬

‫‪29‬‬
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2.3 Lossless Compression


2022/2023 2022/2023 2022/2023

In lossless compression (as the name suggests) data are reconstructed


after compression without errors, i.e. no information is lost. Typical
application domains where you do not want to lose information is
compression of text, files, fax. In case of image data, for medical imaging or
the compression of maps in the
context of land registry no information loss can be tolerated. A further reason
to stick to lossless coding schemes instead of lossy ones is their lower
30210061601189 30210061601189 30210061601189
computational demand. For all lossless compression techniques there is a
well-known trade-off: Compression Ratio – Coder Complexity – Coder
Delay.

30
Lossless Compression typically is a process with three stages:

• The model: the data to be compressed is analyzed with respect to its


structure and the relative frequency of the occurring symbols.
• The encoder: produces a compressed bitstream / file using the information
provided by the model.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• The adaptor: uses information’s extracted from the data (usually during
encoding) to adapt the model (more or less) continuously to the data.

The most fundamental idea in lossless compression is to employ codewords


which are shorter (in terms of their binary representation) than their
corresponding symbols in case the symbols do occur frequently.
On the other hand, codewords are longer than the corresponding symbols in
2022/2023 2022/2023 2022/2023
case the latter do not occur frequently (there’s no free lunch!).
Each process which generates information can be thought of a source emitting
a sequence of symbols chosen from a finite alphabet (in case of computer-
based processing this boils down to the binary alphabet 0,1). We denote this
“source of information” by S with the corresponding alphabet {s1, s2, . . . ,
Sn} and their occurrence probabilities {p(s1), p(s2), . . . , p(Sn)}. To make
things simpler (and more realistic as well), we consider these probabilities as
being relative frequencies. We want to determine the average information a
source emits. Following intuition,
30210061601189 the occurrence of a less30210061601189
30210061601189 frequent symbol
should provide more information compared to the occurrence of highly
frequent symbols. We assume symbols to be independent and consider the

31
information provided by such symbols as being the sum of information as
given by the single independent symbols. We define

as being the information delivered by the occurrence of the symbol si related


to its probability p(si). The basis of the logarithm determines the measure of
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

the amount of information, e.g., in case of basis 2 we talk about bits.


By averaging over all symbols emitted by the source we obtain the average
information per symbol, the entropy:

The most significant interpretation of entropy in the context of lossless


compression is that entropy measures
2022/2023 how many bits are required
2022/2023 2022/2023on average

per symbol to represent the source, i.e., to conduct lossless compression.


Entropy therefore gives a lower bound on the number of bits required to
represent the source information. In the next section we will discuss several
techniques approaching closely this theoretical bound.

2.3.1 Influence of the Sampling Rate on Compression

If we compare data originating from an identical analog source where yi is


sampled ten times as densely as30210061601189
30210061601189 xi, we notice that neighboring values of yi
30210061601189

tend to be much more similar as compared to neighboring values in xi. If we


assume that the analog signal exhibits continuity to some degree, the samples
of yi will be more continuous (i.e., more similar) as compared to the samples
32
of xi since these are taken from locations in the signal more dislocated as the
samples of yi The higher degree of similarity in yi has a positive effect on
entropy and therefore on the achievable compression ratio.

2.3.2 Predictive Coding - Differential Coding

This technique is not really a compression scheme but a procedure to


‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

preprocess data to make them more suited for subsequent compression.


Differential coding changes the statistics of the signal – based on the fact that
neighboring samples in digital data are often similar or highly correlated,
differential coding does not compress the sequence of original data points
{x1, x2, . . . , xN} but the sequence of differences yi = xi −xi−1. While xi
usually follow a uniform or normal distribution, yi exhibit significant peaks
around 0 (always assuming that neighboring
2022/2023 2022/2023
samples tend to be similar as it is
2022/2023

the case in audio, image, and video data). To give an example, we consider an
image with 999 pixels and 10 different symbols. In case a) we have p(si) =
1/10, i = 1, . . . , 10, in case b) we have p(s1) = 91/100 and p(si) = 1/100 for i
= 2, . . . , 10.
Case a) corresponds to the “conventional” image with uniformly distributed
symbols, while case b) is the differential image consisting of yi, where s1 is
the zero symbol. In case a) we result in H(S) = 3.32, while in case b) entropy
is only H(S) = 0.72. This small example immediately makes clear that
30210061601189 30210061601189 30210061601189
differential coding is sensible. It is the basis of many compression schemes
employing prediction.

33
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2.3.3 Run length Coding


2022/2023 2022/2023 2022/2023
The basic principle is to replace sequences of successive identical symbols by
three elements: a single symbol, a counter, and an indicator which gives the
interpretation of the other two elements.
As an example we discuss the tape drive of the IBM AS/400 (where obviously
lots of blanks are encountered): strings of consecutive blanks (2-63 byte) are
compressed into a codeword which contains the information “blank” and a
counter. Strings of consecutive identical characters unequal blank (3-63 byte)
are compressed into 2 byte: Control byte (“consecutive char” and counter) and
character byte. Strings of non-consecutive
30210061601189 30210061601189 identical characters are expanded
30210061601189

by a control byte (“non- consecutive identical”).


Example: b Blank, * control byte; bbbbbbABCDEFbb33GHJKbMN333333 is
compressed into **ABCDEF** 33GHJKbMN*3
34
ATTENTION: if the data is not suited for run length encoding, we result in
data expansion (caused by expansion of non-consecutive identical chars). Run
length coding is extremely fast on the other hand and can compress quite well
in case of suited data.

2.3.4 Variable length Coding and Shannon-Fano Coding


‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

We denote the length of the codeword for the symbol si as l(si) (in bits). The
average code length is defined as: L(S) = Pni=1 p(si)l(si). To minimize this
expression (which obviously is the aim of compression) we need to assign
short codewords to symbols with high probability of occurrence. It is
interesting to notice the close relation to entropy: H(S) = Pni=1 p(si)I(si).
Consequently it is evendent that l(si) = I(si) = −log2 p(si) needs to be fulfilled
in order to attain the actual entropy value. However, this is possible only in
case log2 p(si) is a natural number.2022/2023
2022/2023
Usually, this is not the case of course, so
2022/2023

the value is rounded to the next integer (which is a hint to the sub-optimality
of the entire procedure !). Shannon-Fano Coding (see below) exactly provides
the corresponding solution using the idea described before. The term code
efficiency is defined as H(S) L(S) – obviously values close to 1 are desirable !
Example: S = {s1, s2, s3, s4} with p(s1) = 0.6, p(s2) = 0.3, p(s3) = 0.05 and
p(s4) = 0.05. H(S) = 1.4 bits/symbol. When choosing a fixed length encoding
like e.g. 00, 01, 10, 11 as codewords with average code length of 2
bits/symbol. When computing l(si)
30210061601189 we obtain the values 1,2,
30210061601189 and 5 and result
30210061601189

in the following variable length code: 0, 10, 110 und 111. s3 and s4 would
require a 5-bit codeword but 3 bits are sufficient to generate a unique code.
The corresponding average code length is 1.5 bits/symbol.
35
A prefix code (or prefix-free code) is a code system, typically a variable-
length code, with the “prefix property”: there is no valid code word in the
system that is a prefix (start) of any other valid code word in the set. E.g., a
code with code words 9, 59, 55 has the prefix property; a code consisting of 9,
5, 59, 55 does not, because 5 is a prefix of both 59 and 55. With a prefix code,
a receiver can identify each word without requiring a special marker between
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

words.
A way to generate a prefix code is to use a full binary tree to represent the
code. The leaves represent the symbols to be coded while path traversing the
tree from the root to the leave determines the bit-sequence of the resulting
codeword. To actually determine a bit sequence, we need to define how to
encode a branching:
for example, a branching to the left can be encoded by 0 and a branching to
2022/2023 2022/2023 2022/2023
the right by 1. The tree given in the figure leads to the following codewords:
A – 00, B – 01, C – 100, D – 101, E – 1. The tree (and code) generation of the
Shannon-Fano scheme works as follows (in the example, we assume the
following frequency counts: A – 15, B – 7, C – 6, D – 6, E – 5): 1.

30210061601189 30210061601189 30210061601189

36
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

1. For a given list of symbols, develop a corresponding list of probabilities or


frequency counts so that each symbol’s relative frequency of occurrence is
known.
2. 2022/2023
Sort the lists of symbols according to frequency, with the 2022/2023
2022/2023 most frequently
occurring symbols at the left and the least common at the right.
3. Divide the list into two parts, with the total frequency counts of the left half
being as close to the total of the right as possible.
4. The left half of the list is assigned the binary digit 0, and the right half is
assigned the digit 1. This means that the codes for the symbols in the first half
will all start with 0, and the codes in the second half will all start with 1.
5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing
groups and adding bits to the 30210061601189
30210061601189
codes until each symbol has become a
30210061601189

corresponding code leaf on the tree. A procedure often employed is to replace


the alphabet consisting of the Si by an extension using tuples sisj with their
corresponding relative frequencies. Using this strategy, a code length of 1.43

37
bits/symbol can be achieved, however, at a higher cost in terms of computation
and memory requirements.

2.3.5 Huffman Coding


Huffman Coding is a popular technique using the idea of variable length
coding combined with a constructive algorithm how to build the
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

corresponding unique codewords. Suppose we have given a source S with an


alphabets of size n. The two least frequent symbols are combined into a new
virtual symbol which is assigned the sum of occurrence probability of the two
original symbols. Subsequently, the new alphabet of size n−1 is treated the
same way, which goes on until only two symbols are left. If we know the
codewords of the new symbols, the codewords for the two original symbols
are obtained by adding a 0 and 1 bit to the right side of the new symbol. This
procedure is applied recursively, 2022/2023
2022/2023
i.e. starting with the two2022/2023
most frequent
symbols which are assigned codewords 0 and 1, we successively add
corresponding bits to the right until codewords for all original symbols have
been generated. The example has entropy H(S) = 2.15, the generated Huffman
code have average code length of 2.2 bits/symbol, an ad-hoc generated code
like shown before, like e.g. {0, 10, 110, 1110, 1111} has average code length
of 2.25 bits/symbol.

30210061601189 30210061601189 30210061601189

38
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

Modification: in case many symbols have small probabilities of occurrence,


many long codewords is the result which is not efficient. Therefore, all such
symbols are assigned into a dedicated class which is termed “ELSE” in the
Huffman code, and which is encoded separately. This idea is called modified
Huffman code.
Problems:
30210061601189 30210061601189 30210061601189
1. In case a source has a symbol with p(s) close to 1 and many others with
small probability of occurrence we result in an average code length of 1

39
bit/symbol since the smallest length for a codeword is of course 1 bit. The
entropy is of course much smaller (recall the coding of differences).
2. In case of changing statistics of the source one can easily obtain a data
expansion.
3. Since a fixed codebook is of poor quality in many cases, we have a two
stage algorithm: building statistics (the model), generate the code.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

4. Adaptivity is difficult to implement since changes in the statistics affect the


entire tree and not just a local part. We can either store corresponding
Huffman tables (trees) in the data [which is inefficient in terms of
compression] or compute them on the fly from decoded data [which is
inefficient in terms of compression speed].
5. Huffman encoding is used in most older standards like JPEG, MPEG-1 to
MPEG-4, however, in a non-adaptive manner which is possible due to the
2022/2023 2022/2023 2022/2023
nature of the DCT transform.
The variable-length codes assigned to input characters are Prefix Codes,
means the codes (bit sequences) are assigned in such a way that the code
assigned to one character is not the prefix of code assigned to any other
character. This is how Huffman Coding makes sure that there is no ambiguity
when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four
characters a, b, c and d, and their corresponding variable length codes be 00,
01, 0 and 1. This coding leads to30210061601189
30210061601189 ambiguity because code assigned to c is the
30210061601189

prefix of codes assigned to a and b. If the compressed bit stream is 0001, the
de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.

40
See this for applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

occurrences and output is Huffman Tree.


1. Create a leaf node for each unique character and build a min heap of
all leaf nodes (Min Heap is used as a priority queue. The value of
frequency field is used to compare two nodes in min heap. Initially,
the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.

2022/2023
3. Create a new internal node with a frequency equal to
2022/2023
the sum of the
2022/2023

two nodes frequencies. Make the first extracted node as its left child
and the other extracted node as its right child. Add this node to the
min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The
remaining node is the root node and the tree is complete.

Let us understand the algorithm with an example:


character Frequency
30210061601189 30210061601189 30210061601189

a 5

b 9

41
c 12

d 13

e 16

f 45

Step 1. Build a min heap that contains 6 nodes where each node represents
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

root of a tree with single node.


Step 2 Extract two minimum frequency nodes from min heap. Add a new
internal node with frequency 5 + 9 = 14.

2022/2023 2022/2023 2022/2023

Now min heap contains 5 nodes where 4 nodes are roots of trees with single
element each, and one heap node is root of tree with 3 elements

character Frequency

c 12

d 13

Internal Node
30210061601189
14 30210061601189 30210061601189

e 16

f 45
42
Step 3: Extract two minimum frequency nodes from heap. Add a new internal
node with frequency 12 + 13 = 25
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Now min heap contains 4 nodes where 2 nodes are roots of trees with single
element each, and two heap nodes are root of tree with more than one nodes

character Frequency

Internal Node 14

e
2022/2023 16 2022/2023 2022/2023

Internal Node 25

f 45

Step 4: Extract two minimum frequency nodes. Add a new internal node with
frequency 14 + 16 = 30

30210061601189 30210061601189 30210061601189

43
Now min heap contains 3 nodes.

character Frequency

Internal Node 25

Internal Node 30

f 45
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Step 5: Extract two minimum frequency nodes. Add a new internal node with
frequency 25 + 30 = 55

2022/2023 2022/2023 2022/2023

Now min heap contains 2 nodes.

character Frequency

f 45

Internal Node 55

Step 6: Extract two minimum frequency nodes. Add a new internal node with
frequency 45 + 55 = 100
30210061601189 30210061601189 30210061601189

44
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Now min heap contains only one node.

character Frequency

Internal Node 100

Since the heap contains only one node, the algorithm stops here.

Steps to print codes from Huffman Tree:


Traverse the tree formed starting from the root. Maintain an auxiliary array.
2022/2023 2022/2023 2022/2023
While moving to the left child, write 0 to the array. While moving to the right
child, write 1 to the array. Print the array when a leaf node is encountered.

30210061601189 30210061601189 30210061601189

The codes are as follows:

character code-word

45
f 0

c 100

d 101

a 1100

b 1101
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

e 111

2.3.6 Arithmetic Coding


In Huffman coding we have a correspondence between a single symbol and
its codeword – which is the main reason for its suboptimality. Arithmetic
coding uses a single codeword for an entire sequence of source symbols of
length m. In this manner the restriction to integer-valued bits per symbol
values can be avoided in an elegant2022/2023
2022/2023
way. A drawback however is that similar
2022/2023

sequences of source symbols can lead to entirely different codewords.

In data compression, lossy algorithms compress data while losing some


details. They reduce the number of bits used to represent the message, even if
that reduces the quality of reconstructed data.
Lossless algorithms reconstruct original data without any loss. Because of
this, they use a higher number of bits compared to lossy algorithms.
Arithmetic encoding (AE) is a lossless
30210061601189 algorithm that uses30210061601189
30210061601189 a low number of
bits to compress data.

46
It’s an entropy-based algorithm, first proposed in a paper from 1987 (Witten,
Ian H., Radford M. Neal, and John G. Cleary. “Arithmetic coding for data
compression.” Communications of the ACM 30.6 (1987): 520-540).
One reason why AE algorithms reduce the number of bits is that AE encodes
the entire message using a single number between 0.0 and 1.0. Each symbol
in the message takes a sub-interval in the 0-1 interval, corresponding to its
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

probability.
To calculate the probability of each symbol, a frequency table should be given
as an input to the algorithm. This table maps each character to its frequency.
The more frequent the symbol, the lower number of bits it is assigned. As a
result, the total number of bits representing the entire message is reduced.
This is compared to using a fixed number of bits for all symbols in the
message as in Huffman coding.
2022/2023 2022/2023 2022/2023
Frequency table
To encode (= compress) a message using AE, you need these inputs:
The message you want to encode.
The frequency table of all possible symbols in the message.
The frequency table is usually calculated by analyzing different messages and
concluding by the number of times each symbol is repeated.
For example: assume that the messages to be encoded consist of just 3
symbols a, b, and c. By analyzing some previous messages (e.g. babb, cbab,
and bb), we get the frequencies of30210061601189
30210061601189 the 3 symbols: 30210061601189

a=2
b=7
c=1
47
Based on the frequency table, how do you encode a message like abc using
AE? Before you start encoding, you need to calculate a probability table out
of the frequency table.
Probability table
Using the frequency table, we can calculate the probability of occurrence of
each symbol. The probability is calculated as follows:
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Frequency of a symbol/Sum of frequencies of all symbols


Based on the frequency table, here are the probabilities of our 3 symbols:
p(a)=2/10=0.2
p(b)=7/10=0.7
p(c)=1/10=0.1
Given the message and the probability table, AE can start the encoding
process.
2022/2023 2022/2023 2022/2023
Encoding
Encoding in AE works by representing the cumulative probabilities of all
symbols on a line that ranges from 0.0 to 1.0. On that line, each symbol uses a
sub-range.
Given any symbol C, it starts from the value Sand ends at the value calculated
using the following equation:
S+(P(C))*R
Where:
S: The cumulative sum of all previous
30210061601189 probabilities.
30210061601189 30210061601189

P(C): The probability of symbol C.


R: The range of line which is calculated by subtracting the start from the end
of the line.
48
In the beginning, the line starts from 0.0 and ends at 1.0, thus R=1.0.
Let’s calculate the start and end values of the 3 symbols a, b, and c.
For the first symbol, a, it starts from 0.0 to 0.0+(0.2)*1.0=0.2. The start value
is 0.0 because it’s the first symbol. The range is (0.0:0.2).
For the second symbol, b, it starts from 0.2 to 0.2+(0.7)*1.0=0.9. The range is
(0.2:0.9).
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

The third symbol, c, starts from 0.9 to 0.9+(0.1)*1.0=1.0. The range is


(0.9:1.0).
Given the ranges of all symbols, it’s good to represent them graphically, as in
this figure:

Here are some notes:


2022/2023 2022/2023 2022/2023

The first symbol starts from the same start point of the line (0.0) and the last
symbol ends at the endpoint of the line (1.0).
A symbol C covers a percentage of the range that corresponds to its
probability. For example, the symbol b covers 70% of the line because its
probability is 0.7.
Restricting the interval
AE works by restricting the line interval, which starts from 0.0 to 1.0, through
some stages equal to the number of symbols in the message. In this example,
30210061601189 30210061601189 30210061601189
there are just 3 symbols in the message, so there are just 3 stages.
In each stage, the line’s interval is restricted according to the sub-interval of
the current symbol.

49
After all symbols are processed, AE returns a single double value encoding
the entire message.
Now, it’s time to look at the message to be encoded, abc, and work on the first
symbol a to restrict the line interval.Based on the previous figure, the
symbol a covers the interval from 0.0 to 0.2. This interval becomes the line’s
interval. In other words, the line interval changes from 0.0:1.0 to 0.0:0.2 as in
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

the next figure. Note that symbol a starts from 0.0 and the last symbol, c, ends
at 0.2.

To calculate the interval of each symbol, we’ll use the already mentioned
formula:
S+(P(C))*R
2022/2023 2022/2023 2022/2023
Because the line’s range has changed, R changes to 0.2-0.0=0.2. Let’s
calculate the start and end values for each symbol:
For the first symbol, a, it starts from 0.0 to 0.0+(0.2)*0.2=0.04. The start
value is 0.0 because it’s the first symbol. The range is (0.0:0.04).
For the second symbol b, it starts from 0.04 to 0.04+(0.7)*0.2=0.18. The
range is (0.04:0.18).
The third symbol c starts from 0.18 to 0.18+(0.1)*0.2=0.2. The range is
(0.18:0.2).
30210061601189 30210061601189 30210061601189
At the end of the first stage, here is the figure that shows the start and end
values of each symbol’s interval.

50
Note that each symbol takes a portion of the line equal to its probability. For
example, the probability of the symbol b is 0.7 and thus it takes 70% of the
line’s interval.
The next figure summarizes the progress of AE at this point. Let’s start the
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

second stage, which just repeats what we did in the first stage.

2022/2023 2022/2023 2022/2023


The next symbol in the message is b. According to the latest line interval (0.0
to 0.2), the symbol b starts from 0.04 to 0.18. In this stage, the interval of the
line will be further restricted to b’s interval, as shown in this figure:

Because the line’s interval changed to (0.04:0.18), R also changes. Its new
value is 0.18-0.04=0.14. Let’s calculate the start and end values for each
symbol:
30210061601189 30210061601189 30210061601189
For the first symbol, a, it starts from 0.04 to 0.04+(0.2)*0.14=0.068. The
range is (0.04:0.068).
For the second symbol, b, it starts from 0.068 to 0.068+(0.7)*0.14=0.166. The
range is (0.068:0.166).
51
The third symbol, c, starts from 0.166 to 0.166+(0.1)*0.14=0.18. The range is
(0.166:0.18).
The next figure summarizes the intervals of the 3 symbols. This marks the end
of the second stage.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

To summarize the progress made up to this point, the next figure connects the
2 completed stages. Let’s move to the third stage.

2022/2023 2022/2023 2022/2023

The third and last symbol in the message is c. According to the latest stage,
this symbol falls within the range starting from 0.166 and ending at 0.18. c’s
current interval will be the next interval of the line according to the next
figure. R for this new line is 0.18-0.166=0.014.
30210061601189 30210061601189 30210061601189

52
On that line, similarly to previous 2 stages, here are the intervals of the 3
symbols:
symbol a starts from 0.166 to 0.166+(0.2)*0.014=0.1688. The range is (0.166
:0.1688).
symbol b starts from 0.1688 to 0.1688+(0.7)*0.014=0.1786. The range is
(0.1688 :0.1786).
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

symbol c starts from 0.1786 to 0.1786+(0.1)*0.014=0.18. The range is


(0.1786:0.18).
The intervals are reflected in the next figure.

There are no more symbols in the message. The next figure summarizes the
calculations from the initial interval (0.0:1.0) to the last interval (0.166:0.18).
2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

53
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023


Given that AE completed the encoding process, the next subsection calculates
the single number that encodes the entire message.
Single value encoding the message
The latest interval reached by AE starts from 0.166 to 0.18. Within this range,
any value can be used to encode the entire message. For example, the value
could be 0.17.
Through this tutorial, the value is the average of the interval which is:
(0.166+0.18)/2=0.173
30210061601189 30210061601189 30210061601189
So, based on the frequency table used, the message abc is encoded as the
value 0.173. When the frequency table changes, then the value that encodes
the message changes too.

54
This marks the end of the encoding process. Now let’s discuss the decoding
process.
Decoding
The inputs for decoding are:
The single value that encodes the message.
The frequency table. It should be identical to the one used for encoding.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

The number of symbols in the original message.


In our example, the value that encodes the message is 0.173. The frequency
table is [a=2, b=7, c=1]. The number of symbols in the message is 3.
The first step of decoding is to calculate the probabilities of the symbols out
of their frequency, similarly to what we did before.
The probability table is [p(a)=0.2, p(b)=0.7, p(c)=0.1]. Based on the
probability table, the decoding process works similarly to encoding by
2022/2023 2022/2023 2022/2023
constructing the same intervals.
First, a line starting from 0.0 and ending at 1.0 is created according to the next
figure. The intervals of the 3 symbols are calculated using the same equation
used for encoding the message.
S+(P(C))*R

The value 0.173 falls within the first interval (0.0:0.2). Thus, the first symbol
30210061601189 30210061601189 30210061601189
in the encoded message is a. As a result, the line’s interval is restricted to the
interval that starts from 0.0 to 0.2.

55
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

The value 0.173 falls within the second interval (0.04:0.18). Thus, the second
symbol in the encoded message is b. In the next figure, the interval will be
restricted to the interval 0.04 to 0.18.

2022/2023 2022/2023 2022/2023

The value 0.173 falls within the third interval (0.166:0.18). This is why the
next symbol in the message is c.
Remember that the number of symbols in the original message is 3. After
30210061601189 30210061601189 30210061601189
decoding 3 symbols, the decoding process is completed. The decoded
message is abc.
Let’s look at another example.

56
Example #2
In this example, I won’t detail the steps like before.
There are 4 symbols available to build messages – a, b, c, and d. The
frequencies of these symbols are:
a=2
b=3
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

c=1
d=4
Based on the frequency table, here are the probabilities of the symbols:
p(a)=2/10=0.2
p(b)=3/10=0.3
p(c)=1/10=0.1
p(d)=4/10=0.4
2022/2023 2022/2023 2022/2023
The next figure shows the cumulative probabilities of the symbols where the
interval of each symbol is equal to its probability.

Now, it’s time for the message to be encoded, which is bdab.


In the first stage, the first symbol in this message, b, is processed. The interval
will be restricted to b‘s interval that starts from 0.2 to 0.5.

30210061601189 30210061601189 30210061601189

57
In the second stage, the next symbol in the message is d. As in the next figure,
the line’s interval will be restricted to the interval of d which starts from 0.38
to 0.5.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Because the next symbol in the message is a, the interval of the third stage is
restricted to start from 0.38 and end at 0.404.

2022/2023 2022/2023 2022/2023

In the fourth and last stage, the next


30210061601189 symbol is b, so the interval
30210061601189 is restricted to
30210061601189

be from 0.3848 to 0.392.

58
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

At the end of the last stage, the interval is from 0.3848 to 0.392. To encode
the entire message, any value within2022/2023
2022/2023 the interval would be used.
2022/2023

The average value is used in this example, which is 0.3884. This marks the
end of the encoding process.
(0.3848+0.392)/2=0.3884
The decoding process starts given the following 3 inputs:

1. The value that encodes the message, 0.3884.


2. The frequency table.
3. The length of the message, 4.
30210061601189 30210061601189 30210061601189
The decoder starts by preparing the probability table similar to what the
encoder did. Note that the decoder constructs the same graph in the
previous figure.

59
Here is how the decoding process works:

1. By checking the initial interval that starts from 0.0 to 1.0, the
value 0.3884 falls within the interval of symbol b. Thus, the first
symbol in the message is b.
2. In the next interval that starts from 0.2 to 0.5, the value 0.3884 falls
within d‘s interval that starts from 0.38 to 0.5. Thus, the second
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

symbol is d.
3. The next interval is restricted to d‘s interval where the
value 0.3884 falls within the interval of symbol a. Thus, the third
symbol is a.
4. In the next interval, the value 0.3884 falls within the interval of
symbol b and thus the fourth and last symbol in the message is b.

After passing through some stages


2022/2023 equal to the length of the
2022/2023 message,
2022/2023

which is 4, the decoded message is bdab.

Example: Encoding the sequence wuvw leads to the interval [0.5822, 0.5850]
the length of which is 0.0028 = p(w)p(u)p(v)p(w). In binary representation the
interval borders are 0.10010101110002 and 0.10010101000012,

30210061601189 30210061601189 30210061601189

respectively. The shortest binary number in this interval is 0.1001010112


which is the codeword for this sequence.

60
When computing the entropy of the sequence, we get −log20.0028 = 8.48bit
which is close to 9.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Advantages:
1. Adaptivity can be easily realized2022/2023
2022/2023 – encoder and decoder need to carry the
2022/2023

statistical information in identical (i.e. synchronized) manner and can adapt


the interval boundaries correspondingly. Either for each symbol, or for a
fixed number of symbols, or when a certain extent of statistical change
occurs. In any case there is no need to store the statistical information.
2. Already if only a small amount of encoded data is available (i.e. the MSB of
the position) we can start decoding, there is no need to wait for the receipt
of the entire data to start decoding. With the first symbols we can determine
in which of the two initial intervals the final interval will be located, upon
30210061601189 30210061601189 30210061601189
receipt of more information also more symbols can be decoded.

61
3. The restriction to integer bits/symbol values is overcome. Arithmetic coding
is used in JBIG-1 as the only coding engine and in the more recent lossy
standards like JPEG2000 and H.264 as lossless compression stage.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

2.3.7 Dictionary Compression


Basic idea: the redundancy of repeated phrases in a text is exploited (recall: in
Run length encoding, repeated identical symbols can be efficiently coded).
Here we encode repeatedly occurring symbol strings in arbitrary files in an
efficient manner. A frontend selects strings and information for compression
and decompression is stored in a dictionary. The encoder and the decoder
need to have access to a copy of
30210061601189 this dictionary of course.
30210061601189 In this manner,
30210061601189

entire strings are replaced by tokens (i.e. codewords). The basic algorithmic
elements are lookup tables, and the corresponding operations are very fast.

62
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Variations of LZW Compression


Method 2022/2023
Author Year Description
2022/2023 2022/2023

LZ77 Ziv and Lempel 1977 Send pairs with pointer and character.
Pointer is fix-size and indicates a substring
in the previous N characters.
LZR Rodeh et al. 1981 Send pairs with pointer and character.
Pointer is variable-size and indicates a
substring anywhere in the previous
characters.
LZSS Bell
30210061601189 1986 A flag bit distinguishes
30210061601189 sending of pointer or
30210061601189

character. Pointer is fix-size and indicates a


substring in the previous N characters.

63
LZB Bell 1987 Same as LZSS, except that pointer is
variable-size
LZH Bell 1987 Same as LZSS, except that Huffman coding
is used for pointers on a second pass.
LZ78 Ziv and Lempel 1978 Send pairs of pointer and character. Pointer
indicates a previously parsed substring
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

which is stored in a dictionary


LZW Welch 1984 Include all alphabets in dictionary initially.
Therefore only outputs fix-size pointers.
LZC Thoman et al. 1985 As suggested by Welch, a variable-size
pointer scheme is implemented.
LZT Tischer 1987 Same as LZW, except that the parsed
2022/2023 strings in dictionary 2022/2023
2022/2023 are stored as a Least
Recently Used list.
LZJ Jakobsson 1985 Same as LZW, except that pointers can
reach anywhere in the previous characters.
LZFG Fiala and Greece 1989 By breaking up strings in the sliding window,
pointers are formed from a tree data
structure.

Table 1: Summary of principal30210061601189


30210061601189
LZ variations. (N is the30210061601189
size of window
used in dictionary)

LZ77 Dictionary Compression

64
The name originates from Lempel and Ziv (1977), the algorithm described
here is a variant, LZSS (Storer/Szymanski) as of 1982. A typical feature is the
so-called “sliding window” consisting of two parts:
the history buffer (the dictionary) contains a block of data recently encoded
(some 1000 symbols) – and the look-ahead buffer which contains strings to be
encoded (10-20 symbols).
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

In important issue is the coice of the buffer sizes:


• History buffer small − > match is unlikely.
• History buffer large − > high computational requirements and poor
compression rate caused by the large address space for the pointers into the
buffer.
• Look-ahead buffer small − > no match for long strings is possible (waste of
potential redundancy).
2022/2023 2022/2023 2022/2023
• Look-ahead buffer large − > time is wasted for the unlikely case of finding
matches for long strings.

Example:
Text out < − ——DATA COMPRESSION—— COMPRESSES DATA——
< − Text in.
“DATA COMPRESSION” is the content of the history buffer,
“COMPRESSES DATA” is the look-ahead buffer. The actual coding used
subsequently is the QIC-122 standard
30210061601189 for tape drives using 30210061601189
30210061601189 a history buffer of
2048 bytes.
To start with, we encode each symbol in “DATA COMPRESSION” as
0<ASCII Char. Wert>. “COMPRESS” is a string of length 9 which has been
65
found in the history buffer 12 symbols backwards: 1<offset=12><length=9>.
Subsequently we encode “E” and “S” again as 0<ASCII Char. Wert> and
“DATA” as string of length 4
which has been found in the history buffer 28 positions backwards:
1<offset=28><length=4>.

We notice that
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

• the algorithm requires a start-up phase where the data in the dictionary has
to be encoded explicitly, which means that it is hardly suited for
compressing lots of small files.
• The algorithm is asymmetric – searching the history buffer in an efficient
manner is crucial. Intelligent data structures are required.
• Since data leaves the history buffer after some time, long term redundancy
cannot be exploited.
2022/2023 2022/2023 2022/2023

The Algorithm

To encode the sequence in the look-ahead buffer, the encoder moves a search
pointer back through the search buffer until it encounters a match to the first
symbol in the look-ahead buffer. The distance of the pointer from the look-
ahead buffer is called the offset. The encoder then examines the symbols
following the symbol at the pointer location to see if they match consecutive
symbols in the look-ahead buffer.30210061601189
30210061601189
The number of consecutive symbols in the
30210061601189

search buffer that match consecutive symbols in the look-ahead buffer,


starting with the first symbol, is called the length of the match. The encoder
searches the search buffer for the longest match. Once the longest match has
66
been found, the encoder encodes it with a triple <o, l, c>, where o is the
offset, l is the length of the match, and c is the codeword corresponding to the
symbol in the look-ahead buffer that follows the match. For example, in the
diagram above, the longest match is the first ‘a’ of the search buffer. The
offset o in this case is 7, the length of the match l is 4, and the symbol in the
look-ahead buffer following the match is ‘r’. The reason for sending the third
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

element in the triple is to take care of the situation where no match for the
symbol in the look-ahead buffer can be found in the search buffer. In this
case, the offset and match-length values are set to 0, and the third element of
the triple is the code for the symbol itself.

The algorithm looks like the following:

while( lookAheadBuffer not empty )


2022/2023 2022/2023 2022/2023
{
get a pointer (position, match) to the longest
match
in the window for the lookAheadBuffer;

output a (position, length, char()) triple;


shift the window length+1 characters along;
}
30210061601189 30210061601189 30210061601189

For the decoding process, it is basically a table loop-up procedure and can be
done by reversing the encoding procedures. We employ the same buffer
sized n characters, and then use its first (N-n) spaces to hold the previously

67
decoded characters, where N is the size of the window (sum of the size of the
look-ahead buffer and the search buffer) used in the encoding process. If we
break up each triple that we encounter back into its components:- position
offset o, match length l, and the last symbol of the incoming stream c, we
extract the match string from buffer according to o, and thus obtain the
original content. What has been shifted out of the buffer is sent to the output.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

The process continues until all code words have been decoded, and the
decoding is then complete.

________________________________________________________
___

LZ78 Algorithm
2022/2023 2022/2023 2022/2023

The Concept

To avoid the problems that occurred with LZ77, Ziv and Lempel developed a
different form of dictionary-based compression. LZ78 abandons the concept
of a text window. In LZ77, the dictionary of phrases was defined by a fixed-
length window of previously seen text. Under LZ78, the dictionary is a
potentially unlimited collection of previously seen phrases. LZ78-based
schemes work by entering phrases
30210061601189 into a ‘dictionary’ and then,
30210061601189 when a repeat
30210061601189

occurrence of that particular phrase is found, outputting a token that consists


of the dictionary index instead of the phrase, as well as a single character that

68
follows that phrase. Unlike LZ77, there is no need to pass the phrase length as
a parameter because decoder already has this information.

Several compression algorithms based on this principle, differing mainly in


the manner in which they manage the dictionary. The most well-known
scheme (in fact the most well-known of all the Lempel-Ziv compressors is
Terry Welch's LZW scheme, developed in 1984.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Just as files have certain symbols that occur with more regularity than others,
many files (especially, but not restricted to, text files) also have certain string
that are repeated frequently. For an example, take the string " the " (including
the spaces). Including the instances of the spaces, the string takes 5¥ 8 = 40
bits to encode. However, if we were to append this entire string to the list of
characters, at position 256, then at every subsequent occurrence of " the " we
2022/2023 2022/2023 2022/2023
could send the code 256 instead of the index sequence 32, 116, 104, 101, 32.
Since 256 cannot be represent by the standard 8-bit byte, this would require 9
bits – nevertheless, it is still a great improvement on the standard 40-bit
ASCII encoding.

Unlike LZ77, LZ78 does not have a ready-made window full of text (the
search windows) to use as a dictionary. On the contrary, it has to create a
dictionary ‘on the fly’: it creates a new phrase each time a token is output, and
it adds that phrase to the dictionary.
30210061601189 After the phrase is 30210061601189
30210061601189 appended, it will
available to the encoder at any time in the future – not just for the next few
thousand characters as with LZ77.

69
The Algorithm

This is precisely the idea behind the LZW compression algorithm. It starts
with a 4K "dictionary" of all the single character with indexes from 0 to 255
(which the standard ASCII set). It then starts to expand the dictionary as it
processes the text, using the entries 256 to 4095 to refer to substrings. As a
new string is parsed, a new code is generated. Strings for parsing are formed
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

by appending the current character K to an existing string w. The algorithm


for LZW compression is shown below:

set w = NIL
loop
read a character K
if wK exists in the dictionary
2022/2023 2022/2023 2022/2023
w = wK
else
output the code for w
add wK to the string table
w = K
endloop

As the dictionary grows, redundant strings will be coded as a single 2-byte


number, resulting in a compressed30210061601189
30210061601189
file. 30210061601189

What is Lempel–Ziv–Welch (LZW) Algorithm ?

70
The LZW algorithm is a very common compression technique. This algorithm
is typically used in GIF and optionally in PDF and TIFF. Unix’s ‘compress’
command, among other uses. It is lossless, meaning no data is lost when
compressing. The algorithm is simple to implement and has the potential for
very high throughput in hardware implementations. It is the algorithm of the
widely used Unix file compression utility compress and is used in the GIF
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

image format.
The Idea relies on reoccurring patterns to save data space. LZW is the
foremost technique for general-purpose data compression due to its simplicity
and versatility. It is the basis of many PC utilities that claim to “double the
capacity of your hard drive”.

How does it work?


2022/2023 2022/2023 2022/2023
LZW compression works by reading a sequence of symbols, grouping the
symbols into strings, and converting the strings into codes. Because the codes
take up less space than the strings they replace, we get compression.
Characteristic features of LZW includes,

 LZW compression uses a code table, with 4096 as a common


choice for the number of table entries. Codes 0-255 in the code
table are always assigned to represent single bytes from the input

30210061601189
file. 30210061601189 30210061601189

 When encoding begins the code table contains only the first 256
entries, with the remainder of the table being blanks. Compression

71
is achieved by using codes 256 through 4095 to represent
sequences of bytes.
 As the encoding continues, LZW identifies repeated sequences in
the data and adds them to the code table.
 Decoding is achieved by taking each code from the compressed file
and translating it through the code table to find what character or
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

characters it represents.
Example: ASCII code. Typically, every character is stored with 8 binary
bits, allowing up to 256 unique symbols for the data. This algorithm tries to
extend the library to 9 to 12 bits per character. The new unique symbols are
made up of combinations of symbols that occurred previously in the string.
It does not always compress well, especially with short, diverse strings. But
is good for compressing redundant data, and does not have to save the new
2022/2023 2022/2023 2022/2023
dictionary with the data: this method can both compress and uncompress
data.

Implementation
The idea of the compression algorithm is the following: as the input data is
being processed, a dictionary keeps a correspondence between the longest
encountered words and a list of code values. The words are replaced by their
corresponding codes and so the input file is compressed. Therefore, the
efficiency of the algorithm increases
30210061601189 as the number of long,
30210061601189 repetitive words
30210061601189

in the input data increases.

72
LZW ENCODING
* PSEUDOCODE

1 Initialize table with single character strings


‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2 P = first input character

3 WHILE not end of input stream

4 C = next input character

5 IF P + C is in the string table

6 P = P + C

7 ELSE
2022/2023 2022/2023 2022/2023

8 output the code for P

9 add P + C to the string table

10 P = C

11 END WHILE

12 output code for P

30210061601189 30210061601189 30210061601189

73
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Compression using LZW


Example 1: Use the LZW algorithm to compress the string: BABAABAAA
The steps involved are systematically shown in the diagram below.

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

74
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

LZW Decompression
The LZW decompressor creates the same string table during
decompression. It starts with the first 256 table entries initialized to single
characters. The string table is updated for each character in the input
stream, except the first one. Decoding is achieved by reading codes and
translating them through the code table being built.

LZW Decompression Algorithm


30210061601189 30210061601189 30210061601189
PSEUDOCODE

1 Initialize table with single character strings

2 OLD = first input code


75
3 output translation of OLD

4 WHILE not end of input stream

5 NEW = next input code

6 IF NEW is not in the string table

7 S = translation of OLD
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

8 S = S + C

9 ELSE

10 S = translation of NEW

11 output S

12 C = first character of S

13 OLD + C to the string table


2022/2023 2022/2023 2022/2023
14 OLD = NEW

15 END WHILE

Example 2: LZW Decompression: Use LZW to decompress the output


sequence of : <66><65><256><257><65><260>
The steps involved are systematically shown in the diagram below.

30210061601189 30210061601189 30210061601189

76
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

In this example, 72 bits are represented with 72 bits of data. After a


reasonable string table is built, compression improves dramatically.
LZW Summary: This algorithm compresses repetitive sequences of data
very well. Since the codewords are 12 bits, any single encoded character
will expand the data size rather than reduce it.

30210061601189 30210061601189 30210061601189


The algorithm described here the LZW(elch) Coding variant as of 1984. This
algorithm solves the problems arising with the explicit definition of the two
buffers. Stings in the dictionary are not automatically erased and the
lookahead buffer does not have a fixed size. The algorithm does not employ
77
pointers to earlier complete strings, but a procedure called “Phrasing”. A text
is divided into phrases, where a phrase is the longest string found so far plus
an additional character.
All symbols are replaced by tokens (i.e., codewords) which point to entries in
the dictionary. The first 256 dictionary entries are the common 8-bit ASCII
code symbols. The dictionary is successively extended by new phrases
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

generated from the text (i.e. data) to be coded.

Critical issues:
• How to efficiently encode a variable number of strings with variable length?
• How long can we grow the dictionary? What are the strategies when the
dictionary gets too large (start from scratch, keep the most common entries)?

2022/2023 2022/2023 2022/2023


2.3.8 Which lossless algorithm is most suited for a given application?
Testing with exemplary data is important apart from the facts we know
theoretically with respect to speed, asymmetry, compression capability etc. !

30210061601189 30210061601189 30210061601189

78
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023


LZ77 vs. LZSS

In their original LZ77 algorithm, Lempel and Ziv proposed that all strings be
encoded as a length and offset, even strings with no match. Storer and
Szymanski observed that individual unmatched symbols or matched strings of
one or two symbols take up more space to encode than they do to leave uncoded.

For example, encoding a string from a dictionary of 4096 symbols, and allowing
for matches of up to 15 symbols, will require 16 bits to store an offset and a
length. A single character is only 830210061601189
30210061601189
bits. Using this method, encoding a single
30210061601189

character doubles the required storage.

79
Storer and Szymanski proposed preceding each symbol a with coded/uncoded
flag. Using the above example it now takes 9 bits for each uncoded symbol and
17 bits for each encoded string.

Storer and Szymanski also observed that if you're not encoding strings of length
0 to M, then M may be used as an offset to the length of the match. Applying
their observation, instead of using 4 bits to represent match lengths of 0 to 15,
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

the same 4 bits may be used to represent match lengths of M to (15 + M).

Putting it All Together

Based on the discussion above, encoding input requires the following steps:
Step 1. Initialize the dictionary to a known value.
Step 2. Read an uncoded string that is the length of the maximum allowable
match.
2022/2023 2022/2023 2022/2023

Step 3. Search for the longest matching string in the dictionary.


Step 4. If a match is found greater than or equal to the minimum allowable
match length:

A. Write the encoded flag, then the offset and length to the encoded output.
B. Otherwise, write the uncoded flag and the first uncoded symbol to the
encoded output.

Step 5. Shift a copy of the symbols written to the encoded output from the
30210061601189 30210061601189 30210061601189
unencoded string to the dictionary.
Step 6. Read a number of symbols from the uncoded input equal to the number
of symbols written in Step 4.
Step 7. Repeat from Step 3, until all the entire input has been encoded.
80
Decoding Strings

The LZSS decoding process is less resource intensive than the


LZSS encoding process. The encoding process requires that a the dictionary is
searched for matches to the string to be encoding. Decoding an offset and length
combination only requires going to a dictionary offset and copying the specified
number of symbols. No searching is required.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

Decoding input requires the following steps:


Step 1. Initialize the dictionary to a known value.
Step 2. Read the encoded/not encoded flag.
Step 3. If the flag indicates an encoded string:

A. Read the encoded length and offset, then copy the specified number of

2022/2023
symbols from the dictionary to2022/2023
the decoded output. 2022/2023

B. Otherwise, read the next character and write it to the decoded output.

Step 4. Shift a copy of the symbols written to the decoded output into the
dictionary.
Step 5. Repeat from Step 2, until all the entire input has been decoded.

Implementation Issues
30210061601189 30210061601189 30210061601189

When I set out to implement LZSS, I had the following goals in mind:

 The code should be easy to follow, easy to debug, and easy to modify.

81
 Encoded strings should not be longer than two bytes (16 bits). 16 bit
values are easy to deal with and keep encoded strings small.
 Avoid reading/writing individual bits.

My version 0.2 and later code uses a bitfile library, so reading and writing
individual bits isn't a big deal. All of my versions use codes that are integral
bytes, and each of my newer versions are derived from my earlier versions so
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

there is a lot of common design.

What follows is a discussion about how these goals were accomplished.

The Dictionary

The dictionary size directly effects:


2022/2023 The time to search the entire dictionary
2022/2023 2022/2023

 The size of the encoded offset into the dictionary


 The likelihood of a string matching one that already exists in the
dictionary

I chose to implement my dictionary as a 4096 character cyclic buffer (sliding


window). I chose 4096 characters, because others have achieved good results
with dictionaries of this size, and I can encode offsets of 0 to 4095 in 12 bits. If I
encode the offset and length of a string in a 16 bit word, that leaves me with 4
bits for the length.
30210061601189 30210061601189 30210061601189

Minimum and Maximum Encoded Length

82
Keeping the goal of a 16 bit encoded string in mind, and the above requirement
for a 12 bit offset, I'm left with 4 bits to encode the string length. With 4 bits, I
can encode lengths of 0 through 15. However, as Storer and Szymanski
observed, it only takes 1 byte to write out characters that match dictionary strings
0 or 1 character long, and 2 bytes to write out characters that match dictionary
strings 2 characters long. Savings of one or more bytes per string doesn't occur
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

unless the string is 3 characters or longer.

By not encoding strings that offer less than one byte of savings, the minimum
encoded length is three characters. Since the minimum encoded length is three, I
am able to add three to the 4 bit value stored in the length field, resulting in
encoded string lengths of 3 to 18. So 18 is the maximum string length encoded
by this implementation.

Encode/Not Encoded Flag


2022/2023 2022/2023 2022/2023

I've been calling my implementation a modified LZSS implementation. It's


modified because of the way my version 0.1 code handled the encoded/not
encoded flag. If I followed the traditional LZSS approach for handling the
encoded/not encoded flag, writing an unencoded character would require a 9 bit
write, 1 for the flag and 8 for the character. Similarly writing an encoded string
would require 17 bits, 1 bit for the flag and 16 bits for the code.

While I was studying the algorithm,


30210061601189
I came across some implementations
30210061601189 30210061601189
that
stored encoded flags in groups of eight followed by the characters or encoded
strings that they represent. I don't know who to credit with the discovery of this

83
technique, but it allowed my version 0.1 implementation to use standard one byte
reads and writes instead of one bit reads and writes.

This technique also makes the EOF clear. By processing only bytes, there are no
spare bits, os the EOF of the encoded dats is actually the EOF of the encoded
file. Don't worry about it if I lost you on the EOF discussion, just relax, because
there's no need to handle it specially.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

After writing my version 0.1 implementation, I wrote a bitfile library that


handles single and multiple bit writes. My version 0.2 and later code uses the
bitfile library and handles the encoded/not encoded flag according to the
traditional LZSS algorithm.

Finding Matches in the Dictionary

2022/2023 2022/2023 2022/2023


As stated above, one of my goals was to provide an LZSS implementation that is
easy to read and easy to debug. So, to keep things simple, my first
implementation used a brute force sequential search to match the string to be
encoded to strings in the dictionary. Through experimentation and reading, I've
discovered that the methods used for string matching significantly impact
encoding time.

Other's have successfully improved the time required to find matching strings
using the following techniques:
30210061601189 30210061601189 30210061601189

 Knuth-Morris-Pratt linear search optimization


 Boyer-Moore linear search optimization
 Linked Lists
84
 Hash Tables
 Multi-Layer Hash Tables
 Binary Search Trees
 Splays
 Suffix Trees or Suffix Arrays

I have already experimented with some of these techniques and plan to


‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

experiment with others as time allows. The rest of this section documents some
of what I have tried so far.

2.4 Lossy Compression


In addition to the tradeoff between coding efficiency – coder
complexity – coding delays the additional aspect of compression quality
arises with the use of lossy methods. Quality is hard to assess since for many
2022/2023 2022/2023 2022/2023
applications human perception is the only relevant criterion. However, there
are some scenarios were other factors need to be considered, e.g., when
compressed data is used in matching procedures like in biometrics. Although
several quality measures have been proposed over the last years, PSNR is still
the most used measure to determine image and video quality after
compression has been applied:

30210061601189 30210061601189 30210061601189

where f0(i) is the reconstructed signal after decompression. Peak-signal-to-


noise ratio (measured in decibel dB) is defined as follows (where MAX is the
maximal possible value in the original signal f(i)):
85
The higher the value in terms of PSNR, the better is the quality of the
reconstructed signal. With increasing compression ratio, PSNR decreases.
While PSNR is not well correlated with moderate quality changes, it serves its
purpose as a rough quality guideline. Many other numerical measures have
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

been developed with better prediction of human perception (e.g. structured


similarity index SSIM) but these have not been widely adopted One reason is
that image compression can be easily optimized with respect to RMSE (rate /
distortion optimization), but not with respect to the “better” measures. To
really take human perception into account, subjective testing involving a
significant number of human subjects is necessary – the resulting subjective
measure is called MOS (“mean opinion
2022/2023 2022/2023 score”). When developing
2022/2023 quality
measures, a high correlation with MOS computed over large databased is
desired. PSNR does not correlate well with MOS.

2.4.1 Quantization
When changing the description of a signal from using a large alphabet to
using a smaller alphabet of symbols
30210061601189 we apply quantization.30210061601189
30210061601189 Obviously, when
reversing the process, a perfect reconstruction is no longer possible since two
or more symbols of the larger alphabet are mapped to a single symbol of the

86
small alphabet. Often, quantization errors are the only actual errors caused by
lossy compression (apart from rounding errors in transforms).

Example:
suppose we have stored a grayscale image with 8bit/pixel (bpp)
precision, which means that for each pixel, we may choose among 256
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

different grayscales. If we apply a quantization to 4 bpp, only 16 grayscale


can be used for a pixel. Of course, it is no longer possible to re-transform the
4 bpp into the original bit-depth without errors. The information is lost.
If single signal values are subject to quantization one-by-one (a single value is
quantized into its less accurate version), the procedure is denoted “scalar
quantization”. If we subject n-tuples of signal values together to quantization
(a vector is replaced by a less accurate version), we call the scheme “vector
2022/2023 2022/2023 2022/2023
quantization”.
In any case, quantization may be interpreted as a specific kind of
approximation to the data. If we divide the range of the original signal into n
equally sized and disjoint intervals and map all symbols contained in an
interval to one of the n symbols of the new alphabet, the quantization is called
“uniform”.
Especially in the case of uniformly distributed probabilities of symbol
occurrence uniform quantization makes sense. Quantization is most often
applied to transform coefficients30210061601189
30210061601189 (e.g. after DCT or wavelet transform) – in
30210061601189

this context, coefficients quantized to zero are desired to appear frequently.


Therefore, “uniform quantization with dead zone around 0” is introduced, we

87
uniform quantization is applied, except for the quantization bin around zero,
which has twice the size of the other bins (symetrically around zero).
In case the occurrence probabilities of the symbols in the original signal
are not uniformly distributed and the distribution is known, it makes sense to
apply finer quantization (i.e. smaller bins) in the areas of the histogram where
more data values are accumulated. In this case we talk about “non- uniform
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

quantization”.
Note that following this strategy we minimize PSNR (the error is minimized
in areas where many values are found) but we do not care at all about
perceptual issues.
If the distribution of the occurrence probabilities of the symbols is not known,
the signal can be analyzed and based on the results, the quantization bins can
be set adaptively – termed “adaptive quantization”.
2022/2023 2022/2023 2022/2023
Since especially in the case of transform coding in many cases the
probabilities of symbol occurrence are quite stable for typical signals,
quantization can be modeled in advance and does not have to be adapted to
signal characteristics.

2.4.2 DPCM
Differential pulse code modulation (DPCM) is the general principle
behind differential / predictive coding.
Instead of considering the simple30210061601189
30210061601189
difference between adjacent signal values,
30210061601189

also the difference between a signal value to be predicted and a linear


combination of surrounding values (i.e. the prediction) can be considered.
Example:
88
yi = xi − x0 i, where x0
i = _xi−1 + _xi−2 + xi−3. The values _, _, are denoted “predictor
coefficients”, x0
i is the prediction for xi. The example is a linear predictor of order 3. Instead
of storing or processing the sequence xi, the sequence of differences yi is
considered, which has a significantly lower entropy (and can therefore be
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

better compressed).
The predictor coefficients can be fixed for classes of known signals (e.g.
exploiting specific patterns like ridges in fingerprint images) or can be
adapted for each single signal (in the latter case, these need to be stored). The
optimal values for _, _, e.g. minimizing entropy for yi can be determined of
course.
Furthermore, for each distinct part of a signal dedicated coefficients can be
2022/2023 2022/2023 2022/2023
used and optimized (“local adaptive prediction”). By analogy, these
coefficients need to be stored for each signal part impacting on compression
ratio (obviously, there is a trade-off !).
The decision if DPCM is lossy or lossless is taken by the way how yi is
compressed. In case e.g. Huffman coding is applied to yi we result in an
overall lossless procedure. Is yi subject to quantization, the overall scheme is
a lossy one.

A variant of DPCM frequently employed


30210061601189 is to use a fixed set
30210061601189 of predictor and
30210061601189

quantize combinations which are applied tp the data depending on local signal
statistics. Lossless JPEG is such a scheme (without applying quantization of
course).
89
2.4.3 Transform Coding
Most standardized lossy media compression schemes are based on transform
coding techniques. Typically, transform based coding consists of three stages.
1. Transformation
2. Quantization: a variety of techniques (as discussed before) can be applied to
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

map the transform coefficients to a small alphabet, the size of the latter
being selected corresponding to the target bitrate. Most rate-control
procedures are applied in this stage.
3. Lossless encoding of coefficients: a variety of techniques (as discussed
before) is applied to encode the quantized transform coefficients close to
the entropy bound. Typically, a combination of run length (zero-runs!) and
Huffman coding (for the older standards like JPEG, MPEG-1,2,4) or
2022/2023 2022/2023 2022/2023
arithmetic coding (for the more recent standards like JPEG2000 H.264) is
used.
The Transform
The aim of transformation is to change the data of a signal in a way that it is
better suited for subsequent compression. In most cases, so-called “integral-
transforms” are used, which are based (in their continuous form) on applying
integral operators. When applying transforms, the concept of projecting the
data onto orthogonal basis functions is used.
30210061601189 30210061601189 30210061601189
Motivation: Vectors in two-dimensional space can be represented by a set of
orthogonal basis-vectors (“orthogonal basis”, orthogonal requires the inner
product between vectors to be 0).

90
{(1, 0), (0, 1)} are orthogonal basis-vectors, α and β are called “coefficients”
which indicate a weight for each basis-vector to be able to represent the vector
(x, y). The property of orthogonality guarantees that a minimal number of
basis-vectors suffices (obviously, this will be important for compression).
This concept can be transferred by analogy to represent functions or signals
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

by means of basic elements.

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

91
where M is the number of pairs and xj is the mean of xj computed over all
pairs. A simple technique to apply compression is to replace one of the two
components in each pair by the corresponding mean xj. The MSE of this
compression approach is exactly 𝜎 , the variance. No matter which
coordinates we replace be the mean,
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

the variance of both cases is large and therefore, the error of this compression
approach is large.

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

92
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189


Let us consider the transformation of X to Y = (y1, y2) by applying a rotation
of the coordinate axes by 45 degrees (Y = AX where A is a rotation matrix).
A fundamental property of these types of transforms is that they preserve the
variance, i.e.
93
While variances in the original coordinate system have been almost equally
for x1 and x2, in the new coordinate system variance is distribute quite
unevenly, i.e.
𝜎 >> 𝜎 Therefore, in this representation it is obvious to apply our
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

compression scheme to Y2 (by replacing its components by the mean Y2) since
the variance and consequently the error will be small.
Examples of popular transforms (desired properties are: decorrelation of
data, energy compaction, basis functions not dependent on data, fast
algorithms):

1. Karhunen-Loeve transform: optimal transform since basis functions are


computed depending on the data. A disadvantage is the high complexity
2022/2023 2022/2023 2022/2023
O(N3) and the fact that the basis functions need to be stored and cannot be
used universally. This transform is used to decorrelate “multiple-spectral
band imagery” with a larger number of bands. Closely related to PCA!
2. Discrete Fourier transform (DFT): “the” classical transform, however, due
to its complex output and periodicity not well suited for compression.
Mostly employed in audio coding for signal analysis.

3. Discrete cosine transform (DCT):


30210061601189
applied in image, 30210061601189
30210061601189
video, and audio
coding where it is applied to chunks of data and not to the entire signal.

94
4. Wavelet and subband transform: uses basis functions with 2 parameters
instead of the single frequency parameter of the DFT and DCT: a sort of
scale and time / position.

More details to come in the chapter on lossy image compression. Quantization


A fundamental issue for quantization of transform coefficients is the strategy
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

how coefficients are selected to be processed further:

1. According to pre-defined zones in the transform domain: e.g., high


frequency coefficients are severely quantized and low frequency
components are protected (HVS modelling).
2. According to magnitude: coefficients smaller than some thresholds are
ignored (disadvantage: location needs to be store somehow), or only L
largest coefficients are kept in quantized
2022/2023 2022/2023
form, etc. In fact, in most schemes
2022/2023

both ideas are combined (e.g. JPEG).

Entropy Coding
Most schemes use either a combination of Huffman and Runlegth Coding
(mostly older standards like JPEG, MPEG-1,2,4) or arithmetic coding (more
recent schemes like JPEG2000, H.264 / CABAC)

2.2.4 Codebook-based Compression


30210061601189 30210061601189 30210061601189

Like the lossless case, we employ a dictionary to represent parts of the signal
by entries in the dictionary. I.e., select a part of the signal of size n and
replace it by the entry in the dictionary with has the same size and is most like
95
it. Of course, also here arises the question how to determine the measure of
similarity or quality. Instead of the signal part we only store the index of the
entry in the dictionary. Encoder and Decoder need to be synchronized with
respect to the dictionary entries.
Important techniques of this class are vector quantization (applied directly in
the signal domain and not to transform coefficients) and fractal compression.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

96
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

‫‪2022/2023‬‬ ‫‪2022/2023‬‬ ‫‪2022/2023‬‬

‫‪30210061601189‬‬ ‫‪30210061601189‬‬ ‫‪30210061601189‬‬

‫‪97‬‬
1. Toy Language A random variable X is distributed according to the point
probabilities in the following table:

x a b c d e
PX (x) 1 /3 1 /4 1 /6 1 /6 1/12

(a) Compute H(X).


(b) Construct a Huffman code for the variable, and compute its expected
code length.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

(c) Decode the message 00101100001 according to your code.

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

98
2. Dice Throws Let X be a random variable distributed uniformly on the
set {1, 2, 3, 4, 5, 6}.

(a) Construct a Huffman code for the variable.


(b) What is the average codeword length for your code? How does that
compare with the entropy?
(c) If you interpret a codeword of length k as a probability of 2−k, what
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

is then the implicit distribution expressed by your code?

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

99
3. Maximum Equivocation Suppose X and Y are random variables rang-
ing over n different values.

(a) Prove that if H(X | Y ) = log n, then X and Y are independent.


(b) Provide an example showing that X and Y need not be independent
even though H(X) = log n.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

100
4. Typical Sequences Suppose we have a bent coin with PX (1) = 1/3. We
flip this coin five times, producing a random sequence S = (X1, . . . , X5).

(a) Compute the entropy H(X) of each individual coin flip, and the
entropy H(S) of the sequence.

(b) Draw up a table of the surprisal values log PS(s) and their proba-

bilities.
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

101
5. High-Probability Sets (CT, Ex. 3.5) Suppose that X1, X2, . . . , Xn are
independent and identically distributed random variables with a common
entropy H(X). Let PS be the probability distribution of the sequence
S = (X1, . . . , Xn), and define the set C(τ ) as

C(τ ) := s : PS(s) ≥ 2−nτ }.

(a) For a fixed τ , what’s the largest number of tuples such a set can
contain?
(b) Sketch a graph of the probability

P (S ∈ C(τ )),
‫ﻣﻴﺮﻧﺎ ﻣﺤﻤﺪ ﻋﺒﺪﺍﻟﻤﻨﻌﻢ ﻋﻄﺎﺍﻟﻠﻪ‬

as a function of τ , both for large n, and for extremely large n.

2022/2023 2022/2023 2022/2023

30210061601189 30210061601189 30210061601189

102

You might also like