KEMBAR78
Module 5 - Info Theory and Compression Algo | PDF | Data Compression | Code
0% found this document useful (0 votes)
20 views58 pages

Module 5 - Info Theory and Compression Algo

This document discusses various compression algorithms including: 1. Run-length encoding which replaces sequences of repeating pixels with a single value and count. 2. Huffman coding which assigns variable length binary codes to symbols based on their frequency, with more common symbols getting shorter codes. 3. Arithmetic coding which encodes a string of symbols into a single fractional value between 0 and 1. It provides examples of how these algorithms work and their applications in compressing images, text, and other data.

Uploaded by

muwaheedmustapha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views58 pages

Module 5 - Info Theory and Compression Algo

This document discusses various compression algorithms including: 1. Run-length encoding which replaces sequences of repeating pixels with a single value and count. 2. Huffman coding which assigns variable length binary codes to symbols based on their frequency, with more common symbols getting shorter codes. 3. Arithmetic coding which encodes a string of symbols into a single fractional value between 0 and 1. It provides examples of how these algorithms work and their applications in compressing images, text, and other data.

Uploaded by

muwaheedmustapha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

ELE5211: Advanced Topics in CE

Module Five
(Multimedia – Compression Algorithms)

Tutor: Dr. Hassan A. Bashir1


Information Theory and
Compression Algorithms

2
Information Theory and
Compression Algorithms

3
Information Theory (Shannon Theory)

4
Information Theory (Shannon Theory)

5
Entropy and Code Length

6
Run-Length Coding (RLC)

7
Run-Length Coding (RLC)
Run length encoding (RLC) is a technique that is not
so widely used these days

but it is a great way to get a feel for some of the


issues around using compression.

Imagine we have the following simple black and


white image.
One very simple way a computer can store this image in binary is by using a
format where ‘1' means white and ‘0' means black

This is a "bitmap", because we've mapped the pixels onto the values of bits

Using this method, the above image would be represented in the following way:
8
Run-Length Coding (RLC)
100111101111001  1, 2, 4, 1, 4, 2, 1
011111000111110  0, 1, 5, 3, 5, 1
111110000011111  5, 5, 5
111100000001111 ….
111000000000111
….
Can we represent the same image using fewer bits,
but still be able to reconstruct the original image?
Yes, we can. One of the many methods is called run length encoding.
 Replace each row with numbers that say how many consecutive pixels
are the same colour,
 Always starting with the number of white pixels.
For example, the first row in the image above contains one white, two black,
four white, one black, four white, two black, and one white pixel. 9
Decompression of RLC
Exercise 1: 4, 11, 3
4, 9, 2, 1, 2
 Can you decompress the following code? 4, 9, 2, 1, 2
 How many pixels were there in the original image? 4, 11, 3
 How many numbers were used to represent those 4, 9, 5
pixels? 4, 9, 5
 How much space have we saved using this alternate 5, 7, 6
representation, and how can we measure it? 0, 17, 1
1, 15, 2
RLC Usage
The main place that black and white scanned images are used now is on fax
machines, which use this approach to compression. One reason that it works so
well with scanned pages is that the number of consecutive white pixels is huge.
In fact, there will be entire scanned lines that are nothing but white pixels. A
typical fax page is 200 pixels across or more, so replacing 200 bits with one
10
number is a big saving.
Variable Length Coding (VLC)

11
Variable Length Coding (VLC)

12
Variable Length Coding (VLC)

13
Variable Length Coding

14
Variable Length Coding

15
Huffman Coding

16
Huffman Coding

17
Huffman Coding

18
Properties of Huffman Coding

19
Fixed vs. Variable Length Coding

20
Exercise: Shannon vs. Huffman Coding

Shannon-Fano: 89 bits
Huffman: 87 bits
Fix Length Coding : 117 bits

21
Extended Huffman Coding

22
Extended Huffman Coding

23
Adaptive Huffman Coding

24
Adaptive Huffman Coding

25
Adaptive Huffman Coding
(Tree Updating)

26
27
Adaptive Huffman Coding

28
Adaptive Huffman Coding

29
Adaptive Huffman Coding

30
31
Adaptive Huffman Coding

32
Dictionary-based Coding
• Lempel–Ziv–Welch is a universal lossless data compression
algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.

• LZW compression is the compression of a file into a smaller file


using a table-based lookup algorithm.

• LZW compression works by:


• reading a sequence of symbols,
• grouping the symbols into strings, and
• converting the strings into codes.

• Because the codes take up less space than the strings they
replace, we get compression.
33
Dictionary-based Coding
• Two commonly-used file formats in which LZW compression is
used are
• the GIF image format served from Web sites and
• the TIFF image format.

• Both ZIP and LZW are lossless compression methods.


• That means that no data is being lost in the compression,
unlike a lossy format like JPG.

• You can open and save a TIFF file as many times you like without
degrading the image.
• If you try that with JPG, the image quality will deteriorate more
each time.
34
Dictionary-based Coding

35
36
Dictionary-based Coding

37
38
Dictionary-based Coding

39
40
Dictionary-based Coding
(LZW - Remarks)

41
Arithmetic Coding
• Arithmetic coding (AC) is a form of entropy encoding used in
lossless data compression.

• Normally, a string of characters such as the words "hello there" is


represented using a fixed number of bits per character, as in the
ASCII code.

• Comparison of AC with Huffman


• Arithmetic Coding is superior to Huffman coding in the sense
that it can assign a fractional number of bits for the codewords
of the symbols, whereas
• in Huffman coding an integral number of bits have to be
assigned to a codeword of a symbol.
42
Arithmetic Coding

43
Arithmetic Coding

44
Arithmetic Coding

45
Arithmetic Code for: CAEE$

46
Arithmetic Coding

47
Arithmetic Coding

48
Arithmetic Coding
For the above example, low = 0.33184, high = 0.3322.

If we assign 1 to the 1st binary fraction bit, it would be 0.1 in binary,


and its decimal value is:
value(code) = value(0.1) = 0.5 > high
Hence, we assign 0 to the first bit. Since value(0.0) = 0 < low, the
while loop continues.

Assigning 1 to the 2nd bit makes a binary code 0.01 and value(0.01) =
0.25, which is less than high, so it is accepted. Since it is still true that
value(0.01) < low, the iteration continues.

Eventually, the binary codeword generated is: 0.01010101


which is 2-2+2-4+2-6+2-8 = 0.33203125 49
Arithmetic Coding

50
Arithmetic Coding

51
Lossless Image Compression

52
Lossless Image Compression

53
Lossless JPEG

54
Lossless JPEG

55
Lossless JPEG

56
Lossless JPEG

57
Homework

Use Arithmetic Coding to encode your Surname

58

You might also like