KEMBAR78
Tutorial 8 | PDF | Data Compression | Code
0% found this document useful (0 votes)
13 views20 pages

Tutorial 8

The document discusses source coding techniques, focusing on fixed-length and variable-length codes, including Huffman coding. It explains the importance of coding for data compression, error protection, and secrecy, and differentiates between lossless and lossy compression methods. Additionally, it provides examples and exercises related to Huffman coding, illustrating how to assign codewords based on symbol frequency.

Uploaded by

z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views20 pages

Tutorial 8

The document discusses source coding techniques, focusing on fixed-length and variable-length codes, including Huffman coding. It explains the importance of coding for data compression, error protection, and secrecy, and differentiates between lossless and lossy compression methods. Additionally, it provides examples and exercises related to Huffman coding, illustrating how to assign codewords based on symbol frequency.

Uploaded by

z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ELEC1010 Tutorial 8

Source Coding
 Fixed Length Code
 Huffman Code

Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Coding
 Converts information into a different form for storage, transmission,
or other types of processing.
 Why coding?

 Compression

 Error Protection

 Secrecy

 Source coding

 Ensures bits are used efficiently

 Reduces the number of bits

 Channel coding

 Adds redundant bits for protection.

 Help receiver to detect and possibly recover bit errors.

2 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Example - Video storage in a Blu-ray Disc (BD)
A full HD frame size has: 1080 × 1920 pixels
There are 1080 ×1920 = 2,073,600 pixels per frame

With color: 3 color channels to represent one pixel. One byte per each
color channel.

Total number of bytes in each frame: 2,073,600×3 = 6,220,800 bytes/frame

For a 2-hour movie with 60 frames/second, there would be:

6220800 ×60 ×3600 ×2 hrs × = 2.44TB/movie (~100 blu-ray)

There is a huge number of bytes in a 2-hour movie. Need to reduce


the data size to save storage space.
3 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding (Data Compression)

Source Coding: the process of encoding information using a


small number of bits (compression).
 Decompression: the process of reconstructing the original
signal, or something close to it, from the compressed version.
 Compression:
 Lossless compression

 Lossy compression

4 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding - Lossless Compression

 Nothing about the original set of number or symbols is lost.


 Reversible

 The reverse process is decompression.

The exact set of original numbers can be recovered by


decompression process.
 Used for signals and information that require accuracy.
 e.g. financial record, scientific data

WinZip & WinRAR are lossless compression software

5 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding - Lossy Compression
 Selectively throw away information about the sets of
numbers or symbols being compressed.
 Loses features that would not be noticed or are not

important in the application.


 Not reversible.

 Cannot get back the exact original set of numbers from

the compressed version.


 Reconstruct something close to the original signal from

compressed version.
 Offers bigger compression ratio.

 e.g. MP3 music, JPEG picture, DVD movies

6 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding Basics : Symbol-to-bit assignment

 The information is a sequence of symbols.

The set of symbols is the set {xi} in Shannon’s entropy


equation.

 e.g. The 52 alphabets in a book {a, b, c, d,….z; A, B, C,…Z}


The black dot and the white dot in a fax page {B, W}

Assign a sequence of bits (codeword) to represent each


symbol.

7 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding -Fixed-Length Code (Uniform Code)
 Every codeword has the same code length (the number of bits).
 For N symbols, we need a minimum of M bits, such that the total

number of combinations is 2M ≥ N
Examples:
# of Symbols Symbol Set Possible Codewords
2: {B, W} 0, 1
4: {a, e, q, z} 00, 01, 10, 11
Any problem in Uniform code?
What will happen if there are 5 symbols?
Examples:
# of Symbols Symbol Set Possible Codewords
5: {a, b, c, d, e} 000, 001, 010, 011, 100
Requires 3 bits for each symbol. 23 > 5, not efficient.
8 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding - Variable-Length Code
 Assigns variable length codeword to different symbols.
Examples:
# of Symbols Symbol Set Possible Codewords
5: {a, e, r, o, z} 1, 0, 10, 11, 01
 On average, require less than 2 bits to represent each symbol.
More efficient than uniform code.
 Is there any problem?
Given a sequence: 110110 aaeaae
The sequence can be decoded as: aazr
araae
ozr
ozae
oeoe
……

9 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Prefix Code
 Typically a variable-length code.
 No codeword is a prefix of any other codeword.
0 1
 Uniquely decodable.
e. g. a=000, b=001, c=010, d=011, e=1 is prefix code
1
0011011000010 can only be bedac 0 1 e
 Note: a=0, b=1, c=10, d=11 is not prefix code

 How to assign code to each symbol?


0 1 0 1
Code Tree
• Branches to the left for a “0” bit and to 001 010 011
000
the right for a “1” bit. a b c d

• All codewords must be at the leaves. Prefix Code -> Uniquely


Decodable
• Decode from the top of the tree.
 Is the code efficient? How to use the bits more efficiently?

10 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Huffman Code

 Developed by David Albert Huffman.


 Variable-length, prefix source coding technique.

 Uniquely decodable.

 Used for lossless data compression.


 Objective: most infrequent symbol uses most bits.

14 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Huffman Code
“Relative frequency” : the frequency of a symbol compared to other
symbols. This relative frequency is the probability p(x) in Shannon’s
entropy formula! Here are the steps:
Step 1: find out the relative frequency (p(x)) for each symbol
Step 2: combine the most infrequent two symbols into one symbol
Step 3: assign 0 to the left (upper) and 1 to the right (lower side) (can
be arbitrary, e.g. We can also ask to follow the convention that the
group/symbol with higher probability is assigned a ‘0’ and that with
lower probability is assigned a ‘1’, or vice versa)
Step 4: find out the relative frequency of the new symbol
Step 5: repeat step 1 to step 4, until all symbols are merged into a tree
Step 6: convert the code tree into a prefix code
15 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding – Huffman Code Exercise 1
There are 100 students taking the exam. 20% of them get A, 70% of
them get B, 10% of them get C.
i. Perform Huffman code on the symbols. Provide a codeword for
each symbol and calculate the average bit/symbol.
ii. Perform Huffman code on symbol pairs, provide a codeword for
each symbol pair and calculate the average bit/symbol.
answer
i. P(A)=0.2, P(B)=0.7, P(C)=0.1 Symbol Codeword
B 0.7
A 10
0
1 B 0
0.2
A 0
0.3 1 C 11
0.1 1
C
Avg. bit/sym = 0.2×2+0.7×1+0.1×2 = 1.3 bit/symbol
16 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding – Huffman Code Exercise 1
ii. answer
BB 0.49

0
BA 0.14 0 0.28 1

AB 0.14 1 0
1
0.51
CB 0.07 0
0.14
BC 0.07 1 0 1
0.23
AA 0.04
0
1
0.02 0.09
CA
0
0.02 0.05 1
AC 0
0.03 1
CC 0.01
1
17
Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding – Huffman Code Exercise 1 answer
P(BB)=0.49
Symbol Codeword
P(BA)=0.14
BB 0
P(AB)=0.14 BA 100
P(CB)=0.07 AB 101
CB 1100
P(BC)=0.07
BC 1101
P(AA)=0.04 AA 1110
P(CA)=0.02 CA 11110
AC 111110
P(AC)=0.02
CC 111111
P(CC)=0.01
Avg. bit/sym = (0.04×4+0.14×3+0.02×6+0.14×3+0.49×1+0.07×4
+0.02×5+0.07×4+0.01×6)/2
= 1.165bit/symbol
18 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding – Huffman Code Exercise 2
Assume that we encode the sequence: 111101100100 by grouping three
binary bits as one symbol and encode one symbol at a time using
Huffman Coding. We assume further the distribution of three
successive binary bits is completely random which means that the
probability of finding a specific sequence of three binary bits is equal
to product of their individual probability as shown below:
P(000) = P(0)×P(0)×P(0); P(001) = P(0)×P(0)×P(1)
P(010) = P(0)×P(1)×P(0); P(011) = P(0)×P(1)×P(1)
P(100) = P(1)×P(0)×P(0); P(101) = P(1)×P(0)×P(1)
P(110) = P(1)×P(1)×P(0); P(111) = P(1)×P(1)×P(1)
1) Compute the probabilities of the 8 different symbols.
2) Draw the code tree and provide the codeword for each symbol.
3) Encode the sequence with the codewords and provide the
encoded sequence.
19 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST
Source Coding – Huffman Code Exercise 2 answer

1) P(1) = 7/12 = 0.6, P(0) = 5/12 = 0.4


P(000) = 0.4 × 0.4 × 0.4 = 0.064
P(001) = 0.4 × 0.4 × 0.6 = 0.096
P(010) = 0.4 × 0.6 × 0.4 = 0.096
P(011) = 0.4 × 0.6 × 0.6 = 0.144
P(100) = 0.6 × 0.4 × 0.4 = 0.096
P(101) = 0.6 × 0.4 × 0.6 = 0.144
P(110) = 0.6 × 0.6 × 0.4 = 0.144
P(111) = 0.6 × 0.6 × 0.6 = 0.216

20 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Huffman Code Exercise 2 answer
2)

21 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Huffman Code Exercise 2 answer

2)…The codewords are: 3) Original sequence:


111101100100
Symbol Codeword
000 1111 Codeword:
001 1110 111: 00
010 011 101: 100
011 101 100: 010
100 010
100: 010
101 100
Encoded sequence is
110 110 00100010010
111 00

22 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST


Source Coding – Huffman Code

 Solution is not necessarily unique but gives the same avg.


bits/symbol.

 Huffman coding is completely reversible (assuming no bit


errors over storage or transmission). It is lossless.

 More frequently used symbols have shorter codewords.

23 Raymundo TANG TANG, Electronic and Computer Eng. Dept, HKUST

You might also like