Coding and Information Theory
Slide 1 Sylubus
CS5058701
CS5058 Coding and Information Theory
Lecturer: Wei-Chung Teng, http://faculty.csie.ntust.edu.tw/ ~weichung/ Ofce hour: Friday 14:00-17:00, room T4-502 AssistantDing-Jie Huang, room RB-304-2, tel: 2733-3141#7297, D9515002@mail.ntust.edu.tw Grading: HW 40%, Mid-term exam 30%, Final exam 30%
Learning Objectives
Upon completion of the course, the students will be able to
dene and apply the basic concepts of information theory differentiate between lossy and lossless data compression methods, and describe the most common ones design an efcient data compression scheme for a given information source calculate the capacity of communication channels
Closely Related Courses
Channel Coding
CS5101 Digital Wireless Communications and Networks CS5085 Data Compression with Applications CS5067 Image and Video Compression CS5028 Cryptography and Data Security
Source Coding
Cryptography
Literatures
Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, 2nd Edition, Wiley, 2006. David J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003.
Available online: http://www.inference.phy.cam.ac.uk/mackay/itila/ book.html
Peter Sweeney, Error Control Coding - from theory to practice, Wiley, 2002.
Shannons Information Theory
Claude Elwood Shannon (1916- 2001), "the father of information theory". 1948: A Mathematical Theory of Communication, published in The Bell System Technical Journal Channel Coding Theorem:For any channel, there exist codes that make it possible to communicate with arbitrarily small probability of error at non-zero rates. The maximum rate at which communication is possible with arbitrarily small error is called the capacity of the channel.
Information Theory
Information theory answers two fundamental questions in communication theory
What is the ultimate data compression?
the entropy H
What is the ultimate transmission rate of communication? the channel capacity C
A Communication System with Discrete Source and Noisy/Noiseless Channel
INFORMATION SOURCE TRANSMITTER
RECEIVER
DESTINATION
SIGNAL MESSAGE
RECEIVED SIGNAL MESSAGE
NOISE SOURCE
Fig. 1 Schematic diagram of a general communication system.
a decimal digit is about 3 1 bits. A digit wheel on a desk computing machine has ten stable positions and 3 therefore has a storage capacity of one decimal digit. In analytical work where integration and differentiation from Shannons paper are involved the base e is sometimes useful. The resulting units of information will be called natural units. Change from the base a to base b merely requires multiplication by logb a. 8 By a communication system we will mean a system of the type indicated schematically in Fig. 1. It
Source Coding & Channel Coding
Discrete source with noiseless channel : source coding
basic function of source coding is data compression other functions: cryptography, help multicast in network
Discrete source with noisy channel : channel coding basic function: error detection & correction
A Flowchart Representing the Various Stages of Data Processing
Message Source Message Received
Source Encoder
Coding & Info Theory
Source Decoder
Encryption
Cryptography
Decryption
Channel Encoder
Coding & Info Theory
Channel Decoder
Noisy Channel
10
Channel over Time
We usually assume the communication channel connect here and there (transmission of information), but in fact we may also consider the channel as a mean to sending information from now to then (storage of information). The encoding of information for efcient storage as well as reliable recovery in the presence of noise is essential in computer science.
11
Related Fields
Communication Engineering Computer Science (Kolmogorov Complexity) Physics (Thermodynamics)
The concept of entropy is borrowed from the second law of thermodynamics
Math (Probability and Statistics) Economics (Investment)
12
Everything starts from probabilities of symbols
discrete data source, ex.
English text, [0-9a-zA-Z/,.:;`()!$%&+-]+ Number [0-9+-.] Binary value: +/-, Green/Red, 0/1
If you do not know the probability of any possible value (or symbol) and you are asked to guess the next output, how would you act?
13
Everything starts from probabilities of symbols cont.
We dont feel advantaged if the next output has a uniform distribution over all possible values and
English text, [0-9a-zA-Z/,.:;`()!$%&+-]+ Number [0-9+-.] Binary value: +/-, Green/Red, 0/1
If you do not know the probability of any possible value (or symbol) and you are asked to guess the next output, how would you act?
14
What is The Unit of Measuring Information?
Shannon promotes logarithm measure For a binary channel, the information of some value with probability p is dened as
log 2 p bits
Ex. If you know that the next output is a digit, but have no clue on what number. Then this very digit 1 has information of log 2 = log 2 10 bits
10
15
Entropy
Entropy is the expected value of the next output, which we call the random variable X. The entropy of X with a probability mass function p(x) is dened by
H (X) = p(x)log x p(x)
x
16
Example 1.1 & 1.2
Consider a random variable that has a uniform distribution over 8 outcomes. To identify an outcome, we assign different label (code) to each outcome. Thus, 3-bit strings sufce as labels. Also the entropy of this random variable is also 3 bits. What if the probability distribution becomes (1/2, 1/4, 1/8, 1/16, 1/64, 1/64, 1/64, 1/64)? We may still use 3 bits labels, but the entropy become 2 bits. How could we cut off the redundant bit?
17
Example 1.2 with coding
As the win probabilities are not uniform, it makes sense to use shorter descriptions for the more probably horses, and longer descriptions for the less probable ones. For example, the following strings can be used to represent the eight horses: 1, 01, 001, 0001, 000000, 000001, 000010, 000011 The average description length is then 2 bits (=entropy).
The entropy gives a lower bound for the average description length.
18
Mutual Information
entropy: Uncertainty of a single random variable. conditional entropy: The entropy of a random variable, given another random variable. mutual information: The reduction in uncertainty due to another random variable. For two random variables, X and Y , the mutual information is The mutual information is symmetric in X and Y and non-negative.
19
p(x, y) I(X;Y ) = H (X) H (X | Y ) = p(x, y)log p(x)p(y) x, y
Channel Capacity
A communication channel is a system in which the output depends probabilistically on its input. A probability transition matrix determines the conditional distribution of the output given the input. With input X and output Y , the channel capacity is dened by
C = max I(X;Y )
p( x )
The capacity is the maximum rate at which we can send information over the channel and recover the information at the output with a vanishingly low probability of error.
20
Example 1.3 Noiseless Binary Channel
For the noiseless binary channel, the binary input is reproduced exactly at the output. Any transmitted bit is therefore received without error; in each transmission, we can send 1 bit reliably to the receiver and the capacity is 1 bit. We can also calculate the capacity
C = max I(X;Y ) = 1bit
21