Generating code words for groups or
sequences of symbols
A unique identifier or tag is generated for
the sequence to be encoded
This tag is a fraction and it is binary
encoded
The set of tags for representing symbols
are the numbers in the unit interval [0,1)
Number of numbers in the unit interval is
infinite.
Unique tag can be generated for each
sequence.
Cumulative distribution function of the random
variables associated with the source is used for
mapping the sequence to a tag.
Generating the tag:
Example:
Consider a three-letter alphabet A = {a1, a2, a3}
with
P(a1) = 0.7,P(a2) = 0.1, and P(a3) = 0.2.
Fx(1) = 0.7, Fx (2) = 0.8, and Fx(3) = 1. This
partitions
the unit interval as shown in Figure.
0.0
a1
0.7
0.49
0.546
a1
a1
a1
0.49
a2
0.8
0.5558
0.539
a2
0.56
a3
1.0
0.0
0.546
a3
0.70
a2
a3
0.56
a2
0.5572
a3
0.56
The interval in which the tag for a particular
sequence resides is disjoint from all
intervals in which the tag for any other
sequence may reside.
Any number in this interval can be used as
the tag(say, mid point/lower value).
the tag forms a unique representation for
the sequence. This means that the binary
representation of the tag forms a unique
binary code for the sequence.
We can show that for any sequence
x = (x1, x2,.xn)
l ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn 1)
u ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn )
u (n) l (n)
TX ( x )
2
The only information required by the tag
generation procedure is the cdf of the source,
which can be obtained directly from the probability
model.
Given a DMS A = {a1, a2, a3}
P(a1) = 0.8,
P(a2) =0.02
Obtain tag for a1 a3 a2 a1.
Ans. Tag = 0.772352
P(a3) =0.18
Given Tag Value, Probability model & length
of the block
Tag value = 0.772352
The interval containing this tag value is the
subset of every interval obtained in the
encoding process. While decoding, decode
the elements in such a way that upper and
lower limits always contain this tag value.
1.
2.
3.
4.
5.
Start with l(0) =0, u(0) =1
Let the first element is decoded as x1
Then l(1)= 0 +(1-0)Fx(x1-1)=Fx(x1-1)
u(1) =0 +(1-0)Fx(x1)=Fx(x1)
Find the value of x1 for which this interval
contains the tag value.
If we pick x1=1, the interval is [0,0.8). If
we pick x1=2, the interval is [0.8,0.82),
and if we pick x1=3, the interval is
[0.82,1.0). As 0.77232 lies in the interval
[0.0,0.8), we choose x1=1
Let the second element is x2.
l(2)
=
0 +(0.8-0)Fx(x2-1)=0.8 Fx(x2-1)
u(2)
=
0 +(0.8-0)Fx(x2)=0.8 Fx(x2)
If we pick x2=1, the updated interval is [0, 0.64,),
which does not contain the tag.
If we pick x2=2, the updated interval is [0.64,
0.656), which also does not contain the tag.
If we pick x2=3, the updated interval is [0.656,
0.8) which contains the tag.
Therefore second element is 3.
Let the third element is x3.
l (3)
u (3)
= 0.656 +(0.8-0.656) Fx(x3-1) = 0.656 + 0.144 Fx(x3-1)
= 0.656 +(0.8-0.656) Fx(x3) = 0.656 + 0.144 Fx(x3)
Subtract the value of l
(2)
from both the limits and the tag.
Find the value of x3 for which the interval [0.144 x Fx (x3-1),
0.144 x Fx (x3)) contains 0.772352 - 0.656 (Residual tag)
Divide the residual tag value by 0.144 to get 0.808.
Find the value of x3 for which 0.808 falls in the interval
[Fx(x3-1), Fx(x3) ). It is possible if x3 =2.
Then update the intervals.
Let the fourth element is x4.
(4)
(4)
Subtract the value of l
0.7712 +(0.77408-0.7712) Fx(x4-1) =
0.7712 + 0.00288 Fx(x 4-1)
0.7712 +(0.77408-0.7712) Fx(x4) =
0.7712 + 0.00288 Fx(x 4)
(3)
from both the limits and the tag.
(0.772352 - 0.7712 =0.001152)
Find the value of x4 for which the interval [0. 00288 x Fx (x4-1),
0. 00288 x Fx (x4)) contains 0.001152 (Residual tag)
Divide the residual tag value by 0. 00288 to get 0.4.
Find the value of x4 for which 0.4 falls in the interval
[Fx(x4-1), Fx(x4) ). It is possible if x4 =1.
Data is decoded as 1321 i.e., a1 a3 a2 a1
1.
2.
3.
4.
5.
Initialize l(0) = 0 & u(0) = 1
For each k find t* = (tag - l(k-1))/ (u(k-1)- l(k-1))
Fin the value of xk for which Fx (xk-1)
t*
Fx (xk)
Update u(k) and l(k)
Continue until the entire sequence has been
decoded.
T x( x )
log(1 / p( x)) 1
Eg: A ={a1, a2, a3,a4}
P(a1) = , P(a2) = , P(a3) =
Symbol
Fx
, P(a4) =
Tag
In binary
1
log p( x) 1
Code
0.5
.25
.010
01
0.75
.625
.101
101
0.875
.8125
.1101
1101
1.0
.9375
.1111
1111
Binary code obtained by truncating is uniquely decodable.
Proof: T x( x ) is a unique identifier.
It lies in the interval [Fx(x-1), Fx(x))
We have to show that T ( x) is also contained in this
x
l ( x)
interval
Since we are truncating the binary representation of T x( x )
to T ( x) , T ( x)
is less than or equal to T x( x )
l ( x)
l ( x)
1
0 Tx ( x) Tx ( x) l ( x )
l ( x) 2
Since
< Fx(x), < Fx(x).
T x( x )
Tx ( x)
l ( x)
We have to show that T ( x)
is greater than or equal to
x
l ( x)
Fx(x-1)
1
1
1
2l ( x )
log
1
p( x)
2
1
1
log
p( x)
1
p( x)
1
2
2
p( x)
p( x)
Tx ( x) Fx ( x 1)
2
1
Tx ( x) Fx ( x 1) l ( x )
2
or Tx ( x) Fx ( x 1)
l ( x)
Therefore the code Tx ( x) is a unique
l ( x)
representation of
1
2n
T x( x )
Average codeword length (Rate)
2
H ( X ) lA H ( X )
m