KEMBAR78
UNIT V Finite Word Length Effects Lecture Notes Modified | PDF | Digital Signal Processing | Rounding
0% found this document useful (0 votes)
175 views11 pages

UNIT V Finite Word Length Effects Lecture Notes Modified

This document discusses finite word length effects in digital signal processing. It covers quantization noise, fixed point and binary floating point number representations, overflow and truncation errors, and coefficient quantization error. Examples are provided to illustrate fixed point arithmetic operations like addition, subtraction, and multiplication. Binary floating point representation and arithmetic are also explained.

Uploaded by

ramuamt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views11 pages

UNIT V Finite Word Length Effects Lecture Notes Modified

This document discusses finite word length effects in digital signal processing. It covers quantization noise, fixed point and binary floating point number representations, overflow and truncation errors, and coefficient quantization error. Examples are provided to illustrate fixed point arithmetic operations like addition, subtraction, and multiplication. Binary floating point representation and arithmetic are also explained.

Uploaded by

ramuamt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

AVR 1

UNIT 5 FINITE WORD LENGTH EFFECTS

Syllabus:

Quantization noise – Derivation for quantization noise power – Fixed point and binary floating point
number representation – Comparison – Over flow error – Truncation error – Co-efficient quantization
error – Limit cycle oscillation – Signal scaling – Analytical model of sample and hold operations –
Application of DSP – Model of speech wave form – Vocoder.

Minimal Coverage:
 Quantization noise – Derivation for quantization noise power
 Fixed point and binary floating point number representation – Comparison –
 Over flow error – Truncation error –
 Co-efficient quantization error –
 Limit cycle oscillation – Signal scaling –
.

TEXT BOOK
1. John G Proakis and Dimtris G Manolakis, “Digital Signal Processing Principles- Algorithms and
Application”, 3rd Edition, PHI/Pearson Education, 2000.
REFERENCES
1. Sanjit K.Mitra, “Digital Signal Processing A Computer - Based Approach”, 2nd Edition, Tata
McGraw-Hill, 2001. (Covers much of this unit)
2.Alan V Oppenheim, Ronald W Schafer and John R Buck, “Discrete Time Signal Processing”, 2nd
Edition, PHI/Pearson Education, 2000.
3. Johny R.Johnson, “Introduction to Digital Signal Processing”, Prentice Hall of India/Pearson
Education, 2002.
4.http://www-inst.eecs.berkeley.edu/~cs61c/sp06/handout/fixedpt.html
5. Alan V Oppenheim & Ronald W Schafer " Digital Signal Processing" (Many good ideas got from
this book. This is another book which is not the one mentioned in Ref 2)
AVR 2

Fixed point and binary floating point number representation– Comparison


Representation of Numbers:
Any digital filter realization (hardware or software), the filter coefficients (h(n) values) are stored
in memory area (registers). This memory can store only finite number of bits. Most of the time, filter
coefficients have more numbers after decimal point (0.22344556676778895432119990….) which cannot
be stored as it is, in finite memory area. The filter coefficients are either truncated or rounded-off to store
them. But truncation and rounded-off causes errors which we are going to study.
Also, quantization of filter coefficients (h(n)) values causes the change in filter characteristics
(even sometimes, quantization make the filter completely useless).
This is called as Finite Word Length Effects.
Word: Group of bits
Word Length: Size of group of bits, usually byte (group of 8 binary bits)
Binary Fixed point Representation of Numbers:
A real number, X can be represented as:
X =( b− A … b−1 . b0 b1 … b B )r
B
X = ∑ bi r−i
i=− A
A : Number of integer digits , B : number of fractional digits
We are going to consider only fractional numbers (which are very common in digital filters), then,
X =0. b1 b2 … bB
B
X =∑ bi r −i
i=1
If fraction is a negative number, then,
B
X =−0. b1 b2 … bB =−∑ b i r
−i

i=1
Negative Fraction (Fixed point numbers) can be represented in three ways:
 Sign Magnitude Representation: X SM =1. b1 b 2 … b B 1 denotes negative fraction
 1’s Complement Representation: X 1 C =1.b 1 b 2 b3 … b B
 2’s Complement Representation: X 2 C =1.b 1 b 2 b3 … b B +0.000 … 01
Positive Fraction representation is same in every above representation.
Example:
7
∧−7
1. Express the fraction 8 in sign magnitude, two’s complement and one’s complement
8
format.
Solution:
7
=0.875=0.1110(2)
8
0.875∗2=1.75 →1
0.75∗2=1.5 →1
0.5∗2=1.0 → 1
0∗2=0 → 0
7 −7
Sign Magnitude: = 0.111 =1.111
8 8
7 −7
1’s Complement: = 0.111 =1.000
8 8
7 −7
2’s Complement: = 0.111 =1.001
8 8
Note:
There won’t be any point (decimal or binary point) in computers if you type a fractional number. Rather it
is made to evaluate the numbers after particular position.
AVR 3

For positive number, every


representation is same

Fixed Point Arithmetic:


2’s complement operations are more common in computers due to their easiness (for a computer
NOT for us).
Adding two fixed number:
4 1
+ =0.5+ 0.125=0.1+0.001=0.101=0.62510
8 8
Subtracting two fixed number:
4 1
− =0.5−0.125=0.5+ (−0.125 )=0.1+1.111
8 8
0.100
1.111

10.011

Ignore carry , 0.011 isthe answer=0.37510


Multiplication:
'
0.111∗0. 001 Do n t worry about the dotsnow …
11 1
000
000
˙ 3+3 digits
Place the after
001 11
i .e 6 digits
0.000111

Dynamic Range and Resolution:

−B
Range of positive fraction :0 ≤ X ≤1−2
Range of 2’ s complement Representation :−1≤ X ≤ 1−2−( B−1 )
(Since 2’s complement form represent both positive and negative fraction)
Dynamic range=X MAX −X MIN
Dynamic Range
Resolution= B
→tells what is the next number∈the series
2 −1
Example:
If B =2, then considering 2’s complement representation,
Range: −1 ≤ X ≤ 0.5
Dynamic Range = 0.5−(−1 )=1.5
1.5 1.5
Resolution = = =0.5
2 −1 3
2

−1−0.5 00.5 ( So totally 2 numbers: 4 numbers )


B

( 11 ) (10 )( 01 ) (00)
Binary Floating point Representation:
Floating point:
AVR 4

E
X =M . 2
M : Mantissa∧E : Exponent
0.125 → 0.001∗20 →0.100∗2−2 →0.100∗2110
Floating Point Arithmetic:
Big Number = Small number * 2+ve
Floating Point Multiplication:
 Multiply the mantissa
Small Number = Big number * 2− ve
 Add the exponent
 Correct the decimal point to the first one. 101=0.101∗2+3
Example :
X 1 =5=101=0.101∗23=0.101∗2011 0.001=0.100∗2−2
3 −1 −0 01 101
X 2 = =0.375=0.0 11=0.11∗2 =0.11 0∗2 =0.110∗2
8
E +E ( 011+101 ) 011−001
X 1∗X 2=( M 1∗M 2 )∗2 =( 0.101∗0.11 0 )∗2
1 2
=0.011110∗2
010 001
¿ 0.011110∗2 =0.111100∗2
Floating Point Addition:
If exponent is same for the both number, then add the mantissa like fixed point addition.
If exponent isdifferent , makethe smallnumber ¿ equate the exponent . See below :
Taking the previous example : −1
011 101 011 −001 0.110∗2
X 1 + X 2=0.101∗2 +0.110∗2 =0.101∗2 + 0.110∗2 0
011 011
0.0110∗2
X 1 + X 2=0.101∗2 +0.0000110∗2 1
0.00110∗2
Now the exponents are same , we can add themantissa , 2
011
0.000110∗2
X 1 + X 2=0.1010110∗2 3
0.0000110∗2
Comparison of Floating Point and Fixed Point Representation:
S.No Fixed Point Floating Point
1. Advantage: Disadvantage:
Faster computation Relatively slower computation
Fixed range and precision Variable range and precision
2. Disadvantage: Advantage:
Poor dynamic range Larger dynamic range
Uniform resolution throughout the numbers Fine resolution for small numbers but coarser
resolution for large numbers
3. Example: Example:
0.375 → 0.011 0
0.375 → 0.011∗2 =0.11∗2 =0.11∗2
−1 101

−0.375 →1.101 0 −1
−0.375 →1.011∗2 =1.11∗2 =1.11∗2
101

Overflow Error and Truncation Error


Overflow:
Overflow occurs during arithmetic operations when numbers exceed their range.
Fixed point addition and overflow:

Sign bit becomes one for the addition of two positive numbers resulting like a negative number. This is
actually due to overflow. To handle overflow, when the number exceeds the range, then exceeding
number should be replaced by the maximum number.
Overflow also occurs when two numbers (fixed point or floating point ) are multiplied. It means,
the result of product of two numbers can’t be stored in the memory register because of their large size
(b bits * b bits = 2b bits). In that case, to store large number, large number is made to small number either
by truncation or Rounding (This is also called as Quantization).
Multiplying fractional numbers (b bits * b bits = 2b bits can be quantized to b bits itself) without
much loss of accuracy.
AVR 5

Truncation and Rounding:


Rounding and Truncation causes error as shown below:
¿
¿
Error=Truncated Value ( ¿ Rounded Value )−Original Value
¿ general , Error=Quanitised Value−Orginal Value
e=Q [ x ] −x
Note:
Truncation of positive number always causes Negative error. (Because, after truncation, value is
always reduced).
Rounding may cause either positive or negative error.
b1 : Number of bits before quantization
b : Number of bits after quantization
b < b1 (always)

Range of Truncation Error in Fixed Point:


Positive Number:
−( 2 −2 ) ≤ e ≤ 0 →−2−b ≤ e ≤ 0 sin ce ( 2−b ≫ 2−b )
−b −b1 1

Sign-Magnitude Representation Negative Number:


0 ≤ e ≤ ( 2 −2 ) →0 ≤ e ≤ 2−b since ( 2−b ≫2−b )
−b −b1 1

1’s Complement Represented Negative Number:


0 ≤ e ≤ ( 2 −2 ) →0 ≤ e ≤ 2−b since ( 2−b ≫2−b )
−b −b1 1

2’s Complement Represented Negative Number:


−( 2 −2 ) ≤ e ≤ 0 →−2−b ≤ e ≤ 0 since ( 2−b ≫ 2−b )
−b −b1 1

Range of Rounding Error in Fixed Point:


Rounding error is same for both positive and negative numbers. (even for any representation)
−1 −b −b
( 2 −2 ) ≤ e ≤ 1 ( 2−b−2−b ) →− 1 2−b ≤ e ≤ 1 2−b
1 1

2 2 2 2
Fig: 5.1 Quantization errors (scan from A V O)
Range of Truncation Error in Floating Point: (Hint: Multiply by 2 with the fixed point range)
Positive Number:
−b
−2. 2 ≤ e ≤ 0
Sign-Magnitude Representation Negative Number:
−b
0 ≤ e ≤ 2.2
1’s Complement Represented Negative Number:
−b
0 ≤ e ≤ 2.2
2’s Complement Represented Negative Number:
−b
−2. 2 ≤ e ≤ 0
Range of Rounding Error in Floating Point:
−b −b
2 2
−2. ≤ e ≤ 2. →−2−b ≤ e ≤ 2−b
2 2

Quantization Noise and Quantization Noise Power:


(or Quantization in Sampling Analog Signals)
Sampled Signal is quantized to represent the numbers in digital system (which is otherwise impossible to
represent)
x ( n )=x a ( nT )
Theoretically, infinite numbers of bits are required to represent each sample. So, each sample must be
either truncated or rounded to fit a finite length register. Assume, samples are represented in 2’s
complement fixed point fraction of length, b+1 bits (b: no. of fractions, 1: sign bit) and quantized using
rounding and represented as ^x ( n ).
Error due to quantization can be modeled as additive noise.
AVR 6

Fig: 5.2 a) Sampler (Non Linear Model and b) Additive Noise Model for Quantization (Statistical Model)

^x ( n )=Q [ x ( n ) ]=x ( n ) + e ( n ) ( Recall: e=Q [ x ] −x )

Probability density functions are given for rounding and truncation:

Fig: 5.3 Probability density functions for a) rounding and b) truncation (assumed)
(Statistical Characterization of Quantization errors)
AVR 7


Mean∨Average Noise , me =E ( e )=∫ e . Pe ( e ) de
−∞
2
σ =Noise Variance=Noise Power
e


σ =E (( e−m e ) )=E ( e )= ∫ e . Pe ( e ) de
2 2 2
2
e
−∞
Mean and Variance of Rounding Noise:
+Δ +Δ

[ ] [ ]
2 2 Δ
1 1 1 e2 1 Δ2 Δ 2
me =E ( e )= ∫ e . de= ∫ e de=
2
= − =0
−Δ Δ Δ −Δ Δ 2 −Δ
2
2Δ 4 4
2 2
+Δ +Δ

[ ] [ ( )] [ ]
2 2 Δ
1 1 1 e3 1 Δ3 −Δ3 1 Δ3 Δ 3
Noise Power : σ = ∫ e . de= ∫ e de=
2 2
2 2
e = − = +
−Δ Δ Δ −Δ Δ 3 −Δ
2
3Δ 8 8 3Δ 8 8
2 2

¿
3Δ 8 [ ]
1 2 Δ 3 Δ2 2−2 b
= =
12 12
Mean and Variance of Truncation Noise:
Similarly proceeding,
−b −2 b
−Δ −2 2
2
me = = ;σ = e
2 2 12
Output Quantization Noise Power:
Whenever quantized signals are processed by a digital system (like Filter), the input error (or noise)
manifests itself as an error (or noise) in the resulting output.
Actualinput ¿ the system:=x ( n ) +e ( n ) (Statistical Modeling)
Actual output of the system := y ( n )+ f (n)
Ignoring theinput signal : x ( n ) , considering only e ( n ) as input , output noise power can be calculated .

x(n) y(n)
System

e(n) f(n)
System

{∑ }
+∞ +∞
Mean of the output=E [ y ( n ) + f ( n ) ]=E h ( k ) x ( n−k ) + ∑ h ( k ) e ( n−k )
k=−∞ k=−∞

Ignoring theunquantized input x ( n ) , since we need ¿ find only the effect of quantization noise
on the output :

[ ]
+∞ +∞ +∞
mf =E [ f ( n)] =E ∑ h ( k ) e ( n−k ) = ∑ h ( k ) E [ e ( n−k ) ] = ∑ h ( k ) me
k=−∞ k=−∞ k=−∞
∞ ∞
mf =me ∑ h ( n )=me ∑ h ( n ) e j 0 =m e H ( e j 0 )
n=−∞ n=−∞
(Since

H ( e jω )= ∑ h ( n ) e− jωn
n=−∞
Similarly ,
∞ 2 π 2
σ 2 σ
Output Noise Power : σ =σ ∑ |h ( n )| = e ∫ | H ( e jω )| dω= e ∮ H ( z ) H ( z−1 ) z−1 dz
2 2 2
f e
n=−∞ 2 π −π 2 πj

( )

∑ |h ( n )| = 21πj ∮ H ( z ) H ( z −1 ) z−1 dz
2
using Parseval’ s relation :
n=−∞

Co-efficient quantization error –


IIR Filter Coefficient quantization:
AVR 8

IIR Filter implementation requires multiplication of filter coefficient and addition.


In Fixed Point Arithmetic,
Rounding and Truncation of Multiplication is to be done
No rounding/truncation for Addition but it may cause overflow.
In Floating Point Arithmetic,
Rounding and Truncation is needed for both addition and multiplication.
Rounding and Truncation is a nonlinear process.
Quantization of IIR Filter coefficient causes:
 Non-Linearity in Filter structure.
o Very difficult to overcome this non-linearity to achieve a good filter.
 Zero-Input Limit cycle Oscillations:
In a stable digital filter implemented with infinite precision arithmetic (practically impossible)
has an input that is zero for n greater than some value n o, the output for n > no will decay
asymptotically toward zero.
x(n) Filter y(n)
(Unquantized coefficients)

0 nop nop

Fig: 5.4a No limit cycles in Unquantized Filter Coefficients Limit Cycle

Filter y(n)
x(n)
(Quantized coefficients)
...

0 nop nop
Fig: 5.4 b Limit Cycles in Quantized Filter Coefficients
For the same filter, implemented with finite register length arithmetic (Quantization), the output may
decay into a non zero amplitude, after which it has an oscillatory behaviour. This effect is often referred
to as zero-input limit cycle behavior. (This is due to nonlinear quantizers in the feedback loop of the
filter)
Example:

x(n) + y(n)

Fig: 5.5a Ideal Linear system

x(n) + y(n)

Q[ ]

Fig: 5.5b Nonlinear System due to Quantization


Assume α =0.5∧x (n) is aimpulse whose value is7 /8∧b=4=(3+1)∧Quantization is done through Rounding
y (n)=x ( n)+ α y (n−1)
y (0)=7 /8+0.5∗0=0.875
y (1)=0+ 0.5∗0.875=0.4375
y (2)=0+ 0.5∗0.4375=0.21875
y (3)=0+0.5∗0.21875=0.109375
y (4)=0+ 0.5∗0.109375=0.0546875
Output y (n) is decayingtowards zero

y ( n )=x ( n ) +Q [ α y ( n−1 ) ]
7
y ( 0 )= +Q ( 0.5∗0 )=0.875=0.111( 2)
8
Quantized
AVR 9

y ( 1 )=0+Q ( 0.10 0 (2)∗0.111( 2) )=Q ( 0.011100 )=0.10 0(2 )=0.5


y ( 2 )=0+ ( 0.10 02∗0.10 0 2) =Q ( 0.010000 )=Q ( 0.010 )=0.010 (2)=0.25
y ( 3 )=0+ Q ( 0.100 2∗0.01 02 )=Q ( 0.00100 ) =0.00 1(2)=0.125
y ( 4 )=0+Q ( 0. 1002∗0.0012 ) =Q ( 0.00 )=Q ( 0.0001 )=0.00 12=0.125
y ( 5 )=0+ Q ( 0.1002∗0.0012 ) =Q ( 0.00 )=Q ( 0.0001 )=0.00 12=0.125
(Plotted in the figure:5.6)
Output y(n) is constant at 0.125 even at no input. Since output is continuously oscillating at 0.125, it
is called Limit Cycle Oscillation
Also try the above example with α =−0.5
The amplitude level of limit cycle is limited to a particular range. This range is called dead band. In
this problem, dead band is : −2−b ¿+2−b

Fig: 5.6 Limit Cycle Oscillations


To find Dead band for General First Order Filter:
1 −b
e=Q ( x )−x ≤ 2 ( For Rounding error)
2
|Q [ α y ( n−1 ) ]−αy ( n−1 )|≤ 12 2−b
During limit cycle , Q [ αy ( n−1 ) ] = y ( n−1 ) ( Refer example before )

| y ( n−1 )−αy ( n−1 )|≤ 1 2−b


2
1 −b
| y ( n−1 )|( 1−|α |) ≤ 2
2
1 −b
2
2
Dead Band∨Amplitude of lim ¿ cycle:| y ( n−1 )|≤
1−|α|
Statistical Model of Quantization Noise (Fixed Point Representation):
 For simple filters, it's easy to analyze the non-linearity due to quantization(limit cycle behaviour)
for simple inputs (like impulse input). But for higher order filters and complex inputs, it's difficult
to do this analysis.
 Hence an approximate analysis is carried out in those cases. Effect of quantization is modeled as
additive noise (Statistical Model).
 Then the filter becomes linear filter but the output has noise component (noise power) due to
rounding (Or truncation) of the input sequence.
AVR 10

x(n) + y(n)

e(n)
Fig:5.6 Statistical Model for fixed point roundoff noise in First order IIR Filter
Assumptions about the noise:
 e(n) is a white noise sequence
 e(n) has a uniform distribution over one quantization interval
−1 −b 1 −b
Roundoff noise Error range: 2 ≤ e ( n) ≤ 2
2 2
m e =0
−2 b
2 2
Input Noise power :σ e =
12

∑ |h ( n )|
2
Output N oise power :σ 2f =σ 2e
n=−∞

If h ( n ) =α n u ( n ) , then , ( assumed impulse response is same for both signal∧noise )


∞ ∞

∑ α 2 n ( u ( n ) ) =σ 2e ∑ α 2 n
2
σ 2f =σ 2e
n=−∞ n=0

1
Since ∑ bn=
n=0 1−b

( )
−2 b
2 2 1 2 1
σ f =σ e 2
= 2
1−α 12 1−α
Ovcrflow Oscillations:
Sometimes, due to overflow, output reaches the maximum value. Now the limit cycle will oscillate with
maximum amplitude. This is called Overflow Oscillations

Scaling to prevent Overflow:


To avoid overflow, each node in the filter must be constrained to maintain a magnitude less than
unity.
Let x ( n ) : Filter Input , y k ( n ) :Output of kth node∧hk ( n ) :impulse response ¿the input
¿ kth node∈the filter .
+∞
y k ( n )= ∑ hk ( r ) x (n−r )
r =−∞

I If x max denotes the ma ximumof the absolute value of the input ,then
+∞ +∞
|y k ( n )|≤ ∑ hk ( r ) x max →| y k ( n )|≤ xmax ∑ hk ( r )
r=−∞ r =−∞

¿ avoid overflow ,| y k ( n )|<1


+∞
x max ∑ hk ( r ) <1
r=−∞
1
x max < +∞

∑ hk ( r )
r=−∞
For FIR Filter, this reduces to:
1
x max < M −1
∑ hk ( r )
r =0
In most cases, scaling (reducing the input by some constant times x --> a x where, a < 1 ) of the input
using above equation is required to guarantee that no overflow occurs.
AVR 11

FIR Filter Coefficient quantization:


NO Limit cycle effect (because it has no feedback)
Statistical Model of Quantization Noise (Fixed Point Representation):

−2 b
2 2 2
σ f =M . σ e =M
12

Applications of DSP:
Noise Cancellation
Echo Cancellation and generation
Speech signal processing
Image Processing

You might also like