6.0 INTRODUCTION
374
‘As we saw in Chapter 5,an LTI system with a rational system function bas the property
that the input and output sequences satisfy linear constant-coefficient difference equa-
tion, Singe the system function is the ¢-transform of the impulse response, and since the
difference equation satisfice by the input and output can be determined by inspection
Of the system function, follows that the difference equation, the impulse response, and
the system function are equivalent characterizations of the input-output relation of an
LU discrete-time system. When such systems arc implemented with discrete-time ana-
og or digital hardware. the difference equation or the system function representation
must be converted to an algorithm or structure that can be realized in the desired tech-
nology. As we will see:in this chapter, systems deseribed by lineztr constant-coefficient
difference equations can be represented hy structures consisting of an interconnection
of the basic operations of addition, multiplication by a comslunt, and delay, the exac!
implementation of which is dictated by the technology to be used.
Asan illustration of the computation associated with 2 difference equation, con-
sider the system described by the system function
by + yet "
Hg = TE. tl > ll. (6)
“The impulse response of this system is
ho] = Byala) + da" tuln 1, (62)
and the ISLorder difference equation that is satisfied by the input and ontpat
sequences is
ylnd = aya — 1] = boxtad + bieln — 1. (63)Section 6.1 Block Diagram Representation of Linear Constant-Coefficient Difference Equations 375
‘Fquation (6.2) gives a formula for the impulse response for this system. However,
since the system impulse response has infinite duration, even if we only wanted to
compute the output over a finite interval, it would not be efficient to do so by discrete
convolution since the amount of computation required to compute y{n] would grow
with 2. However, rewriting Eq. (6.3) in the form
yin} = ayln — 1] + boxtn] + yzin = 1] (64)
provides the basis for an algorithm for recursive computation of the output at any
time n in terms of the previous output yl — 1}, the current input sample x[7], and
the previous input sample x[v ~ 1. As discussed in Section 2.5, if we further assume
initial-rest conditions (i.e. if x\x] = 0 for < 0,then y[n} = 0 for n = 0), and if
swe use Eq. (64) as a recurrence formula for computing the output from past values
of the output and present and past values of the input, the system will be linear and
time invariant. A similar proceclure can be applied to the more general case of an
"order difference equation. However, the algorithm suggested by Eq. (6.4), and
generalization for higher-order difference equations is not the only computational
algorithm for implementing a particular system, and often, itis not the best choice. As
‘we will see. an unlimited variety of computational structures result in the same relation
between the input sequence x{n] and the output sequence y[n}
Inthe remainder of this chapter. we consider the important issuesin the implemen-
tation of LT! discrete-time systerns. We first present the block diagram and signal flow
graph descriptions of computational structures for linear constant-coetficient difference
equations representing LT causal systems.' Using a combination of algebraic manipu-
ations and manipulations of block diagram representations, we derive a number of basic
equivalent structuses for implementing a causal LTT system including lattice structures,
Although two structures may be equivalent with regard to their input-output character
istics for infinite-precision representations of coefficients and variables, they may have
vastly different behavior when the numerical precision is limited. This is the major rea-
son that it is of interest to study different implementation structures, The effects of finite-
precision representation of the system coefficients and the effects of truncation orround-
ing of intermediate computations are considered in the latter sections of the chapter.
6.1 BLOCK DIAGRAM REPRESENTATION OF LINEAR
CONSTANT-COEFFICIENT DIFFERENCE EQUATIONS
The implementation of an ITT discrete-time system by iteratively evaluating a recur-
ence formula obtained from a difference equation requires that delayed values of the
‘output, input, and intermediate sequences be available. The delay of sequence values
implies the need for storage of past sequence values, Also, we must provide means for
multiplication of the delayed sequence values by the coefficients, as well as means for
adding the resulting products. Therefore, the basic elements required for the implemen-
tation of an LTT discrete-time system are adders, multipliers, and memory for storing
#Such flow graphs are also called “networks” ip analogy tovtectrical circuit diagrams. We shall use the
terms low graph, structure, and net work interchangeably with respect gruphic representations of difference
equationsChapter Structures for Discrete-Time Systems
vale
ze) wb + abel
wal ei
LL ,Frizure 6.4 Block ciagram symbols
tn Tie 71] (A) Addition of two sequences.
. (b) Multiplication of a sequence by a
© ‘constant, (t) Unit delay.
delayed sequence values and coefficients. The interconnection of these basic elements
is conveniently depicted by block diagrams composed of the: busic pictorial symbols
shown in Figure 6.1, Figure 6, 1(a) represents the addition of two sequences. In general
block diagram notation. an adder may have any number of inputs. However, in almost
all practical implementations. adders have only two inpuls. In all the diagrams of this
chapter, we indicate this explicitly by limiting the number of inpuls as in Figure 6.1(a}.
Figure 6.1(b) depicts multiplication of a sequence by a constant, and Figure 6.1(c) de-
picts delaying a sequence by one sample, In digital implementations, the delay operation
can be implemented by providing a storage register for each unit delay that is required,
For this reason, we sometimes refer to the operator of Figure 6.1(c} as a delay register.
In analog disercte-time implementations such as switched-capacitor filters, the delays
are implemented by charge storage devices. The unit delay system is represented in Fi
ure 6.1(c) by its system function, 4, Delays of more than one sample can be denoted
as in Figure 6.1(¢), with a system function of 2, where M is the number of samples
of delay; however, the actual implementation of Af samples of delay would generally
be done by cascading M unit delays, In an integrated-ciscuit implementation, these unit
delays might form a shift register that is clocked at the sampling rate of the input signal.
In a software implementation, M cascaded unit delays would be implemented as M
consecutive memory registers
Example 6.1 Block Diagram Representation of a Difference
Equation
Asan example of the representation of a difference equation in terms of the elements
in Figure 6.t, consider the 2-order difference equation
F vin} = ayy — 1] + any — 2] + boat 5)
© The corresponding system function is
HG 6.6)
Tae =a
The block diagram representation ofthe systemt realization based oa Eq. (6.5) isshowa
in Figure 6.2, Such diagrams give a pictorial representation of a computational a
1 gorithm for ienplementing the system. When the system is implemented on either 4Section 6.1
Blook Diagram Representation of Linear Constant-Coefficient Difference Equations 377
©) general-purpose computer or a digital signal processing (DSP) chip, network structures
|| Stich asthe one shown in Figure 6.2serveas the basis for a program that implements the
system. Ifthe system isimplemented with discrete components or asacomplete s
with very large-scale integration (VLSI technology, the block diagram is the basis for
determining a hardware architecture for the system. In both eases, diagrams such as
| Figure 6.2 show expiicitly that we must provide storage for the delayed variables (in
this case, yln—1Tand yr —2)) and also the coetficients ofthe difference equation (in
this case, uj,.<), and hy). Furthermore, we see ftom Figure 62 that an output sequence
value yf] is computed by first forming the products ayn — 1] and ayy {4 ~ 2}. then
adding them, and, finally, adding the result to box[n, Thas, Figure 62 conveniently
depicts the complexity of the associated computational algorithm, the steps of the
algorithm, and the amount of hardware required to realize the system.
‘ yfn-2]
Figure 6.2 Example of a block diagram representation ofa difference equation.
Example 6.1 can be generalized to higher-order difference equations of the form?
x «
vit = Cayln - £1] =) bexin - (67)
‘
with the corresponding system function
u
Dobe
HQ = _. (68)
1-Yae*
Rewriting Eq, (6.7) as a recurrence formula for y{n]in terms of a linear combination of
past values of the output sequence and current and past values of the input sequence
2 the form used in previous chaprers fora general ND order difference equation was
y w
Vevte a= Sie 4
= f=
Tn the remainder of the book. it wil be more convenient 10 use the form in Eq. (67), where the ooetficient
Of y{n) is normalized to unity und the coefficients asociated with the delayed output appear with a positive
sigh afer they have been moved to the right-hand side of the equstion. (See Eq. (6.9).)378
Ghapter6 Structures for Discrete-Time Systems.
Jeads to the relation
8
yn] = Sayin — 41+ 7 basta — kh (69)
i
‘The block diagram of Figure 6.3 is an explicit pictorial representation of Eq, (6.9)
More precisely, it represents the pair of difference equations
vin] = )Sbexin #1, (6.10a)
im
vlad = So ary
mi
“The assumption of a two-input adder implies that the additions are done in a specified
order, hat is, Figure 6.3 shows that the products ay yu — NJ and ay-iyta — N +1]
must be computed, then added, and the resulting sum added to ay—2¥ln — N +2], and
soon, After y[n] has been computed, the delay variables must be updated by moving
o{n — N + LJ into the register holding yl — NJ, and so on, with the newly computed
‘yla} becoming yf — 1] for the next iteration.
A block diagram can be rearranged or modified in a variety of ways without chang-
ing the overall system function. Each appropriate rearrangement represents a different
computational algorithm for implementing the same system. For example, the block
diagram of Figure 6.3 can be viewed as a cascade of two systems, the first represent-
ing the computation of v(1| from x(n and the second representing the computation of
{vl from v[n. Since each of the two systems is an LTT system (assuming initial-rest
conditions for the delay registers), the order in which the two systems are cascaded can
be reversed, as shown in Figure 6.4, without affecting the overall system Function. In
Figure 6.4, for convenience, we have assumed that M = N. Clearly, there is no Loss of
generality, since if M 3 NV, some of the coefficients ay or by in the figure would be zero,
and the diagram could be simplified accordingly
nk) + vl (6.10b)
Figure 6.3 Block diagram
rapresentation for a general N"-orger
yIn-NY ifference equation.Section 6,1 Block Diagram Representation af Linear Constant-Goelficient Diference Equations 379
In terms of the system function H(z) in Eq. (68). Figure 6.3 can be viewed as an
implementation of (z) through the decomposition
a
L
H(2) = Hae) Hile) = 7 (She (11)
1- Yq * |?
=
or, equivalently, through the pair of equations
Vz) = My(@)X (2) (6.128)
1
¥ (5) = Hxz)¥(@) = | —{— ] ¥@) (6.120)
1-Slas*
a
Figure 6.4, on the other hand, represents H(z) as
1
+ (6.13)
1
or, equivalently, through the equations
pees 1 f
Wiz) = Ha(z)X (2) = | —~——_ ] X@), (6.14a)
1- Yaz
a
YG)
a
= Hy(z)Wiz) = (Sact) wes (6.14b)
=Chapter 6 Structures for Discrete-Time Systems
lal by
re
a by
wf]
nN]
Figure 6.4 Rearrangement of block
diagram of Figure 6.3. Wa assume for
‘convenience that N= MIEN 2 M,
win] some of the coefficients wil be zero
Inthe time domain, Figure 6.4 and, equivalently, Eqs. (6.14a) andl (6.14b) can be repre-
sented by the pair of difference equations
wha] = Sawin — b+ xed, (6.153)
i
yin] =) bewtn — kD (6.15b)
‘The block diagrams of Figures 6.3 and 6.4 have several important differences. In
Figure 6.3, the zeros of H(z), represented by H (2), are implemented first, followed by
the poles, represented by H(z). In Figure 6.4, the poles are implemented first, followed
by the zeros. Theoretically, the order of implementation does not affect the overall sys-
tem function. However, as we will see, when a difference equation is implemented with
finite-precision arithmetic, there can be a significant difference between two systems
that are equivalent with the assumption of infinite precision arithmetic in the real num-
ber system. Another important point concerns the number of delay elements in the two
systems, As drawn, the systems in Figures 6.3 and 6.4 each have a total of (V + Mf)
delay elements, However, the block diagram of Figure 6.4 can be redrawn by noting
that exactly the same signal, wf], is stored in the two chains of delay elements in the
Figure. Consequently, the two can be collapsed into one chain, as indicated in Figure 6.5,
‘The total number of delay elementsin Figure 6.5 is tess than or equal to the number
required in either Figure 6.3 or Figure 6.4.and in fact itis the minimum number required
toimplement asystem with system function given by Eq, (6.8). Specifically, the minimum
number of delay elements required is, in general, max(W, Mf). An implementation with
the minimum number of delay elements is commonly referred to a8 a canonic formSection 6:1
Block Diagram Ropresentation af Linear Constant-Coeflicient Difference Equations 381
win}
To]
Figure 6.5 Combination of delays in
Figure 6.4,
implementation. The noncanonic block diagram in Figure 6.3 is referred to as the direct
form Fimplementation of the general Nth-order system because itis a direct realization
of the difference equation satisfied by the input x[n] and the output yfa], which in
Turn can he written directly from the system function by inspection. Figure 6.5 is often
referred to as the direct form I or canonic direct form implementation. Knowing that
Figure 6.5 is an appropriate realization structure for H(2) given by Eq. (6.8), We can 20
directly back and forth in a straightforward manner between the system function and
the block diagram (or the equivalent difference equation).
Example 6.2 Direct Form I and Direct Form |
Implementation of an LTI System
Consider the LTT system with system function
e 14227
4) = ope
OS [rise 1 40.
‘Comparing this system (unction with Eq. (6.8), We find bg = 1, bt = 2,4, = +15,
and a = ~0.9, so it follows from Figure 63 that we can implement the system in
direct form [block disgram as shown in Figure 6,6, Referring io Figure 6.5, we can also
implement the system function in direct form 11, as shown in Figure 6,7. In both cases,
note that the coetficients in the feedback branches in the block diagram have opposite
signs from the corresponding coefficients of 2~? and ¢~? in Eq. (6.16). Although this
‘change of sign is sometimes confusing, itis essential to remember that the feedback
coefficients (ax) slways have the opposite sign in the difference equation from their
sign in the system function, Note also that the direct form 11 requires only two delay
elements to implement #(¢), one less than the direct form | implementation.
(6.16)382
Chapter 6 Structures for Discrete-Time Systerns
Figure 6.6 Direct form | implementation of Ea. (6.16).
Figure 6.7 Direct form It implementation of Eq, (6.16),
In the preceding discussion, we developed two equivalent block diagrams for im-
plementing an LTT system with system function given by Eq. (6.8). These block diagrams,
which represent different computational algorithms for implementing the systern, were
obtained by manipulations based on the Linearity of the system and the algebraic prop-
erties of the system function. Indeed, since the basic difference equations that represent
an LTI system are linear, equivalent sets of difference equations can be obtained sim-
ply by linear transformations of the variables of the difference equations. Thus, there
are an unlimited number of equivalent realizations of any given system. In Section 6.3,
using an approach similar to that employed in this section, we will develop a number of
other important and useful equivalent structures for implementing a system with system
function as in Eq, (6.8). Before discussing these other forms, however, it is convenient
to introduce signal flow graphs as an alternative to block diagrams for representing
difference equations.
6.2 SIGNAL FLOW GRAPH REPRESENTATION OF LINEAR
CONSTANT-COEFFICIENT DIFFERENCE EQUATIONS
A signal flow graph representation of a difference equation is essentially the same as
a block diagram representation, except for a few notational differences. Formally, aSection 6.2
Signal Flow Graph Representation 383
Noe
Ke)
a) / Figure 6.8 Example of nodes and
Nodek branches ina signa flow graph
signal flow graph is a network of directed branches that connect al nodes. Associated
with each node is a variable or node value. The value associated with node & might
be denoted wz, or, since node variables for digital filters are generally sequences, We
often indicate this explicitly with the notation will, Branch (7,4) denotes a branch
originating at node j and terminating at node k, with the direction from j to k being
\dicated by an arrowhead on the branch. This is shown in Figure 6.8. Each branch
has an input signal and an output signal, The input signal from node j to branch (j, 4)
is the node value w;[n|. In a linear signal flow graph, which is the only class we will
consider, the output of a branch is a finear transformation of the input to the branch,
‘The simplest example is a constant gain, ie., when the output of the branch is simply a
constant multiple of the input to the branch. The linear operation represented by the
branch is typically indicated next to the arrowhead showing the direction of the branch.
For the case of a constant multiplier, the constant issimply shown next (o the arrowhead.
When an explicit indication of the branch operatian is omitted, this indicates a branch
transmittance of unity, or the identity transformation. By definition, the value at each
node in a graph is the sum of the outputs of all the branches entering the node.
To complete the definition of signal flow graph notation, we define two special
types of nodes, Source nodes are nodes that have no entering branches, Source nodes
are used to represent the injection of external inputs or signal sources into a graph.
Sink nodes are nodes that have only entering branches. Sink nodes are used to extract
outputs from a graph, Source nodes, sink nodes, and simple branch gains are illustrated
in the signal flow graph of Figure 6,9, The linear equations represented by the figure
are as follows:
wn] = xn] + awin} + bw2 ln],
waln| = cuilnl, (617)
gla] = dein] +ewaln
Source
Rede fn] wf)
Sink
jrjnose Figure 6.9. Example ofa signal flow
graph showing source and sink nodes.
fr
og Anata384
Chapter Structures for Discrete-Time Systems
Souree
rode 0
8
vin]
Figure 6.10 (2) Block diagram
ropresentation ofa 1"'-order digital iter
(b) Structure of the signal flow graph
wy garespanlg fs Boek cea
5) fa.
Addition, multiplication by a constant, and delay are the busie operations required
toimplementa linear constant-cocfficient difference equation. Since these are all lincar
operations, it is possible to use signal flow graph notation to depict algorithms for
implementing LTI discrete-time systems. As an example of how the flow graph concepts
just discussed can be applied to the representation of a difference equation, consider
the block diagram in Figure 6.10(a), which is the direct form II realization of the system
whose system function is given by Eq. (6.1). A signal flow graph corresponding to this
system is shown in Figure 6.10(b). In the signal flow graph representation of difference
equations, the node variables are sequences. In Figure 6.10(b), node Qis a source node
whose value is determined by the input sequence x[n], and node 5 is a sink node whose
value is denoted yl]. Notice that the source and sink nodes are connected to the
rest of the graph by unity-gain branches to clearly denote the input and output of the
system, Obviously, nodes 3 and 5 have icentical values, The extra branch with unity
gain is simply used to highlight the fact that node 3 is the output of the system. Tn
Figure 6.10(b), all branches except one (the delay branch (2, 4)) can be represented
by a simple branch gain; ie, the output signal is a constant multiple of the branch
input, A delay cannot be represented in the time domain by a branch gain, However,
the z-transform representation of a unit delay is multiplication by the factor 2". Ifwe
represented the difference equations by their corresponding z-transform equations, all
the branches would be characterized by theirsystem {unctions. In this ease, each branch
ain would be a function of z; eg., a unit delay branch would have a gain of 2~!. By
convention, we represent the variables in a signal flow graph ax sequences rather than
as z-transforms of sequences. However, to simplify the notation, we normally indicate
2 delay branch by showing its branch gain as <~!, bul it is understood that the output
of such a branch is the branch input delayed by one sequence value. That is, the use of
2! in a signal flow graph is in the sense of an operator that produces a delay of one
sample, The geaph of Figure 6,10(b) is shown in Figure 6.11 with this convention. TheSection 82
‘Signal Flow Graph Representation 385
will whe a, wale
. 11 M = N, then Np = M — N; otherwise, the first summation in
Eq. (6.34) is not included. If the coctficients ay and hy are real in Eq, (6.27), then the
quantities Ay, Bg, Ce, ce, and eg are all real. In this form, the system function can be
interpreted as representing a paraltel combination of i*- and 2°4-order LR systems,
with possibly .V, simple scaled delay paths. Alternatively, we may group the real poles
in pairs, so that 1(z) can be expressed as
(6.35)
YD Ny
He) = Deets Yet
kad k ee.
aye
where, as in the cascade form, Ny = L(V +1)/2| is the largest integer contained in
(N-+1)/2,and if Np = M —N is negative, the first sumis not present. A typical example
for N = M = Gisshownin Figure 6.20. The general difference equations for the parallel
form with 2"-order direct form I] sections are
wel] = aywaln — 1 bawele 24 ala}, k= 122M (6.360)
Yale) = eoewaled + crawler ~ 1), kaL2 My (6,36b)
, Ne
yb = So Cutln — E+) vile (6.360)
ra] ai
If M < NW, then the first summation in Eq, (6.36c) is not included,394 Chepter6 Structures for Discrete-Time Systems
&
will ey
Figure 6.20 Parallel torm structure for 6"*-order system (MM — WW = 6) with the
real and complex poles grouped in pairs.
Example 6.6 Illustration of Parallel Form Structures
Consider again the system function used in Examples 6.4 and 6.5. For the parallel form,
fe mustexpress H(z) in the form of either Eq, (6.34) or Eq, (6.35). If we use order
sections,
l4mlyet T+ 87!
Toonset yodas ? 7° * parm Ty ase
| The parallel form realization for this example with a 2°-order section is shown in
Figure 6.21
Since all the poles are real, we can obtain an alternative parallel orm realization
by expanding H(z) as
He (637)
448s
1-052) ~ 1-028
‘The resulting parallel form with 1%-order sections is shown in Figure 6.22. As in the
general ease, the difference equations represented by both Figures 6.21 and 6.22 can
“: be written down by inspection.
HE (638)
)Section 6.3 Basic Structures for IR Systems 395
xia) |
Figure 6.21 Paral form structure for Example 6.6 using a 2"-order
system.
a
Is
ala via}
05
2s
028
Figure 6.22 Parallel form structure for Example 6.6 using 1*-order systems.
6.3.4 Feedback it
IR Systems
All the flow graphs of this section have feedback loops; ie., they have closed paths that
begin at a node and return to that node by traversing branches only in the dircetion of
their arrowheads, Such a structure in the flow graph implies that a node variable in aloop
depends directly or indirectly on itself, A simple example is shown in Figure 6.23(a).
which represents the difference equation
ple] = ayfn — 11+ xfn]. (6.39)Chapter 8 Structures for Discrete-Time Systems
ch vial
a ° } Pa
in) — le
a Figure 6.23 (a) System with feedback
loop. (b} FIR system with feedback loop
te {c) Noncomputable system.
Such loops are necessary (but not suificient) to generate infinitely long impulse
responses. This can be seen if we consider a network with no feedback loops. In such a
case, any path from the input to the output can pass through each delay element only
once. Therefore, the longest delay between the input and output would occur for a path
that passes through all of the delay elements in the network. Thus, For a network with
no loops, the impulse response is no longer than the total number of delay ctements
in the network. From this, we conclude that if a network hus no loops, then the system
funetion has only zeros (except for poles at z = 0), and the number ef zeros can be no
more than the number of delay elements in the network
Returning to the simple example of Figure 6.23(a), we see that when the input
is the impulse sequence 3{n, the single-input sample continually recirculates in the
feedback loop with either increasing (if jal > 1) or decreasing (if Jal = 1) amplitude
owing to multiplication by the constant a, so that the impulse response is hf] = aul]
This illustrates how feedback can create an infinitely long impulse response.
Ifa system function has poles a corresponding block diagram or signal flow graph
will have feedback loops. On the other hand, neither poles in the system function nor
loops in the network are sufficient for the impulse response to be infinitely long. Fig-
ure 6.23(b) shows a network with a feedback Toop, but with an impulse response of
finite length. This is because the pole of the system function cancels with a zero; he for
Figure 6.23(b),
(6.0)
The impulse response of this system is ffm] = 3|7] + a3{n — 1). The system is a simple
example of a general class of FIR systems called freguency-sampling systerns. This class
of systems is considered in more detail in Problems 6.39 and 6,51.Section 6.4 Transposed Forms 397
Loops in a network pose special problems in implementing the computations im-
plied by the network, As we have discussed, it must be possible to compute the node
variables in a network in sequence such that all necessary values are available when
needed. In. some cases, there is no way to order the Computations so that the node
variables of a flow graph can be computed in sequence. Such a network is called non
computable (see Crochiere and Oppenheim, 1975). A simple noncomputable network
is shown in Figure 6.23(c). The difference equation for this network is
yfe] = aytn} + af). (641)
In this form, we cannot compute yin] because the right-hand side of the equation in-
volves the quantity we wish to compute. The fact that a flow graph is noncomputable
does not mean the equations represented by the flow graph cannot be solved; indeed,
the solution to Eq. (6.41) is yfa] = x[u]/(1 a}. ILsimply means that the flow graph does
nol represent a set of difference equations that can be solved successively for the mode
variables. The key to the computability of a flow graph is that all loops must contain at
east one unit delay element. Thus, in manipulating flow graphs representing implemen-
ations of LL systems, we must be careful not to create delay-free loops. Problem 6.37
deals witha system having a delay-free loop. Problem 7.51 shows how a delay-free loop
can be introduced.
6.4 TRANSPOSED FORMS
‘The theory oflinear signal flow graphs provides a variety of procedures for transforming,
such geaphs into different forms while leaving the overall system function between
input and output unchanged. One of these procedures, called flow graph reversal or
transpasition, leads to a set of transposed system structures that provide some useful
alternatives to the structures discussed in the previous section,
Transposition of a flow graph is accomplished by reversing the directions of all
branches in the metwork while keeping the branch transmittances as they were and
reversing the roles of the input and output so that source nodes become sink nodes and
vice versa. For single-input, single-output systems, the resulting flow graph has the same
system function as the original graph if the input and output nodes are interchanged.
Although we will not formally prove this result here,* we will demonstrate that iLis
valid with two examples.
Example 6.7 Transposed Form for a 1*-Order System with
No Zeros
The ISLorder system corresponding to the flow graph in Figure 6.24a) has system
function
(62)
The theorem fotiows dtecty from Mason’s gain formuta of signal flow graph theory (Sec Mason and
Zimmermann. 1960: Chow and Cassignol, 1962; 0: Phillips and Nagle, 1995.)398,
Chanter Structures for Diserate-Timo Systems
28 To obtain the teansposed form for this system, we reverse the directions of all the
branch arrows, taking the output where the input was and injecting the input where
the output was. The resull is shown in Figure 6.24{b). Tis usually convenient to draw
the transposed network with the inputon the left and the output on the right, as shown
in Figure 6,24(c). Comparing Figures 6,24(a) and 6.24(c) we note that the only differ
cence is that in Figure 6.24(a), we multiply the delayed output sequence yin — 1 by the
coefficient a, whereas in Figure 624(c) we multiply the output yfn] by the coetficient
aand then delay the resulting prod uct. Since the two operations can be interchanged,
‘we can conclude by inspection that the original system in Figure &24(a) and the cor
responding transposed system in Figure 6.24(c) have the same system function.
Stee
ae
x td
(a)
ae - “Tie
5
ie rc)
Weal
ap
¢
2 ©)
EB
a Figure 6.24 (a) Flow graph of simple 1**-order system, (b) Transposed form af
(9 6) Sacro) redrawn wih np ona.
In Example 6.7, itis straightforward tosee that the original system and its transpose
have the same system function. However, for more complicated graphs. the result is
often not so obvious. This is illustrated by the next example,
Example 6.8 Transposed Form for a Basic 2"¢-Order Section
Consider the basic 2"-ordex section depicted in Figure 6.25. The corresponding ditt
ference equations for this system are
vf] = ay wher — 11 + agteln — 2] =x Enh, (6.434)
la] = bpwler} + by wl — 1) + bwhn — 2) (6.430)Section 64 Transposed Forms 399
‘The transposed flow graph is shown in Figure 6.26; its corresponding difference
equations are
gla] = &pxie] + vba — 1, (@.44a)
vf] = tot (6.44h)
uyled = ay ybnd + byt} + natn — 1 (6.44)
vbr] = agylay 4+ bax (4a)
Equutions (643n)-(6.43b) and Eqs (6.440)-(6.44d) are different ways to ong
nize the computation of the output samples [a] from the input samples x[n), and it
is not immediately clent that the wo sets of difference equations are equivalent, One
vay to show this equivalence is €0 use the z-transform representations of both sets
of equations, solve for the ratio ¥(:)/X (s) = H(z) in both cases, and compare the
results. Another way is £0 substitute Eq. (6.444) into Eq. (6.44e), substitute the result
into Eq. (6.44), and finally, substitute that result into Ea. (6.44b). The final result is
yim = aryl — 1) + avin —21-+ Bpxin] + bya ~ 1} + boxy — 21 (645)
Since the network of Figure 625 isa direct form I structure, itis easily seen that the
input and output of the system in Figure 6.25 also satisfies the difference Eq. (6.45).
‘Therefore, for initial-rest conditions, the systems in Figures 6.25 and 6.26 are
equivalent.
whol by
Figure 6.25 Direct form I! structure for Example 6.8.
ty tala)
Figure 6.26 Transposed direct form Il structure for Example 6.8,
The transposition theorem can be applied to any of the structures that we have
discussed so far, For example, the result of applying the theorem to the direct form I
structure of Figure 6.14 is shown in Figure 6.27, and similarly, the structure obtained by400
Chapler6 Structures for Discrete-Time Systems
Figure6.27 _Generalfiow graph resulting from applying the transposition theorem
tothe direct form I structure of Figure 6.14.
transposing the direct form Il structure of Figure 6.15 isshown in Figure 6.28. If signal
flow graph configuration is transposed, the number of delay branches and the number
of coefficients remain the same. Thus, the transposed direct form If structure is also a
canonic structure. Transposed structures derived from direct forms are also “direct” in
the sense that they can be constructed by inspection of the numerator and denominator
of the system function,
‘An important point becomes evident through a comparison of Figures 6.15 and
6.28, Whereas the direct form II structure implements the poles first and then the
zeros, the iransposed direct form IT structure implements the zeros first and then the
poles. These differences can become important in the presence of quantization in finite-
precision digital implementations or in the presence of noise in discrete-time analog
implementations
Figure6.28 _Goneraltlow graph resulting from applying the transposition theorem
ta the direct form il structure of Figure 6.15.Section 65 asic Network Structures for FIR Systems aot
When the transposition theorem is applied to cascade or parallel structures, the
individual 2"-order systems are replaced by transposed structures. For example, ap-
plying the transposition theorem 1o Figure 6, 18 results in a cascade of three transposed
direct form IT sections (cach like the one in Example 6.8) with the same coefficients as
in Figure 6.18, but with the order of the cascade reversed. A similar statement car be
made about the transposition of Figure. 6.20.
The transposition theorem further emphasizes that un infinite variety of imple-
mentation structures exists for any given rational system function, The transposition
theorem provides a simple procedure for generating new struetures. The problems of
{implementing systems with finite-precision arithmetic have motivated the development
of many more classes of equivalent structures than we can discuss here. Thus, we con-
centrate only on the most commonly used structures.
6.5 BASIC NETWORK STRUCTURES FOR FIR SYSTEMS
The direct, cascade, and parallel form structures discussed in Sections 6.3 and 6.4 are
the most common basic structures for IIR systems, These structures were developed
under the assumption that the system function had both poles and zeros. Although the
direct and cascade forms for IIR systems include FIR systems as a special case, there
are additional specific forms for FIR
6.5.1 Direct Form
For causal FIR systems, the system function has only zeros (except for poles at z = 0).
and since the coefficients ay are all zera, the difference equation of By. (6.9) reduces to
“
vba] = So baxta 41, (6.46)
i
This can be recognized as the discrete convolution of xn] with the impulse response
vin=[
In this case, the direct form J and direct form II structures in Figures 6.14 and 6.15 both
reduce to the direct form FIR structure as redrawn in Figure 6.29. Because of the chain
of delay elements across the top of the diagram, this structure is also referred to asa
rapped delay line structure or a transversal filter structure. As seen from Figure 6.29. the
signal at each tap along this chain is weighted by the appropriate coefficient {impulse-
response value), and the resulting products are summed to form the output yl].
‘The transposed direct form for the FIR case is obtained by applying the transpo-
sition theorem to Figure 6.29 or, equivalently, by setting the coefficients a, 0 zero in
Figure 6.27 or Figure 6.28. The result is shown in Figure 6.30.
(6.47)oz Chapteré Structures for Discrete-Time Systems
zt
Tne f i hia) fala
Figure 6.29 Direct form realization of an FIR systern
~ ~ ~ vin
iy Angst] bata] dae] att) fo)
Figure 6.30 Transposition of the network of Figure 6.29.
ln]
6.5.2 Cascade Form
The cascade form for FIR systems is obtained by factoring the polynomial system func-
tion. ‘That is, we represent H(z) a8
Me
ue
ad = Sita" = [tbo + beet bo
= i
f (6.48)
where M, = [(M + 1)/2] is the largest integer contained in (M + 1)/2. If M is odd, one
of the coetficients bag will be zero, since A(z) in that case would have an odd number
of real zeros. The flow graph representing Eq. (6.48) is shown in Figure 6.31, which is
identical in form to Figure 6.18 with the coefficients ay, and az. all zero. Each of the
2"-order sections in Figure 6.31 uses the direet form shown in Figure 629. Another
alternative is to use trunsposed dizeet form 2*€-order sections or, equivalently. to apply
the transposition theorem to Figure 6.31
Figure 6.31 Cascade form realization af an FIR system,Section 6.5 Basic Network Structures for FIR Systems 403
6.5.3 Structures for Linear-Phase FIR Systems
Tn Chapter 5, we showed that causal FTR systems have generalized linear phase if the
impulse response satisfies the symmetry condi
AIM —n] = hin]
M (6.40)
AIM =n) =—Aln] n= 0,1,.... (6.496)
With either of these conditions, the number of coefficient multipliers can be es-
sentially halved. To see this, consider the following manipulations of the discrete con-
volution equation, assuming that Mf is an even integer corresponding to type T or type
IIL systems:
ve) = Salk lxtn — k1
im
una a
=O miktehe 1-4 HIM Qt — MP2 DS nko A
i= 7a
Mpa Mpa
= Metahe 1-4 mE 2c — MAE data = MF
i ard
For type T systems, we use Eq. (6.49a) to obtain
Mpa
yl] = SD wleledn — kl tala — M+ RD + ALM 21x10 — M/2}. (6.50)
mS
For type Il systems, we use Bq. (6.49b) to obtain
Mant
ylal = s Alk](cln — k] —xla— M4 k)). (6.51)
for}
For the ease of M an odd integer, the corresponding equations are, for type TI systems,
orn
sil = SS Atal CeIn A+ ale = MF RD, (6.52)
fod
and, for type IV systems,
otaay2
vin= Yo blkline] xdn AD. (6.53)
ra)
Equations (6.50)-{6.53) imply structures with either M/2+ 1, Mj2, or (M+ 1)/2
coefficient multipliers, rather than the M coefficient multipliers of the gencral direct
form structure of Figure 6.29. Figure 6.32 shows the structure implied by Eq. (6.50), and
Figure 6.33 shows the structure implied by Eq. (6.52).
In our discussion of lincar-phase systems in Section 5.7.3, we showed that the
symmetry conditions of Eqs. (6.49a) and (6.496) cause the zeros of H(z) to occur in
mirror-image pairs. ‘That is, if zp is a zero of H(z), then 1/z9 is also a zero of Hz).
Furthermore, if h{nl is real, then the zeros of H(z) occur in complex-conjugate pairs.Chapteré Structures for Discrete-Time Systems.
tnd
viel
Figure 6.32 Direct form structure for an Fil
even integer.
so), ae
f
et} oa
Paicar 3/2] falcan 1972],
Figure 6.33 Direct form structure for an FIR Linear-phase system when Mis an
6d integer.
As a consequence, real zeros not on the unit circle occur in reciprocal pairs Complex
zexos not on the unit circle occur in groups of four, corresponding to the complex
conjugates and reciprocals. [fa zeroison the unit citee, its reciprocal isalso its conjugate.
Consequently, complex zeros on the unit citcle are conveniently grouped into pairs.
Zeros at : = +1 are their own reciprocal and complex conjugate. The four cases are
summarized in Figure 6.34, where the zeros at 21, 27, 1/21, and 1/2} are considered
as a group of four. The zeros at zz and I/z2 are considered as a group of two, as are
the zeros at z3 and z$, The zero at z4 is considered singly, If H(2) has the zeros shown
in Figure 6.34, it can be factored into a product of 1°-, 2°4., and 4*h-order factors.
Each of these factors is a polynomial whose coefficients have the same symmetry as
the coeflicients of H(z); ie. each factor is a linear-phase polynomial in z~!, Therefore,
the system can be implemented as a cascade of 1%, 2"*., and 4!"-order systems. For
example, the system function corresponding to the zerus of Figure 6.34 can be expressed
as
H(z) = hfO\ + ¢
x(Ltee + de ter
Mi +az
(6.54)
where
a@=2tl fiz), b= WRRelssh, = -2Relr + Mal, d= 2H le + 1 fer?Section 6.6
Lattice Filters 405
Figure 6.34 | Symmetry of zeres tor a
‘inear-phase FIR filter.
‘This representation suggests a cascade structure consisting of linear-phase elements. It
can be seen thet the order of the system function polynomial is M = 9 and the number
of different coefficient multipliers is five. This is the same number (M + 1)/2 = 5) of
constant multipliers required for implementing the system in the linear-phase direct
form of Figure 6.32. Thus, with no additional multiplications, we obtain a modular
structure in terms of a cascade of short lincar-phase FIR systems.
6.6 LATTICE FILTERS
In Sections 6.3.2 and 6.52, we discussed cascade forms for both IR and FIR systems
obtained by factoring their system functions into 1*- and 2"!-order sections. Another
interesting and usctul cascade structure is based on a cascade (output to input) con-
nection of the basie structure shown in Figure 6.35(a). In the case of Figure 6.35(a) the
basic building block system has two inputs and two outputs, and is called a two-port
flow graph. Figure 6.35(b) shows the equivalent flow graph representation. Figure 6.36
shows a cascade of M of these basic elements with a “lermination” at each end of the
cascade so that the overall system is a single-input single-output system with input xfn]
feeding both inputs of two-port building block (1) and output y{n] defined to be a |],
the upper output of the last two-port building block M/, (The lower output of the a"
stage is generally ignored.) Although such a structure could take many different forms
Two-Port
Vow
Graph =
wa
edgy 8g) ula ite
fa) 0)
Figure 6.35 One section of the lattice structure for FIR late filters. (2) Block
iagram representation of a two-port building biock (b) Equivalert flow graph.408 Chapter6 Structures for Discrete-Time Systems
aa an an] any an)
Two-Port Two-Part Two-Part =stel
Flow Flow Flow
Graph Graph Graph
@ @) ca
Ppl aad ra Ha}
Figure-6.36 Cascade connection of Mf basic building block sections.
depending on the definition of the basic building block, we will limit our attention to
the particular choice in Figure 6.35(b), which leads to a widely used class of FIR and
IIR filter structures known as lattice filters.
6.6.1 FIR Lattice Filters
If the basic butterfly-shaped two-port building block in Figure 6.35(b) is used in the
cascade of Figure 6.36, we obtain a flow praph like the one shown in Figure 637, whose
lattice shape motivates the name lattice filer. The coefficients fy, k,..., ky, atereferted
to generally as the k-parameters of the lattice structure, In Chapter 11, we will see that
the &-parameters have a special meaning in the context of all-pole modeling of signals,
and the lattice filter of Figure 6,37 is an implementation structure fora linear prediction
of signal samples. In the eurrent chapter, our focus is only on the use of lattice filters to
implement FIR and all-pole ITR transfer functions.
‘The node variables a'{n] and b'[n] in Figure 6.37 are intermediate sequences
that depend upon the input x{] through the set of difference equations
al [nt = OO [a] = ated (6.55a)
a! tn] =a py — bP — 1 i= M (6.556)
BOM = bn — kal Py i= aM (6.55e)
yin = a" [a (655d)
As we can see, the &-parameters are coefficients in this set of M coupled difference
equations represented by Figure 6.37 and Eqs (6.55a)-(6.55d). It should be clear that
xlel_o on tn) a] _ yin)
“kn
fay
pis Bo fw Ny
pa) oe ree ron
Figure 6.37 Lattice flow graph for an FIR system based on a cascade of M two-
port building blocks of Figure 6.35(b).Section 66 Lattice Filters 407
these equations must be computed in the order shown {i = 0.1, ..., M)since the output
of stage (i — 1) is needed as input to stage ti), etc.
‘The lattice structure in Figure 6,37 is clearly an UTI system, since itis a finear signal
flow graph with only delays and constant hranch coefficients. Furthermore, note that
there are no feedhack loops, which implies that the system hay a finite- duration impulye
response, In fact, a straightforward argument is sufficient to show that the impulse
response from the input to any internal node has finite length. Specifically, consider the
impulse response from x(n] to the node variable an], ic., from the input to the i*
upper ftode. Its clear that if x[n] = Sp], then a0] — | for every i, since the impulse
propagates with no delay through the top branch of all the stages. All other paths to any
node variable a![n] or 6m] pass through at least one delay, with the greatest delay
being along the bottom path and then up to node variable a”[n| through the coefficient
—k;. This will be the last impulse that arrives at a" [1], so the impulse response will have
length i + 1 samples. All other paths fo an internal node zigeag between the top and
bottom of the graph. thereby passing through at least one, but not all, of the delays
occurring before the outputs of section (0),
‘Note that in our introduction to lattice Alters, a[a] and 6 {n} were used in
Figure 6.37 and Eqs. (6.55a}-(6.55d) to denote the node variables of building block (7)
for any input x[n]. However, for the remainder of our discussion, it is convenient to
assume specifically that fa] = 4{n] so that a" [a] and b(n] arc the resulting impulse
Fesponses at the associated nodes, and that the corresponding z-trans forms A¥(:) and
B'(z) are the transfer functions between the input and the i nodes, Consequently,
the transfer function betwcen the input and the upper i" node is
1-Dawes, (656)
=
where in the second form, the coefficients ay’ for m < i are composed of sums of
products of the coefficients k; for j = m. As we have shown, the coefficient for the
fongest delay from the input to the upper node / is a”
impulse response from x[1] to node variable al!"[n] is,
&. In this notation, the
1 aso
ainl=} a! Len 0.
As indicated earlier, the transfer functions A(z) and B“'(z) can be computed
recursively using Eq, (6.584) and (6.58b). These transfer functions are *-order polyno-
mials, and it is particularly useful to obtain a direct relationship among the coefficients
of the polynomials. Toward this end, the right side of Eq. (6.57) defines the coeficients
of A®() to be —at!, for m = 1.2,.... 4 with the leading coefficient equal to one: ic,
as in Eq, (6.56),
AO@)
Daven, (661)
and similarly,
AN UG) = 1 Sal
mmol
F (6.62)Section 6.6 Lattice Filters 409
To obtain a direct recursive relationship for the coefficients e\’ in terms of eft"! and
ki, we combine Eqs, (6.60a) and (6.62) from which it follows that
Big aa Dayy =
(6.63)
Substituting Eqs. (6.62) and (6.63) into Eq. (6.58a), Al) can also he expressed as
Re indexing the second summation by reversing the ordering of the terms (ie..replacing
m by i —m and re-summing) and combining terms in Eq, (6.64) leads to
in
- wt
=F fog aol]
where we see that, as indicated earlier, the coefficient of <~ is —k;, Comparing Eqs. (6.65)
and (6.61) shows that
AOE
hit (6.65)
wid (6.66a)
ait) = [al Y — kal?) m=
af? = ki (6.66b)
Equations (6.66) are the desired direct recursion between the coefficients of A(z) and
the coefficients of A! (z), These equations, together with Eq. (6.60a) also determine
the transfer function B)(z),
The recursion of Fys, (6.66) can also be expressed compactly in matrix form, We
denote by a1 the vector of transfer function coefficients for AY" (2) and by &—1
these coefficients in reverse order, ic,
yt
ey ie wy
and
:
(ght), g(h HY
fia = [al a? af]
Then Eqs, (6.66) can be expressed as the matrix equation
a (6.67)
-1
The recursion in Eqs, (6.66) or Eqs, (6.67) isthe basis foranalgorithm for analyzing
an FIR lattice structure to obtain its transfer function, We begin with the flow graph
specified as in Figure 6.37 by the set of &-parameters (Ay, ka, ., Aye} Then we can use
Eqs. (6.66) recursively to compute the transfer functions of successively higher-order
FIR filters until we come to end of the cascade giving us
wu
AD =1-P Gat
¥@
~XxG)"
(6.682)410
Chapter Structures for Discrete-Time Systoms
k-Parameters-to-Coefficients Algorithm
‘Given ki, ka,....ku
for i=1,2...,M
al = ky Eq. (6.66b)
if i> I then for j = 1,2...
af = al) ali? Eq. (6.66a) |
end i
snd: Figure 6.38 Algorithm for converting
a fF 24,.2..,M Eq. (6.68b) orn paramo tn FR er.
coe
where
iw
aq = al M. (6.68)
‘The steps of this algorithm are represented in Figure 6.38,
It js also of interest Lo obtain the k-parameters in the FIR lattice structure that
lize a given desired transfer function from input [7] to the output y[a] = afahic.,
wwe wish Lo go from A(z) specified as a polynomial by Eqs. (6.68a) and (6.68) to the sct
of keparameters for the lattice structure in Figure 6.37. This can be done by reversing the
recursion of Egs. (6.66) or (6.67) to obtain successively the transfer function AY "'(2) in.
terms of A"'}(z) fori — M, M—1,....2. The &-parameters are obtained as.a by-product
of this recursion.
Specifically, we assume that the coefficients af!” = ce, for m = 1,..., M are spec-
ified and we want to obtain the k-parameters ky, ... ky to realize this transfer function,
in lattice form. We start with the last stage of the FIR lattice, ie., with i = M. From
Eq. (6.666).
ku =a! am (6.69)
with A(*? (2) defined in terms of the specified coefficients as
AM) =1— Salt - (6.70)
Inverting Eqs. (6.66) or equivalently Eq. (6.67), with ¢ = M and ky Of then
determines oy, the vector of transform coefficients ut the next to last stage t= M1.
“This process is repeated until we reach Al (2),
To oblain a general recursion formula for af
1 “
in terms of af? from Bq. (6.668)
note that «/'!? must be eliminated. To do this, replace m by i — m in Eq. (6.66a) and
multiply both sides of the resulting equation by &; thereby obtaining
kya, = hat? — ot»
Adding this equation to Eq, (6.66a) results in
Bag)
from which it follows that
ty
67a)Section 6.6 Lattice Filters an
With af” calculated for m
(671b)
‘Thus, starting with oy!” = an,
2, ..M we can use Eqs. (6.71a) and (6.71b)
to compute wf!!—"!, form = 1,2,...,M — Land kjy—1, and then repeat this process
recursively to obtain all of the transfer functions A'?(z) and, as a by-product, all of the &-
parameters needed for the lattice structure. The algorithm is represented in Figure 6.39.
Coefficients-to--Purameters Algorithm
Given al) = aj) 7 = 1.2,
all) Fq. (6.69)
‘for M.M- 1, 2
for j= 1,2.-44-1
Eq. (6.71a)
Eq. (6.71b)
Figure 6.39 Algorithm for converting from FIR filter coetficients to t-parameters.
Example 6.9 #Parameters for a 3"°-Order FIR System
Consider the FIR system shown in Figure: 6.401 whose system function is,
1 AG) = 1-095! 40.6427 ~ 0.576:
Consequently, M = 3 and the coefficients af” in Eq. (6.70), are
09 af =064 a
‘We begin by observing that és = af") = 0.576,
Next we want to caleulate the coefficients for transfer function 4/22) using
Eq, (6.71a). Specifically, applying Eq. (6.72a), we obtain (rounded to three decimal
places):
S16,
el) 4 tgal?
pee
i
From Bg, (6706) we then ident iy = a = -0.182
To obtain A") (2) we again apply Eq, (6.71a) obtaining
© 5 tal?
af) 1 thay
1 ae
673,anz Chapter Structures for Discrete-Time Systems
We then identify & — a
ure 6.40b.
= 0.673, The resulting lattice structure is shown in Fig-
“9+ “ oo
a _—
vie
@
aia] yd
wg pe .
= \ Aw Lat
ora Joe Jos
oS
(o)
Figure 6.40 Flow graphs for example. (a) Direct form. (ty
cients rounded),
6.6.2 Al-Pole Lattice Structure
A lattice structure for implementing the all-pole system function H(z) = 1/A(2) can
be developed from the FIR lattice of the previous section by recognizing that Hic)
is the inverse filter for the FIR system function A(z). To derive the all-pole lattice
structure, assume that we are given s{7] = a!*fn}, and we wish to compute the input
ain} = x{n]. This can be done by working from right to left to invert the computations
in Figure 6,37. More specifically, if we solve Eq. (6.58a) for A ')(z) in terms of A(z)
and BY!) @) and leave Eq, (6.58b) as it is, we obtain the pair of equations
AvP) 2) jet BED
(6.72a)
B® Ale 4B, (6.720)
)
which have the flow graph representation shown in Figure 6.41, Note that in this case,
the signal flow is from i to? — 1 along the top of the diagram and from é — 1 to é along
the bottom. Suecessive connection of M stages of Figure 6.41 with the appropriate &;
in each section takes the input a!{n] to the output a°)[a] as shown in the flow graph.
of Figure 6.42. Finally, the condition x{r] = a! {n] = 4 (n] at the terminals of the last
stage in Figure 6.42 causes a feedback connection that provides the sequences [nj
that propagate in the reverse direction, Such feedback is, of course, necessary for an
IIR systemSection 6.6 Lattice Filters 413
Figure 6.41 One stage of computation
bea 2! bs] for an all-nole lattice system.
The set of difference equations represented by Figure 6.42 ist
ain] = sla] (6.73a)
a Yin} =a in] +6" Pt) i= MM 1. (6.730)
bn] = ba 1] — kai Mn) i= MM 1. (6.73c)
ala] = 2 In] = ba, (6.734)
ause of the feedback inherent in Figure 6.42 and these corresponding equations,
al conditions must he specified for all of the node variables associated with delays.
Typically, we would specify b'""[—1] = 0 for initial rest conditions. Then, if Eq. (6.73b)
is evaluated first, a!'~))[n] will be available at times 7 = 0 for evaluation of Eq. (6.73¢)
with the values of b)fa ~ 1] having been provided by the previous iteration
Now we can state that all the analysis of Section 6.6.1 applies to the all-pole lattice
system of Figure 6.42. If we wish to obtain a lattice implementation of an all-pole system
with system function H(z) = 1/A(z), we can simply use the algorithms in Figures 6.39
and 6.38 to obtain &-parameters from denominator polynomial coefficients or vice-versa.
wel = fn] Opn)
apa.
4a] Fay
Figure 6.42 _At-pole lattice system.
Nowe that by basing our derivation of the all-pole lattice on the FIR lattice in Figure: 637, we hawe
‘ended up with the input denoted yin} and the ourput x{a} in opposition to our normal convention. This
labeling i, of course, arbitrary once the derivation has been completed,a4 Chapter 8 Structures for Discrete-Time Systems
Example 6.10 Lattice Impfementation of an IIR System
26 Asan example of an IIR system, consider the system function
&
He = ——— (6.740
@ 1092-140. 2 0.5762-3 « )
L
(108) + Oizo THE 0.927)
(6.740)
hich is the inverse system for the system in Example 6.9. Figure 6.43(a) shows the
direel form realization of this system, whereas Figure 6.43(b) shows the equivalent
TIR lattice systent using the &-parameters computed as in Example 69. Note that the
Jatice structure has the same number of delays (memory registers) as the
© structure, However, the number of multipliers is twice the number of the:
This is obviously true for any order Mf
ze ln)
ta) te
tN / / a ey /
0182,
cose of 2A.
LZNef Nef Ne!
o
Figure 6.43 Signal tlow graph of II filter (a) direct form, (b) lattice form,
Since the lattice structure of Figure 6.42 is an IIR system, we must be concerned
about its stability, We will see in Chapter 13 thal a necessary and sufficient condi-
tion for all the zeros of a polynomial A(z) to be inside the unit circle is |ky| < 1,
i= 1,2,...,M. (Gee Markel and Gray, 1976.) Example 6.10 confirms this fact since, as
shown in Eq, (6.74b) the poles of H(z) (zeros of A(c}) are located inside the unit circle
of the z-plane and all the &-parameters have magnitude less than one. For LIK systems,
the guarantee of stability inherent in the condition |&;| < 1 is particularly important.
Even though the lattice structure requires twice the number of multiplications per out-
put sample as the direct form, itis insensitive to quantization of the k-parameters. This,
property accounts for the popularity of lattice filters in speech synthesis applications,
(See Quatieri, 2002 and Rabiner and Schafer. 1978.)Section 6.7
Overview of Finite-Precision Numerical Efects a5
6.6.3 Generalization of Lattice Systems
We have shown that FIR systems and all-pole IIR systems have a lattice structure
representation, When the system function has both poles and zeros itis still possible to
find a lattiee structure based upon a modification of the all-pole striictute of Figure 6.42.
The derivation will not he provided here (Sec Gray and Markel, 1973, 1976.), but it is
outlined in Problem 11.27.
6.7 OVERVIEW OF FINITE-PRECISION NUMERICAL
EFFECTS
We have seen that a particular LIL discrete-time system can be implemented by a
variety of computational structures, One motivation for considering alternatives to the
simple direct form structures is that different structures that are equivalent for intinite-
precision arithmetic may behave differently when implemented with finite numerical
precision. In this section, we give a brief introduction 10 the major numerical proble
that arise in implementing discrete-timesysiems, A more detailed analysis of these finite
‘word-length effects is given in Sections 6.8-6.10.
6.7.1 Number Representations
In theoretical analyses of discrete-time systems, we generally assume that signal values
and system coefficients are represented in the real-number system. However, with ana-
log discrete-time systems, the limited precision of the components of a circuit makes
it difficult to realize coefficients exactly. Similarly, when implementing digital signal-
Processing systems, we must represent signals and coefficients in some digital number
system that must always have finite precision
The problem of finite numerical precision has already been discussed in Sec-
tion 4.82 in the context of A/D conversion. We showed there that the output samples
from an A/D converter are quantized and thus can be represented by fixed-point binary
numbers For compactness and simplicity in implementing arithmetic, one of the bits
of the binary number is assumed to indicate the algebraie sign of the number. Formats
such as sign and magnitude, one’s complement, and two's complement are possible, but
two's complement is most common.> A real number can be represented with infinite
precision in two's-complement form as
Rm (e+ S02 ‘ (6:75)
Where Xj_ is an arbitrary scale factor and the bys are either 0 or 1. ‘The quantity by is,
referred to as the sign bit. If bp = 0, then 0 In this ease & would be replaced by (4 ~ X21) in the bounds om the size
‘ofthe quantization error.Overview of Finite-Precision Numerical Efects 47
{Ol
+
Lt po
4 a a8 .
©
[2a
! ! 1
7 7 ya, 8
42 4 | 4 2 4
“3
Figure 6.44 Nonlinear relationships
representing :wo's-complement
(a) rounding and (b) truncation for
©) B=?
So far, we have simply stated that the quantity X,_ is am arbitrary scale factor;
however, thisfactorhas several useful interpretations. In A/D conversion, we considered
Xp tobe the full-scale amplitude of the A/D converter. In this case, Xj, would probably
represent a voltage or current in the analog part of the system, Thus, Xqr serves as a
calibration constant for relating binary numbers in the range —1 <= ft» < 1 to analog
signal amplitudes.
In digital signal-processing implementations, itis common to assume thatall signal
variables and all coefficients are binary fractions. Thus, if we multiply a (B + 1)-bit signal
variable by a (B + 1)-bit coefficient, the result is a (2B + 1)-bit fraction that can be
conveniently reduced to (B+ |) bits by rounding or truncating the least significant bits.
With this convention, the quantity X,, can be thought of as a scale factor that allows the418
au
010
oo
Chapter 6
f= Ola]
Structures for Discrete-Time Systems
ou
ono
oo
101
100
m
3A
43—
@)
OL
01
106
ou
oo
101
@)
Figure 6.45 Two's-complement rounding. (a} Natural over‘low. (b) Saturation,
representation of numbers that are greater than unity in magnitude, For example, in
fixed-point computations, it is common to assume that each binary number h
factor of the farm Xyq
actually located betw
a scale
2°. Accordingly, a value c= 2 implies that the binary point is
in bp and b3 of the binary word in Eq. {6.78}. Often, this scl
factor is not explicitly represented; instead, itis implicit in the implementation program
or hardware architecture.Section 6.7 Overview of Finite-Precision Numerical Effects aig
Still another way of thinking about the scale factor Xq, leads to the floating-point
representations, in which the exponent c of the stale factor is called the characteristic
and the fractional part Zp is called the mantissa. ‘The characteristic and the mantissa
are each represented explicitly as binary numbers in floating-point arithmetic systems
Floating-point representations provide a convenient means for maintaining both a wide
dynamic range and a small quantization noise: however, quantization error manifests
itself in a somewhat different way.
6.7.2 Quantization in Implementing Systems
Numerical quantization affects the implementation of LT discrete-time systems in sev-
eral ways, As a simple illustration, consider Figure 6.46(a), which shows a block diagram
ein _slel |
~
10
+
©
cote, ato, Bel ne
ad | T gan
T r
é a= Ontal
©)
dat
sa s)_ tte d tn stl oe
a ° ~O ‘O 5
! eam el r
ln]
©
Figure 6.46 _(mplementation of discrete-time filtering ofan analog signal. (a) ideal
system. (b} Nonfinear model. (c} Linearized model420
Chapter 6 Structures for Diseete-Time Systems
for a system in which a bandlimited continuous-time signal x47) is sampled to oblain
the sequence x{n], which is the input to an LIT system whose system function is,
1
HO=TaT (6.80)
‘The output of this system, yf}, is converted by ideal bandlimited interpolation to the
bandlimited signal ye).
‘A more realistic madel is shown in Figure 6.46(b). In a practical setting, sampling
would be done with an A/D converter with finite precision of (8; + 1) bits. The sys
tem would be implemented with binary arithmetic of (B + 1) bits, The coefficient « in
Figure 6.46(a) would be represented with (B + 1) bits of precision. Also, the cl
vatiable é{n — 1] would be stored in a (# + 1)-bit register, and when the (B + 1)-bit
number 5[u — 1] is multiplied by the (B | 1)-bit number a, the resulting product would
be (28 +1) bits in length. If we assume that a (B + 1)-bit adder is used. the product
a0 (n — 1} must be quantized (1.e., rounded or truncated) to (B +1) bits before it can
be added to the (8; + 1)-bit input sample é[n]. When B; < B, the (B, + 1) bits of the
input samples can be placed anywhere in the (2 + 1)-bit binary word with appropriate
extension of the sign. Different choices correspond to different scalings of the input.
The coeflicient a has been quantized, so leaving aside the other quantization errors,
the system response cannot in general be the same as in Figure 6.46(a). Finally, the
(B 4 1)-bit samples én]. computed by iterating the difference equation represented
by the block diagram, would be converted to an analog signal by a (8, + 1)-bit DIA
converter. When B,, = #, the output samples must be quantized further before DIA
conversion
Although the model of Figure 6.46(b) could be an accurate representation of
4 real system, it would be difficult to analyze. The system is nonlinear owing to the
quantizers and the possibility of overflow at the adder, Also, quantization errors ate
introduced at many points in the system. ‘The effets of these efrors are impossible to
analyze precisely, since they depend on the input signal, which we generally consider to
be unknown. ‘Fhus, we are forced to adopt several different approximation approaches
to simplify the analysis of such systems.
The effect of quantizing the system parameters, such as the coetficient « in Fig-
ure 6.46(a), is generally determined separately from the effect of quantization in data
conversion or in implementing difference equations. That is, the ideal coefficients of a
system function are replaced by their quantized values, and the resulting response fune-
tions are tested to see whether, in the absence of other quantization in the arithmetic,
quantization of the filter coefficients has degraded the performance of the system to
‘unacceptable levels. For the example of Figure 6.46, if the real number a is quantized
to (B +1) bits, we must consider whether the resulting system with system function
1
Toit
A= (681)
is close enough to the desired system function H(z) given by Eq. (6.80), Since there are
only 2+? different (8 + 1)-bit binary numbers, the pole of H(z) can occur only at2!4!
locations on the real axis of the z-plane, and. while itis possible thatd = a, inmost cases
some deviation from the idcal response would result. This type of analysis is discussed
in more general terms in Section 6.8.Section 68 The Effects of Coefficient Quantization a2
The nonlinearity of the system of Figure 6.46(b) causes behavior that cannot occur
ina linear system. Specifically, systems such as this can exhibit zero-input limit eycles,
where the output oscillates periodically when the input becomes zero after having been
nonzero. Limit cycles are caused both by quantization and by overflow. Although the
analysis of such phenomena is difficult, some useful approximate results have been
developed. Limit cycles are discussed briefly in Section 6.10
Ti care is taken in the design of a digital implementation, we can ensure that
overflow occurs only rarely and quantization errors are small. Under these conditions,
the system of Figure 6.46(b) behaves very much like a lincar system (with quantized
coefficients) in which quantization errors are injected at the input and output and at
internal points in the structure where rounding or truncation occurs, Therefore, we can
replace the model of Figure 6.46(b) by the linearized model of Figure 6.46(c), where the
quantizers are replaced by additive noise sources (sce Gold and Rader, 1969; Jackson,
1970a, 1970b). Figure 6.46(c) is equivalent to Figure 6.46(b) if we know cach of the noise
sources exactly. However, as discussed in Section 4.8.3, useful results are obtained if
we assume a random noise model for the quantization noisc in A/D conversion. This
same approach can be used in analyzing the effvets of arithmetic quantization in digital
implementations of linear systems. As seen in Figure 6.46(¢), each noise soure injects
a random signal that is processed by a different part of the sysiem, but since we assume
that all parts of the system are lincar, we can compute the overall cffcet by superposition.
In Section 6.9, we iblustrate this style of analysis for several important systems.
In the simple example of Figure 6.46, there is little flexibility in the choice of
structure, However, for higher-ordcr systems, we have seen that there is a wide variety
of choices. Some of the structures are less sensitive to coefficient quantization than
others, Similarly, because different structures have different quantization noise sources
and because these noise sources are filtered in different ways by the system, we will find
that structures that are theoretically equivalent sometimes perform quite differently
when finite-precision arithmetic is used to implement them.
6.8 THE EFFECTS OF COEFFICIENT QUANTIZATION
ITI discrete-time systems are generally used to perform a filtering operation. Methods
for designing FIR and IIR filters, which are discussed in Chapter 7, typically assume
4 particular form for the system function. The result of the filter desig process is a
system function for which we must choose an implementation structure (4 set of dif-
ference equations) from an unlithiled number of theoretically equivalent implementa
tions. Although we are almost always interested in implementations that require the
least hardware or soltware complexity, it is not always possible to base the choice of
yplementation structure on this criterion alone. As we will sce in Scetion 6.9, the im-
plementation structure determines the quantization noise generated internally in the
system, Also, some structures are more scasitive than others to perturbations of the
coefficients. As we pointed out in Section 6.7, the standard approach to the study of co-
efficient quantization and round-off noise is to-treat them independently. In this section,
we consider the effects of quantizing the system parameters,Chapteré Structures ‘or Discrete-Time Systems
6.8.1 Effects of Coefficient Quantization in UR Systems
‘When the parameters of. rational system function or corresponding difference equation
sare quantized, the poles and zeros of the system function move to new positions in the
z-plane. Equivalently, the frequency response is perturbed from its original value. Ifthe
system implementation structure is highly sensitive to perturbations of the eoeflicients,
the resulting system may no longer meet the original design specifications, or an IIR
system might even become unstable.
‘A detailed sensitivity analysis for the general case is complicated and usually of
limited value in specific cases of digital filter implementation. Using powerfulsimulation
tools, it is usually easy to simply quantize the coefficients of the difference equations
employed in implementing the system and then compute the corresponding frequency
response and compare it with the desired frequency-response function, Even though
simulation of the system is generally necessary in specific cases, it is still worthwhile
to consider, in general, how the system function is affected hy quantization of the co-
efficients of the difference equations. For eximple, the system function representation
corresponding to both direct forms (and their corresponding transposed versions) is the
ratio of polynomials
(6.82)
‘The sets of coefficients {ay} and {bi} are the ideal infinite-precision coefficients in both
direct form implementation structures (and corresponding transposed structures). If we
quantize these coefficients, we obtain the system function
\
Yur
is
He) (6.83)
i
where de = ag + Aay and by = by + Aby are the quantized coefficients that differ from
the original coefficients by the quantization errors Aay and Ab.
Now consider how the roots of the denominator and numerator polynomials (the
poles and zeros of H(z}) are affected by the errors in the coefficients, Each polynomial
Toot is affected by al! of the errors in the coefficients of the polynomial since each root
isa function of all the coefficients of thc polynomial, Thus, cach pole and zero will be
affected by all of the quantization errors in the denominator and numerator polynomi-
als, respectively. More specifically, Kaiser (1966) showed that if the poles (or zeros) are
tightly clustered. itis possible that small errors in the denominator (numerator) coetfi~
cients can cause large shifts of the poles (zeros) for the direct form structures. Thus, if
the poles (zeros) are tightly clustered, corresponding to a narrow-bandpass filter or a
narfow-bandwidth lowpass filter, then we can expect the poles of the direct form struc-
ture to be quite sensitive to quantization errors in the coefficients. Furthermore, Kaiser'sSection 6.8
‘The E'fects of Coefficient Quantization aa
analysis showed that the larger the number of clustered poles (zeros), the greater is the
sensitivity.
‘The cascade and parallel form system functions, which are given by Eqs (6.30) and
(6.35), res -onsist of combinations of 2°0-order direct form systems. However,
in both eases, each pair of complex-conjugate poles is realized independently of all the
other poles. Thus, the error in a particular pole pair is independent of its distance from
the other poles of the system function. For the cascade form, the same argument holds
for the zeros, since they are realized as independent 2°2-order factors Thus, the cascade
form is generally much less sensitive to coefficient quantization than the equivalent
direct form realization.
‘Asscen in Eq, (6.35), the zeros of the parallel form system function are realized
implicitly, through combining the quantized 2"-order sections to obisin a common
denominator, Thus, a particular zerois affected by quantization errors in the numerator
and denominator coellicients of ail the 2"-order sections, However, for most practical
filter designs, the parallel form is also found to be much less sensitive to cocfticient
quantization than the cquivalcnt dircet forms becausc the 2°4-order subsystems are
not extremely sensitive to quantization. In many practical filters, the zeros are often
widely distributed around the unit circle, or in some cases they may all be located at
z= 41. In thelatter situation. the zeros mainly provide much higher attenuation around
frequencies w = 0 and w = 1 than is specified, and thus, movements of zeros away from
£1 do not significantly degrade the performance of the system.
6.8.2 Example of Coefficient Quantization in an Elliptic
Filter
As an illustration of the effect of coefficient quantization, consider the example of an
IIR bandpass elliptic filter designed using approximation techniques to be discussed in
Chapter 7. The filter was designed to meet the following specification:
0.99 < |M(e!*)| = LOL O.3n = |oi = 04m,
[M(e!)| < 0014.2, —40dB), foo] < 0.29.
H(ei)| hk
= H(2) + AH), (687)430 Chapter Structures for Discrete-Time Systems
zplane
© Realizable pole positions
Re
ae plane
i)
=
~
os} \
Figure 6.52 Pole locations tor coupled
form 2°orver IIR system of
9 4 a ‘Re Figure 6.51. (2) Four-bit quantization of
&) coeticients, (2) Soven-bit quantization
where
ANE (6.88)
‘Thus, the system function (and therefore, also the Frequeney response) of the quantized
system is linearly related! to the quantization errors in the impulse-response coefficients
For this rcason, the quantized system can be represented as in Figure 6.53, which shows
te unquantized system in parallel with an error system whose impulse response is
the sequence of quantization exo samples (Afr) aud whose system Lunction is the
corresponding z-transform, A/(2).Section 68
The Effects of Coeicient Quantization 4a
ee)
xin viel
Figure 6.53 Representation of
‘coefficient quantization in FIR systems.
TABLE 6.5 _UNQUANTIZED AND QUANTIZED COEFFICIENTS FOR AN OPTIMUM
FIR LOWPASS FILTER (M = 27),
Costfcient ——_Unquantized obits
MAKIN LaseeT* IO ASD |S
A{)=h(25]— —Lere993 x 10-853
1738032 = 10-3 =284 x 2-18
286841 10 21S
1255246 410-24 x 2-18
esyisay 10-7 2ibe2
22172 «10% PDB
124663 = 10-2 S00 2-18
B20 x 10212192 nsx 2 IF
3288892 % 10-2 10592-18265 2-13
653057 x10 2142 «2-1 336 2-18
AT52K754 10-2467 x21 617 x
AZ] AlIs| —1560970% 10-5115 «2-H 279 «2-8
ALLS} = Alls} — 4394004 107114399. 2-15 3600 2°13
Another approach to studying the sensitivity of the direct form FIR structure
would be to examine the sensitivity of the zeros to quantization errors in the impulse-
response coeflicients, which are, of course the coefficients of the polynomial H(z). If
the zeros of H(z) are tightly clustered, then their locations will be highly sensitive to
quantization crrorsin the impulse-response coefficients. The reason that the direet form
PIR system is widely used is that, for most linear phase FIR filters, the zeros are more
or less uniformly spread in the z-plane, We demonstrate this by the following example.
6.8.5 Example of Quantization of an Optimum FIR Filter
‘As an example of the effect of coefficient quantization in the FIR case, consider a
linear-phase lowpass filter designed to meet the following specifications:
0.99 = [Hel = 101, 0s al = 04x,
IH(e*)| $0.001(.e,-600B), 0.6
This filter was designed using the Parks-McCletlan design technique, which will be
discussed in Section 7.7.3, The details of the design for this example are considered in
Section 7.8.1.
Table 6.5 shows the unquantized impulse-response coefficients for the system,
along with quantized coefficients for 16-, 14-, 13-, and &-bit quantization. Figure 6.5432
0
40
“0
-80~
100
a 7
Ralan Frequency (op
@)
on
0.605
005
ooio|—-—L» —t__1_
ae Oe
Radian Frequency (a)
(
oa10.- ————— —————
‘nos
E
5
—
a
0 ae Oar O67 O8e 7
Radian frequency (0)
©
Figure 6.54 FIR quantization example, (a) Log magnitude for unquantized case.
(©) Approximation error forun quantized case. (Err not defined in transition band)
{6) Approximation errr for 16-bit quantizationopi
0s
Amplitude
0 O29 Ode ‘Ge oar 7
Radium frequency (a)
(a)
oo
8.005
3 |
q 1
=
=oons =
-0010 A ccna Hl camo
02 nae em Or 7
Radian frequency (
©
Radian frequency («)
0
Figure 6.54 (contiaved) {d) Approximation error for 14-bit quantization,
(} Approximation error for 13-bit quantization. {f) Approximation error for 8-bit
‘quantization.
433434
Chapter® Structures for Discrate-Time Systems
gives a comparison of the frequency responses of the various systems. Figure 6.54{a)
shows the log magnitude in dB of the frequency response for unquantized coefficients
Figures 6.54(b), (c), (d), (c), and (1) show the passband and stopband approximation
errors (errors in approximating unily in the passband and zero in the stopband) for
the unquantized, 16-, 14-, 13-, and 8-bil quantized cascs, respectively. From Figure 6.54,
‘we See that the system meets the specifications for the unquantized case and both the
16-bit and 14-bit quantized cases. However, with 13-bit quantization the stopband ap-
proximation error becomes greater than 0.001, and with 8-bit quantization the stopband
approximation erroris over 10 timesas large as specified. ‘Thus, we see that at least 14-bit
coclicients are required for a direct form implementation of the system, However, this
is not a serious limitation, since 16-bit or 14-bit coefficients are well matched to many
of the technologies that might be used to implement such a filter.
‘The effect of quantization of the filter coefficieats on the locations of the zeros
of the filter is shown in Figure 6.55. Note that in the unquantized case, shown in Fig-
ure 6.55(a), the zeros are spread around the z-plane, although there is some clustering
‘on the unit circle. The: zeros on the unit cirele are primarily responsible for developing,
the stopband attenuation, whereas those at conjugate reciprocal locations off the unit
circle are primarily responsible for forming the passband. Note that little difference
is observed in Figure 6.554b) for 16-bit quantization, but in Figure 6.55(c), showing
13-bit quantization, the zeros on the unit cirele have moved significantly. Finally, in Fig
ure 6.55(d), we sve that 8-hil quantization causes several of the zeros on the unit circle
to pair up and move off the circle to conjugate reciprocal locations. This behavior of
the zeros explains the behavior of the frequency response shown in Figure 6.54.
‘A final point about this example is worth mentioning. All of the unquantized
coetficients have magnitudes less than 0.5. Consequently, if all of the coefficients (and
therefore, the impulse response) are doubled prior to quantization, more efficient use
of the available bits will result, corresponding in effect to increasing & by 1, In Table 6.
and Figure 6.54, we did not take this potential for increased accuracy into account.
6.8.6 Maintaining Linear Phase
So far, we have not made any assumptions about the phase response of the FLR sys-
tem, However, the possibility of generalized linear phase is one of the major advan-
tages of an FIR system, Recall that a linear-phase FIR system has either a symmetric
(ALM =n} = hlml) or an antisymmetric (1M — n] = —hln)) impulse response. ‘These
Jinear-phase conditions are easily preserved for the direct form quantized system. Thus,
all the systems discussed in the example of the: previous subsection have a precisely
linear phase, regardless of the coarseness of the quantization. This can be seen in the
way in which the conjugate reciprocal locations are preserved in Figure 6.55
Figure 6.55(d) suggests that, in situations where quantization is very coarse or for
high-order systems with closely spaced zeros, it may be worthwhile to realize smaller
sets of zeros independently with a cascade form FIR system. To maintain linear phase,
each of the sections in the cascade must also have linear phase, Recall that the zeros
of a linear-phase system must occur as illustrated in Figure 6.34. For example, if-we use
2" .order sections of the form (1 +az~' +2°?) for each complex-conjugate pair of zeros:
on the unit cirele. the zero can move only on the unit circle when the coefficient a isSection 68
The Etfects of Coeiient Quantization 435,
FIR Lops Filter: FIR Lospuss Biter:
Ungsinced Coeliients 165it Celcents
a z
e 2
é =
Real part
@) ®
FIR Low pass Fiter, FIR Lowpass Fates
Ishi Cooficiens Sit Coeliients
Imaginary past
maginary part
Real part Real part
©) a
Figure 6.55 Effect of impulse response quantization on zeros of Hi2) (a) Un~
‘quantized. (b) 16-bit quantization. (c) 13-bit quantization, (d) 8-bit quantization.
‘quantized. This prevents zeros from moving away from the unit circle, thereby lessening
their attenuating effect. Similarly, real zeros inside the unit circle and at the reciprocal
location outside the unit circle would remain real. Also, 2er08 at = +L can be realized
exactly by 1*-order systems. Ifa pair of complex-conjugate zcros inside the unit circle is
realized by a 2"¢-order system rather than a 4"*-order system, then we must ensure that,
for each complex zero inside the unit circle, there is a conjugate reciprocal zero outside
the unit circle, This can be done by expressing the 4""-order factor corresponding to
zeros at z = rel and 2 =rte-#* as
Leet - dz? $e:
(6.89)
(1 = 2r cosdz
= 2reos dz! +2436 Chapter Structures for Discrete-Time Systems
Figure 6.56 Subsystem to implement 4"-order factors in a linear-phase FIR
system such that linearity of the phase is maintained independently of parameter
‘quantization
‘This condition corresponds to the subsystem shown in Figure 6.56, This system uses the
same coefficients, ~2r cos # and r2, to realize both the zeros inside the unit circle and
the conjugate reciprocal zeros outside the unit circle. Thus, the linear-phase condition is
preserved undex quantization. Notice that the factor (1~2r cos 27} 4 r22~2) is identical
tothe denominator of the 2-order direct form TTR system of Figure 6,49, Therefore, the
set of quantized zeros is as depicted in Figure 6.50. More details on cascade realizations
of FIR systems are given by Herrmann and Schiissler (1970b),
6.9 EFFECTS OF ROUND-OFF NOISE IN DIGITAL FILTERS
Difference equations realized with finite-precision arithmetic are nonlinear systems
Although itis important in general to understand how this nonlinearity affects the per-
formance of discrete-time systems, a precise analysis of arithmetic quantization effects
is generally not required in practical applications, where we are typically concerned with
the performance of a specific system. Indeed, just as with coefficient quantization, the
most effective approach is often to simulate the system and measure its performance.
For example, a common objective in quantization error analysis is to choose the digital
word length such that the digital system is a sufficiently accurate realization of the de-
sired linear system and at the same time requires a minimum of hardware or software
complesity, The digital word length can, of course, be changed only in steps of 1 bit,
and as we have already seen in Section 4.8.2, the addition of 1 bit to the word length
reduces the size of quantization errors by a factor of 2. Thus, the choice of word length is
insensitive to inaccuracies in the quantization error analysis; an analysis that is correct
to within 30 to 40 percent is often adequate. For this reason, many of the important
effects of quantization can he studied using lincar additive noise approximations, We
develop such approximations in this section and illustrate their use with several ex
amples. An cxception is the phenomenon of zero-input limit cycles, which are striclly
nonlinear phenomena. We restrict our sttidy of nonlinear models for digital filters to a
brief introduction to zero-inpul limit cycles in Section 6.10.
6.9.1 Analysis of the Direct Form IR Structures
Tointroduce the basic ideas, let us consider the direct form structure for an LI discrete
time system. The flow graph of a direct form 1 2"!-order system is shown inSection 6.9
Effects of Round-off Noise in Digital Fiters
se
yal
437
Figure 6.57 Models for direct form 1
system, (2 Infnite-precision madel
(b) Nonlinear quantized model
{c) Linear noise model
Figure 6.$7(a), The general "order difference equation for the direct form T
structure is
x
a
Late e+ Yoextn -A, (6.90)
fe ms438
Chapter Structures for Discrete-Time Systems
and the system function is,
(691)
Let us assume that all signal values and coefficients are represented by (B + 1)-bit
fixed-point binary numbers. Then, in implementing Eq. (6.90) with a (B + 1)-bit adder,
it would be necessary to reduce the length of the (28 + 1)-bit products resulting from
multiplying two (B + 1)-bit numbers back to (# + 1) bits. Since all numbers are treated as
fractions, we would discard the least significant & bits by either rounding or truncation.
This is represented by replacing each constant multiplier branch in Figure 6.57(a) by a
constant multiplier followed by a quantizer, as in the nonlinear model of Figure 6.57(b)..
The difference equation corresponcting to Figure 6.57(b) is the nonlinear equation
¥ u
Sta} = YO Olax dla — kl) =P Otbextn — ka (692)
1 7
Figure 6.57(c) shows an alternative representation in which the quantizers are
Teplaced by noise sources that arc equal to the quantization error at the autput of cach
quantizer. For example, rounding or truncation of a product dxf] is represented by a
noise source of the form
eln] = Qlbrin}} ~ bet). (6.93)
If the noise sources are known exactly, then Figure 6.57(c) is exactly equivalent to
Figure 6.57(b). However, Figure 6.57(c) is most useful when we assume that each quan-
tization noise source has the following properties:
1. Ewch quantization noiscsource efn] isa wide-scnse-stationary white-noise process,
2, Each quantization noise source has a uniform distribution of amplitudes over one
quatitization interval
3. Each quantization noise source is uncorrelated with the input to the corresponding
quantizer, all other quantization neise sources. and the input to the system.
‘These assumptions are identical to those made in the analysis of A/D conversion
in Section 48. Strictly speaking, our assumptions here cannot be valid, since the quanti-
zation ertor depends directly on the input to the quantizer. This is readily apparent for
constant and sinusoidal signals. However, experimental and theoretical analyses have
shown (see Bennett, 1948; Widrow, 1956, 1961; Widrow and Kollar, 2008) that in many
situations the approximation just described leads to accurate predictions of measured
statistical averages such as the mean, variance, and correlation function. This is truc
when the input signal is a complicated wideband signal such as speech, in which the
signal fhuctuates rapidly among all the quantization levels and traverses many of those
levels in going from sample to sample (see Gold and Rader, 1969), The simple linear-
noise approximation presenicd here allows us to characterize the noise generated in the
system by averages such as the mean and variance and to determine how these averages
are modiied by the system.Section 6.9
Effects of Round-off fotse in Digital Filters 439
Pele)
1
is
Ano
a
2
(a)
pee)
1
&
ay Figure 6.58 Probability density
function for quantization errors,
wv) (@) Rounding. (bj Truncation.
For (B + 1)-bit quantization, we showed in Section 6.7.1 that, for rounding,
lin low
52 "seul 52"%, (6.94a)
and for two’s-complement truncation,
-2-8 elnl <0. (6.94b)
‘Thus, according to our second assumption, the probability density functions for the
random variables representing quantization error are the uniform densities shown in
Figure 6.58(a) for rounding and in Figure 6,58(b) for truncation. The mean and variance
for rounding are, respectively.
(6.95a)
(6.95b)
For two's-complement truncation, the mean and varianee are
— (6.96a)
(6.96b)
In general, the autocorrelation sequence of a quantization noise source is, according to
the first assumption,
deel] = 07
In the case of rounding, which we will assume for convenience henceforth m = 0,80 the
autocorrelation function is d,e[2] = 07 4[n], and the power spectrum is ®,-(e/) = a?
for \w| < m. In this case, the variance and the average power are identical. In the
Bin] + m2. (6.97)Chapter 8 Structures for Discrete-Time Systerrs
elal
Bla} = vn} Fe
Figure 6.59 Linear-noise model for direct form | with noise sources combined,
case of truncation, the mean is not zero, so average-power results derived for rounding
must be corrected by computing the mean of the signal and adding its square to the
average-power resulls for rounding.
With this mode! for each of the noise sourcesin Figure 6.57(c), we can now proceed
to determine the effect of the quantization noise on the output of the system. To aid
Us in doing this, it is helpful to observe that all of the noise sources in that figure are
effectively injected between the part of the system that implements the zeros and the
part that implements the poles. Thus, Figure 6.59 is equivalent to Figure 6.57(c) if e[a]
in Figure 6.59 is
eln] = enn} + e,{n] + e2ln] + esfn] + eatn (698)
Since we assume that all the noise sources are independent of the input and independent
of each other, the variance of the combined noise sources for the 2"4-order direct form
Tease is
2 val agh to2ag? as tt
oh to2 to, +02, +02 =5 : 6.99)
Ty +86, + ey + Os + 8 7 (6.99)
and for the general direct form I case, itis
28
of =(M41+N) iz (6.100),
To obtain an expression for the output noise, we note from Figure 6.59 that the
system has two inputs, x[n] and ef]. and since the system is now assumed to be linear,
the output can be represented as 121] = yInl + fin], where yfa] is the response of the
ideal unquantized system Lo the input sequence x[a] and f[a] is the response of the
system to the mput ejn]. ‘The output y[n] is given by the difierence equation (6.90), but
since elit] is injected after the zeros und before the poles, the output noise satisfies the
difference equation
Fl) = Daa sin e+e: (6.101)
1
i.e., the properties of the output noise in the direct form I implementation depend only
on the poles of the system.
To determine the mean and variance of the output noise sequence, we can use
some results from Section 2.10. Consider a linear system with system function H.y(z)Section 6.9
Effects of Round-off Noise in Digital Filters 4a
with a white-noise input el] and corresponding output fl]. Then, from Es. (2.184)
and (2.185), the mean of the output is
me S2 hep] = me gle (6.102)
Since me = 0 for rounding, the mean of the output will be zero, so we need not be
concerned with the mean value of the noise if we assume rounding, From Eqs. (6.97)and
(2.190), it follows that, because, for rounding, elm ]is azero-mean white-noise sequence,
the power density spectrum of the output noise is simply
Preto) = Pppfeh”) = o2|Heple™)P. (6.103)
From Eq. (2.192), the variance of the output noise can be shown to be
ef 1 f” iy
oan _Pritoldes = 085 [Hey (2)Pde. (6.104)
Using Parseval's theorem in the form of Eq. (2.162), we can also express o7 as
(6.10:
When the system function corresponding to her[n] isa rational function, as it will always
be for difference equations of the type considered in this chapter, we can use Eq. (A.66)
in Appendix A for evaluating infinite sums of squares of the form of Eq, (6.105).
‘We will use the results summarized in Eqs. (6.102)-(6.105) often in our analysis
cof quantization noise in linear systems. For example, for the direct form 1 system of
Figure 6.59, H,;(z) = 1/A(2); Le, the system function from the point where all the
noise Sources are injected to the output consists only of the poles of the system function
A(z) in Eq. (6.91). Thus, we conclude that, in general, the total output variance owing
to internal round-off or truncation is
2781 pt do
T2 2x Jig (ACE
2S 2
= ELE NIE D bert,
oF =(M+14N)
(6.106)
where hey{nt| is the impulse response corresponding to Her (2) = 1/A(e). The use of the
preceding results is illustrated by the following examples.
Example 6.11 Round-off Nolse in a 1*-Order System
2 Suppose thal we wish to implement a stable system having the system function
q b
We) =
@= Ta
Figure 6.60 shows the flow graph of tae lineur-noise model for the implementation
in which products ate quantized before addition. Each noise source is filtered by the
"system from en] to the output, for which the impulse response is ef) = aula.
iid (6.107)az Chapter 6 Structures for Discrete-Time Systems
4g Since M = Oand = 1 for this example, from Eq. (6,103), the power spectrum of the
‘ouEpUt noise is
er) (6.108)
and the total noise variance athe output is
we,
Set = (6109)
From Pa, (6.109), we see that the output noise variance increases as the pole at
approaches the unit circle, Thus, to maintain the noise variance below a specified
level as jal approaches unity, we must use longer word lengths, The following example
also illustrates this point.
el) = ele] + exla)
In] Bln] =o) + Fl)
Figure 6.60 1*-order linear noise made!
Example 6.12 Round-off Noise in a 2"-Order System
$8 Consider a stable 2"J-order direct form I system with system function
bot bye7! + byzm?
He) Nd ree-Hy
(6.110)
Tei,
‘The linear-noise model for this system is shown in Figure 6.57(0), or equivalently,
Figure 6.59, with a; — 2r cos and az = —r?. In this case, the total output noise power
can be expressed in the form.
ora pe des
5 Se og Ware eTou are Pee oy
Using Eq. (A.66) in Appendix A, the output noise power is found to be
28
eas (He (1)
As in Exumple 6.11, we see thal as the complex conjugate poles approach the unit
circle (r — 1), the (otal autput noise variance increases, thus requiring longer word
lengths to maintain the variance below a prescribed level,
The techniques of analysis developed so far for the direct form structure can also
bbe applied to the direct form II structure. The nonlinear difference equations for theSection 6.9
Effects of Round-off Noise in Digital Fiters 443
direct form HI structure are of the form
w
Bln] = D> Olaxiile — KI] + xLn], (6.1134)
=
aM
$lal = D> Olbxtbln — Fy] (6.1136)
far]
Figure 6.61(a) shows the linear-noise model for a 2-order direct form II system, A
noise source has been introduced after each multiplication, indicating that the products
are quantized to (+ 1) bits before addition, Figure 6.61(b) shows an equivalent linear
model, wherein we have moved the noise sources resulting from implementation of the
poles and combined them into a single noise source e,{1] = e3in| + eal] at the input.
Likewise. the noise sources due to implementation of the zeros are combined into the
single noise source ¢p[] = eg{n]-tex{nl--eala) that is added directly to the output. From
this equivalent model. it follows that for M zeros and N poles and rounding (m, = 0),
the power spectrum of the output noise is
28 qe
Prplo) = NAS IMP + MED (e114)
and the output noise variance is
P 2 28
a fine Pda ony
a 4 (ot1sy
DY hea? aren
2
That is, the white noise produced in implementing the poles is filtered by the entire
system, whereas the white noise produced in implementing the zeros is added dircetly
to the output of the system. In writing Eq. (6.115), we have assumed that the N noise
sources at the input are independent, so that their sum has N’ times the variance of a
single quantization noise source. The same assumption was made about the (M + 1)
noise sources at the output. These results are easily modified for two's-complement
truncation, Recall from Eqs. (6.95a)-(6.95b) and Eqs. (6.96a)-(6.96b) that the variance
of a truncation noise source is the same as that of a rounding noise source, but the mean
of a truncation noise source is not zero. Consequently, the formulas in Eqs. (6.106) and
(6-115) for the total output noise variance also hold for truncation. However, the output
noise will have a nonzero average value that can be computed using Eq. (6.102).
A comparison of Fg. (6.106) with Eq, (6.1 15) shows thatthe direct form [and direct,
form I structures are affected differently by the quantization of products in implement-
ing the corresponding difference equations. In general, other equivalent structures such
as cascade, parallel, and transposed forms will have a total output noise variance differ-
ent from that of either of the direct form structures. However, even though Egs. (6.106)
and (6.115) axe different, we cannot say which system will have the smaller output noise
variance unless we know specific valucs for the coefficients of the system. Inother words,
it is not possible to state that a particular structural form will always produce the least
output noiseChapter 6 Structures for Discrete-Time Systems
bp
“TT ic)
eal ele
@
eal] ‘
a a
sh i Sl
Liat
4%
oe
|
< | igure 6.61 Linear-noise models for
a | direct form I. (a) Showing quantization
of individual products. (b) With noise
a) sources combined.
It is possible to improve the noise performance of the dircet form systems (and
therefore cascade and parallel forms as well) by using a (2B-+1)-bit adder to accumulate
the sum of products required in both direct form systems. For example, for the direct
form implementation, we could use a difference equation of the form
¥ ow
Sin] = o| Spout Soh -ah (6.116)
m
i.e, the sums of products are accumulated with (28 + 1)-or (28 ++ 2)+bit accuracy, and
the result is quantized to (B + 1) bits for output and storage:in the d
direct form I case, this means that the quantization noise is still filtered by the poles, but
the factor (M-++1+N ) in Eq. (6, 106) is replaced by unity. Similarly. for the direct form TT
realization, the difference equations (6.113a)-(6.113b) can respectively be replaced by
win] =O [Zam -# at] (6.107)
and
i
ait 0[ Src —a (6.1176)
io FSection 6.9
Effects of Round-off Noise in Digital Fitters 45
This implies a single noise source at both the input and output, so the factors V and
(M ~1)in Eq. (6.115) are each replaced by unity. Thus, the double-length accumulator
provided in most DSP chips can be used to significantly reduce quantization noise in
direct form systems
6.9.2 Scaling in Fixed-Point Implementations of IIR
Systems
“The possibility of overflow is another important consideration in the implements
of IIR systems using fixed-point arithmetic. If we follow the convention that each fixed-
point number represents a fraction (possibly times a known scale factor), each node in
the structure must be constrained to have a magnitude less than unity to avoid overflow.
If we[n] denotes the value of the k'" node variable, and Ay[m] denotes the impulse
response [rom the input x[71] lo the node variable wel], then
B= | YD xe = onytete (6.118)
The bound
WwelnT| Samax Velo (6.119)
is obtained by replacing x[n — ml by its maximum value xmax and using the fact that the
magnitude of a sum is less than or equal to the sum of the magnitudes of the summands.
Therefore, a sufficient condition for |w|7]| < Lis
1
aa (6.120)
YS thamtl
for all nodes in the flow graph. If Ymax does not satisfy Eq. (6.120), then we can multiply
xin] by a sealing multiplier s at the input to the system so that sxmax Satisfies Eq. (6.120)
for all nodes in the flow graph; ic.,
a (6.121)
max [2mm]
Scaling the input in this way guarantees that overflow never occurs at any of the modes in
the flow graph, Equation (6.120) is necessary as well as sufficient, since an input always
exists such that Eq. (6.119) is satisfied with equality. (See Eg, (2.70) in the discussion
of stability in Section 2.4.) However, Eq, (6.120) leads to a very conservative scaling of
the input for most signals
Another approach to scaling is to assume that the input is a narrowband signal,
modeled as x[ 1] = Xinax COS gan. In this case, the node variables will have the form
wel] = | ae!) xmax coslogn + £ Hele!) (6.122)
Therefore, overflow is avoided for all sinusoidal
RAS, Hele! <1 (6.123)
nals ifChapter 6 Structures for Discrete-Time Systems
or if the input is scated by the scale factor s such that
1
Simex < ——_— (6.124)
max [Ha(e!)]
Still another sealing approach is based on the energy F = %y x(n? of the input
signal, We can derive the scale factor in this case by applying the Schwarz inequality
(see Bartle, 2000) to obtain the following inequality relating the square of the node
signal lo the energies of the input signal and the node impulse response:
lr
puxia? = Lf" neti steed
s (2f lately) (s f Ix(e™Pde) sete)
Therefore, if we scale the input sequence values by s and apply Parseval’s theorem, we
see that {ay [n]/? < 1 for all nodes k i
s (= pe?) EB pl (6.125)
op x wna
Since it can be shown that for the 4" node,
% 1
‘Ds van} é max Mite]
= YS Malell (61m)
it follows that (for most input signals) gs. (6.121), (6.124), and (6.126) give three de-
creasingly conservative ways of scaling the input to a digital filter (equivalently decreas-
ing the gain of the filter). Ot the three, Eq. (6.126) is generally the easiest to evaluate
analytically because the purtial Lraction method of Appendix A can be used: however
use of Eq. (6.126) requires an assumption about the mean-squared value of the signal
E, On the other hand, Eq, (6.121) is difficult to evaluate analytically, except for thesim-
plest systems, Of course, if the filter coefficients are fixed numbers, the scale factors can
be estimated by computing the impulse response or frequency response numerically.
If the input must be scaled down (s < 1), the signal-to-noise ratio (SNR) at the
output of the system will be reduced because the signal power is reduced, but the noise
power is dependent only on the rounding operation. Figure 6.62 shows 2"-order direct
form Land direct form II systems with scaling multipliers at the input. In determining
the scaling multiplier for these systems, it is not necessary to examine each node in
the flow graph. Some nodes do not represent addition and thus cannot overflow: Other
nodes represent partial sums. If we use nonsaturation two's-complement arithmetic,
such nodes are permitted to overflow if certain key nodes do not. For example, in
re 6,62(a), we can focus om the node enclosed by the dashed cirele, In the figure, the
scaling multipticr is shown combined with the his. so that the noise source is the sameSection 6.9
Effects of Round-off Noise in Digital Fiters a7
Figure 6.62 Scaling of direct form
systems. (a) Direct form |. (b} Direct
form I
as in Figure 6.59; ie., it has five times the power of a single quantization noise source.'"”
Since the noise source is again filtered only by the poles, the output noise power is the
same in Figures 6.59 and 6,62(a). However, the overall system function of the system
in Figure 6.62(a) is s/1(2) instead of H(z), 80 the unquantized component of the output
3fn] is sy{a] instead of yf. Since the noise és injected after the scaling, the ratio of
signal power to noise power in the scaled system is s* times the SNR for Figure 6.59.
Because s < 1 if scaling is required to avoid overflow, the SNR is reduced by sealing,
‘The same is true for the direct form IT system of Figure 6.62(b). In this ease, we
must determine the scaling multiplier to avoid overflow at both of the circled nodes
Again, the overall gain of the system is s times the gain of the system in Figure 6.61(B)
but it may be necessary to implement the scaling multiplier explicitly in this case to
avoid overflow at the node on the left. This scaling multiplier adds an additional noise
component to ea[n], $0 the noise power at the input is, in general, (N + 1)2~24/12.
Otherwise, the noise sources are filtered by the system in. exactly the same way in both
Figure 6.61(b) and Figure 6.62(b), Therefore, the signal power is multiplied by s?, and
the noise power at the output is again given by Eq. (6.115), with N replaced by (N + 1).
The SNR is again reduced if scaling is required to avoid overflow.
"This eliminates a separate scaling multiplication and quantization naise soures, However, seing
{and quantizing) the ys can change the frequency response ol the system, Ifa separate input scaling multiplier
precedesthe implementation the zeros in Figure 6.62(a), thea un additional quantization aise source would
contribute tothe output noise after going through the entire systemChapter 6 Structures for Discrete-Time Systems
Example 6.13 Interaction Between Scaling and Round-off
Noise
{To illostrate the interaction of scaling and round-off noise, consider the system of
“) Example 6.11 with system function given by Eo. (6.107). If the scaling multiplier is
combined with the coefficient b. we oblain the flow graph of Figure 6.63 forthe scaled
system. Suppose that the input is white noise with amplitudes uniformly distributed
between —1 and +1, Then the total signal variance is a? = 1/3. To guarantee no
‘overflow in computing §[n}, we use Eg, ¢6.121) to compute the scale factor
1 11a)
=> (6.128)
Yair
a0
‘The output noise variance was determined in Example 6,11 to be
28
2222 2
wate (6.129)
and since we again have two (B + )-bit rounding operations, the noise power i the
cutput is the same, Lc.,0, = 0. The variance ofthe output yf] due tothe scaled
input so{n]is
=s7of. (6.130)
( =a)? 2
a(S (6.131)
my) ae
TS at
‘As the pole of the system approaches the unit circle, the SNR decreases because the
quantization noise is amplified by the system and because the high gain of the system
forces the input to be sealed down (o avoid overflow. Again, we see that overflow and
quantization noise work in opposition to decrease the performance of the system.
eal
sb
To
sind Sled oyled
Figure 6.63 Scaled 1*-arder system,
6.9.3 Example of Analysis of a Cascade HR Structure
‘The previous results of this section can be applied directly to the analysis of either paral-
fel or cascade structures composed of 2°4-order direet form subsystems. The interaction
of scaling and quantization is particularly interesting in the cascade form. Our generalSection 6.9
Effects of Round-off Notse in Digital Fiters 449
comments on cascade systems will be interwoven with a specific example.
An elliptic lowpass filter was designed to meet the following specifications:
0.99 = [He = 1.01, — \a < 05x,
L(e) = 0.01, 056m < ol Sm
“The system function of the 1
sulting system is
Gy = 0.079459 T] (;
el
the coefficients are given in Table 6.6. Notice that all the zeros of H(z) are on the
le in this example; however, that need not be the case in general,
Figure 6.64(a) shows a flow graph of a possible implementation of this system asa
cascade of 2"4-order transposed direct form LI subsystems, The gain constant, 0.079459,
issuch that the overall gain of the system is approximately unity in the passband, and it
is assumed that this guarantees no overfiow at the output of the system. Figure 6.64(a)
shows the gain constant placed at the input to the system. This approach reduces the
amplitude of the signal immediately, with the result that the subsequent filter sections
must have high gain to produce an overall gain of unity. Since the quantization noise
sources are introduced after the gain of 0.079459 but are likewise amplified by the rest
of the system. this is not a good approach. Ideally, the overall gain constant, being, less
than unity, should be placed at the very end of the cascade, so that the signal and noise
will be attenuated by the same amount. However. this creates the possibility of overflow
along the cascade, Therefore, a better approach is to distribute the gain among the three
stages of the system, so that overflow is just avoided al each stage of the cascade. This
distribution is represented by
HG (6.133)
where sjs253 = 0.079459. The scaling multipliers can be incorporated into the coetfi-
cients of the nume rators of the individual system functions H(z) = sy H(z), as in
5 afi age 3
4G) =[] ( ai bu ) Te. (6.134)
where bj = by = se and 61, = 4614. The resulting scaled system is depicted in
Figure 6.64(b),
Also shown in Figure 6.64(b) are quantization noise sources re presenting the quan-
tization of the products before addition. Figure 6.64 (c) shows an equivalent noise model,
) = 0.079459] 17402). (6.132)
sy (sq HazhssH3 (2).
2
2
+0,
TABLE 6.6 COEFFICIENTS FOR
ELLIPTIC LOWPASS FILTER IN
CASCADE FORM
kaw ae bu
1 OATBRRE 0.72150. 71985
2 OASTIT 0.010077 081109
32 —DoSMTF9 DOU Oa LLAS2450
o7D4s9
Ghapter'6 Structures for Discrete-Time Systems
Sta
Figure 6.64 Models for 6"-order cascade system with transposed direct form lt subsys-
‘tems. (a) Infinite-precision model, (b) Linear-noise model for scaled system, showing quan-
ization of individual multiplications. (c} Linear-noise model with noise sources combined.
for which it is recognized that all the noise sources in a particular section are filtered
only by the poles of that section (and the subsequent subsystems). Figure 6.64(c) also
uses the fact that delayed white-noise sources are still white noise and are independent
of all the other noise sources, so that all five sources in a subsection can be combined
into a single noise source having five times the variance of a single quantization noise
source;! Since the noise sources are assumed independent, the variance of the output
“this discussion can be generalized se show that the transpased direst form II nas the same noise
behavior asthe ditect form Esystem,Section 6.
Effects of Round-off Noise in Digit Fers 451
noise is the sum of the variances owing to the three noise sources in Figure 6.64(c).
Therefore, for rounding, the power spectrum of the output noise is
0 BHy ei Pat Ag(ely? — BiH ei? 1
Pr pla) Se eel Cie
®) [ Teese Tan * jae) (ens)
and the total output noise variance is
2-28 fa f SBOP HEMP
1 [2x Tyee
(6.136)
L ft stitts(e!®)h 1
time ‘Axle A Dae L meme
‘Ifa double-length accumulator is available, it would be necessary to quantize only
the sums that are the inputs to the delay elements in Figure 6.64(b). In this case the
factor of 5 in Eqs. (6.135) and (6.136) would be changed to 3. Furthermore, if a double-
length register were used to implement the delay elements, only the variables ity]
would have to be quantized, and there would be only one quantization noise source per
subsystem. In that case, the factor of 5 in Eqs, (6.135) and (6.136) would be changed to
unity,
The scale factoxs s, are chosen to avoiel overffow at points along the cascade system.
We will use the scaling convention of Eq. (6.124). Therefore, the scaling constants are
chosen to satisfy
si max (eI <1, (6.137)
suse may (He!) Fate!) = 1 (6.137b)
sisps3 = 0.079459, (6.137)
The last condition ensures that there will be no overflow al the outputof thesystem
for unit-amplitude sinusoidal inputs, because the maximum overall gain of the filter is
unity. For the coefficients of Table 6.6, the resulting scale factors are s; = 0.186447,
$9 = 0.529236, and s3 = 0.805267.
Equations (6.135) and (6.136) show that the shape of the output noise power spec-
trum and the total output noise variance depends on the way that zeros and poles are
paired to form the 2"4-order sections and on the order of the 2"order sections in the
cascade form realization. Indeed, it is easily seen that, for N sections, there are (NI):
ways to pair the poles and zeros, and there are likewise (N!) ways to order the result
ing 2"4-order sections, a total of (A)? different systems. In addition, we can choose
either direct form 1 or direct form II (or their transposes} for the implementation of
the 2""-order sections, In our example, this implies that there are 144 different cascade
systems to consider, if we wish to find the system with the lowest output noise variance.
For five cascaded sections, there would be 57,600 different systems. Clearly, the complete
analysis of even low-order systems is a tedious task, since an expression like Eq. (6.136)
must be evaluated for each pairing and ordering. Hwang (1974) used dynamic pro-
gramming and Liu and Peled (1975) used a heuristic approach to reduce the amount of
computation,452
Ghapter 6 Structures for Discrete-Time Systems
e-plane
Unit cele
Figure 6.65 Pole-zer0 plot for
6""-arder system of Figure 6.64,
i showing pairing of poles and zeros,
Even though finding the best pairing and orderingmay require computer optimiza-
tion, Jackson (1970a, 1970b, 1996) found that good results are: almost always oblained
by applying simple rules of the following form:
1. The pole that is efosest to the unit circle should be paired with the zero that is
closest to it in the z-plane.
2. Rule I should be repeatedly applied until all the poles and zeros have been paired.
3. The resulting 2""-order sections should be ordered according to either increasing
closeness to the unit cirele or decreasing closeness to the unit circle,
‘The pairing rules are based on the observation that subsystems with high peak
gain are undesirable because they can cause overflow and because they can amplify
quantization noise. Pairing a pole that is close to the unit circle with an adjacent zero
tends to reduce the peak gain of that section. These heuristic rules are implemented in
design and analysis tools such as the MATLAB Iunction 2p2sos.
One motivation for rule 3 is suggested by Eq. (6.135). We see that the frequency re-
sponses of some of the subsystems appear more than once in the equation for the power
spectrum of the output noise. If we do not Want the output noise variance spectrum to
have a high peak around a pole that is close to the unit circle, then it is advantageous
to have the frequency-response component owing to that pole not appear frequently in
Eq, (6.135). This suggests moving such ‘“high Q poles to the beginning of the cascade.
On the other hand, the frequency response from the input to a particular node in the
flow graph will involve a product of the frequency responses of the subsystems that pre-
cede the node. Thus, to avoid excessive reduction of the signal level in the early stages
of the cascade, we should place the poles that are close to the unit circle last in order.Section 6.9 Effects of Round-off Noise in Digita Filters 453
Clearly then, the question of ordering hinges on a variety of considerations, including
total output noise variance and the shape of the output noise spectrum, Jackson (1970,
1970) used L , norms to quantify the analysis of the pairing-and-ordering problem and
gave a much more detailed set of “rules of thumb” for obtaining good results without
having to evaluate all possibilities
The pole-zero plot for the system in our example is shown in Figure 6.65. The
paired poles and zeros are circled. In this case, we have chosen to order the sections
from least peaked to most peaked frequency response. Figure 6.66 illustrates. how the
frequency responses of the individual sections combine to form the overall frequency
response. Figures 6.66(a)-(c) show the frequency responses of the individual unscaled
subsystems. Figures 6.66(d)-(I) show how the overall frequency response is built up.
Notice that Figures 6.66(d)-(f) demonstrate that the scaling Figs. (6.137a)-(6.137¢) en-
sure that the maximum gain from the input to the output of any subsystem is less than
unity, The solid curve in Figure 6.67 shows the power spectrum of the output noise for
the ordering 123 (least peaked to most peaked). We assume that B + 1 = 16 for the
plot. Note that the spectrum peaks in the vicinity of the pole that is closest to the unit
Circle, The dotted curve shows the power spectrum of the output noise when the scetion
order is reversed (i.e., 321). Since section 1 has high gain at low frequencies, the noise
spectrum is appreciably larger at low frequencies and slightly lower around the peak
‘The high Q pole still filters the noise sources of the first section in the cascade. soit still
tends to dominate the spectrum, The total noise power for the tWo orderings turns out
to be almost the same in this case.
‘The example we have just presented shows the complexity of the issues that arise
in fixed-point implementations of cascade LR systems, ‘The parallel form is somewhat
simpler because the issue of pairing and ordering does not arise. However, scaling still
required (o avoid overfiow in individual 2" order subsystems and when the outputs of
the subsystems are summed to produce the overall output. The techniques that we have
developed must therefore be applied for the parallel form as well. Jackson (1996) dis-
cusses the analysis of the parallel form in detail and concludes that its total output noise
power is typically comparable to that of the best pairings and orderings of the cascade
form. Even sa, the cascade form is more common, because, for widely used TTR filters
such that the zeros of the system function are on the unit circle, the cascade form can
be implemented with fewer multipliers and with more control over the locations of the
zeros,
6.9.4 Analysis of Direct-Form FIR Systems
Since the direct form [and direct form ILIIR systems include the direct form FIR system
asa special case (ie, the case where all coefficients ay in Figures 6.14 and 6.15 are zero),
the results and analysis techniques of Sections 6.9.1 and 6.9.2 apply to FIR systems if we
eliminate all reference to the poles. of the system function and eliminate the feedback
paths in all the signal flow graphs.
‘The direct form FIR system is simply the discrete convolution
a
yl] = > Alelatn — 41 (6.138)
i=20 —
of
3 -2of
woh
-) ! 1 ! 1
0 oT 7
Radian frequency (o)
®
En
oF
§ 20h
40
a a
° a 7
Radian frequency (a)
ws
a0
2 2
#0
Figure 6.66 Frequency-response
2a : functions for example system,
0 ae Der (Re (a) 2010949 1H (2*.
Radian frequerey (0) (0) 2010939 IH(6"|
Cr {€) 2010949 13 (2%)
454Oe Oar Om
Radian frequency (0)
@
1
Oar leer
Radian frequency (a)
©
08 *
or Or
Radian frequency (o)
0
Figure 6.66 (continued
(0) 20iogig 1H4co*D,
fe) 20logy9 iM; (eh HH5 (2h),
{1 2010040 hcp col age)
= 20 log ig 1H'(e 9h.
455455
Chapter Structures for Discrete-Time Systems
0 Gf 02 03 04 OS 06 07 08 09 1
Radian frequency ¢a)
Figure 6.67 utput noise power spectrum for 123 ordering (solid line) and 321
‘ordering (dashed line) of 2"-order sections.
Figure 6.68(a) shows the ideal unquantized direct form FLR system, and Figure 6.68(b)
shows the lincar-noisc mode! for the system, assuming that all products are quantized
before additions are performed. The effect is to inject (M + 1) white-noise sources
directly at the output of the system, so that the total output noise variance is
2-8
oF a +) (6.139)
This is cxactly the result we would obtain by setting N = 0 and hes{n] = Slr} in
Eqs. (6.106) and (6.115). When a double-length accumulator is available, we would
need to quantize only the output. Therefore, the factor (Mf +1) in Eq. (6.139) would be
replaced by unity. This makes the double-length accumblator a very attractive hardware
feature for implementing FIR systems
Overflow is also a problem for fixed-point realizations of FIR systems in direct
form. For two's-complement arithmetic, we need to be concerned only about the size
of the output, since all the other sums in Figure 6.68(b) are partial sums, Thus, the
impulse-response coe fficients.can be scaled to reduce the possibility of overflow, Scaling,
multipliers can be determined using any of the altematives discussed in Section 6.9.2.
Of course, scaling the impulse response reduces the gain of the system, and therefore
the SNR at the output is reduced as discussed in that section.Section 89 Effects of Rourd-off Noise in Digital Filters 487
go) Paty dae) patsy iyAM
all) ated
>
eateh) ewe
or] + Fle}
)
Figure 6.68. Direct form realization of an FIR system. (2) nfnte- precision model
{b) Linear-noise model
Example 6.14 Scaling Considerations for the FIR System in
Section 6.8.5
‘The impulse-response coetficients forthe system in Section 6.8.5 are given in Table 65.
Simple calculations show, and from Figure 6.54(b) we see, that
Spun =
‘ (5 | ’
4 max | (e!)] = 1,009,
751382,
1679442,
12 These numbers satisfy the ordering relationship in Eq, (6.127). Thus, the system, as
2. given, is scaled so that overflow is theoretically possible for a sinusoidal signal whose
| amplitude is greater than 1/1.009 = 0.9911, but even so, overflow is unlikely for most
signals, Indeed, since the filter has a finear phase, we can argue thal, for widchand
signals, since the gain in the passband is approximately unity, and the gain elsewhere
is ess han unity, the output signal should be smaller than the input signal
In Section 6.5.3, we showed that linear-phase systems like the one in Example 6.14
can be implemented with about half the number of multiplications of the general FIR
system, This is evident from the signal flow graphs of Figures 6.32 and 6,33. In these
cases, it should be clear that the output noise variance would be halved if products
were quantized before addition, However, the utilization of such structures involves a
more complicated indexing algorithm than the dircet form. ‘The architecture of most
DSP chips combines a double-length accumulator with an efficient pipelined multiply.458
Chapter 6 Structures for Discrete-Time Systems
accumulate operation and simple lonping control to optimize for the case of the direct
form FIR system, For this reason, direct form FIR implementations are often most
attractive, even compared with IIR filters that meet frequency-response specifications
with fewer multiplications, since cascade or parallel structures do not permit long se-
quences of multiply-accumulate operations
It Section 6.5.3, we discussed cascade realizations of FIR systems. The results and
analysis techniques of Section 6.9.3 apply to these realizations; but for FIR systems
with no- poles, the pairing and ordering problem reduces to just an ordering problem.
As in the case of IIR cascade systems, the analysis of all possible orderings can be very
difficult ifthe system is composed of many subsystems. Chan and Rabiner (97a, 1973b)
studied this problem and found experimentally that the noise performance is relatively
insensitive to the ordering. Their results suggest that a good ordering is an ordering for
which the frequency response from cach noise source to the outputiis relatively Nal and
for which the peak gain is small
6.9.5 Floating-Point Realizations of Discrete-Time
Systems
From the preceding discussion, itis clear that the limited dynamic range of fixed-point
arithmetic makes it necessary to carefully scale the input and intermediate signal levels
in fixed-point digital realizations of diserete-time systems. The need for such scaling can
be essentially eliminated by using floating-point numeric representations and floating
point arithmetic
In foating-point representations, a real number x is represented by the binary
number 2°, where the exponent c of the scale factor is called the characteristic and
isa fractional part called the mantissa, Both the characteristic and the mantissa are
represented explicitly as fixed-point binary numbers in floating-point arithmetic sys
tems. Floating-point representations provide a convenient means for maintaining both
a wide dynamic range and low quantization noise; however, quantization error mani-
fests itseif ina somewhat different way, Floating-point arithmetic generally maintains its
high accuracy and wide dynamic range by adjusting the characteristic and normalizing
the mantissa so that 0.5 = fy < 1. When floating-point numbers are multiplied, their
characteristics are added and their mantissas are multiplied. Thus, the mantissa must
be quantized. When two floating-point numbers are added, their characteristics must
he adjusted to be the same hy moving the binary point of the mantissa of the smaller
number. Hence, addition results in quantization, too. If we assume that the range of the
characteristic is sufficient so that no numbers become larger than 2°. then quantization
affects only the mantissa, but the error in the mantissa is also sealed by 2°. Thus, a
quantized loating-point number is conveniently represented as
Bax tear tex (6.140)
By representing the quantization error as a fraction ¢ of x, we automatically represent
the fact that the quantization error is scaled up and down with the signal level.
‘The aforementioned properties of floating-point arithmetic complicate the quan-
tization error analysis of floating, point implementations of discrete-time systems First,
noise sources must he inserted both after each multiplication and after each addition.
‘An important consequence is that, in contrast to fixed-point arithmetic, the order inSection 6.10 Zero-Input Lint Cycles in Fixed-Point Realzations of HR Digital Fitters 459
which multiplications and additions are performed can sometimes make a big differ.
ence. More important for analysis, we can no longer justify the assumption that the
quantization noise sources are white noise and are independent of the signal. In fact,
in Eq, (6.140), the noise is expressed explicitly in terms of the signal. Therefore, we can
no longer analyze the noise without making assumptions about the mature of the input
signal. If the input is assumed to be known (e.g, white noise), a reasonable assumption
is that the relative error ¢ is independent of x and is uniformly distributed white noise.
With these types of assumptions, useful results have been obtained by Sandberg
(1967), Liu and Kaneko (1969), Weinstein and Oppenheim (1969), and Kan and Ag-
garwal (1971). In particular, Weinstein and Oppenheim, comparing floating-point and
fixed-point realizations of 1*- and 2"-order IIR systems, showed that if the umber
of bits representing the floating-point mantissa is equal to the length of the fixed-point
word, then floating-point arithmetic leads to higher SNR at the output. Not surpris-
ingly, the difference was found to be greater for poles close to the unit circle. However,
additional bits are required to represent the characteristic, and the greater the desired
dynamic range, the more bits are required for the characteristic. Also, the hardware
to implement floating-point arithmetic is much more complex than that for fixed-point
arithmetic. Therefore, the use of floating-point arithmetic entails an increased word
length and increased complexity in the arithmetic unit, Tts major advantage is that it
essentially eliminates the problem of overflow, and if a sufficiently long mantissa is used,
quantization also hecomes much less of a problem. This translates into greater simplicity
in system design and implementation
‘Nowadays, digital iltering of multi-media signalsis often implemented on personal
‘computers or workstations that have very accurate floating point numerical represen-
tions and high speed arithmetic units. In such cases, the quantization issues discussed in
Sections 6.7 6:9 are generally af lltle or no concern, However. in high volume systems,
fixed point arithmetic is generally required to achieve low cost.
10 ZERO-INPUT LIMIT CYCLES IN FIXED-POINT
REALIZATIONS OF IIR DIGITAL FILTERS:
For stable IIR discrete-time systems implemented with infinite-precision arithmetic,
if the excitation becomes zero and remains zero for n greater than some value 129,
the output for a = np will decay asymptotically toward zero, For the same system,
implemented with inite-register-length arithmetic, the output may continue to oscillate
indefinitely with a periodic pattern while the input remains equal to zero. This effect is
often referred to as zera-inpur limit cycle behavior and is a consequence either of the
nonlinear quantizers in the feedback loop of the system or of overflow of additions. The
limit cycle behavior of a digital filter is complex and difficult to analyze, and we will not
allempt to treat the topic in any general sense, To illustrate the point, however, we will
give two simple examples that will show how such limit q
6.10.1 Limit Cycles Owing to Round-off and Truncation
Successive roundoff or truncation of products in an iterated difference equation can
create repeating patterns. This is illustrated in the following example.Chapter 6 Structures for Discrete-Time Systerns
Example 6.15
imit Cycle Behavior in a 1"-Order System
S% Consider the 1*Lorder system characterized by the difference equation
& ybt) = ayln — 1) + xl, tal
1m)
The signal flow graph of this system is shown in Figure 6,65%a), Let us assume
that the register length tor storing the coefficient a, the input x(n], and the flver node
, Yatiable yin ~ 1] 4 bits (ic, a sign bit to the left of the binary point and 3 bits to the
right of the binary point), Because of the fnite-length reyister, the produet ayn — T]
must be tounded o truncated t0 4 bits before being added co «in]. The flow graph
representing the actual realization based on Eq. (6.141) is shown in Figure 6.69(b).
Assuming rounding of the product the netual output $f] satisfies the nonlinear dit
ference equation
Stn = Olas in — 11) + xh (6.142)
where Q-] represents the rounding operation, Let us assume that @ = 1/2 = 06106
and that the input isx{al = (7/8)8la] = (oI 118i). Using Bq, (6.142), we see that for
40, $10} = 7/8 = 00111. To obiain S11), we muliply $10] by a, obtaining the resuk
y[0] = 0-011100, a 7-bit number that must be rounded to 4 bits. This number, 7/16,
is exactly halfway between the two 4-bit quantization levels 4/8 and 3/8. If we choose
always to round upward in such cases, then 0.01 1100 rounded to 4 bits is 0; 100 = 1/2
‘Since [1] = 0, it follows that §{1] = 0,100 = 1/2. Continuing to iterate the difference
‘equation gives §[2] = QlaS11}) = 0.010 = 1/4 and $13] = 00001 = 1/8. In both
these cases, no rounding is necessary. However, to obtain $[4]. we must round the
7-bit sumber 513] = 05090100 to 0.0L, The same result is obtained for all values of
n= 3. The output sequence for this example is shown in Figure 6.70(@). Ifa = —1/2,
‘we can carry oul the preceding computation again to demonstrate thal the output is as
shown in Figure 6.70(b). Thus, because of rounding ofthe product aa — 1], the output
reaches a constant value of 1/8 when a = 1/2 and a periodic steady-state oscillation
between 41/8 and 1/8 whena = —1/2. These are periodic outputs similar to those
that would be obtained from a I"-order pole at 2 = =I instead of at » = +1/2
alnl in
lal orl
4 ()
Figure 6.69 1*-order IIR system. (a) Ininte-precision tmear system. (b) Nan-
‘near system due to quantizationSection 6.10 Zero-Input Limit Cycles in Fixed-Point Realizations ct 1IR Digital Fiers 461
2
a
3 fa) @= 2
Sr) (a 3)
1
1
i 4
4d, ay Gy i
: 1 yy
2 Ae 123 4 5 6) ~ ”
(a
i. apap
x ei ms
°
Figure 6.70 Resparse of the 1-order system of Figure 6.69 to an impulse.
(b)a=-}
} When a = 41/2, the period of the oscillation is 1, and when @ = —1/2, the
period of oscillation is 2. Such steady-state periodic outputs are called limit eycles, and
their existence was firsl noted by Blackman (1965), who referred to the amplitude
intervals to which such limit cycles are confined as dead bunds. Tn this ease, the desd
band is -2-F «© jn] = 42°", where B = 3.
‘The foregoing example has iflustrated that a zero-input limit cycle can result from
rounding ina 1*-order DIR system. Similar results can be demonstrated for truncation,
2"4-order systems can also exhibit limit cycle behavior. In the case of parallel realizationsChapter 6 Structures for Discrete-Time Systerns
of higher-order systems, the outputs of the individual 2"-order systems are indepen:
dent when the input is zero. In this case, one or more of the 2%-order sections could
contribute a limit cycle to the output sum. In the case of cascade realizations, only the
first section has zero input; succeeding sections may exhibit their own characteristic
limit eycle behavior, or they may appear to be simply filtering the limit eyele output of
a previous section. For higher-order systems realized by other filter structures, the limit
eyele behavior becomes more complex, as does its analysis.
In addition to giving an understanding of limit cycle effects in digital filters, the
preceding results are useful when the zero-input limit cycle response of a system is the
desired output. This is the case, for example, when one is concerned with digital sine
‘wave oscillators for signal generation or for the generation of coefficients for caleulation,
of the discrete Fourier transform.
6.10.2 Limit Cycles Owing to Overflow
In addition to the classes of limit cycles discussed in the preceding section, a more severe
type of limit eycle can occur owing 1o overflow. The effect of overflow is to inserta gross
error in the Output, and in some cases the filter output thereafter oscillates between
large-amplitude limits. Such limit cycles have heen referred to as averfiow oscillation.
‘The problem of oscillations caused by overflow is discussed in detail by Ebert et al
(1969). Overflow oscillations are illustrated by the following example.
Example 6.16 Overflow Oscillations in a 2"-Order System:
%) Consider a 2%4-order system realized by the difference equation
ala] + Ola, 51n— 111+ Qlaysln — I, (6143)
where QL-] represents two's-complemeat rounding with a word fenath of 3 bits plus |
Dil for the sign, Overflow can occur with (wo's-complement addition of the rounded
proglucts. Suppose that a, = 3/4 = 001 10 and a = —3/4 = 10610, and assume tral
‘[n] Femains equal to zoro for n = 0, Furthermore, assume that §[-1] = 3/4 = Qld
and §—2] = =3/4 = 1,010, Now the output at sample n = 0s,
510] = 05110 x 0110+ 1,010 x 16010.
I we evaluate the products using (wo's-complement arithmetic, we obtain
510] = 0-100100 + 0,100100,
and if we choose to round upward when a numberis halfway between two quantization
levels, the result of two's-complement addition is
510] = 0101 + 06101 = 1,010 — 3)
Inthiscase the binary carry overflows into the sign bit, thus changing the positive sum.
into a negative number. Repeating the process ives
= 1011+ 1.011 0.110 = 3
‘The binary carry resulting from the sum ofthe sign bits is lost, amd the negative sum is
mapped into a positive number. Clearly. stn] will continue to oscillate between 13/4
- and ~3/4 until an input is applied. Thus, §{n} has entered a periodic limit cycle with 2
£2 period of Zand an amplitude of almost the full-scale omplitude of the implementation,Section 6.19 Summary 463
‘The preceding example illustrates how overflow oscillations occur. Much more
complex behavior cam be exhibited by higher-order systems, and other frequencies can
occur, Some results are available for predicting when overllow oscillations can be sup-
ported by a difference equation (see Ebert et al., 1969). Overflow oscillations ean be
avoided by using the saturation overflow charactcristic of Figure 6.45(b) (see Ebert et
al., 1969).
6.10.3 Avoiding Limit Cycles
The possible existence of a zero-input limit cycle is important in applications where a
digital filter is to be in continuous operation, since itis generally desired that the output
approach zero when the input is zero. For example, suppose that a speech signal is
sampled, filtered by a digital filter, and then converted back to an acoustic signal using
a D/A converter. In such a situation it would be very undesirable for the filter to enter
a periodic limit cycle whenever the input is zero, since the limit cycle would produce an
audible tone
‘One approach to the general problem of limit cycles is to seek structures that do
ot support limit cycle oscillations. Such structures have been derived by using state~
space representations (see Barnes and Fam, 1977; Mills, Mullis and Roberts, 1978)
and concepts analogous to passivity in analog systems (sce Ruo and Kailath, 1984;
Fettweis, 1986}. However, these structures generally require more computation than
an equivalent cascade or parallel form implementation. By adding more bits to the
computational wordlength, we can generally avoid overflow. Similarly, since round-off
Jimit cycles usually are limited to the least significant bits of the binary word, additional
bits can be used to reduce the effective amplitude of the limit cycle. Also, Claasen
et al. (1973) showed that if a double-length accumulator is used so that quantization
‘occurs after the accumulation of products, then limit cycles owing to round-off are much
less likely to occur in 2°4-order systems, Thus, the trade-off between word length and
computational algorithm complexity arises for limit cycles just as it does for coefficient
quantization and round-off noise.
Finally. itis important to point out that zero-input limit cycles due to both overflow
and round-off are a phenomenon unique to ITR systems: FIR systems cannot support
zero-input limit cycles, because they have no feedback paths. The output of an FIR
system will be zero no later than (M + 1) samples after the input goes to zero and
remains there. This is a major advantage of FIR systems in applications wherein limit
cycle oscillations cannot be tolerated.
6.11 SUMMARY
In this chapter, we have considered many aspects of the problem of implementing an,
LTT discrete-time system. The first half of the chapter was devoted to basic implemen-
tation structures. After introducing block diagram and signal flow graphs as pictorial
Tepresentations of difference equations, we discussed a number of basic structures forChapter 6 Structures for Discrete-Time Systems
IIR and PIR discrete-time systems. ‘These included the direct form J, direct form Hl, cas-
cade form, parallel form, lattice form. and transposed version of all the basic forms. We
showed that these forms are all equivalent when implemented with infinite-precision
arithmetic. However, the different structures are most significant in the context of finite-
precision implementations. Therefore, the remainder of the chapter addressed problems
associated with finite precision or quantization in fixed-point digital implementations
of the basic structures
We began the discussion of finite precision effects with a brief review of digital
number representation and an overview showing that the quantization effects that are
important in sampting (discussed in Chapter 4) are also important in representing the
coefficients of a discrete-time system and in implementing systems using finite-precision
arithmetic, We ilustrated the effect of quantization of the coefficients of a difference
‘equation through several examples, This issuc was treated independently of the effects
of finite-precision arithmetic, which we showed introduces nonlinearity into the system,
We demonstrated that in some cases this nonlinearity was responsible for limit cycles
that may persist after the input toa systern has become zero, We also showed that quan-
tization effects can be modeled in terms of independent random white-noise sourc
thal are injected internally into the flow graph. Such linesr-noise models were devel
oped for the direct form structures and for the caseade structure. In all of our discussion
of quantization effects, the underlyingtheme was the conflict between the desire forline
quantization and the need {o maintain a wide range of signal amplitudes. We saw that
in fixed-point implementations, one can be improved at the expense of the other, but to
improve one while leaving the other unalfected requires that we increase the number
of bits used to represent coefficients and signal amplitudes. This can be done either by
increasing the fixed-point word length or by adopting a floating-point representation
Our discussion of quantization effects serves two purposes, Hirst, we developed
several results that can be useful in guiding the desiga of practical implementations We
fonnd that quantization effects depend greatly on the structure used and on the specific
parameters of the system to be implemented, and even though simulation of the system
is generally necessary to evaluate its performance, many of the results discussed are
useful in making intelligent decisions in the design process. A second. equally important
purpose of this part of the chapter was to illustrate style of analysis that can be applied
in studying quantization effects in a variety of digital signal-processing algorithms. The
examples of the chapter indicate the types of assumptions and approximations that
are commonly made in studying quantization effects. In Chapter 9, we will apply the
analysis techniques developed here to the study of quantization in the computation of
the discrete Fourier transform
Basic Problems with Answers
6.1, Determine the system function of the (wo Row graphs in Figure P6.1, and show that they
have the same pales.bar 6 Problems 465
ar
'
Neewor
im
a 77
reas in
rin
le
% reos ag
NN 4
MN
etmork2
oy Figure P6.1
6.2. ‘The signal flow graph of Figure P62 represents 4 Tinear difference equation with constant
coefficients. Determine the difference equation that relates the ourput yln to the input
shy).
Figure P6.2
6.4, Figure P63 shows six systems, Determine which one of the last five, (b)-(). has the same
system function as (2). You shauld be able to eliminate some of the possibilities by inspec:
tion,“hn
co)
w)
©
‘a Figure P63Chapter 6
Problems. 467
ated vied
Figure P6.3 (continued)
64, Consider the system in Figure P6.3(d).
(a) Determine the system function relating the :-transforms of the input and output
(b) Write the difference equation that issatisfied by the inputsequence x[n)and the output
sequence yin}
6.5. An LIV system i realized by the flow graph shown in Figure Pa.5.
Figure P6.5
{a) Write the difference equation relating xn] and yl] for this flow graph.
{b) What is the system function of the system?
{e) Inthe realization 07 Figure P6.5, how many real multiplications and real additions are
required to compute each sample of the output? (Assume that x[x] is real, and ussume
that multiplication by 1 does not count in the total.)
{@) The realization of Figure P65 requires four storage registers (delay elements). Is it
possible to reduce the number of storage registers by using a different structure? If
so, draw the flow graph; if not, explain why the aumber of storage registers cannol be
reduced,Chapter 6 Structures for Discrete-Time Systems
66, Deterinine the impulse response of each of the systems in Figure P66
tel
st]
@
Figure P6.6
6.7. Let x{n] and y[n} be sequences related by the following difference equation:
1 1
yin] = 5yln = 2] = xf —2}— 5rin
Draw a direct form 1 signal flow graph for the causal LIT system corresponding to this
difference equation.Shepter 6
Problems 469
628. The signal flow graph in Figure P68 represents un LTH system. Determine a difference
‘equation that gives a relationship between the input x[n] and the output y[n] of this system,
As usual, all branches of the signal low graph have unity gain unless specifically indicated
otherwise
la}
3 Figure P6.8
63,
Figure P6.9 shows the signal Nlow graph for 4 causal discrete-time LTH system. Branches
‘without gains explicitly indicated have a gain of unity.
(a) By tracing the path of an impulse through the flowgraph, determine hil, the impulse
response at
(b) Determine the difference equation relating x{n] and y(n].
Figure P6.9
6.10. Consider the signal flow graph shown in Figure P6.10,
(a) Using the node variables indicated, write the set of difference equations represented
by this flow graph,
(b) Draw the flow graph of an equivatent system that is the cascade of two order
systems,
(@) Is the system stable? Explain
a”
1
a
win} Figure P6.10470 Chapter 6 Structures for Discrete-Time Systems
6.1L. Consider a causal LTT system with impulse response h(v] and system function
(a) Draw a direct form II Blow graph for the system.
(b)_ Draw the transposed form of the flow graph in part (a).
6.2. For the LTT system described by the Now graph in Figure P6.12, determine the difference
equation relating the input x{2] to the output yl}
Sa —\ fl
Figure P6.12
6.13. Draw the signal flow graph for the direct form | implementation of the fT system with
system function
Hw
6.14. Draw the signal flow graph for the direct form I implementation of the LTT system with
system function
GAS. Draw the signal flow graph for the transposed direct form I] implementation of the LT1 sys
tem with system function
6.16. Consider the signal flow graph shown in Figure P6.16.
(2), Draw the signal flow graph that results from applying the transposition theorem to this,
signal flow graph
0). Confirm that the transposed signal flow graph that you foundin (a) has thesame system
fanction H(z) as the original system in the figure.
st)
Figure P6.16Chapter &
Problems an
6.17, Consider the causal TI system with system function
(a) Draw the signal flow graph for the direct form implementation of this system.
(b) Draw the signal flow graph for the transposed direet form implementation of the
system
6.18. For some nonzero choices of the parameter 4, the signal flow graph in Figure P6.18 can
be replaced by a 2"4-order direct form II signal low graph implementing the same system
function. Give one such choice for a and the systern function H() that results.
Figure P6.18
6.19, Consider the causal LTT sysiem with the system function
Draw a signal fow graph that implements this system as a parallel combination of 1"order
transposed direct form II secti
6.20, Draw a signal flow graph implementing the system function
HG)
as a cascade of 2°¢-order transposed direct form I sections with real coefficients.
Basic Problems
6.21. For many applications. itis useful to have asystom that will generate a sinusoidal sequence.
One possible way to do this is with a system whose impulse response is An] =e!" Tal.
‘he real and imaginary parts of hla} are therefore hyn] = (coswynduln] and hyn} =
(sin eqa jul}, respectively.
Tn implementing a system with x complex impulse response, the real and imaginary
parisare distinguished as separate outputs. By frst writing the complex difference equation
required to produce the desired impulse response and then separating it into its real uod
imaginary parts, draw a flow graph that will implement this system. The flow graph that
‘you draw should have only real cocflicients. This implementation is sometimes called the
caupled form oscillaior, since, when the input is excited by an impulse, the outputs are
sinusoidalar
Chapter Structures for Discrete-Time Systerns
6.22, For the system function
draw the flow graphs of all possible realizations for this system as cascades of 1%-order
systems.
623, We want to implement a causal system H(<) with the pole-zero diagram shown in Fig.
ure P6.23. For all parts of this problem, cj, 22. pj, and pz are real and a gain constant that
is independent of frequency can be absorbed into a gain coefficient in the outpat branch of
each flow graph,
' Figure P6.23
(a) eww the flow graph of the direct form IT implementation. Determine an expression
for each of the branch gains in terms of the variables z,, 22. py, and po.
(b) Draw the flow graph of an implementation as a cascade of 2"4-order dicect form 1
sections. Determine an expression for each of the bramch gains in terms of the variables
24527» Ps 08 pr»
(€), Draw the flow graph of a parallel form implementation with 1*-order direct form 11
sections. Specify a system of linear equations that can be solved to express the branch
ins in terms ofthe variables zy, 22, pj.nd pa,
624, Consider a causal LTT system whose system function is
(a) Draw the signal flow graphs for implementations ofthe system in each ofthe Following
Forms
(@) Diroct form
i) Direct form 11
(iii) Cascade form using 1*- and 2"4-order direet form II sections
fv) Parallel form using 1"- and 2"4-order direct form I seetions
(8) Transposed direct form 1
(b) Write the difference equations for the low graph of part (¥) in (a) and show that this
system has the correct system functionChapter Problems ars
625. A causal LTI system is defined by the signal flow graph shown in Figure P6.25, which
represents the system as a cascade of a 2°-order system with a 15!-order system,
sal
ea ost
— 4
Figure P6.25
(a) What isthe system fanction of the overall cascade system?
4h) Is the overall system stable? Explain briefly
{¢) Is the overall system a minimam-phase system? Explain briefiy.
(a) Draw the signal Sow graph ofa transposed direct form IT implementation of this system.
6.26. A causal LATE system has sys
sm function given by the following expression:
geek
i+ {opt
(a) Is this system stable? Explain briefly
(b)_ Draw the signal flow graph of a parallel form implementation of this system,
(©) Draw the signal flow graph of a cascade form implementation of this system as &
cascade of a 1" order system and a 2"-order system. Use a transposed direct form II
implementation for the 2°4-order system.
627. An LTI system with system function
Aeam Chapter Structures for Discrete-Time Systems
(a) Fill in all the coefficients in the diagram of Figure P6.27. Is your solution unique?
(b) Define appropriate node variables in Figure P6.27, and write the set of difference
‘equations that is represented by the flow praph.
62K. (a) Determine the system function, 1/(z), from xf] to yl] for the flow graph shown in,
Figure P6.28-1 (note that the location where the diagonal lines criss-cross isnot a single
node).
z Figure P6284
(b) Draw the direct form (I and II) flow graph of systems having the system funetion H()
(©) Design A(z) such that H(z) in Figure P6.28-2 has a causal stable inverse and
H,(e!)| — |H(e)), Note: Zeto-pole cancellation is permitied.
ST ney LOL ace
Ti
Figure P6.28-2
(a) Draw the transposed direct form II flow graph for H:(2),
6.29. (a) Determine the system function H (¢> relating the input xin] to the output pln forthe
FIR lattice fitter depicted in Figure P628.
yi]
eg ge pe
vt zt a Figure P6.29
(b) Draw the lattice filler structure for the all-pole filler 1/2).Chapter Problems 475
6.40, Determine and draw the lattice filter implementation of the following causal all-pole system
function:
Ts the system stale?
GL, An IIR lattice filter is shown in Figure Pé.31
xin vay
z Figure P6.31
(a) By tracing the path of an impulse through the flowgraph, determine y{1] for input
fee} = Bf.
tb) Determine a flow graph for the corresponding inverse filter
{c) Determine the transfer function for the IIR filter in Figure P6.31,
6.32. ‘he flow graph shown in Figure P6.32 is an implementation of a causal, LT! system.
ral
Figure PB.32
(a) Draw the transpose of the signal flow graph.
(b) Foreither the original system or its transpose, determine the difference equation relat-
ing the input xr] to the output y[n]. (Note: The difference equations wil be the same
for both structures)
(6) Is the sysiem BIBO stable?
(a) Determine y12] if a) = (1/2)
ln418 Chapter 6 Structures for Discrete-Time Systems
Advanced Problems
6.33, Consider the LTT system represented by (he FIR lattice structure in Figure P6.33-1
vin
xf] Seal
Figure P6.23-1
(a) Determine the system function from the input x[n] to the output ef] (NOT yf.
(b) Let #(<) be the system function from the input 2[7] to the output yl], and let gl]
be the result of expanding the associated impulke response kl} by 2 as shown in
Figure. P6.33-2.
S0| pg
Figure P6.33-2
The impulse response ef] defines a new system with system function (42). We would
Tike to implement Giz) using an FIR lattice structure. Determine the &-parameters
necescary for an FIR lattice implementation of Gi), Note: You should think carefully
before diving into a long calculation.
6.34, Figure P6.34-1 shows an impulse response k(n, specified as
A , '
hin = | (2)"" sta, torn an integer maple of
constant in between asindieated
Alay
12
is 3
0 4 8 Figure P6.34-1
(a) Determine a choice for 4 [7] and hy|n] such that
‘e] = fy fal eal,
where hy [n] isan FIR filterand where fi] = 0 for n/4 not an integer. fs gf] an FIR
or IIR filter?Chapter6 Problems: an
{b} ‘The impulse response h[n] #8 to he used in a downsampling system as indicated in
Figure P6:34-2.
ON int Ll ye
Figure P6.34-2
Draw a flow graph implementation of the system in Figure P6.34-2 that requires the
minimum number of nonzero and nomuaity coefficient multipliers, You may tse unit
delay elements, coefficient muttiptiers, adders and compressors. (Multiplication by a
zero or a one does not require a multiplier)
(e} For your system, state how many multiplications per input sample and per output
sample are required, giving a brief explanation.
5. Consider the system shown in Figure P6.35-L
gu lr) 44 al
Figure P6.35-1
We want to implement this system using the polyphase structure shown in Figure: P6.35-
se
+4 extn]
i
vi 7
yt]
sy
4 extn]
Figure P6.35-2 Polyphase structure of the system.
For parts (a) andl (b) only, assume fifn] is defined ia Figure P6.35-3,
wow ye
Ata
va
ttle.
+s 67 8 ! lvoe WT "Figure PB.35-3
(hl) =O for all n = Dand n > 12).ame Ghapter6 Structures for Discrete-Time Systems
(@) Give the sequences ep|J, cyl Jeli], and e3[n] that result in acorrect implementation.
(b)_ We want to minimize the total number of multiplies per output sample for the imple-
mentation of the structure in Figure Pé.35-2. Using the appropriate choice of fn).
elt, eala!, and e3|n] from part (a), determine the minimum number of multiplies
per output sample for the overall system. Also, determine the minimum number of
‘multiplies per input sample for the overall system. Explain,
(6) Instead of using the sequences gol |, eye}. ealit}, and e3[n] identified in part (a), now
assume that Fy(e®) and £2(e!®) the DTFTS of eolrrl andl epln], respectively, are as
aliven in Figure P6.35-4, and E,(e!®) = Ex(e) = 0.
Pony Ble) YE sa 209
ar ar
Figure P6.35-4
‘Sketch und label A1¢e!*") from (x, 2),
6.36. Consider a general flow graph (denoted Network A) consisting of coefficient multipliers
and delay clements, as shown in Figure P6.36-1. Ifthe systems initially at rest, its behavior
is completely specified by its impulse response ln]. We wish to modify the system to create
a new flow graph (denoted Network A ;) with impulse response: hy [nl = (— L)*aln
so
Network A Figure P6.36-1
(a) Hf H(e4%} is as given in Figure P6.36.2, sketch Hf (2/4),
Me™)
2 Figure P5362
(b) Explain how to modify Network A by simple modifications ofits coefficient multipliers
andior the delay branches to form the new Network 4, whose impulse response is
ruleGhapter6 Problems 479
(c) IE Network A is as given in Figure P6.36-3, show how to modify it by simple moditica-
tions to only the coefficient multipliers so that the resulting Network A has impulse
response [1].
al)
Figure P6.36-3
6.37. The flow graph shown in igure 6.37 is noncomputable; ic. itis not possible to compute
the output using the difference equations represented by the flow graph because it contains
a closed loop having no delay elements
Figure P6.37
(a) Write the difference equations for the system of Figure P6.37, and from them, find the
system function of the flow graph.
(b) From the system function, obtain a flow graph that
‘computable,
6.38. ‘The impulse response of an LTT system is
a, Went,
ninr= (5 Siac
(a) Draw the flow graph of a direct form nonrecursive. implementation of the system.
(b) Show that the corresponding system function can he expressed as
a8
A) = lel ah
(€) Draw the flow graph of an. implementation of (2), as expressed in part (b), eorte-
sponding toacascade of an FIR system (numerator) with an IIR system (denominator).
(@) Is the implementation in part (c) recursive or nonrecursive? Is the overall system FIR
or IR?Chapter 6 Structures for Discrete-Time Systerns
(e) Which implementation of the system requires
(i) the most storage (delay elements)?
(i) the most arithmetic (mukiplcations and additions per ouspat sample)?
6.39, Consider an FIR system whose impulse response is
iin) = [ HOTEOARH/AG ng). On 14
o otherwise
This ystem is un example ofa lass of filer: known a Frequeney-sampling filters, Prob-
lem 6351 discusses these filters in detall. Ln this problem, we consider just one specific case.
5/2
(a) Sketch the impulse response of the system for the cases ny = Oand my =
(b) Show that the system function of the system can be expressed as
Jen s2xnadl5 Leitenos 5
H@=d-2 8)
sintiolS/2) 1 sinflw —277/18)15/2]
singo/3) * 2 sinllo— 2a/15)/21
1 sino + 29 /15)15/2]
2 sin[(o + 29/15/21
‘Use this expression to sketch the magnitude of the frequency response ofthe system
for ny = 15/2. Obtain a similar expression for rig = 0. Sketeh the magnitude response
{for np =0. For which choices of ng does the system have a generalized linear phase?
(a) Drawa signal flow graph of an implementation of the system asa cascade of an FIR sys-
tem whose system function is 2! and a parallelcombinationof a1*t-and 2"4-order
AIR system.
6.0. Consider the discrete-time system depicted in Figure P6.40-1.
Figure P6.40-4
(a), Write the set of difference equations represented by the flow graph of Figure Pé.ath!
(b) Determine the system function #7; (:) = ¥(2)/ X (2) of the system in Figure Pé6.40-1
and determine the magnitudes and angles of the poles of Fz) asa function of r for
-leral
(©) Figure P6.40-2 shows a different flow graph obtained from the flow graph of Fig:
ure P640-I by moving the delay elements to the opposite top branch. How is the
system function F(z) = ¥p(z)/ X (2) related to H(z)?
Figure Pé.40-2Chapter 6
Problems 481
GAl. ‘The three flow graphs in Figure P6.41 are all equivalent implementations of the same two-
input, two-output LTE system.
‘ 4
Saal ier aT xe aot
ln) la
Nework A Network
« 6)
©
Figure Pé.44
(a) Write the difference equations for Network A.
(b) Determine values of a, b,c, and d for Network 8 in terms of rin Network A such that
the two systems are equivalent
(c) Determine values of e and f for Network C in terms of r in Network 4 such that the
‘ovo systerns are equivalent,
(4) Why might Network B or C be preferred over Network A? What possible advantage
could Network 4 have over Network B or C?
6.42. Consider an all-pass system with system function
ut
1-054
‘A flow graph for un implementation of this systema is shown in Figure P6.42,
H(2) = 054
Figure P6.42
(a) Determine the coefficients fc, and d such that the flow graph in Figure P6.42 is a
direct eealization of H().
(b) Ina practical implementation of the network in Figure P6.42, the coefficients h, ¢, and
might be quantized by rounding the exact walue to the nearest tenth (¢., 0.54 will
be rounded 10 (h5 and. 1/054 = 1.8518... Will be rounded to 1,9), Would the resulting
system stil be an all-pass system:Chapter Structures for Discrete-Time Systems
(©) Show that the difference equation relating the input and output of the all-pass system
‘with system function #2) can be expressed as
yl] = O.54¢[a 1) ~ tad + she = 1.
Draw the flow graph of « network that implements this difference oquation with to
delay elements, hut only one multiplication by a constant other than +1
(@) With quantized coetficients, would the flow graph of part (c) be an all-pess system?
The primary disadvantage of the implementation in part {c) compared with the im
plementation in part (a) is that it requires two delay elements. However, for higherorder
systems, itis necessary to implement a cascade of all-pass systerns. Ror W allepass sections
in cascade, itis possible to use all-pass sections in the form determined in part (c) while
requiring Only (.V + 1) delay elements, This is accomplished by sharing a delay element
between sections
(e) Consider the all-pass system with system function
(8) (=e)
‘Draw the flow graph of a “cascade” realization composed of two sections of the form
‘oblained in part {c) with one delay clement shared between the sections. The resulting
ow graph should have only three delay elements,
(0) With quantized coefficients « and 6, would the flow graph in part {e) be an all-pass
system?
643. All branches of the signal flow graphs in this problem have unity gain unless specifically
indicated otherwise
ia] Figure P6.43-1
(a) “The signa! flow graph of Systern A, shown in Figure P6.43-1. represents a causal LTI sys
‘tem. Isil possible to implement the same input-output relationship using fewer delays?
{itis possible, whatis the minimum number of delays required to implement an equiv.
alent system? If it is not possible, explain why not.
(h) Does the System B shown in Figure P6.43-2 represent the same input-outpul relation:
ship as System A in Figure P6.43-1? Explain clearly,
va)
1 Figure P5.43-2Chapter 6
Problems
68,
483
(a) Draw the direct form | signal flow graph for the system. How many delays and multi
pliers do you need’? (Do not count multiplying by +1.)
(b) Draw a signal flow graph for the systers that usesene multiplier. Minimize the number
of delays
(e) Now consider another all-pass system whose system function is
Determine and draw a signal flow graph for the system with wo multipliers and three:
delays
With infinite-precision arithmetic, the flow graphs shown in Figure P6.45 have the same
system function, but with quantizcd fixed-point arithmetic they behave differently, Assume
that w and bare eeal mumbers and