KEMBAR78
Iterative Methods For Solving Linear Systems | PDF
0% found this document useful (0 votes)
1K views237 pages

Iterative Methods For Solving Linear Systems

Iterative Methods for Solving Linear Systems (Anne Greenbaum)

Uploaded by

Bill W
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views237 pages

Iterative Methods For Solving Linear Systems

Iterative Methods for Solving Linear Systems (Anne Greenbaum)

Uploaded by

Bill W
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 237

128 Initial-Boundary Value Problems and the Navier-Stokes Equations

and therefore we obtain the basic energy estimate from

Bounds for \\D+ dqv/dtq\\h independent of h can be obtained along the same
lines as the a priori estimates for u in the previous section. Thus we can proceed
as in Section 3.2: Fourier interpolate v = vh with respect to x and send h —> 0.
We obtain existence in some interval 0 < t < T, T depending only on

4.1.4. Existence via Iteration


To prove existence via the iteration (4.1.2), we estimate the functions un inde-
pendently of n. Again, our estimates will not take advantage of the diffusion
term ewlx, and so will not depend on t > 0. All of the arguments are without
reference to the maximum principle, thus generalize from scalar equations to
systems.
To show uniform smoothness of the sequence u n , we start with the following
analogue of Lemma 4.1.3.

Lemma 4.1.5. For a suitable time T\ = T\ (H/H/^) > 0 it holds that

Proof. We use induction on n; the case n = 0 is trivial. As abbreviations, let


w = un, v = un+l, thus

Note that

For the derivative vx = v\ it holds that


Iterative Methods for Solving Linear Systems
FRONTIERS IN APPLIED
MATHEMATICS
The SIAM series on Frontiers in Applied Mathematics publishes monographs dealing
with creative work in a substantive field involving applied mathematics or scientific
computation. All works focus on emerging or rapidly developing research areas that
report on new techniques to solve mainstream problems in science or engineering.

The goal of the series is to promote, through short, inexpensive, expertly written
monographs, cutting edge research poised to have a substantial impact on the solutions
of problems that advance science and technology.The volumes encompass a broad
spectrum of topics important to the applied mathematical areas of education,
government, and industry.

EDITORIAL BOARD
H.T. Banks, Editor-in-Chief, North Carolina State University

Richard Albanese, U.S. Air Force Research Laboratory, Brooks AFB

Carlos Castillo Chavez, Cornell University

Doina Cioranescu, Universite Pierre et Marie Curie (Paris VI)

Pat Hagan, Banque Paribas, New York

Matthias Heinkenschloss, Rice University

Belinda King,Virginia Polytechnic Institute and State University

Jeffrey Sachs, Independent Technical Consultants

Ralph Smith, North Carolina State University

AnnaTsao, Institute for Defense Analyses, Center for Computing Sciences


BOOKS PUBLISHED IN FRONTIERS
IN A P P L I E D MATHEMATICS

Lewis, F. L; Campos, J.; and Selmic, R., Neuro-Fuzzy Control of Industrial Systems with Actuator
Nonlinearities

Bao, Gang; Cowsar, Lawrence; and Masters,Wen, editors, Mathematical Modeling in Optical
Science

Banks, H.T.; Buksas, M. W.; and Lin.T., E/ectromagnetic Material Interrogation Using Conductive
Interfaces and Acoustic Wavefronts

Oostveen.Job, Strongly Stabilizable Distributed Parameter Systems

Griewank, Andreas, Evaluating Derivatives: Principles and Techniques ofA/gorithm/c


Differentiation

Kelley, C.T., Iterative Methods for Optimization

Greenbaum.Anne, Iterative Methods for Solving Linear Systems

Kelley, C.T., Iterative Methods for Linear and Nonlinear Equations

Bank, Randolph E., PLTMG:A Software Package for Solving Elliptic Partial Differential Equations.
Users'Guide 7.0

More, Jorge J. and Wright, Stephen J., Optimization Software Guide

Rude, Ulrich, Mathematical and Computational Techniques for Multilevel Adaptive Methods

Cook, L. Pamela, Transonic Aerodynamics: Problems in Asymptotic Theory

Banks, H.T., Control and Estimation in Distributed Parameter Systems

Van Loan, Charles, Computational Frameworks for the Fast Fourier Transform

Van Huffel, Sabine andVandewalle.Joos, TheTotal Least Squares Problem:Computational


Aspects and Analysis

Castillo, Jose E., Mathematical Aspects of Numerical Grid Generation

Bank, R. E., PLTMG: A Software Package for Solving Elliptic Partial Differential Equations.
Users' Guide 6.0

McCormick, Stephen F., Multilevel Adaptive Methods for Partial Differential Equations

Grossman, Robert, Symbo//c Computation: Applications to Scientific Computing

Coleman.Thomas F. and Van Loan, Charles, Handbook for Matrix Computations

McCormick, Stephen F., Multigrid Methods

Buckmaster, John D., The Mathematics of Combustion

Ewing, Richard E., The Mathematics of Reservoir Simulation


This page intentionally left blank
Iterative Methods for Solving Linear Systems

Anne Greenbaum
University of Washington
Seattle, Washington

Society for Industrial and Applied Mathematics


Philadelphia
Copyrigh 1997 by Society for Industrial and Applied Mathematics.

109876543

All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the publisher.
For information, write to the Society for Industrial and Applied Mathematics, 3600 University City
Science Center, Philadelphia, PA 19104-2688.

Library of Congress Cataloging-in-Publication Data

Greenbaum, Anne.
Iterative methods for solving linear systems / Anne Greenbaum.
p. cm. -- (Frontiers in applied mathematics ; 17)
Includes bibliographical references and index.
ISBN 0-89871-396-X (pbk.)
1. Iterative methods (Mathematics) 2. Equations, Simultaneous
-Numerical solutions. I. Title. II. Series.
QA297.8.G74 1997
519.4-dc21 97-23271

Exercise 3.2 is reprinted with permission from K.-C. Toh, GRMES vs. ideal GMRES, SIAM
Journal on Matrix Analysis and Applications, 18 (1994), pp. 30-36. Copyright 1997 by the Society
for Industrial and Applied Mathematics. All rights reserved.

Exercise 5.4 is reprinted with permission from N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen,
How fast are nonsymmetric iterations?, SIAM Journal on Matrix Analysis and Applications, 13
(1994), pp. 778-795. Copyright 1992 by the Society for Industrial and Applied Mathematics. All
rights reserved.

is a registered trademark.
Contents

List of Algorithms xi

Preface xiii

CHAPTER 1. Introduction 1
1.1 Brief Overview of the State of the Art 3
1.1.1 Hermitian Matrices 3
1.1.2 Non-Hermitian Matrices 5
1.1.3 Preconditioners 6
1.2 Notation 6
1.3 Review of Relevant Linear Algebra 7
1.3.1 Vector Norms and Inner Products 7
1.3.2 Orthogonality 8
1.3.3 Matrix Norms 9
1.3.4 The Spectral Radius 11
1.3.5 Canonical Forms and Decompositions 13
1.3.6 Eigenvalues and the Field of Values 16

I Krylov Subspace Approximations 23

CHAPTER 2. Some Iteration Methods 25


2.1 Simple Iteration 25
2.2 Orthomin(l) and Steepest Descent 29
2.3 Orthomin(2) and CG 33
2.4 Orthodir, MINRES, and GMRES 37
2.5 Derivation of MINRES and CG from the Lanczos Algorithm . . 41

CHAPTER 3. Error Bounds for CG, MINRES, and GMRES 49


3.1 Hermitian Problems—CG and MINRES 49
3.2 Non-Hermitian Problems—GMRES 54
vii
viii Contents

CHAPTER 4. Effects of Finite Precision Arithmetic 61


4.1 Some Numerical Examples 62
4.2 The Lanczos Algorithm 63
4.3 A Hypothetical MINRES/CG Implementation 64
4.4 A Matrix Completion Problem 66
4.4.1 Paige's Theorem 67
4.4.2 A Different Matrix Completion 68
4.5 Orthogonal Polynomials 71

CHAPTER 5. BiCG and Related Methods 77


5.1 The Two-Sided Lanczos Algorithm 77
5.2 The Biconjugate Gradient Algorithm 79
5.3 The Quasi-Minimal Residual Algorithm 80
5.4 Relation Between BiCG and QMR 84
5.5 The Conjugate Gradient Squared Algorithm 88
5.6 The BiCGSTAB Algorithm 90
5.7 Which Method Should I Use? 92

CHAPTER 6. Is There a Short Recurrence for a Near-Optimal


Approximation? 97
6.1 The Faber and Manteuffel Result 97
6.2 Implications 102

CHAPTER 7. Miscellaneous Issues 105


7.1 Symmetrizing the Problem 105
7.2 Error Estimation and Stopping Criteria 107
7.3 Attainable Accuracy 109
7.4 Multiple Right-Hand Sides and Block Methods 113
7.5 Computer Implementation 115

II Preconditioners 117
CHAPTER 8. Overview and Preconditioned Algorithms 119

CHAPTER 9. Two Example Problems 125


9.1 The Diffusion Equation 125
9.1.1 Poisson's Equation 129
9.2 The Transport Equation 134

CHAPTER 10. Comparison of Preconditioners 147


10.1 Jacobi, Gauss-Seidel, SOR 147
10.1.1 Analysis of SOR 149
10.2 The Perron-Frobenius Theorem 156
10.3 Comparison of Regular Splittings 160
10.4 Regular Splittings Used with the CG Algorithm 163
Contents ix

10.5 Optimal Diagonal and Block Diagonal Preconditioners 165

CHAPTER 11. Incomplete Decompositions 171


11.1 Incomplete Cholesky Decomposition 171
11.2 Modified Incomplete Cholesky Decomposition 175

CHAPTER 12. Multigrid and Domain Decomposition Meth-


ods 183
12.1 Multigrid Methods 183
12.1.1 Aggregation Methods 184
12.1.2 Analysis of a Two-Grid Method for the Model Problem. 187
12.1.3 Extension to More General Finite Element Equations. . 193
12.1.4 Multigrid Methods 193
12.1.5 Multigrid as a Preconditioner for Krylov Subspace Meth-
ods . 197
12.2 Basic Ideas of Domain Decomposition Methods 197
12.2.1 Alternating Schwarz Method 198
12.2.2 Many Subdomains and the Use of Coarse Grids 201
12.2.3 Nonoverlapping Subdomains 203

References 205

Index 213
This page intentionally left blank
List of Algorithms

Algorithm 1. Simple Iteration 26


Algorithm 2. Conjugate Gradient Method (CG) 35
Algorithm 3. Generalized Minimal Residual Algorithm (GMRES) . 41
Algorithm 4. Minimal Residual Algorithm (MINRES) 44
Algorithm 5. Quasi-Minimal Residual Method (QMR) 83
Algorithm 6. BiCGSTAB PI
Algorithm 7. CG for the Normal Equations (CGNR and CGNE) . . 105
Algorithm 8. Block Conjugate Gradient Method (Block CG) . . . . 114
Algorithm 2P. Preconditioned Conjugate Gradient Method (PCG). . 121
Algorithm 4P. Preconditioned Minimal Residual Algorithm
fPMINRES) 122

XI
This page intentionally left blank
Preface

In recent years much research has focused on the efficient solution of large
sparse or structured linear systems using iterative methods. A language full
of acronyms for a thousand different algorithms has developed, and it is often
difficult for the nonspecialist (or sometimes even the specialist) to identify the
basic principles involved. With this book, I hope to discuss a few of the most
useful algorithms and the mathematical principles behind their derivation and
analysis. The book does not constitute a complete survey. Instead I have
tried to include the most useful algorithms from a practical point of view and
the most interesting analysis from both a practical and mathematical point of
view.
The material should be accessible to anyone with graduate-level knowledge
of linear algebra and some experience with numerical computing. The relevant
linear algebra concepts are reviewed in a separate section and are restated
as they are used, but it is expected that the reader will already be familiar
with most of this material. In particular, it may be appropriate to review
the QR decomposition using the modified Gram-Schmidt algorithm or Givens
rotations, since these form the basis for a number of algorithms described here.
Part I of the book, entitled Krylov Subspace Approximations, deals with
general linear systems, although it is noted that the methods described are
most often useful for very large sparse or structured matrices, for which
direct methods are too costly in terms of computer time and/or storage. No
specific applications are mentioned there. Part II of the book deals with
Preconditioned, and here applications must be described in order to define and
analyze some of the most efficient preconditioners, e.g., multigrid methods. It
is assumed that the reader is acquainted with the concept of finite difference
approximations, but no detailed knowledge of finite difference or finite element
methods is assumed. This means that the analysis of preconditioners must
generally be limited to model problems, but, in most cases, the proof techniques
carry over easily to more general equations. It is appropriate to separate the
study of iterative methods into these two parts because, as the reader will
see, the tools of analysis for Krylov space methods and for preconditioners are
really quite different. The field of preconditioners is a much broader one, since

xin
xiv Preface

the derivation of the preconditions can rely on knowledge of an underlying


problem from which the linear system arose.
This book arose out of a one-semester graduate seminar in Iterative
Methods for Solving Linear Systems that I taught at Cornell University during-
the fall of 1994. When iterative methods are covered as part of a broader
course on numerical linear algebra or numerical solution of partial differential
equations, I usually cover the overview in section 1.1, sections 2.1-2.4 and 3.1,
and some material from Chapter 5 and from Part II on preconditioners.
The book has a number of features that may be different from other books
on this subject. I hope that these may attract the interest of graduate students
(since a number of interesting open problems are discussed), of mathematicians
from other fields (since I have attempted to relate the problems that arise in
analyzing iterative methods to those that arise in other areas of mathematics),
and also of specialists in this field. These features include:

• A brief overview of the state of the art in section 1.1. This gives the
reader an understanding of what has been accomplished and what open
problems remain in this field, without going into the details of any
particular algorithm.
• Analysis of the effect of rounding errors on the convergence rate of the
conjugate gradient method in Chapter 4 and discussion of how this
problem relates to some other areas of mathematics. In particular, the
analysis is presented as a matrix completion result or as a result about
orthogonal polynomials.
• Discussion of open problems involving error bounds for GMRES in
section 3.2, along with exercises in which some recently proved results
are derived (with many hints included).
• Discussion of the transport equation as an example problem in section
9.2. This important equation has received far less attention from nu-
merical analysts than the more commonly studied diffusion equation
of section 9.1, yet it serves to illustrate many of the principles of non-
Hermitian matrix iterations.
• Inclusion of multigrid methods in the part of the book on preconditioners
(Chapter 12). Multigrid methods have proved extremely effective for
solving the linear systems that arise from differential equations, and they
should not be omitted from a book on iterative methods. Other recent
books on iterative methods have also included this topic; see, e.g., [77].
• A small set of recommended algorithms and implementations. These are
enclosed in boxes throughout the text.

This last item should prove helpful to those interested in solving particular
problems as well as those more interested in general properties of iterative
Preface xv

methods. Most of these algorithms have been implemented in the Templates for
the Solution of Linear Systems: Building Blocks for Iterative Methods [10], and
the reader is encouraged to experiment with these or other iterative routines for
solving" linear systems. This book could serve as a supplement to the Templates
documentation, providing a deeper look at the theory behind these algorithms.
I would like to thank the graduate students and faculty at Cornell
University who attended my seminar on iterative methods during the fall of
1994 for their many helpful questions and comments. I also wish to thank a
number of people who read through earlier drafts or sections of this manuscript
and made important suggestions for improvement. This list includes Michele
Benzi, Jim Ferguson, Martin Gutknecht, Paul Holloway, Zdenek Strakos, and
Nick Trefethen.
Finally, I wish to thank the Courant Institute for providing me the
opportunity for many years of uninterrupted research, without which this book
might not have developed. I look forward to further work at the University of
Washington, where I have recently joined the Mathematics Department.

Anne Greenbaum
Seattle
This page intentionally left blank
Chapter 1

Introduction

The subject of this book is what at first appears to be a very simple problem—
how to solve the system of linear equations Ax = 6, where A is an n-by-n
nonsingular matrix and b is a given n-vector. One well-known method is
Gaussian elimination. In general, this requires storage of all n2 entries of
the matrix A as well as approximately 2n3/3 arithmetic operations (additions,
subtractions, multiplications, and divisions). Matrices that arise in practice,
however, often have special properties that only partially can be exploited by
Gaussian elimination. For example, the matrices that arise from differencing
partial differential equations are often sparse, having only a few nonzeros per
row. If the (i, j)-entry of matrix A is zero whenever \i — j\ > m, then a banded
version of Gaussian elimination can solve the linear system by storing only the
approximately 2mn entries inside the band (i.e., those with i — j\ < ra) and
performing about 2m2n operations. The algorithm cannot take advantage of
any zeros inside the band, however, as these fill in with nonzeros during the
process of elimination.
In contrast, sparseness and other special properties of matrices can often be
used to great advantage in matrix-vector multiplication. If a matrix has just
a few nonzeros per row, then the number of operations required to compute
the product of that matrix with a given vector is just a few n, instead of the
2n2 operations required for a general dense matrix-vector multiplication. The
storage required is only that for the nonzeros of the matrix, and, if these
are sufficiently simple to compute, even this can be avoided. For certain
special dense matrices, a matrix-vector product can also be computed with
just O(ri) operations. For example, a Cauchy matrix is one whose (^,j)-entry
is l/(zi — Zj) for i ^ j, where z\,...,zn are some complex numbers. The
product of this matrix with a given vector can be computed in time O(n)
using the fast multipole method [73], and the actual matrix entries need never
be computed or stored. This leads one to ask whether the system of linear
equations Ax — b can be solved (or an acceptably good approximate solution
obtained) using matrix-vector multiplications. If this can be accomplished
with a moderate number of matrix-vector multiplications and little additional
work, then the iterative procedure that does this may far outperform Gaussian

I
2 Iterative Methods for Solving Linear Systems

elimination in terms of both work and storage.


One might have an initial guess for the solution, but if this is not the case,
then the only vector associated with the problem is the right-hand side vector
6. Without computing any matrix-vector products, it seems natural to take
some multiple of b as the first approximation to the solution:

One then computes the product Ab and takes the next approximation to be
some linear combination of b and Ab:

This process continues so that the approximation at step k satisfies

The space represented on the right in (1.1) is called a Krylov subspace for the
matrix A and initial vector b.
Given that x^ is to be taken from the Krylov space in (1.1), one must ask
the following two questions:
(i) How good an approximate solution is contained in the space (1.1)?
(ii) How can a good (optimal) approximation from this space be computed
with a moderate amount of work and storage?
These questions are the subject of Part I of this book.
If it turns out that the space (1.1) does not contain a good approximate
solution for any moderate size k or if such an approximate solution cannot be
computed easily, then one might consider modifying the original problem to
obtain a better Krylov subspace. For example, one might use a preconditioner
M and effectively solve the modified problem

by generating approximate solutions x\, #2, • • • satisfying

At each step of the preconditioned algorithm, it is necessary to compute the


product of M~l with a vector or, equivalently, to solve a linear system with
coefficient matrix M, so M should be chosen so that such linear systems are
much easier to solve than the original problem. The subject of finding good
preconditioned is a very broad one on which much research has focused in
recent years, most of it designed for specific classes of problems (e.g., linear
systems arising from finite element or finite difference approximations for
elliptic partial differential equations). Part II of this book deals with the topic
of preconditioners.
Introduction 3

1.1. Brief Overview of the State of the Art.


In dealing with questions (i) and (ii), one must consider two types of matrices—
Hermitian and non-Hermitian. These questions are essentially solved for the
Hermitian case but remain wide open in the case of non-Hermitian matrices.
A caveat here is that we are now referring to the preconditioned matrix. If
A is a Hermitian matrix and the preconditioner M is Hermitian and positive
definite, then instead of using left preconditioning as described above one can
(implicitly) solve the modified linear system

where M = LLH and the superscript H denotes the complex conjugate


transpose (L^ = Lji). Of course, as before, one does not actually form the
matrix L~1AL~H, but approximations to the solution y are considered to
come from the Krylov space based on L~1AL~H and L~lb, so the computed
approximations to x come from the space in (1.2). If the preconditioner M
is indefinite, then the preconditioned problem cannot be cast in the form
(1.3) and treated as a Hermitian problem. In this case, methods for non-
Hermitian problems may be needed. The subject of special methods for
Hermitian problems with Hermitian indefinite preconditioners is an area of
current research.
Throughout the remainder of this section, we will let A and 6 denote the
already preconditioned matrix and right-hand side. The matrix A is Hermitian
if the original problem is Hermitian and the preconditioner is Hermitian and
positive definite; otherwise, it is non-Hermitian.

1.1.1. Hermitian Matrices. For real symmetric or complex Hermitian


matrices A, there is a known short recurrence for finding the "optimal" ap-
proximation of the form (1.1), if "optimal" is taken to mean the approximation
whose residual, b — Axk, has the smallest Euclidean norm. An algorithm that
generates this optimal approximation is called MINRES (minimal residual)
[111]. If A is also positive definite, then one might instead minimize the A-norm
of the error, \\ek\\A = (A~lb — Xk,b — Axfc) 1 / 2 . The conjugate gradient (CG)
algorithm [79] generates this approximation. For each of these algorithms, the
work required at each iteration is the computation of one matrix-vector prod-
uct (which is always required to generate the next vector in the Krylov space)
plus a few vector inner products, and only a few vectors need be stored. Since
these methods find the "optimal" approximation with little extra work and
storage beyond that required for the matrix-vector multiplication, they are
almost always the methods of choice for Hermitian problems. (Of course, one
can never really make such a blanket statement about numerical methods. On
some parallel machines, for instance, inner products are very expensive, and
methods that avoid inner products, even if they generate nonoptimal approx-
imations, may be preferred.)
Additionally, we can describe precisely how good these optimal approxi-
4 Iterative Methods for Solving Linear Systems

mations are (for the worst possible right-hand side vector 6) in terms of the
eigenvalues of the matrix. Consider the 2-norm of the residual in the MINRES
algorithm. It follows from (1.1) that the residual Tk = b — Axk can be written
in the form

where Pk is a certain fcth-degree polynomial with value 1 at the origin, and,


for any other such polynomial Pk, we have

Writing the eigendecomposition of A as A = QAQH, where A =


diag(Ai,..., A n ) is a diagonal matrix of eigenvalues and Q is a unitary ma-
trix whose columns are eigenvectors, expression (1.4) implies that

and since this holds for any fcth-degree polynomial Pk with Pfe(0) = 1, we have

It turns out that the bound (1.5) on the size of the MINRES residual at step
k is sharp—that is, for each k there is a vector b for which this bound will be
attained [63, 68, 85]. Thus the question of the size of the MINRES residual
at step k is reduced to a problem in approximation theory—how well can one
approximate zero on the set of eigenvalues of A using a fcth-degree polynomial
with value 1 at the origin? One can answer this question precisely with a
complicated expression involving all of the eigenvalues of A, or one can give
simple bounds in terms of just a few of the eigenvalues of A. The important
point is that the norm of the residual (for the worst-case right-hand side vector)
is completely determined by the eigenvalues of the matrix, and we have at least
intuitive ideas of what constitutes good and bad eigenvalue distributions. The
same reasoning shows that the .A-norm of the error e^ = A~lb — Xk in the CG
algorithm satisfies

and, again, this bound is sharp.


Thus, for Hermitian problems, we not only have good algorithms for finding
the optimal approximation from the Krylov space (1.1), but we can also say
just how good that approximation will be, based on simple properties of the
coefficient matrix (i.e., the eigenvalues). It is therefore fair to say that the
iterative solution of Hermitian linear systems is well understood—except for
one thing. All of the above discussion assumes exact arithmetic. It is well
known that in finite precision arithmetic the CG and MINRES algorithms do
not find the optimal approximation from the Krylov space (1.1) or, necessarily,
Introduction 5

anything close to it! In fact, the CG algorithm originally lost favor partly
because it did not behave the way exact arithmetic theory predicted [43]. More
recent work [65, 71, 35, 34] has gone a long way toward explaining the behavior
of the MINRES and CG algorithms in finite precision arithmetic, although
open problems remain. This work is discussed in Chapter 4.

1.1.2. Non-Hermitian Matrices. While the methods of choice and con-


vergence analysis for Hermitian problems are well understood in exact arith-
metic and the results of Chapter 4 go a long way towards explaining the be-
havior of these methods in finite precision arithmetic, the state-of-the-art for
non-Hermitian problems is not nearly so well developed. One difficulty is that
no method is known for finding the optimal approximation from the space
(1.1) while performing only O(n) operations per iteration in addition to the
matrix-vector multiplication. In fact, a theorem due to Faber and Manteuffel
[45] shows that for most non-Hermitian matrices A there is no short recurrence
for the optimal approximation from the Krylov space (1.1). To fully under-
stand this result, one must consider the statement and hypotheses carefully,
and we give more of these details in Chapter 6. Still, the current options for
non-Hermitian problems are either to perform extra work (O(nk) operations
at step k) and use extra storage (O(nk) words to perform k iterations) to find
the optimal approximation from the Krylov space or to settle for a nonoptimal
approximation. The (full) GMRES (generalized minimal residual) algorithm
[119] (and other mathematically equivalent algorithms) finds the approxima-
tion of the form (1.1) for which the 2-norm of the residual is minimal at the cost
of this extra work and storage, while other non-Hermitian iterative methods
(e.g., BiCG [51], CGS [124], QMR [54], BiCGSTAB [134], restarted GMRES
[119], hybrid GMRES [103], etc.) generate nonoptimal approximations.
Even if one does generate the optimal approximation of the form (1.1),
we are still unable to answer question (i). That is, there is no known way to
describe how good this optimal approximation will be (for the worst-case right-
hand side vector) in terms of simple properties of the coefficient matrix. One
might try an approach based on eigenvalues, as was done for the Hermitian
case. Assume that A is diagonalizable and write an eigendecomposition of A
as A = VAV"1, where A is a diagonal matrix of eigenvalues and the columns
of V are eigenvectors. Then it follows from (1.4) that the GMRES residual at
step k satisfies

where n(V) = \\V\\ • \\V~l\\ is the condition number of the eigenvector matrix.
The scaling of the columns of V can be chosen to minimize this condition
number. When A is a normal matrix, so that V can be taken to have condition
number one, it turns out that the bound (1.7) is sharp, just as for the Hermitian
case. Thus for normal matrices, the analysis of GMRES again reduces to a
question of polynomial approximation—how well can one approximate zero on
6 Iterative Methods for Solving Linear Systems

the set of (complex) eigenvalues using a fcth-degree polynomial with value 1 at


the origin? When A is nonnormal, however, the bound (1.7) may not be sharp.
If the condition number of V is huge, the right-hand side of (1.7) may also be
large, but this does not necessarily imply that GMRES converges poorly. It
may simply mean that the bound (1.7) is a large overestimate of the actual
GMRES residual norm. An interesting open question is to determine when
an ill-conditioned eigenvector matrix implies poor convergence for GMRES
and when it simply means that the bound (1.7) is a large overestimate. If
eigenvalues are not the key, then one would like to be able to describe the
behavior of GMRES in terms of some other characteristic properties of the
matrix A. Some ideas along these lines are discussed in section 3.2.
Finally, since the full GMRES algorithm may be impractical if a fairly large
number of iterations are required, one would like to have theorems relating the
convergence of some nonoptimal methods (that do not require extra work and
storage) to that of GMRES. Unfortunately, no fully satisfactory theorems of
this kind are currently available, and this important open problem is discussed
in Chapter 6.

1.1.3. Preconditioners. The tools used in the derivation of precondition-


ers are much more diverse than those applied to the study of iteration methods.
There are some general results concerning comparison of preconditioners and
optimality of preconditioners of certain forms (e.g., block-diagonal), and these
are described in Chapter 10. Many of the most successful preconditioners,
however, have been derived for special problem classes, where the origin of the
problem suggests a particular type of preconditioner. Multigrid and domain
decomposition methods are examples of this type of preconditioner and are dis-
cussed in Chapter 12. Still other preconditioners are designed for very specific
physical problems, such as the transport equation. Since one cannot assume
familiarity with every scientific application, a complete survey is impossible.
Chapter 9 contains two example problems, but the problem of generalizing
application-specific preconditioners to a broader setting remains an area of
active research.

1.2. Notation.
We assume complex matrices and vectors throughout this book. The results
for real problems are almost always the same, and we point out any differences
that might be encountered. The symbol i is used for \/—I, and a superscript
H
denotes the Hermitian transpose (A^ = Aji, where the overbar denotes the
complex conjugate). The symbol || • || will always denote the 2-norm for vectors
and the induced spectral norm for matrices. An arbitrary norm will be denoted
HI-HI-
The linear system (or sometimes the preconditioned linear system) under
consideration is denoted Ax = 6, where A is an n-by-n nonsingular matrix and
b is a given n-vector. If x^ is an approximate solution then the residual b — Ax^
Introduction 7

is denoted as r^ and the error A~lb — Xk as e^. The symbol ^ denotes the jth
unit vector, i.e., the vector whose jth entry is 1 and whose other entries are 0,
with the size of the vector being determined by the context.
A number of algorithms are considered, and these are first stated in the
form most suitable for presentation. This does not always correspond to
the best computational implementation. Algorithms enclosed in boxes are
the recommended computational procedures, although details of the actual
implementation (such as how to carry out a matrix-vector multiplication or
how to solve a linear system with the preconditioner as coefficient matrix)
are not included. Most of these algorithms are implemented in the Templates
[10], with MATLAB, Fortran, and C versions. To see what is available in this
package, type

mail netlib@ornl.gov
send index for templates

or explore the website: http://www.netlib.org/templates. Then, to obtain


a specific version, such as the MATLAB version of the Templates, type

mail netlibQornl.gov
send mltemplates.shar from templates

or download the appropriate file from the web. The reader is encouraged to
experiment with the iterative methods described in this book, either through
use of the Templates or another software package or through coding the
algorithms directly.

1.3. Review of Relevant Linear Algebra.


1.3.1. Vector Norms and Inner Products. We assume that the reader
is already familiar with vector norms. Examples of vector norms are

• the Euclidean norm or 2-norm,


• the 1-norm,
• the oo-norm,
From here on we will denote the 2-norm by, simply, || • ||. If ||| • ||| is a vector
norm and G is a nonsingular n-by-n matrix, then ||H||<yH<7 = l||Gv||| is also a
vector norm. (This is sometimes denoted |||V|||G instead of ||H||<3 ff G' but we
will use the latter notation and will refer to || • \\QHG as the GHG-ruyrm in order
to be consistent with standard notation used in describing the CG algorithm.)
The Euclidean norm is associated with an inner product:
8 Iterative Methods for Solving Linear Systems

We refer to this as the standard inner product. Similarly, the G^G-norm is


associated with an inner product:

By definition we have \\v\\2 = (v, v), and it follows that IHIc^G = (^v^v) =
(V,V)GHQ. If HI • HI is any norm associated with an inner product ({-, •}}, then
there is a nonsingular matrix G such that

The i,j entry of GHG is {{£i,£j}}, where & and £j are the unit vectors with
one in position i and j, respectively, and zeros elsewhere.

1.3.2. Orthogonality. The vectors v and w are said to be orthogonal if


(v,w) = 0 and to be orthonormal if, in addition, ||v|| = ||w|| = 1. The vectors
v and w are said to be GHG-orthogonal if (v, GHGw) = 0.
The GHG-projection of a vector v in the direction w is

and to minimize the G^G-norm of v in the direction iu, one subtracts off the
G^G-projection of v onto w. That is, if

then of all vectors of the form v — aw where a is a scalar, v has the smallest
G^G-norm.
An n-by-n complex matrix with orthonormal columns is called a unitary
matrix. For a unitary matrix Q, we have QHQ = QQH = I, where / is the
n-by-n identity matrix. If the matrix Q is real, then it can also be called an
orthogonal matrix.
Given a set of linearly independent vectors {vi,..., vn}, one can construct
an orthonormal set {?/i,..., un} using the Gram-Schmidt procedure:

In actual computations, a mathematically equivalent procedure called the


modified Gram-Schmidt method is often used:

Modified Gram—Schmidt Algorithm.

Set
Introduction 9

Here, instead of computing the projection of v^ onto each of the basis vectors
it,, i = 1,. . . , & — 1, the next basis vector is formed by first subtracting off the
projection of Vk in the direction of one of the basis vectors and then subtracting
off the projection of the new vector Uk in the direction of another basis vector,
etc. The modified Gram-Schmidt procedure forms the core of many iterative
methods for solving linear systems.
If Uk is the matrix whose columns are the orthonormal vectors u\,..., Uk,
then the closest vector to a given vector v from the space spanjui,..., Uk} is
the projection of v onto this space,

1.3.3. Matrix Norms. Let Mn denote the set of n-by-n complex matrices.
DEFINITION 1.3.1. A function ||| • | : Mn —>• R is called a matrix norm if,
for all A, B £ Mn and all complex scalars c,

Example. The Frobenius norm defined by

is a matrix norm because, in addition to properties 1-3, we have

This is just the Cauchy-Schwarz inequality.


DEFINITION 1.3.2. Let ||| • ||| be a vector norm on C n . The induced norm,
also denoted \\\ • \\\, is defined on Mn by

The "max" in the above definition could be replaced by "sup." The two are
equivalent since |||-Ay||| is a continuous function of y and the unit ball in C n ,
10 Iterative Methods for Solving Linear Systems

being a compact set, contains the vector for which the sup is attained. Another
equivalent definition is

The norm || • ||i induced on Mn by the 1-norm for vectors is

the maximal absolute column sum of A. To see this, write A in terms of its
columns A = [ai,..., an]. Then

Thus, max||y||1=1 \\Ay\\i < \\A\\i. But if y is the unit vector with a 1 in the
position of the column of A having the greatest 1-norm and zeros elsewhere,
then \\Ay\\i = \\A\\i, so we also have max||y||1=i \\Ay\\i > \\A\\i.
The norm || • ||oo induced on Mn by the co-norm for vectors is

the largest absolute row sum of A. To see this, first note that

and hence max||y||oo=1 ||A?/||oo < Halloo- On the other hand, suppose the kth
row of A has the largest absolute row sum. Take y to be the vector whose
jth entry is akj/\akj\ if a^j ^ 0 and 1 if a^j = 0. Then the kih entry of Ay
is the sum of the absolute values of the entries in row k of A, so we have
Pylloo > HAIIoo.
The norm || • |J2 induced on Mn by the 2-norm for vectors is

To see this, recall the variational characterization of eigenvalues of a Hermitian


matrix:
Introduction 11

This matrix norm will also be denoted, simply, as || • || from here on.
THEOREM 1.3.1. // ||| • ||| is a matrix norm on Mn and if G & Mn is
nonsingular, then

is a matrix norm. If \\\ • \\\ is induced by a vector norm \\\ • |||, then ||| • |||GHG
is induced by the vector norm \\\ • |||GHG.
Proof. Axioms 1-3 of Definition 1.3.1 are easy to verify, and axiom 4 follows
from

If I H 4 I I = max^0|Py|||/||M||, then

so III • |||G»G *s the matrix norm induced by the vector norm ||| • )||GHG.

1.3.4. The Spectral Radius. The spectral radius, or largest absolute


value of an eigenvalue of a matrix, is of importance in the analysis of certain
iterative methods.
DEFINITION 1.3.3. The spectral radius p(A) of a matrix A e Mn is

THEOREM 1.3.2. // || • ||| is any matrix norm and A 6 Mn, then


P(A) < \\\A\\\.
Proof. Let A be an eigenvalue of A with |A| = p(A), and let v be the
corresponding eigenvector. Let V be the matrix in Mn each of whose columns
is v. Then AV = AF, and if ||| • j|| is any matrix norm.

Since |||F||| > 0, it follows that p(A) < \\\A\\\.


THEOREM 1.3.3. Let A € Mn and e > 0 be given. There is a matrix norm
HI • HI induced by a certain vector norm such that

Proof. The proof of this theorem will use the Schur triangularization, which
is stated as a theorem in the next section. According to this theorem, there is
a unitary matrix Q and an upper triangular matrix U whose diagonal entries
12 Iterative Methods for Solving Linear Systems

are the eigenvalues of A such that A = QUQH. Set Dt = diag(£, t2,..., tn)
and note that

For t sufficiently large, the sum of absolute values of all off-diagonal elements
in a column is less than e, so HDfC/D^lli < p(A) + e. Thus if we define the
matrix norm 111 • 111 by

for any matrix B <E Mn, then we will have |||A||| = \\DtUD^\\i < p(A) + e. It
follows from Theorem 1.3.1 that ||| • ||| is a matrix norm since

and that it is induced by the vector norm

To study the convergence of simple iteration, we will be interested in


conditions under which the powers of a matrix A converge to the zero matrix.
THEOREM 1.3.4. Let A e Mn. Then lim^oo^ = 0 if and only if
p(A) < 1.
Proof. First suppose lim^oo Ak = 0. Let X be an eigenvalue of A with
eigenvector v. Since Akv = \kv —» 0 as k —> oo, this implies |A| < 1.
Conversely, if p(A) < 1, then by Theorem 1.3.3 there is a matrix norm ||| • |||
such that IPIH < 1. It follows that |||A*||| < |||A|||fc -> 0 as k -* oo. Since all
matrix norms on Mn are equivalent, this implies, for instance, that ||.Afc||F —> 0
as k —* oo, which implies that all entries of Ak must approach 0.
COROLLARY 1.3.1. Let ||| • ||| be a matrix norm on Mn. Then

for all A 6 Mn.


Proof. Since p(A)k = p(Ak) < \\\Ak\\\, we have p(A) < \\\Ak\\\Vk, for all
k = 1,2, — For any e > 0, the matrix A = [p(A) + e]-1A has spectral radius
strictly less than one and so |||-Afc||| —> 0 as k —> CXD. There is some number
K = K(e) such that |||Afc||| < 1 for all k > K, and this is just the statement
that |p*||| < [p(A) + e]k or |pfc|||1/fc < p(A) + e for all k>K.We thus have
p(A) < \\\Ak\\\^k < p(A) + e, for all k > K(e), and since this holds for any
e > 0, it follows that lim^oo |||Afc|||1//i: exists and is equal to p(A).
Introduction 13

1.3.5. Canonical Forms and Decompositions. Matrices can be reduced


to a number of different forms through similarity transformations, and they
can be factored in a variety of ways. These forms are often useful in the
analysis of numerical algorithms. Several such canonical representations
and decompositions are described here, without proofs of their existence or
algorithms for their computation.
THEOREM 1.3.5 (Jordan form). Let A be an n-by-n matrix. There is a
nonsingular matrix S such that

where

and J2iLi ni =n-


The matrix J is called the Jordan form of A. The columns of S are called
principal vectors. The number m of Jordan blocks is the number of independent
eigenvectors of A.
The matrix A is diagonalizable if and only if m = n, and in this case the
columns of 5 are called eigenvectors. The set of diagonalizable matrices is dense
in the set of all matrices. To see this, consider perturbing the diagonal entries
of a Jordan block by arbitrarily tiny amounts so that they are all different. The
matrix then has distinct eigenvalues, and any matrix with distinct eigenvalues
is diagonalizable.
In the special case when A is diagonalizable and the columns of S are
orthogonal, the matrix A is said to be a normal matrix.
DEFINITION 1.3.4. A matrix A is normal if it can be written in the form
A = QAQ1*, where A. is a diagonal matrix and Q is a unitary matrix.
A matrix is normal if and only if it commutes with its Hermitian transpose:
AAH = AHA. Any Hermitian matrix is normal.
14 Iterative Methods for Solving Linear Systems

It can be shown by induction that the kth power of a j-by-j Jordan block
corresponding to the eigenvalue A is given by

where (J) is taken to be 0 if i > k. The 2-norm of Jk satisfies

where the symbol ~ means that, asymptotically, the left-hand side behaves
like the right-hand side. The 2-norm of an arbitrary matrix Ak satisfies

where j is the largest order of all diagonal submatrices Jr of the Jordan form
with p( Jr} — p(A) and v is a positive constant.
THEOREM 1.3.6 (Schur form). Let A be an n-by-n matrix with eigenvalues
A i , . . . , A n in any prescribed order. There is a unitary matrix Q such that
A = QUQH, where U is an upper triangular matrix and U^i = Aj.
Note that while the transformation S taking a matrix to its Jordan form
may be extremely ill conditioned (that is, S in Theorem 1.3.5 may be nearly
singular), the transformation to upper triangular form is perfectly conditioned
(Q in Theorem 1.3.6 is unitary). Consequently, the Schur form often proves
more useful in numerical analysis.
The Schur form is not unique, since the diagonal entries of U may appear in
any order, and the entries of the upper triangle may be very different depending
on the ordering of the diagonal entries. For example, the upper triangular
matrices

are two Schur forms of the same matrix, since they are unitarily equivalent via
Introduction 15

THEOREM 1.3.7 (LU decomposition). Let A in Mn be nonsingular. Then


A can be factored in the form

where P is a permutation matrix, L is lower triangular, and U is upper


triangular.
The LU decomposition is a standard direct method for solving a linear
system Ax = b. Factor the matrix A into the form PLU, solve Ly = PHb
(since PH = P~l), and then solve Ux — y. Unfortunately, even if the matrix
A is sparse, the factors L and U are usually dense (at least within a band about
the diagonal). In general, the work required to compute the LU decomposition
is O(n3) and the work to backsolve with the computed LU factors is O(n2).
It is for this reason that iterative linear system solvers, which may require
far less work and storage, are important. In Chapter 11, we discuss the use
of "incomplete" LU decompositions as preconditioners for iterative methods.
The idea is to drop entries of L and U that are small or are outside of certain
positions.
For certain matrices A, the permutation matrix P in (1.10) is not necessary;
that is, P can be taken to be the identity. For Hermitian positive definite
matrices, for instance, no permutation is required for the LU decomposition,
and if L and UH are taken to have the same diagonal entries, then this
decomposition becomes A — LLH. This is sometimes referred to as the
Cholesky decomposition.
Another direct method for solving linear systems or least squares problems
is the QR decomposition.
THEOREM 1.3.8 (QR decomposition). Let A be an m-by-n matrix with
m > n. There is an m-by-n matrix Q with orthonormal columns and an n-by-
n upper triangular matrix R such that A = QR. Columns can be added to the
matrix Q to form an m-by-m unitary matrix Q such that A = QR, where R is
an m-by-n matrix with R as its top n-by-n block and zeros elsewhere.
One way to compute the QR decomposition of a matrix A is to apply
the modified Gram-Schmidt algorithm of section 1.3.2 to the columns of A.
Another way is to apply a sequence of unitary matrices to A to transform it to
an upper triangular matrix. Since the product of unitary matrices is unitary
and the inverse of a unitary matrix is unitary, this also gives a QR factorization
of A. If A is square and nonsingular and the diagonal elements of R are taken
to be positive, then the Q and R factors are unique, so this gives the same
QR factorization as the modified Gram-Schmidt algorithm. Unitary matrices
that are often used in the QR decomposition are reflections (Householder
transformations) and, for matrices with special structures, rotations (Givens
transformations). A number of iterative linear system solvers apply Givens
rotations to a smaller upper Hessenberg matrix in order to solve a least squares
problem with this smaller coefficient matrix. Once an m-by-n matrix has
been factored in the form QR, a least squares problem—find y to minimize
16 Iterative Methods for Solving Linear Systems

\\QRy — b\\—can be solved by solving the upper triangular system Ry = QHb.


THEOREM 1.3.9 (singular value decomposition). If A is anm-by-n matrix
with rank k, then it can be written in the form

where V is an m-by-m unitary matrix, W is an n-by-n unitary matrix, and S


is an m-by-n matrix with &ij — 0 for all i ^ j and an > 022 > • • • > <?kk >
0k+i,k+i = ••• = aqq = 0, where q = min{m,n}.
The numbers an = a^ known as singular values of A, are the nonnegative
square roots of the eigenvalues of AAH'. The columns of V, known as left
singular vectors of A, are eigenvectors of AAH; the columns of W, known as
right singular vectors of A, are eigenvectors of AHA.
Using the singular value decomposition and the Schur form, we are able
to define a certain measure of the departure from normality of a matrix. It is
the difference between the sum of squares of the singular values and the sum
of squares of the eigenvalues, and it is also equal to the sum of squares of the
entries in the strict upper triangle of a Schur form. For a normal matrix, each
of these quantities is, of course, zero.
THEOREM 1.3.10. Let A € Mn have eigenvalues \i,...,\n and singular
values a\,..., an, and let A = QUQH be a Schur decomposition of A. Let A
denote the diagonal of U, consisting of the eigenvalues of A, in some order,
and let T denote the strict upper triangle of U. Then

Proof. From the definition of the Frobenius norm, it is seen that ||^4.|||' =
ti(AHA), where tr(-) denotes the trace, i.e., the sum of the diagonal entries.
If A — VSWH is the singular value decomposition of A, then

If A = QUQH is a Schur form of A, then it is also clear that

DEFINITION 1.3.5. The Frobenius norm of the strict upper triangle of a


Schur form of A is called the departure from normality of A with respect to
the Frobenius norm.

1.3.6. Eigenvalues and the Field of Values. We will see later that the
eigenvalues of a normal matrix provide all of the essential information about
that matrix, as far as iterative linear system solvers are concerned. There is no
corresponding simple set of characteristics of a nonnormal matrix that provide
Introduction 17

such complete information, but the field of values captures certain important
properties.
We begin with the useful theorem of Gerschgorin for locating the eigenval-
ues of a matrix.
THEOREM 1.3.11 (Gerschgorin). Let A be an n-by-n matrix and let

denote the sum of the absolute values of all off-diagonal entries in row i. Then
all eigenvalues of A are located in the union of disks

Proof. Let A be an eigenvalue of A with corresponding eigenvector v. Let


vp be the element of v with largest absolute value, \vp\ > max^ |i>j|. Then
since Av = \v, we have

or, equivalently,

Prom the triangle inequality it follows that

Since vp\ > 0, it follows that |A — ap;p| < Rp(A); that is, the eigenvalue A lies
in the Gerschgorin disk for the row corresponding to its eigenvector's largest
entry, and hence all of the eigenvalues lie in the union of the Gerschgorin disks.

It can be shown further that if a union of k of the Gerschgorin disks forms


a connected region that is disjoint from the remaining n — k disks, then that
region contains exactly k of the eigenvalues of A.
Since the eigenvalues of A are the same as those of AH, the following
corollary is immediate.
COROLLARY 1.3.2. Let A be an n-by-n matrix and let
18 Iterative Methods for Solving Linear Systems

denote the sum of the absolute values of all off-diagonal entries in column j.
Then all eigenvalues of A are located in the union of disks

A matrix is said to be (rowwise) diagonally dominant if the absolute value


of each diagonal entry is strictly greater than the sum of the absolute values
of the off-diagonal entries in its row. In this case, the matrix is nonsingular,
since the Gerschgorin disks in (1.13) do not contain the origin. If the absolute
value of each diagonal entry is greater than or equal to the sum of the absolute
values of the off-diagonal entries in its row, then the matrix is said to be
weakly (rowwise) diagonally dominant Analogous definitions of (columnwise)
diagonal dominance and weak diagonal dominance can be given.
We say that a Hermitian matrix is positive definite if its eigenvalues are all
positive. (Recall that the eigenvalues of a Hermitian matrix are all real.) A
diagonally dominant Hermitian matrix with positive diagonal entries is positive
definite. A Hermitian matrix is positive semidefinite if its eigenvalues are all
nonnegative. A weakly diagonally dominant Hermitian matrix with positive
diagonal entries is positive semidefinite.
Another useful theorem for obtaining bounds on the eigenvalues of a
Hermitian matrix is the Cauchy interlace theorem, which we state here without
proof. For a proof see, for instance, [81] or [112].
THEOREM 1.3.12 (Cauchy interlace theorem). Let A be an n-by-n Hermi-
tian matrix with eigenvalues AI < • • • < A n , and let H be any m-by-m principal
submatrix of A (obtained by deleting n — m rows and the corresponding columns
from A), with eigenvalues m < • • - < fj,m. Then for each i = 1,..., m we have

For non-Hermitian matrices, the field of values is sometimes a more useful


concept than the eigenvalues.
DEFINITION 1.3.6. The field of values of A e Mn is

This set is also called the numerical range. An equivalent definition is

The field of values is a compact set in the complex plane, since it is the
continuous image of a compact set—the surface of the Euclidean ball. It can
also be shown to be a convex set. This is known as the Toeplitz-Hausdorff
theorem. See, for example, [81] for a proof. The numerical radius v(A] is the
largest absolute value of an element of F(A):
Introduction 19

If A is an n-by-n matrix and a is a complex scalar, then

since

Also,

since

For any n-by-n matrix A, F(A] contains the eigenvalues of A, since


v Av = XvHv = X if A is an eigenvalue and v is a corresponding normalized
H

eigenvector. Also, if Q is any unitary matrix, then Jr(QHAQ) = F(A), since


every value yHQH AQy with yHy = 1 in f(QHAQ) corresponds to a value
wHAw with w = Qy, WHW — 1 in F(A), and vice versa.
For normal matrices the field of values is the convex hull of the spectrum.
To see this, write the eigendecomposition of a normal matrix A in the form
A = QA.QH, where Q is unitary and A = diag{Ai,..., An}. By the unitary
similarity invariance property, F(A) = .F(A). Since

it follows that ^(A) is just the set of all convex combinations of the eigenvalues
AI, • • • i A n .
For a general matrix A, let H(A) = \(A + AH) denote the Hermitian part
of A. Then

To see this, note that for any vector y e C",

Thus each point in f(H(A)) is of the form Re(z) for some z € f(A) and vice
versa.
The analogue of Gerschgorin's theorem for the field of values is as follows.
20 Iterative Methods for Solving Linear Systems

THEOREM 1.3.13. Let A be an n-by-n matrix and let

Then the field of values of A is contained in

where Co(-) denotes the convex hull.


Proof. First note that since the real part of F(A) is equal to F(H(A)) and
since F(H(A)) is the convex hull of the eigenvalues of H(A), it follows from
Gerschgorin's theorem applied to H(A) that

Let Gp(A) denote the set in (1.19). If Gp(A) is contained in the open right
half-plane {z : Re(z) > 0}, then Re(ai)i) > \(Ri(A) + Ci(A)) for all 2, and
hence the set on the right in (1.20) is contained in the open right half-plane.
Since ^(A) is convex, it follows that F(A) lies in the open right half-plane.
Now suppose only that Gp(A) is contained in some open half-plane
about the origin. Since Gp(A) is convex, this is equivalent to the condition
0 £ GF(A). Then there is some 0 <E [0,2?r) such that eieGF(A) = GF(el0A) is
contained in the open right half-plane. It follows from the previous argument
that F(elBA) = et9J:(A) lies in the open right half-plane, and hence 0 £ F(A).
Finally, for any complex number a, if a £ GF(A) then 0 $. GF(A — a/),
and the previous argument implies that 0 ^ F(A — al). Using (1.16), it follows
that a g F(A). Therefore, F(A) C GF(A).
The following procedure can be used to approximate the field of values
numerically. First note that since F(A) is convex and compact it is necessary
only to compute the boundary. If many well-spaced points are computed
around the boundary, then the convex hull of these points is a polygon p(A)
that is contained in F(A), while the intersection of the half-planes determined
by the support lines at these points is a polygon P(A) that contains J-'(A).
To compute points around the boundary of ^(A), first note from (1.18)
that the rightmost point in ^(A) has real part equal to the rightmost point in
f(H(A)), which is the largest eigenvalue of H(A). If we compute the largest
eigenvalue Xmax of H(A) and the corresponding unit eigenvector v, then VHAv
Introduction 21

is a boundary point of ^(A) and the vertical line {Amoa; + ti, t € R}, is a
support line for J-(A); that is, F(A) is contained in the half-plane to the left
of this line.
Note also that since e"16 J-(el® A) — ^(A), we can use this same procedure
for rotated matrices el0A, 0 € [0,27r). If Xe denotes the largest eigenvalue of
H(etdA) and vg the corresponding unit eigenvector, then vffAvg is a boundary
point of F(A) and the line [e~lB(\o+ti), t € R} is a support line. By choosing
values of 0 throughout the interval [0,2?r), the approximating polygons p(A)
and P(A) can be made arbitrarily close to the true field of values F(A).
The numerical radius v(A) also has a number of interesting properties. For
any two matrices A and B, it is clear that

Although the numerical radius is not itself a matrix norm (since the
requirement v(AB) < v(A) • v(B) does not always hold), it is closely related
to the 2-norm:

The second inequality in (1.21) follows from the fact that for any vector y with
\\y\\ = 1, we have

The first inequality in (1.21) is derived as follows. First note that v(A) =
v(AH). Writing A in the form A = H(A) + N(A), where N(A) = (A - AH)/2,
and noting that both H(A) and N(A) are normal matrices, we observe that

Using the definition of the numerical radius this becomes

The numerical radius also satisfies the power inequality

For an elementary proof, see [114] or [80, Ex. 27, p. 333].


An important property of the set of eigenvalues A(A) is that if p is any
polynomial, then K(p(A)} = p(A.(A)). This can be seen from the Jordan form
A = SJS~l. Hp is any polynomial then p(A) = Sp(J)S~l and the eigenvalues
of p(J) are just the diagonal elements, p(A(A)). Unfortunately, the field of
22 Iterative Methods for Solving Linear Systems

values does not have this property: F(p(A)) ^ p(f(A}}. We will see in later
sections, however, that the weaker property (1.22) can still be useful in deriving
error bounds for iterative methods. From (1-22) it follows that if the field of
values of A is contained in a disk of radius r centered at the origin, then the
field of values of Am is contained in a disk of radius rm centered at the origin.

There are many interesting generalizations of the field of values. One that
is especially relevant to the analysis of iterative methods is as follows.
DEFINITION 1.3.7. The generalized field of values of a set of matrices
{Ai,..., A}.} in Mn is the subset of Cfc defined by

Note that for k = 1, this is just the ordinary field of values. One can also
define the conical generalized field of values as

It is clear that this object is a cone, in the sense that if z € Fk({Ai}*=l) and
a > 0, then az e ^({Ai}^) . Note also that the conical generalized field of
values is preserved by simultaneous congruence transformation: for P e Mn
nonsingular, ^({AO*Li) = fk({PH'>U>}ii)•

Comments and Additional References.


The linear algebra facts reviewed in this chapter, along with a wealth of
additional interesting material, can be found in the excellent books by Horn
and Johnson [80, 81].
Part I

Krylov Subspace
Approximations
This page intentionally left blank
Chapter 2

Some Iteration Methods

In this chapter the conjugate gradient (CG), minimal residual (MINRES),


and generalized minimal residual (GMRES) algorithms are derived. These
algorithms, designed for Hermitian positive definite, Hermitian indefinite, and
general non-Hermitian matrices, respectively, each generate the "optimal" (in
a sense, to be described later) approximation from the Krylov space (1.1).
The algorithms are first derived from other simpler iterative methods such
as the method of steepest descent and Orthomin. This corresponds (roughly)
to the historical development of the methods, with the optimal versions being
developed as improvements upon nonoptimal algorithms. It shows how these
algorithms are related to other iterative methods for solving linear systems.
Once it is recognized, however, that the goal in designing an iterative
method is to generate the optimal approximation from the space (1.1), these
methods can be derived from standard linear algebra techniques for locating
the closest vector in a subspace to a given vector. This derivation is carried
out in the latter part of section 2.4 for GMRES and in section 2.5 for MINRES
and CG. This derivation has a number of advantages, such as demonstrating
that the possible failure of CG for indefinite matrices corresponds to a singular
tridiagonal matrix in the Lanczos algorithm and providing a basis for the
derivation of other iterative techniques, such as those described in Chapter 5.

2.1. Simple Iteration.


Given a preconditioner M for the linear system Ax = 6, a natural idea for
generating approximate solutions is the following. Since a preconditioner is
designed so that M~1A in some sense approximates the identity, M~l(b—Ax^)
can be expected to approximate the error A~lb—Xk in an approximate solution
Xfc. A better approximate solution £fc+i might therefore be obtained by taking

This procedure of starting with an initial guess XQ for the solution and
generating successive approximations using (2.1) for k = 0,1.... is sometimes
called simple iteration, but more often it goes by different names according to
the choice of M. For M equal to the diagonal of A, it is called Jacobi iteration;
25
26 Iterative Methods for Solving Linear Systems

for M equal to the lower triangle of A, it is the Gauss-Seidel method; for M of


the form u~lD — L, where D is the diagonal of A, L is the strict lower triangle
of A, and a; is a relaxation parameter, it is the successive overrelaxation or
SOR method. Preconditioners will be discussed in Part II of this book, but
our concern in this section is to describe the behavior of iteration (2.1) for a
given preconditioner M in terms of properties of the preconditioned matrix
M~1A.
We will see in later sections that the simple iteration procedure (2.1) can
be improved upon in a number of ways. Still, it is not to be abandoned.
All of these improvements require some extra work, and if the iteration matrix
M~1A is sufficiently close to the identity, this extra work may not be necessary.
Multigrid methods, which will be discussed in Chapter 12, can be thought of
as very sophisticated preconditioned used with the simple iteration (2.1).
An actual implementation of (2.1) might use the following algorithm.

Algorithm 1. Simple Iteration.

Given an initial guess XQ, compute r0 = 6 - Ax0, and solve Mz0 = r0.
For A; = 1,2,...,

Set

Compute

Solve

Let ek = A lb — Xk denote the error in the approximation x^. It follows


from (2.1) that

Taking norms on both sides in (2.2), we find that

where ||| • ||| can be any vector norm provided that the matrix norm is taken
to be the one induced by the vector norm |||B||| = max||U||=i |||5y|||. In this
case, the bound in (2.3) is sharp, since, for each A;, there is an initial error CQ
for which equality holds.
LEMMA 2.1.1. The norm of the error in iteration (2.1) will approach zero
and Xk will approach A~lb for every initial error SQ if and only if

Proof. It is clear from (2.3) that if lirm^ooliK/ - Af-M^IH = 0 then


limjfc_oo |||efc||| — 0. Conversely, suppose |||(/-M~M)fc||| > a > 0 for infinitely
Some Iteration Methods 27

many values of k. The vectors eo,fc with norm 1 for which equality holds in (2.3)
form a bounded infinite set in Cn, so, by the Bolzano-Weierstrass theorem,
they contain a convergent subsequence. Let CQ be the limit of this subsequence.
Then for k sufficiently large in this subsequence, we have |||eo — eo,fc||| < e < 1,
and

Since this holds for infinitely many values of &, it follows that limfc^oo |||(7 —
M~ 1 A) fc eo|||, if it exists, is greater than 0.
It was shown in section 1.3.1 that, independent of the matrix norm used
in (2.3), the quantity |||(/ — M~1vl)A:|||1/fc approaches the spectral radius,
p(I — M~1A) as k —> oo. Thus we have the following result.
THEOREM 2.1.1. The iteration (2.1) converges to A~lb for every initial
error e$ if and only if p(I — M~1A) < 1.
Proof. If p(I - M~1A) < 1, then

while if p(I - M~1A) > 1, then lim^oo |||(/ - M~1A)fc|||, if it exists, must be
greater than or equal to 1. In either case the result then follows from Lemma
2.1.1.
Having established necessary and sufficient conditions for convergence, we
must now consider the rate of convergence. How many iterations will be
required to obtain an approximation that is within, say, 6 of the true solution?
In general, this question is not so easy to answer.
Taking norms on each side in (2.2), we can write

from which it follows that if |||J — M-1.A||| < 1, then the error is reduced by
at least this factor at each iteration. The error will satisfy |||efc|||/| eo|| < 6
provided that

It was shown in section 1.3.1 that for any e > 0, there is a matrix norm such
that |||/ - M~1A\\\ < p(I - M~1A) + e. Hence if p(I - M~1A) < 1, then
there is a norm for which the error is reduced monotonically, and convergence
is at least linear with a reduction factor approximately equal to p(I — M~1A).
Unfortunately, however, this norm is sometimes a very strange one (as might
be deduced from the proof of Theorem 1.3.3, since the matrix Dt involved an
exponential scaling), and it is unlikely that one would really want to measure
convergence in terms of this norm!
It is usually the 2-norm or the oo-norm or some closely related norm of
the error that is of interest. For the class of normal matrices (diagonalizable
28 Iterative Methods for Solving Linear Systems

FIG. 2.1. Simple iteration for a highly nonnormal matrix p= .74.

matrices with a complete set of orthonormal eigenvectors), the 2-norm and


the spectral radius coincide. Thus if / — M~1A is a normal matrix, then
the 2-norm of the error is reduced by at least the factor p(I — M~1A)
at each step. For nonnormal matrices, however, it is often the case that
p(I — M~1A) < I < \\I — M~1A\\. In this case, the error may grow over
some finite number of steps, and it is impossible to predict the number of
iterations required to obtain a given level of accuracy while knowing only the
spectral radius.
An example is shown in Figure 2.1. The matrix A was taken to be

and M was taken to be the lower triangle of A. For problem size n =• 30, the
spectral radius of / — M~1A is about .74, while the 2-norm of / — M~1A is
about 1.4. As Figure 2.1 shows, the 2-norm of the error increases by about
four orders of magnitude over its original value before starting to decrease.
While the spectral radius generally does not determine the convergence
rate of early iterations, it does describe the asymptotic convergence rate of
(2.1). We will prove this only in the case where M~1A is diagonalizable
and has a single eigenvalue of largest absolute value. Then, writing the
eigendecomposition of M~1A as VkV~l, where A = diag(Ai,..., A n ) and,
Some Iteration Methods 29

say,

Assuming that the first component of V~I&Q is nonzero, for k sufficiently large,
the largest component of V~1eie will be the first, (V~lek)i. At each subsequent
iteration, this dominant component is multiplied by the factor (1 — AI), and
so we have

Instead of considering the error ratios at successive steps as defining the asymp-
totic convergence rate, one might consider ratios of errors (IHefc+jlll/lllefclH) 1 /- 7
for any k and for j sufficiently large. Then, a more general proof that this
quantity approaches the spectral radius p(I — M"1 A) as j —> oo can be carried
out using the Jordan form, as described in section 1.3.2. Note in Figure 2.1
that eventually the error decreases by about the factor p(I — M~^A) — .74 at
each step, since this matrix is diagonalizable with a single eigenvalue of largest
absolute value.

2.2. Orthornin(l) and Steepest Descent.


In this section we discuss methods to improve on the simple iteration (2.1)
by introducing dynamically computed parameters into the iteration. For
ease of notation here and throughout the remainder of Part I of this book,
we avoid explicit reference to the preconditioner M and consider A and
b to be the coefficient matrix and right-hand side vector for the already
preconditioned system. Sometimes we will assume that the preconditioned
matrix is Hermitian. If the original coefficient matrix is Hermitian and
the preconditioner M is Hermitian and positive definite, then one obtains a
Hermitian preconditioned matrix by (implicitly) working with the modified
linear system

where M = LLH and the superscript H denotes the Hermitian transpose. If


the original problem is Hermitian but the preconditioner is indefinite, then we
consider this as a non-Hermitian problem.
One might hope to improve the iteration (2.1) by introducing a parameter
afc and setting

Since the residual satisfies r^+i = r^ ~ a^Ar^, one can minimize the 2-nonn of
r fc+ i by choosing
30 Iterative Methods for Solving Linear Systems

If the matrix A is Hermitian and positive definite, one might instead minimize
the .A-norm of the error, ||efc+i|U = (ek+i,Aek+i)1/2. Since the error satisfies
efc+i = efc —flfcrfc,the coefficient that minimizes this error norm is

For Hermitian positive definite problems, the iteration (2.5) with coefficient
formula (2.7) is called the method of steepest descent because if the problem
of solving the linear system is identified with that of minimizing the quadratic
form XH Ax — 1bHx (which has its minimum where Ax = 6), then the
negative gradient or direction of steepest descent of this function at x = xk is
r/t = b — Axk- The coefficient formula (2.6), which can be used with arbitrary
nonsingular matrices A, does not have a special name like steepest descent but
is a special case of a number of different methods. In particular, it can be
called Orthomin(l).
By choosing a^ as in (2.6), the Orthomin(l) method produces a residual
rfc+i that is equal to r^ minus its projection onto Ar^. It follows that
lkfc+i|| < \\fk\\ with equality if and only if rk is already orthogonal to Ark.
Recall the definition of the field of values F(B) of a matrix B as the set of all
complex numbers of the form yHBy/yHy, where y is any complex vector other
than the zero vector.
THEOREM 2.2.1. The 1-norm of the residual in iteration (2.5) with
coefficient formula (2.6) decreases strictly monotonically for every initial vector
TO if and only ifQg. F(AH).
Proof. If 0 € F(AH) and TO is a nonzero vector satisfying (r0,Aro) =
r^AHr0 = 0, then ||ri|| = ||r0||. On the other hand, if 0 g f(AH), then
(rk,Arh) cannot be 0 for any k and ||rfc+i|| < ||rfc||.
Since the field of values of AH is just the complex conjugate of the field of
values of A, the condition in the theorem can be replaced by 0 0 f(A).
Suppose 0 ^ F(AH). To show that the method (2.5-2.6) converges to the
solution A~lb, we will show not only that the 2-norm of the residual is reduced
at each step but that it is reduced by at least some fixed factor, independent of
k. Since the field of values is a closed set, if 0 ^ f(Aa) then there is a positive
number d—the distance of f(AH) from the origin—such that \v „ y\ > d for
\ I O I ytly I —

all complex vectors y ^ 0. From (2.5) it follows that

and, taking the inner product of rk+i with itself, we have

which can be written in the form


Some Iteration Methods 31

Bounding the last two factors in (2.8) independently, in terms of d and \\A\\,
we have

We have proved the following theorem.


THEOREM 2.2.2. The iteration (2.5) with coefficient formula (2.6) con-
verges to the solution A~lb for all initial vectors r$ if and only i f Q $ }r(AH}.
In this case, the 2-norm of the residual satisfies

for all k, where d is the distance from the origin to the field of values of AH.
In the special case where A is real and the Hermitian part of A, H(A) =
(A + A f f ) / 2 , is positive definite, the distance d in (2.9) is just the smallest
eigenvalue of H(A). This is because the field of values of H(A) is the real part
of the field of values of AH. which is convex and symmetric about the real axis
and hence has its closest point to the origin on the real axis.
The bound (2.9) on the rate at which the 2-norm of the residual is reduced
is not necessarily sharp, since the vectors r^ for which the first factor in (2.8) is
equal to d2 are not necessarily the ones for which the second factor is 1/||>1||2.
Sometimes a stronger bound can be obtained by noting that, because of the
choice of ajt,

for any coefficient a. In the special case where A is Hermitian and positive
definite, consider a = 2/(\n + AI), where Xn is the largest and AI the smallest
eigenvalue of A. Inequality (2.10) then implies that

where K = Xn/\i is the condition number of A. (For the Hermitian positive


definite case, expression (2.9) gives the significantly weaker bound lln-_n II <

The same argument applied to the steepest descent method for Hermitian
positive definite problems shows that for that algorithm,

In the more general non-Hermitian case, suppose the field of values of AH


is contained in a disk D — {z e C : \z — c\ < s} which does not contain the
origin. Consider the choice a = l/c in (2.10). It follows from (1.16-1.17) that
32 Iterative Methods for Solving Linear Systems

Using relation (1.21) between the numerical radius and the norm of a matrix,
we conclude that for this choice of a

and hence

This estimate may be stronger or weaker than that in (2.9), depending on the
exact size and shape of the field of values. For example, if F(A) is contained
in a disk of radius s = (\\A\\ - d)/2 centered at c = (\\A\\ + d)/2, then (2.13)
implies

This is smaller than the bound in (2.9) if d/||A|| is greater than about .37;
otherwise, (2.9) is smaller.
Stronger results have recently been proved by Eiermann [38, 39]. Suppose
F(A) C 17, where 17 is a compact convex set with 0 ^ fi. Eiermann has
shown that if (f>m(z) — Fm(z)/Fm(Q), where Fm(z) is the mth-degree Faber
polynomial for the set 17 (the analytic part of ($(z))m, where 4>(z) maps the
exterior of fi to the exterior of the unit disk), then

where the minimum is over all rath-degree polynomials with value one at the
origin and the constant Cm depends on 17 but is independent of A. For ra = 1,
as in (2.10), inequality (2.14) is of limited use because the constant Cm may be
larger than the inverse of the norm of the first degree minimax polynomial on
(7. We will see later, however, that for methods such as (restarted) GMRES
involving higher degree polynomials, estimate (2.14) can sometimes lead to
very good bounds on the norm of the residual.
Orthomin(l) can be generalized by using different inner products in (2.6).
That is, if B is a Hermitian positive definite matrix, then, instead of minimizing
the 2-norm of the residual, one could minimize the B-norm of the residual by
taking

Writing ak in the equivalent form

it is clear that for this variant the 5-norm of the residual decreases strictly
monotonically for all TO if and only if 0 ^ J:((Blf2AB~lf'2)H). Using arguments
similar to those used to establish Theorem 2.2.2, it can be seen that
Some Iteration Methods 33

where ds is the distance from the origin to the field of values of (B*^AB~l/2)H.
If 0 € ^(A1*), it may still be possible to find a Hermitian positive definite
matrix B such that 0 i ^((Bl^AB^2)H).

2.3. Orthomin(2) and CG.


The Orthomin(l) and steepest descent iterations of the previous section can
be written in the form

where the direction vector pk is equal to the residual TV The new residual and
error vectors then satisfy

where the coefficient a^ is chosen so that either r^+i is orthogonal to Apk


(Orthomin(l)) or, in the case of Hermitian positive definite A, so that ek+i
is .A-orthogonal to pk (steepest descent). Note, however, that r^+\ is not
orthogonal to the previous vector Apk-i in the Orthomin(l) method and e^+i
is not A-orthogonal to the previous vector Pk-i in the steepest descent method.
If, in the Orthomin iteration, instead of subtracting off the projection of
rfc in the direction Ar^ we subtracted off its projection in a direction like Ar^
but orthogonal to Ap^-i, i.e., in the direction Ap^, where

then we would have

Now the residual norm is minimized in the plane spanned by Ar^ and Apk-i,
since we can write

and the coefficients force orthogonality between r^+i and spanjAr^, Ap^-i}.
The new algorithm, known as Orthomin(2), is the following.

Given an initial guess XQ, compute TQ — b — AXQ and set po = r$.


For fc = 1,2,...,
Compute Apk-i-
Set where

Compute
Set
34 Iterative Methods for Solving Linear Systems

The new algorithm can also fail. If (ro,Aro) = 0, then r\ will be equal to
r*o, and pi will be 0. An attempt to compute the coefficient a\ will result
in dividing 0 by 0. As for Orthomin(l), however, it can be shown that
Orthomin(2) cannot fail if 0 0 F(AH). If an Orthomin(2) step does succeed,
then the 2-norm of the residual is reduced by at least as much as it would
be reduced by an Orthomin(l) step from the same point. This is because
the residual norm is minimized over a larger space—r^ + spanjAr-fc, Apk-i]
instead of r^ + span{.Arfc}. It follows that the bound (2.9) holds also for
Orthomin(2) when 0 ^ F(AH). Unfortunately, no stronger a priori bounds
on the residual norm are known for Orthomin(2) applied to a general matrix
whose field of values does not contain the origin although, in practice, it may
perform significantly better than Orthomin(l).
In the special case when A is Hermitian, if the vectors at steps 1 through
k + l of the Orthomin(2) algorithm are defined, then r^+i is minimized not just
over the two-dimensional space r^ + span{Arfc, Apk-i] but over the (k -f 1)-
dimensional space TO + span{Apo, • • • , Ap^}.
THEOREM 2.3.1. Suppose that A is Hermitian, the coefficients OQ, • • • , flfc-i
are nonzero, and the vectors r i , . . . , r^+i and pi,... ,pfc+i in the Orthomin(1}
algorithm are defined. Then

It follows that of all vectors in the affine space

r/j+i has the smallest Euclidean norm. It also follows that if OQ, ..., an-2 are
nonzero and r i , . . . , rn and pi,... ,pn are defined, then rn = 0.
Proof. By construction, we have (ri,Apo) = (Api,Apo} = 0. Assume that
(rfc, Apj) = (Apk, Apj) = 0 V? < k-1. The coefficients at step k + l are chosen
to force (rk+i,Apk) = (Apk+i,Apk) = 0. For j < k - 1, we have

by the induction hypothesis. Also,

with the next-to-last equality holding because A = AH. It is justified to write


ajl since, by assumption, CLJ ^ 0.
It is easily checked by induction that r^+i lies in the space (2.16) and
that span{Apo> • • • , Apk} = span{^4ro,..., Ak+1ro}. Since r^+i is orthogonal
Some Iteration Methods 35

to span{./4po,. • • >^Pfc}> it follows that r^i is the vector in the space (2.16)
with minimal Euclidean norm. For k = n — 1, this implies that rn = 0.
The assumption in Theorem 2.3.1 that r i , . . . ,rk+i and pi,... ,pfc+i are
defined is actually implied by the other hypothesis. It can be shown that these
vectors are defined provided that CQ, ... , flfe-i are denned and nonzero and rk
is nonzero.
An algorithm that approximates the solution of a Hermitian linear system
Ax = b by minimizing the residual over the affine space (2.16) is known as
the MINRES algorithm. It should not be implemented in the form given
here, however, unless the matrix is positive (or negative) definite because, as
noted, this iteration can fail if 0 6 F(A}. An appropriate implementation of
the MINRES algorithm for Hermitian indefinite linear systems is derived in
section 2.5.
In a similar way, the steepest descent method for Hermitian positive definite
matrices can be modified so that it eliminates the A-projection of the error in
a direction that is already ,4-orthogonal to the previous direction vector, i.e.,
in the direction

Then we have

and the A-norm of the error is minimized over the two-dimensional affine space
ek 4- span{rfc,pfc_i}. The algorithm that does this is called the CG method.
It is usually implemented with slightly different (but equivalent) coefficient
formulas, as shown in Algorithm 2.

Algorithm 2. Conjugate Gradient Method (CG)


(for Hermitian positive definite problems)

Given an initial guess XQ, compute ro = b — AXQ and set po = ro-


For k= 1 , 2 , . . . ,

Compute

Set

Compute

Set

It is left as an exercise for the reader to prove that these coefficient formulas
are equivalent to the more obvious expressions
36 Iterative Methods for Solving Linear Systems

Since the CG algorithm is used only with positive definite matrices, the
coefficients are always defined, and it can be shown, analogous to the MINRES
method, that the -A-norm of the error is actually minimized over the much
larger affine space CQ + span{po,Pi, • • • ,Pk}-
THEOREM 2.3.2. Assume that A is Hermitian and positive definite. The
CG algorithm generates the exact solution to the linear system Ax = b in at
most n steps. The error, residual, and direction vectors generated before the
exact solution is obtained are well defined and satisfy

It follows that of all vectors in the affine space

efc+i has the smallest A-norm.


Proof. Since A is positive definite, it is clear that the coefficients in the CG
algorithm are well defined unless a residual vector is zero, in which case the
exact solution has been found. Assume that TO, ..., r^ are nonzero. By the
choice of ao, it is clear that (ri,ro) = (ei, Apo) = 0, and from the choice of 60
it follows that

where the last equality holds because (ri, TO) = 0 and a$l is real. Assume that

Then we also have

so, by the choice of a^, it follows that

From the choice of 6jt, we have


Some Iteration Methods 37

For we have

so, by induction, the desired equalities are established.


It is easily checked by induction that ek+i lies in the space (2.18) and
that span{po5-• • ,Pfc} = span{Aeo,..., Ak+leo}. Since e^i is A-orthogonal
to span{po> • • • ,Pk}> it follows that ek+i is the vector in the space (2.18) with
minimal A-norm, and it also follows that if the exact solution is not obtained
before step n, then en = 0.

2.4. Orthodir, MINRES, and GMRES.


Returning to the case of general matrices A, the idea of minimizing over a larger
subspace can be extended, at the price of having to save and orthogonalize
against additional vectors at each step. To minimize the 2-norm of the residual
Tfc+i over the j-dimensional affine space

This defines the Orthomin(j) procedure. Unfortunately, the algorithm can still
fail if 0 € f ( A f f ) , and again the only proven a priori bound on the residual
norm is estimate (2.9), although this bound is often pessimistic.
It turns out that the possibility of failure can be eliminated by replacing
rfc in formula (2.19) with Apk-i- This algorithm, known as Orthodir, generally
has worse convergence behavior than Orthomin for j < n, however. The bound
(2.9) can no longer be established because the space over which the norm of
rjb+i is minimized may not contain the vector Ark.
An exception is the case of Hermitian matrices, where it can be shown that
for j — 3, the Orthodir^') algorithm minimizes the 2-norm of the residual over
the entire affine space

This provides a reasonable implementation of the MINRES algorithm


described in section 2.3.

Given an initial guess XQ, compute TQ = b — AXQ and set po — TQ.


Compute SQ = APQ. For k = 1,2,...,
38 Iterative Methods for Solving Linear Systems

Set where
Compute
Set For

A difficulty with this algorithm is that in finite precision arithmetic, the vectors
Sfc, which are supposed to be equal to Apk, may differ from this if there is
much cancellation in the computation of Sk- This could be corrected with an
occasional extra matrix-vector multiplication to explicitly set Sk = Apk at the
end of an iteration. Another possible implementation is given in section 2.5.
For general non-Hermitian matrices, if j = n, then the Orthodir(n)
algorithm minimizes the 2-norm of the residual at each step A; over the affine
space in (2.20). It follows that the exact solution is obtained in n or fewer steps
(assuming exact arithmetic) but at the cost of storing up to n search directions
Pk (as well as auxiliary vectors Sfc = Apk) and orthogonalizing against k
direction vectors at each step k = 1,..., n. If the full n steps are required,
then Orthodir(n) requires O(ra2) storage and O(n3) work, just as would be
required by a standard dense Gaussian elimination routine. The power of the
method lies in the fact that at each step the residual norm is minimized over
the space (2.20) so that, hopefully, an acceptably good approximate solution
can be obtained in far fewer than n steps.
There is another way to compute the approximation x^ for which the norm
of rjt is minimized over the space (2.20). This method requires about half
the storage of Orthodir(n) (no auxiliary vectors) and has better numerical
properties. It is the GMRES method.
The GMRES method uses the modified Gram-Schmidt process to construct
an orthonormal basis for the Krylov space span{ro, Ar$,..., Akro}. When the
modified Gram-Schmidt process is applied to this space in the form given
below it is called Arnoldi's method.

Arnold! Algorithm.

If Qk is the n-by-fc matrix with the orthonormal basis vectors < / i , . . . , % as


columns, then the Arnoldi iteration can be written in matrix form as
Some Iteration Methods 39

Here Hk is the k-by-k upper Hessenberg matrix with (i, j^-element equal to
hij for j = 1,..., k, i — 1,..., min{j + !,£}, and all other elements zero. The
vector £fc is the fcth unit fc-vector, ( 0 , . . . , 0,1)T. The fc + 1-by-fc matrix Hk+\,k
is the matrix whose top k-by-k block is Hk and whose last row is zero except
for the (k + l,fc)-element, which is /ifc+i,fc- Pictorially, the matrix equation
(2.21) looks like

In the GMRES method, the approximate solution xk is taken to be of the


form xk = XQ + QkVk for some vector yk, that is, Xk is XQ plus some linear
combination of the orthonormal basis vectors for the Krylov space. To obtain
the approximation for which r^ = TO — AQf-yk has a minimal 2-norm, the vector
j/fc must solve the least squares problem

where 0 = \\ro\\, £i is the first unit (fc + l)-vector ( 1 , 0 . . . , 0)T, and the second
equality is obtained by using the fact that Qk+i£i, the first orthonormal basis
vector, is just ro//3.
The basic steps of the GMRES algorithm are as follows.

Given XQ, compute ro = b — Ax0 and set QI = ro/||r0||. For


k = 1,2,...,
Compute qk+i and h^ki i = 1,.. • , k+\ using the Arnoldi algorithm.
Form XK = XQ + QkVk, where yk is the solution to the least squares
problem min^ ||/3^ - Hk+^ky\\.

A standard method for solving the least squares problem min^ ||/3£i —
Hk+i,kV\\ is to factor the k+l-by-k matrix Hk+\,k into the product of a fc+l-by-
fc+1 unitary matrix FH and a k+l-by-k upper triangular matrix R (that is, the
top k-by-k block is upper triangular and the last row is 0). This factorization,
known as the QR factorization, can be accomplished using plane rotations.
The solution yk is then obtained by solving the upper triangular system

where Rkxk IS the top k-by-k block of R and (F£i)fc x i is the top k entries of
the first column of F.
40 Iterative Methods for Solving Linear Systems

Given the QR factorization of Hk+i,k, we would like to be able to compute


the QR factorization of the next matrix Hk+2,k+i with as little work as possible.
To see how this can be done, let Fi denote the rotation matrix that rotates the
unit vectors & and £j+i through the angle &:

where Q = cos(0j) and Sj = sin(^). The dimension of the matrix Fj, that is,
the size of the second identity block, will depend on the context in which it is
used. Assume that the rotations Fj, i = 1,..., k have previously been applied
to Hk+i,k so that

where the x's denote nonzeros. In order to obtain R^k+1\ the upper triangular
factor for Hk+2,k+i, nrst premultiply the last column of Hk+2,k+i by the
previous rotations to obtain

where the (k + 2, A; + l)-entry, h, is just hk+2,k+i, since this entry is unaffected


by the previous rotations. The next rotation, Fk+i, is chosen to eliminate
this entry by setting Ck+i = \d\/\/\d\2 + |/i|2, s^+i = ck+ih/d if d ^ 0, and
cfc+i = 0, sk+i = 1 if d = 0. Note that the (k + l)st diagonal entry of R(k+r>
is nonzero since h is nonzero (assuming the exact solution to the linear system
has not already been computed), and, if d 7^ 0, then this diagonal element is
(<f/jd|)v / |d| 2 + |/i|2 while if d = 0, the diagonal element is h.
The right-hand side in (2.22) is computed by applying each of the rotations
FI , . . . , Fk to the unit vector £1. The absolute value of the last entry of this
(k + 1)-vector, multiplied by /?, is the 2-norm of the residual at step k since

and 0F£i — Ry^ is zero except for its bottom entry, which is just the bottom
entry of (3F&.
Some Iteration Methods 41

The GMRES algorithm can be written in the following form.

Algorithm 3. Generalized Minimal Residual Algorithm (GMRES).

Given x0, compute r0 = 6 - AXQ and set <?i = ro/||r0||.


Initialize £ = (1,0,..., 0)r, j3 = \}r0\\. For k = 1,2,...,

Compute Qk+i and /i^fc = H(i, fc), i — 1,..., k + 1, using the Arnoldi algorithm.

Apply FI,..., Fk-i to the last column of H; that is,


For i = l , . . . , f c - l ,

Compute the fcth rotation, c^ and s/t, to annihilate the (k + 1, k) entry of /f.1

Apply fcth rotation to £ and to last column of H:

If residual norm estimate /3|^(fc + 1)| is sufficiently small, then


Solve upper triangular system Hkxk Vk = 0 Cfexi-
Compute a:fe = arp + Qfc?/^

The (full) GMRES algorithm described above may be impractical because


of increasing storage and work requirements, if the number of iterations needed
to solve the linear system is large. The GMRES(j) algorithm is defined by
simply restarting GMRES every j steps, using the latest iterate as the initial
guess for the next GMRES cycle. In Chapter 3, we discuss the convergence
rate of full and restarted GMRES.

2.5. Derivation of MINRES and CG from the Lanczos Algorithm.


When the matrix A is Hermitian, the Arnoldi algorithm of the previous section
can be simplified to a 3-term recurrence known as the Lanczos algorithm.
Slightly different (but mathematically equivalent) coefficient formulas are
normally used in the Hermitian case.

1
The formula is ck = \H(k,k)\/^/\H(k,fc)|2 + \H(k + l,fc)| 2 , sk = ckH(k+ l,k)/H(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
42 Iterative Methods for Solving Linear Systems

Lanczos Algorithm (for Hermitian matrices .4).

Given

To see that the vectors constructed by this algorithm are the same as those
constructed by the Arnoldi algorithm when the matrix A is Hermitian, we must
show that they form an orthonormal basis for the Krylov space formed from A
and qi. It is clear that the vectors lie in this Krylov space and each vector has
norm one because of the choice of the /?/s. From the formula for ay, it follows
that (qj+i,qj) = 0. Suppose {<&, qi} — 0 for i ^ k whenever k, i < j. Then

For i < j — 1, we have

Thus the vectors q\,..., qj+i form an orthonormal basis for the Krylov space
span{qi,Aqi,...,A3qi}.
The Lanczos algorithm can be written in matrix form as

where Qk is the n-by-fc matrix whose columns are the orthonormal basis
vectors qi,..., q^, £& is the fcth unit fc-vector, and T^ is the k-by-k Hermitian
tridiagonal matrix of recurrence coefficients:

The k + 1-by-fc matrix Tk+i,k nas 2fc as its upper k-by-k block and @k£% as its
last row.
It was shown in section 2.3 that the MINRES and CG algorithms generate
the Krylov space approximations Xk for which the 2-norm of the residual
and the >l-norm of the error, respectively, are minimal. That is, if q\ in the
Some Iteration Methods 43

Lanczos algorithm is taken to be TO//?, ft = ||ro||, then the algorithms generate


approximate solutions of the form Xk = £o + QkUk, where yk is chosen to
minimize the appropriate error norm. For the MINRES algorithm, y^ solves
the least squares problem

similar to the GMRES algorithm of the previous section.


For the MINRES algorithm, however, there is no need to save the
orthonormal basis vectors generated by the Lanczos algorithm. Hence a
different formula is needed to compute the approximate solution x^,- Let Rkxk
be the upper k-by-k block of the triangular factor R in the QR decomposition of
Tk+i,k — FHR, as described in the previous section. Since Tk+i,k is tridiagonal,
.Rfcxfc has only three nonzero diagonals. Define P^ = ( p o , . . .,pk-i) = QkRkxk-
Then po is a multiple of qi and successive columns of P^ can be computed
using the fact that PkRkxk = Qk'-

where bk_t is the (k — £ + 1, Ar)-entry of Rkxk- Recall from the arguments of


the previous section that bk^ is nonzero, provided that the exact solution to
the linear system has not been obtained already. The approximate solution x^
can be updated from x^-i since

where ak-i is the /cth entry of /3(F£i). This leads to the following implemen-
tation of the MINRES algorithm.
44 Iterative Methods for Solving Linear Systems

Algorithm 4. Minimal Residual Algorithm (MINRES)


(for Hermitian Problems).

Given XQ, compute ro = b — Ax$ and set qi = ro/||ro||.


Initialize^ (1,0,... ,0)T, /? = ||r0||. For k = 1,2,...,

Compute qk+i, ak = T(fc, k), and /3k = T(fc + 1, fc) = T(k, k + l)


using the Lanczos algorithm.

Apply Ffc_2 and Ffe_i to the last column of T; that is,

Compute the fcth rotation, ck and sfc, to annihilate the (fc + 1, fc)-entry of T.2

Apply fcth rotation to £ and to last column of T:

Compute
where underfind terms are zero

Set where

For the CG method, yk is chosen to make the residual rk orthogonal to the


columns of Qk- For positive definite matrices A, this minimizes the A-norm of
the error since ek = eo — QfcJ/fc has minimal A-norm when it is A-orthogonal
to the columns of Q^, i.e., when Q^Aek = Q^rk = 0. Note that the criterion
that rf. be orthogonal to the columns of Qk can be enforced for Hermitian
indefinite problems as well, although it does not correspond to minimizing any
obvious error norm. The vector yk for the CG algorithm satisfies

that is, yk is the solution to the k-by-k linear system T^y — (3£i- While the
least squares problem (2.25) always has a solution, the linear system (2.26) has
a unique solution if and only if Tk is nonsingular. When A is positive definite,
it follows from the minimax characterization of eigenvalues of a Hermitian
2
The formula is ck = \T(k,k)\/^\T(k,k)\* + \T(k + l,fc)| 2 , sk = ckT(k + l,k)/T(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
Some Iteration Methods 45

matrix that the tridiagonal matrix TI, = Q^AQk is also positive definite, since
its eigenvalues lie between the smallest and largest eigenvalues of A.
If 7fc is positive definite, and sometimes even if it is not, then one way to
solve (2.26) is to factor Tk in the form

where Lk is a unit lower bidiagonal matrix and Dk is a diagonal matrix. One


would like to be able to compute not only y^ but the approximate solution
£fc = ZQ + QkUk without saving all of the basis vectors q\,...,qk- The
factorization (2.27) can be updated easily from one step to the next, since
Lk and Dk are the k-by-k principal submatrices of Lfc+i and At+i- If we
define Pk = (po> • • • Pfc-i) = Qfc^fc i then the columns of Pk are A-orthogonal
since

and since Pk satisfies PkL% = Qk, the columns of Pk can be computed in


succession via the recurrence

where bk-i is the (k,k — l)-entry of Lfc. It is not difficult to see that the
columns of Pk are, to within constant factors, the direction vectors from the
CG algorithm of section 2.3. The Lanczos vectors are normalized versions of the
CG residuals, with opposite signs at every other step. With this factorization,
then, Xfc is given by

and since :rfc_i satisfies

it can be seen that Xk satisfies

where dk-i = d^l0(L^l)k,i and dk is the (k, fc)-entry of Dk. The coefficient
afc_i is defined, provided that Lk is invertible and dk ^ 0.
With this interpretation it can be seen that if the CG algorithm of section
2.3 were applied to a Hermitian indefinite matrix, then it would fail at step k
if and only if the LDLH factorization of Tk does not exist. If this factorization
exists for Ti,..., 2V_i, then it can fail to exist at step k only if Tk is singular.
For indefinite problems, it is possible that Tk will be singular, but subsequent
tridiagonal matrices, e.g., Tk+i, will be nonsingular. The CG algorithm of
section 2.3 cannot recover from a singular intermediate matrix Tk. To overcome
this difficulty, Paige and Saunders [111] proposed a CG-like algorithm based
46 Iterative Methods for Solving Linear Systems

on the LQ-factorization of the tridiagonal matrices. This algorithm is known as


SYMMLQ. In the SYMMLQ algorithm, the 2-norm of the error is minimized,
but over a different Krylov subspace. The CG iterates can be derived stably
from the SYMMLQ algorithm, but for Hermitian indefinite problems they do
not minimize any standard error or residual norm. We will show in Chapter
5, however, that the residuals in the CG and MINRES methods are closely
related.

Comments and Additional References.


For a discussion of simple iteration methods (e.g., Jacobi, Gauss-Seidel, SOR),
the classic books of Varga [135] and Young [144] are still highly recommended.
The CG algorithm was originally proposed by Hestenes and Stiefel [79] and
appeared in a related work at the same time by Lanczos [90]. The Orthomin
method described in this chapter was first introduced by Vinsome [137], and
the Orthodir algorithm was developed by Young and Jea [145]. Subsequently,
Saad and Schultz invented the GMRES algorithm [119]. Theorem 2.2.2 was
proved in the special case when the Hermitian part of the matrix is positive
definite by Eisenstat, Elman, and Schultz [40].
The MINRES implementation described in section 2.5 was given by Paige
and Saunders [111], as was the first identification of the CG algorithm with
the Lanczos process followed by LDL^-factorization of the tridiagonal matrix.
The idea of minimizing the 2-norm of the residual for Hermitian indefinite
problems was contained in the original Hestenes and Stiefel paper, implemented
in the form we have referred to here as Orthomin(2). This is sometimes
called the conjugate residual method. For an implementation that uses the
Orthomin(2) algorithm when it is safe to do so and switches to Orthodir(3) in
other circumstances, see Chandra [25].
A useful bibliography of work on the CG and Lanczos algorithms between
1948 and 1976 is contained in [58].

Exercises.
2.1. Use the Jordan form discussed in section 1.3.2 to describe the asymptotic
convergence of simple iteration for a nondiagonalizable matrix.

2.2. Show that if A and 6 are real and 0 G F(AH), then there is a real initial
vector for which the Orthomin iteration fails.

2.3. Give an example of a problem for which Orthomin(l) converges but


simple iteration (with no preconditioner) does not. Give an example
of a problem for which simple iteration converges but Orthomin(j) does
not for any j < n.

2.4. Verify that the coefficient formulas in (2.17) are equivalent to those in
Algorithm 2.
Some Iteration Methods 47

2.5. Show that for Hermitian matrices, the MINRES algorithm given in
section 2.4 minimizes ||r^+i|| over the space in (2.20).
2.6. Express the entries of the tridiagonal matrix generated by the Lanczos
algorithm in terms of the coefficients in the CG algorithm (Algorithm
2). (Hint: Write down the 3-term recurrence for qk+i = (—l)krk/\\rh\\ in
terms of q^ and Qk-i-)
2.7. Derive a necessary and sufficient condition for the convergence of the
restarted GMRES algorithm, GMRES(j), in terms of the generalized
field of values defined in section 1.3.3; that is, show that GMRES(j)
converges to the solution of the linear system for all initial vectors if and
only if the zero vector is not contained in the set

2.8. Show that for a normal matrix whose eigenvalues have real parts greater
than or equal to the imaginary parts in absolute value, the GMRES (2)
iteration converges for all initial vectors. (Hint: Since r^ = P2(A)ro
for a certain second-degree polynomial PI with /^(O) = 1 and since PI
minimizes the 2-norm of the residual over all such polynomials, we have
\\r-2\\ < minp2 ||p2(-<4)|| • ||ro||. If a second-degree polynomial p2 with value
1 at the origin can be found which satisfies ||p2(^4)|| < 1) then this will
show that each GMRES (2) cycle reduces the residual norm by at least a
constant factor. Since A is normal, its eigendecomposition can be written
in the form A = UA.UT, where U is unitary and A = diag(Ai,..., A n );
it follows that ||p2(^)|| = rnaxj=ii...:Tl |p2(Ai)|. Hence, find a polynomial
P2(z) that is strictly less than 1 in absolute value throughout a region

containing the spectrum of A.)


This page intentionally left blank
Chapter 3

Error Bounds for CG, MINRES, and GMRES

It was shown in Chapter 2 that the CG, MINRES, and GMRES algorithms
each generate the optimal approximate solution from a Krylov subspace, where
"optimal" is taken to mean having an error with minimal A-norm in the case
of CG or having a residual with minimal 2-norm in the case of MINRES and
GMRES. In this chapter we derive bounds on the appropriate error norm for
the optimal approximation from a Krylov subspace.
A goal is to derive a sharp upper bound on the reduction in the A-norm
of the error for CG or in the 2-norm of the residual for both MINRES and
GMRES—that is, an upper bound that is independent of the initial vector but
that is actually attained for certain initial vectors. This describes the worst-
case behavior of the algorithms (for a given matrix A). It can sometimes
be shown that the "typical" behavior of the algorithms is not much different
from the worst-case behavior. That is, if the initial vector is random, then
convergence may be only moderately faster than for the worst initial vector.
For certain special initial vectors, however, convergence may be much faster
than the worst-case analysis would suggest. Still, it is usually the same analysis
that enables one to identify these "special" initial vectors, and it is often clear
how the bounds must be modified to account for special properties of the initial
vector.
For normal matrices, a sharp upper bound on the appropriate error norm is
known. This is not the case for nonnormal matrices,and a number of possible
approaches to this problem are discussed in section 3.2.

3.1. Hermitian Problems—CG and MINRES.


It was shown in Chapter 2 that the A-norm of the error in the CG algorithm
for Hermitian positive definite problems and the 2-norm of the residual in the
MINRES algorithm for general Hermitian problems are minimized over the
spaces

respectively. It follows that the CG error vector and the MINRES residual
49
50 Iterative Methods for Solving Linear Systems

vector at step k can be written in the form

where P£ and P^ arefcth-degreepolynomials with value 1 at the origin and,


of all such polynomials that could be substituted in (3.1), P£ gives the error
of minimal A-norm in the CG algorithm and P^1 gives the residual of minimal
2-norm in the MINRES algorithm. In other words, the error e/t in the CG
approximation satisfies

and the residual r^ in the MINRES algorithm satisfies

where the minimum is taken over all polynomials pk of degree k or less with
Pfe(O) = I-
In this section we derive bounds on the expressions in the right-hand sides
of (3.2) and (3.3) that are independent of the direction of the initial error SQ
or residual r0, although they do depend on the size of these quantities. A
sharp upper bound is derived involving all of the eigenvalues of A, and then
simpler (but nonsharp) bounds are given based on knowledge of just a few of
the eigenvalues of A.
Let an eigendecomposition of A be written as A = UAUH, where U is a
unitary matrix and A = diag(Ai,..., An) is a diagonal matrix of eigenvalues.
If A is positive definite, define .A1/2 to be UA.l/2UH. Then the A-norm of a
vector v is just the 2-norm of the vector A1/2^. Equalities (3.2) and (3.3) imply
that

with the inequalities following, because if pk is the polynomial that minimizes


||pfc(A)||, then

for any vector w. Of course, the polynomial that minimizes the expressions in
the equalities of (3.4) and (3.5) is not necessarily the same one that minimizes
||pfc(A)|| in the inequalities. The MINRES and CG polynomials depend on the
initial vector, while this polynomial does not. Hence it is not immediately
obvious that the bounds in (3.4) and (3.5) are sharp, that is, that they can
Error Bounds for CG, MINRES, and GMRES 51

actually be attained for certain initial vectors. It turns out that this is the case,
however. See, for example, [63, 68, 85]. For each k there is an initial vector
CQ for which the CG polynomial at step k is the polynomial that minimizes
||pfc(A)|| and for which equality holds in (3.4). An analogous result holds for
MINRES.
The sharp upper bounds (3.4) and (3.5) can be written in the form

The problem of describing the convergence of these algorithms therefore


reduces to one in approximation theory—how well can one approximate zero
on the set of eigenvalues of A using a /rth-degree polynomial with value 1 at
the origin? While there is no simple expression for the maximum value of
the minimax polynomial on a discrete set of points, this minimax polynomial
can be calculated if the eigenvalues of A are known; more importantly, this
sharp upper bound provides intuition as to what constitutes "good" and "bad"
eigenvalue distributions. Eigenvalues tightly clustered around a single point
(away from the origin) are good, for instance, because the polynomial (1 — z/c)k
is small in absolute value at all points near c. Widely spread eigenvalues,
especially if they lie on both sides of the origin, are bad, because a low-degree
polynomial with value 1 at the origin cannot be small at a large number of
such points.
Since one usually has only limited information about the eigenvalues of
A, it is useful to have error bounds that involve only a few properties of the
eigenvalues. For example, in the CG algorithm for Hermitian positive definite
problems, knowing only the largest and smallest eigenvalues of A, one can
obtain an error bound by considering the minimax polynomial on the interval
from Xmin t° ^max-i i-6., the Chebyshev polynomial shifted to the interval and
scaled to have value 1 at the origin.
THEOREM 3.1.1. Let e^ be the error at step k of the CG algorithm applied
to the Hermitian positive definite linear system Ax — b. Then

where K = Xmax/Xmin is the ratio of the largest to smallest eigenvalue of A.


Proof. Consider the kih scaled and shifted Chebyshev polynomial on the
interval [Amin, Amaar]

where Tk(z) is the Chebyshev polynomial of the first kind on the interval [—1,1]
satisfying
52 Iterative Methods for Solving Linear Systems

In the interval [—1,1], we have Tfc(z) = cos(fccos~1(z)), so |Tfc(z)| < 1 and the
absolute value of the numerator in (3.9) is bounded by 1 for z in the interval
[Armn, ^max]- It attains this bound at the endpoints of the interval and at k — 1
interior points. To determine the size of the denominator in (3.9), note that
outside the interval [—1,1], we have

so if z is of the form z = cosh(lny) = ^(y+y" 1 ), then Tfc(z) = |(y fc +y~ fc ). The


argument in the denominator of (3.9) can be expressed in the form |(y + y"1)
if y satisfies

which is equivalent to the quadratic equation

Solving this equation, we find

In either case, the denominator in (3.9) has absolute value equal to

and from this the result (3.8) follows.


Knowing only the largest and smallest eigenvalues of a Hermitian positive
definite matrix A, bound (3.8) is the best possible. If the interior eigenvalues
of A lie at the points where the Chebyshev polynomial p^ in (3.9) attains its
maximum absolute value on [Amjn, Amaa:], then for a certain initial error BQ, the
CG polynomial will be equal to the Chebyshev polynomial, and the bound in
(3.8) will actually be attained at step k.
If additional information is available about the interior eigenvalues of A,
one can often improve on the estimate (3.8) while maintaining a simpler
expression than the sharp bound (3.6). Suppose, for example, that A has
one eigenvalue much larger than the others, say, AI < • • • < A n _i « A n , that
is, A n /A n _i » 1. Consider a polynomial p^ that is the product of a linear
factor that is zero at An and the (k — l)st-degree scaled and shifted Chebyshev
polynomial on the interval [Aj.,A n _i]:
Error Bounds for CG, MINRES, and GMRES 53

Since the second factor is zero at Xn and less than one in absolute value at
each of the other eigenvalues, the maximum absolute value of this polynomial
on {Ai, ..., An} is less than the maximum absolute value of the first factor on
{Ai,..., An_i}. Using arguments like those in Theorem 3.1.1, it follows that

Similarly, if the matrix A has just a few large outlying eigenvalues, say,
AI
n <
-t «• • • <A n
\ _£ + i < ••• < A n (i.e., A n _ ^ + i / A n _ £ » 1), one can
consider a polynomial pk that is the product of an ^th-degree factor that is
zero at each of the outliers (and less than one in magnitude at each of the other
eigenvalues) and a scaled and shifted Chebyshev polynomial of degree k — i on
the interval [Ai, An_^]. Bounding the size of this polynomial gives

Analogous results hold for the 2-norm of the residual in the MINRES
algorithm applied to a Hermitian positive definite linear system, and the proofs
are identical. For example, for any t > 0, we have

For Hermitian indefinite problems, a different polynomial must be consid-


ered. We derive only a simple estimate in the case when the eigenvalues of
A are contained in two intervals [a, b] \J[c,d\, where a<b<Q<c<d and
b — a = d — c. In this case, the fcth-degree polynomial with value 1 at the origin
that has minimal maximum deviation from 0 on [a, 6] (J[c, d\ is given by

where t = [|], [•] denotes the integer part, and Tf is the £th Chebyshev
polynomial. Note that the function q(z) maps each of the intervals [a, b]
and [c, d] to the interval [—1,1]. It follows that for z G [a, 6]U[ c >d]> the
absolute value of the numerator in (3.13) is bounded by 1. The size of the
denominator is determined in the same way as before: if q(Q) = \(y + y~l),
then Ti(q(Q)) = *(y* + y~^}- To determine y, we must solve the equation

or the quadratic equation


54 Iterative Methods for Solving Linear Systems

This equation has the solutions

It follows that the norm of the fcth MINRES residual is bounded by

In the special case when a = — d and b = — c (so the two intervals are placed
symmetrically about the origin), the bound in (3.14) becomes

This is the bound one would obtain at step [fe/2] for a Hermitian positive
definite matrix with condition number (d/cf\ It is as difficult to approximate
zero on two intervals situated symmetrically about the origin as it is to
approximate zero on a single interval lying on one side of the origin whose
ratio of largest to smallest point is equal to the square of that in the 2-
interval problem. Remember, however, that estimate (3.14) implies better
approximation properties for intervals not symmetrically placed about the
origin. For further discussion of approximation problems on two intervals,
see [50, sees. 3.3-3.4]

3.2. Non-Hermitian Problems—GMRES.


Like MINRES for Hermitian problems, the GMRES algorithm for general
linear systems produces a residual at step k whose 2-norm satisfies (3.3).
To derive a bound on the expression in (3.3) that is independent of the
direction of TO, we could proceed as in the previous section by employing an
eigendecomposition of A. To this end, assume that A is diagonalizable and
let A = VA.V~* be an eigendecomposition, where A = diag(Ai,... ,A n ) is a
diagonal matrix of eigenvalues and the columns of V are right eigenvectors of
A (normalized in any desired way). Then it follows from (3.3) that

where K(V) = \\V\\ • \\V-~1 || is the condition number of the eigenvector matrix V.
We will assume that the columns of V have been scaled to make this condition
number as small as possible. As in the Hermitian case, the polynomial that
minimizes ll^p^AJV^Vo | is not necessarily the one that minimizes ||pfc(A)||,
and it is not clear whether the bound in (3.16) is sharp. It turns out that if A is
a normal matrix (a diagonalizable matrix with a complete set of orthonormal
Error Bounds for CG, MINRES, and GMRES 55

eigenvectors), then n(V) = 1 and the bound in (3.16) is sharp [68, 85]. In
this case, as in the Hermitian case, the problem of describing the convergence
of GMRES reduces to a problem in approximation theory—how well can
one approximate zero on the set of complex eigenvalues using a fcth-degree
polynomial with value 1 at the origin? We do not have simple estimates, such
as that obtained in Theorem 3.1.1 based on the ratio of largest to smallest
eigenvalue, but one's intuition about good and bad eigenvalue distributions in
the complex plane still applies. Eigenvalues tightly clustered about a single
point (away from the origin) are good, since the polynomial (1 — z/c)k is
small at all points close to c in the complex plane. Eigenvalues all around the
origin are bad because (by the maximum principle) it is impossible to have a
polynomial that is 1 at the origin and less than 1 everywhere on some closed
curve around the origin. Similarly, a low-degree polynomial cannot be 1 at the
origin and small in absolute value at many points distributed all around the
origin.
If the matrix A is nonnormal but has a fairly well-conditioned eigenvector
matrix V, then the bound (3.16), while not necessarily sharp, gives a reasonable
estimate of the actual size of the residual. In this case again, it is A's eigenvalue
distribution that essentially determines the behavior of GMRES.
In general, however, the behavior of GMRES cannot be determined from
eigenvalues alone. In fact, it is shown in [72, 69] that any nonincreasing curve
represents a plot of residual norm versus iteration number for the GMRES
method applied to some problem; moreover, that problem can be taken to
have any desired eigenvalues. Thus, for example, eigenvalues tightly clustered
around 1 are not necessarily good for nonnormal matrices, as they are for
normal ones.
A simple way to see this is to consider a matrix A with the following
sparsity pattern:

where the *'s represent any values and the other entries are 0. If the initial
residual r$ is a multiple of the first unit vector £1 = (1,0,..., 0)T, then Ar$ is
a multiple of £n, A^TQ is a linear combination of £n and £n-i, etc. All vectors
A*TO, k = 1,..., re — 1 are orthogonal to TO, so the optimal approximation from
the space XQ-fspan{ro, ATQ, . . . , Ak~lro} is simply xfc = XQ for k = 1,..., n — 1;
i.e., GMRES makes no progress until step n\ Now, the class of matrices of the
56 Iterative Methods for Solving Linear Systems

form (3.17) includes, for example, all companion matrices:

The eigenvalues of this matrix are the roots of the polynomial zn — Y^J=Q cjz~*'>
and the coefficients C Q , . . . ,c n _i can be chosen to make this matrix have any
desired eigenvalues. If GMRES is applied to (3.17) with a different initial
residual, say, a random TO, then, while some progress will be made before step
n, it is likely that a significant residual component will remain, until that final
step.
Of course, one probably would not use the GMRES algorithm to solve a
linear system with the sparsity pattern of that in (3.17), but the same result
holds for any matrix that is unitarily similar to one of the form (3.17). Note
that (3.17) is simply a permuted lower triangular matrix. Every matrix is
unitarily similar to a lower triangular matrix, but, fortunately, most matrices
are not unitarily similar to one of the form (3.17)!
When the eigenvector matrix V is extremely ill-conditioned, the bound
(3.16) is less useful. It may be greater than 1 for all k < n, but we know
from other arguments that ||rfc||/||ro|| < 1 for all k. In such cases, it is not
clear whether GMRES converges poorly or whether the bound (3.16) is simply
a large overestimate of the actual residual norm. Attempts have been made
to delineate those cases in which GMRES actually does converge poorly from
those for which GMRES converges well and the bound (3.16) is just a large
overestimate.
Different bounds on the residual norm can be obtained based on the field
of values of A, provided 0 ^ F(A). For example, suppose ^(A) is contained in
a disk D = {z € C : \z — c| < s} which does not contain the origin. Consider
the polynomial Pk(z) = (1 — z/c}k. It follows from (1.16-1.17) that

and hence that v(I— (l/c)A) < s/\c\. The power inequality (1.22) implies that
j/((/ - (l/c)A)k) < (s/\c\)k and hence, by (1.21),

It follows that the GMRES residual norm satisfies

and this bound holds for the restarted GMRES algorithm GMRES (j) provided
that j > k. It is somewhat stronger than the bound (2.13) for GMRES(l)
Error Bounds for CG, MINRES, and GMRES 57

(which is the same as Orthomin(l)), because the factor 2 does not have to be
raised to the fcth power. Still, (3.18) sometimes gives a significant overestimate
of the actual GMRES residual norm. In many cases, a disk D may have to be
much larger than f(A) in order to include F(A).
Using the recent result (2.14), however, involving the Faber polynomials
for an arbitrary convex set S that contains T(A} and not the origin, one can
more closely fit J-'(A) while choosing S so that the Faber polynomials for S are
close to the minimax polynomials. For example, suppose F(A) is contained in
the ellipse

with foci 6 ± 7 an semi-axes |7|(s ± s~. Assume that 0 ^ E


Faber polynomial for Es is just the fcth Chebyshev polynomial of the first kind
translated to the interval [S — 7,8 + 7]. When this polynomial is normalized
to have value one at the origin, its maximum value on Es can be shown to be

where

and the branch of the square root is chosen so that K < 1. We assume here
that s < K~I. For further details, see [38, 39].
Inequality (2.14) still does not lead to a sharp bound on the residual
norm in most cases, and it can be applied only when 0 ^ ^(A). Another
approach to estimating ||p(^4)|| in terms of the size of p(z) in some region of
the complex plane has been suggested by Trefethen [129]. It is the idea of
pseudo-eigenvalues.
For any polynomial p, the matrix p(A) can be written as a Cauchy integral

where F is any simple closed curve or union of simple closed curves containing
the spectrum of A. Taking norms on each side in (3.19) and replacing the
norm of the integral by the length £(F) of the curve times the maximum norm
of the integrand gives

Now, if we consider a curve Fe on which the resolvent norm \\(zl — A}~1\\ is


constant, say, }\(zl — A)"1]! = e"1, then (3.20) implies
58 Iterative Methods for Solving Linear Systems

The curve on which \\(zl — A)~l\\ = e-1 is referred to as the boundary of the
e-pseudospectrum of A:

From (3.21) and the optimality of the GMRES approximation, it follows


that the GMRES residual r^ satisfies

for any choice of the parameter e. For certain problems, and with carefully
chosen values of e, the bound (3.22) may be much smaller than that in (3.16).
Still, the bound (3.22) is not sharp, and for some problems there is no choice
of e that yields a realistic estimate of the actual GMRES residual [72]. It
is easy to see where the main overestimate occurs. In going from (3.19) to
(3.20) and replacing the norm of the integral by the length of the curve times
the maximum norm of the integrand, one may lose important cancellation
properties of the integral.
Each of the inequalities (3.16), (3.18), and (3.22) provides bounds on the
GMRES residual by bounding the quantity minpfc ||pfc(A)||. Now, the worst-
case behavior of GMRES is given by

The polynomial p^ depends on TQ. Until recently, it was an open question


whether the right-hand side of (3.23) was equal to the quantity

It is known that the right-hand sides of (3.23) and (3.24) are equal if A is a
normal matrix or if the dimension of A is less than or equal to 3 or if k — 1,
and many numerical experiments have shown that these two quantities are
equal (to within the accuracy limits of the computation) for a wide variety of
matrices and values of k. Recently, however, it has been shown that the two
quantities may differ. Faber et al. [44] constructed an example in which the
right-hand side of (3.24) is 1, while that of (3.23) is .9995. Subsequently, Toh
[127] generated examples in which the ratio of the right-hand side of (3.24)
to that of (3.23) can be made arbitrarily large by varying a parameter in the
matrix. Thus neither of the approaches leading to inequalities (3.16) and (3.22)
can be expected to yield a sharp bound on the size of the GMRES residual,
and it remains an open problem to describe the convergence of GMRES in
terms of some simple characteristic properties of the coefficient matrix.

Exercises.
3.1. Suppose a positive definite matrix has a small, well-separated eigenvalue,
AI « A2 < • • • < An (that is, Ai/A2 « 1). Derive an error bound for
Error Bounds for CG, MINRES, and GMRES 59

CG or MINRES using the maximum value of a polynomial that is the


product of a linear factor that is 0 at AI and a (k — l)st-degree Chebyshev
polynomial on the interval [A2,A n ]. Is it more advantageous to have a
small, well-separated eigenvalue or a large, well-separated eigenvalue as
in (3.10)? (For the derivation of many other such error bounds, see [132].)

3.2. Consider the 4-by-4 matrix

This is the example devised by Toh [127] to demonstrate the difference


between expressions (3.23) and (3.24).

(a) Show that the polynomial of degree 3 or less with value one at the
origin that minimizes ||p(A)|| over all such polynomials is

and that ||p*(j4)|| = |, independent of e. (Hint: First show that


p*(z) must be even by using the uniqueness of pf and the fact that
AT is unitarily similar to — A via the matrix

which implies that \\p(—A)\\ = \\p(AT)\\ = \\p(A)\\ for any polyno-


mial p. Now consider polynomials p7 of the form 1 +^z2 for various
scalars 7. Determine the singular values of p-y(A) analytically to
show that

Differentiate with respect to 7 and set the derivative to zero to show


that 7 = | minimizes u
(b) Show that for any vector b with ||6|| = 1 there is a polynomial pb of
degree 3 or less with Pb(0) = 1 such that
60 Iterative Methods for Solving Linear Systems

(Hint: First note that if b = (&i, 62? &3> b^)T then


Chapter 4

Effects of Finite PrecisionArithmetic

In the previous chapter, error bounds were derived for the CG, MINRES,
and GMRES algorithms, using the fact that these methods find the optimal
approximation from a Krylov subspace. In the Arnoldi algorithm, on which
the GMRES method is based, all of the Krylov space basis vectors are retained,
and a new vector is formed by explicitly orthogonalizing against all previous
vectors using the modified Gram-Schmidt procedure. The modified Gram-
Schmidt procedure is known to yield nearly orthogonal vectors if the vectors
being orthogonalized are not too nearly linearly dependent. In the special
case where the vectors are almost linearly dependent, the modified Gram-
Schmidt procedure can be replaced by Householder transformations, at the cost
of some extra arithmetic [139]. In this case, one would expect the basis vectors
generated in the GMRES method to be almost orthogonal and the approximate
solution obtained to be nearly optimal, at least in the space spanned by these
vectors. For discussions of the effect of rounding errors on the GMRES method,
see [33, 70, 2].
This is not the case for the CG and MINRES algorithms, which use short
recurrences to generate orthogonal basis vectors for the Krylov subspace.
The proof of orthogonality, and hence of the optimality of the approximate
solution, relies on induction (e.g., Theorems 2.3.1 and 2.3.2 and the arguments
after the Lanczos algorithm in section 2.5), and such arguments may be
destroyed by the effects of finite precision arithmetic. In fact, the basis vectors
generated by the Lanczos algorithm (or the residual vectors generated by
the CG algorithm) in finite precision arithmetic frequently lose orthogonality
completely and may even become linearly dependent! In such cases, the
approximate solutions generated by the CG and MINRES algorithms are not
the optimal approximations from the Krylov subspace, and it is not clear that
any of the results from Chapter 3 should hold.
In this chapter we show why the nonorthogonal vectors generated by the
Lanczos algorithm can still be used effectively for solving linear systems and
which of the results from Chapter 3 can and cannot be expected to hold (to a
close approximation) in finite precision arithmetic. It is shown that for both
the MINRES and CG algorithms, the 2-norm of the residual is essentially

61
62 Iterative Methods for Solving Linear Systems

determined by the tridiagonal matrix produced in the finite precision Lanczos


computation. This tridiagonal matrix is, of course, quite different from the one
that would be produced in exact arithmetic. It follows, however, that if the
same tridiagonal matrix would be produced by the exact Lanczos algorithm
applied to some other problem, then exact arithmetic bounds on the residual
for that problem will hold for the finite precision computation. In order to
establish exact arithmetic bounds for the different problem, it is necessary to
have some information about the eigenvalues of the new coefficient matrix.
Here we make use of results already established in the literature about the
eigenvalues of the new coefficient matrix, but we do not include the proofs.
The analysis presented here is by no means a complete rounding-error
analysis of the algorithms given in Chapter 2. As anyone who has done a
rounding-error analysis knows, the arguments can quickly become complicated
and tedious. Here we attempt to present some of the more interesting aspects
of the error analysis for the CG and MINRES algorithms, without becoming
bogged down in the details. We consider a hypothetical implementation of
these algorithms for which the analysis is easier and refer to the literature for
arguments about the precise nature of the roundoff terms at each step.
This analysis deals with the rate at which the .A-norm of the error in the
CG algorithm and the 2-norm of the residual in the MINRES algorithm are
reduced before the ultimately attainable accuracy is achieved. A separate issue
is the level of accuracy that can be attained if the iteration is carried out for
sufficiently many steps. This question is discussed in section 7.3.

4.1. Some Numerical Examples.


To illustrate the numerical behavior of the CG algorithm of section 2.3,
we have applied this algorithm to linear systems Ax = b with coefficient
matrices of the form A = UhUH, where U is a random orthogonal matrix
and A = diag(Ai,..., A n ), where

and the parameter p is chosen between 0 and 1. For p = 1, the eigenvalues


are uniformly spaced, and for smaller values of p the eigenvalues are tightly
clustered at the lower end of the spectrum and are far apart at the upper
end. We set n = 24, AI = .001, and An — 1. A random right-hand side and
zero initial guess were used in all cases. Figure 4.la shows a plot of the A-
norm of the error versus the iteration number for p — .4, .6, .8,1. Experiments
were performed using double precision Institute of Electrical and Electronics
Engineers (IEEE) arithmetic, with machine precision e « l.le —16. Figure 4.1b
shows what these curves would look like if exact arithmetic had been used.
(Exact arithmetic can be simulated in the CG algorithm by saving all of the
basis vectors in the Lanczos algorithm and explicitly orthogonalizing against
them at every step. This is how the data for Figure 4.1b was produced.)
Effects of Finite Precision Arithmetic 63

FIG. 4.1. CG convergence curves for (a) finite precision arithmetic and (b) exact
arithmetic, p = 1 solid, p = .8 dash-dot, p = .6 dotted, p = A dashed.

Note that although the theory of Chapter 3 guarantees that the exact
solution is obtained after n = 24 steps, the computations with p = .6 and
p = .8 do not generate good approximate solutions by step 24. It is at about
step 31 that the error in the p = .8 computation begins to decrease rapidly.
The p — .4 computation has reduced the A-norm of the error to l.e -12 by step
24, but the corresponding exact arithmetic calculation would have reduced it
to this level after just 14 steps. In contrast, for p = 1 (the case with equally
spaced eigenvalues), the exact and finite precision computations behave very
similarly. In all cases, the finite precision computation eventually finds a good
approximate solution, but it is clear that estimates of the number of iterations
required to do so cannot be based on the error bounds of Chapter 3. In this
chapter we develop error bounds that hold in finite precision arithmetic.

4.2. The Lanczos Algorithm.


When the Lanczos algorithm is implemented in finite precision arithmetic, the
recurrence of section 2.5 is perturbed slightly. It is replaced by a recurrence
that can be written in matrix form as

where the columns of Fk represent the rounding errors at each step. Let e
denote the machine precision and define
64 Iterative Methods for Solving Linear Systems

where m is the maximum number of nonzeros in any row of A. Under the


assumptions that

and ignoring higher order terms in e, Paige [109] showed that the rounding
error matrix Fk satisfies

Paige also showed that the coefficient formulas in the Lanczos algorithm can
be implemented sufficiently accurately to ensure that

We will assume throughout that the inequalities (4.4) and hence (4.5-4.7) hold.
Although the individual roundoff terms are tiny, their effect on the
recurrence (4.2) may be great. The Lanczos vectors may lose orthogonality
and even become linearly dependent. The recurrence coefficients generated
in finite precision arithmetic may be quite different from those that would be
generated in exact arithmetic.

4.3. A Hypothetical MINRES/CG Implementation.


Although the computed Lanczos vectors may not be orthogonal, one might
still consider using them in the CG or MINRES algorithms for solving linear
systems; that is, one could still choose an approximate solution Xk of the form

where yk solves the least squares problem

for the MINRES method or the linear system

for the CG algorithm. Of course, in practice, one does not first compute the
Lanczos vectors and then apply formulas (4.8-4.10), since this would require
saving all of the Lanczos vectors. Still, it is reasonable to try and separate the
effects of roundoff on the three-term Lanczos recurrence from that on other
aspects of the (implicit) evaluation of (4.8-4.10). It is the effect of using
the nonorthogonal vectors produced by a finite precision Lanczos computation
that is analyzed here, so from here on we assume that formulas (4.8-4.10) hold
exactly, where Qk, Ifc, and Tk+i,k satisfy (4.2).
Effects of Finite Precision Arithmetic 65

The residual in the CG algorithm, which we denote here as r%, then satisfies

where y£ denotes the solution to (4.10). The 2-norm of the residual satisfies

Using (4.5) and (4.6), this becomes

It follows that at steps A:, where ^/k e\ \\A\\ \\y^\\/(3 is much smaller than
the residual norm, the 2-norm of the residual is essentially determined by the
tridiagonal matrix Tk and the next recurrence coefficient /3k.
The residual in the MINRES algorithm, which we denote here as r^,
satisfies

where y^ denotes the solution to (4.9). The 2-norm of the residual satisfies

It follows from (4.6) that

so, with (4.5), we have

It follows that at steps fc, where \fk e.\ \\A\\ \\y^\\//3 is tiny compared to the
residual norm, the 2-norm of the residual is essentially bounded, to within a
possible factor of v/F+T (which is usually an overestimate), by an expression
involving only the k + 1-by-A; tridiagonal matrix Tk+iik.
Thus, for both the MINRES and CG algorithms, the 2-norm of the residual
(or at least a realistic bound on the 2-norm of the residual) is essentially
determined by the recurrence coefficients computed in the finite precision
66 Iterative Methods for Solving Linear Systems

Lanczos computation and stored in the tridiagonal matrix Tfc+i^. Suppose


the exact Lanczos algorithm, applied to a matrix (or linear operator) A with
initial vector tpi, generates the same tridiagonal matrix Tk+i,k- It would follow
that the 2-norm of the residual rf.1 or r% in the finite precision computation
would be approximately the same as the 2-norm of the residual v^ or v% in
the exact MINRES or CG algorithm for solving the linear system (or operator
equation) AX — <p, with right-hand side y> — /3<pi; in this case, we would have

Compare with (4.12) and (4.14).


Note also that if T is any Hermitian tridiagonal matrix (even an infinite
one) whose upper left A; + 1-by-fc block is Tfc+ 1? fc, then the exact Lanczos
algorithm applied to T with initial vector £1 will generate the matrix Tk+\,k at
step k. This follows because the reduction of a Hermitian matrix to tridiagonal
form (with nonnegative off-diagonal entries) is uniquely determined once the
initial vector is set.
With this observation, we can now use results about the convergence of
the exact MINRES and CG algorithms applied to any such matrix T to derive
bounds on the residuals r£; and rj? in the finite precision computation. To do
this, one must have some information about the eigenvalues of such a matrix T.

4.4. A Matrix Completion Problem.


With the arguments of the previous section, the problem of bounding the
residual norm in finite precision CG and MINRES computations becomes
a matrix completion problem: given the k + 1-by-fe tridiagonal matrix
Tk+i,k generated by a finite precision Lanczos computation, find a Hermitian
tridiagonal matrix T with Tk+i,k as its upper left block,

whose eigenvalues are related to those of A in such a way that the exact
arithmetic error bounds of Chapter 3 yield useful results about the convergence
of the exact CG or MINRES algorithms applied to linear systems with
coefficient matrix T. In this section we state such results but refer the reader
to the literature for their proofs.
Effects of Finite Precision Arithmetic 67

4.4.1. Paige's Theorem. The following result of Paige [110] shows that
the eigenvalues of Tk+i, the tridiagonal matrix generated at step k + 1 of a
finite precision Lanczos computation, lie essentially between the largest and
smallest eigenvalues of A.
THEOREM 4.4.1 (Paige). The, eigenvalues 9\3', i = 1,..., j of the tridiag-
onal matrix Tj satisfy

where \i is the smallest eigenvalue of A, Xn is the largest eigenvalue of A, and


€Q and ei are defined in (4.3).
Using this result with the arguments of the previous section, we obtain the
following result for Hermitian positive definite matrices A.
THEOREM 4.4.2. Let A be a Hermitian positive definite matrix with
eigenvalues X\ < • • • < \n and assume that \i — (k + 1)5/2€2||^4|| > 0. Let r^f
and r% denote the residuals at step k of the MINRES and CG computations
satisfying (4.8) and (4.9) or (4.10), respectively, where Qk, T^, and Tfc+i^
satisfy (4.2). Then

where

and eo, ei, and 62 are defined in (4.3) and (4.15).


Proof. It follows from Theorem 4.4.1 that T^ is nonsingular, and since
y% — Tfrl(3£i, we have for the second term on the right-hand side of (4.12)

Since the expression l/^^Tj^1^ in (4.12) is the size of the residual at step k
of the exact CG algorithm applied to a linear system with coefficient matrix
Tfc+i and right-hand side £1 and since the eigenvalues of T/-+i satisfy (4.15), it
follows from (3.8) that

where k is given by (4.18). Here we have used the fact that since the expression
in (3.8) bounds the reduction in the Tfc+i-norm of the error in the exact
68 Iterative Methods for Solving Linear Systems

CG iterate, the reduction in the 2-norm of the residual for this exact CG
iterate is bounded by %/£ times the expression in (3.8); i.e., ||Su||/||Biu|| <
^/K(B} \\V\\B/\\W\\B for any vectors v and w and any positive definite matrix
B. Making these substitutions into (4.12) gives the desired result (4.16).
For the MINRES algorithm, it can be seen from the Cauchy interlace
theorem (Theorem 1.3.12) applied to Tk^-i^Tk+i k that the smallest singular
value of Tk+itk is greater than or equal to the smallest eigenvalue of 7^.
Consequently, we have

so, similar to the CG algorithm, the second term on the right-hand side of
(4.14) satisfies

Since the expression ||£i — Tk+i^l/k1 / P\\ m (4-14) is the size of the residual at
step k of the exact MINRES algorithm applied to the linear system T^+ix — £i>
where the eigenvalues of Tfe+i satisfy (4.15), it follows from (3.12) that

Making these substitutions into (4.14) gives the desired result (4.17).
Theorem 4.4.2 shows that, at least to a close approximation, the exact
arithmetic residual bounds based on the size of the Chebyshev polynomial
on the interval from the smallest to the largest eigenvalue of A hold in finite
precision arithmetic as well. Exact arithmetic bounds such as (3.11) and (3.12)
for i > 0, based on approximation on discrete, subsets of the eigenvalues of
A may fail, however, as may the sharp bounds (3.6) and (3.7). This was
illustrated in section 4.1. Still, stronger bounds than (4.16) and (4.17) may
hold in finite precision arithmetic, and such bounds are derived in the next
subsection.

4.4.2. A Different Matrix Completion. Paige's theorem about the


eigenvalues of Tk+i lying essentially between the smallest and largest eigen-
values of A is of little use in the case of indefinite A, since in that case, Tfc+i
could be singular. Moreover, we would like to find a completion T of Tfc+i^
whose eigenvalues can be more closely related to the discrete eigenvalues of A
in order to obtain finite precision analogues of the sharp error bounds (3.6)
and (3.7).
It was shown by Greenbaum that Tk+i,k can be extended to a larger
Hermitian tridiagonal matrix T whose eigenvalues all lie in tiny intervals
about the eigenvalues of A [65], the size of the intervals being a function of
the machine precision. Unfortunately, the proven bound on the interval size
appears to be a large overestimate. The bound on the interval size established
Effects of Finite Precision Arithmetic 69

in [65] involves a number of constants as well as a factor of the form n3fe2 \/e|| A||
or. in some cases, n3fce1//4||A||, but better bounds are believed possible.
Suppose the eigenvalues of such a matrix T have been shown to lie in
intervals of width 6 about the eigenvalues of A. One can then relate the size
of the residual at step A; of a finite precision computation to the maximum
value of the minimax polynomial on the union of tiny intervals containing the
eigenvalues of T, using the same types of arguments as given in Theorem 4.4.2.
THEOREM 4.4.3. Let A be a Hermitian matrix with eigenvalues AI <
• • • < \n and let Tk+i,k be the k + l-by-k tridiagonal matrix generated by a
finite precision Lanczos computation. Assume that there exists a Hermitian
tridiagonal matrix T, with T^+i^k as ^s upper left k + l-by-k block, whose
eigenvalues all lie in the intervals

where none of the intervals contains the origin. Let d denote the distance from
the origin to the set S. Then the MINRES residual r^f satisfies

If A is positive definite, then the CG residual r£ satisfies

Proof. Since the expression ||£i — Tk+i^yj^//3\\ in (4.14) is the size of the
residual at step k of the exact MINRES algorithm applied to the linear system
TX = £ii where the eigenvalues of T lie in S, it follows from (3.7) that

To bound the second term in (4.14), note that the approximate solution
generated at step k of this corresponding exact MINRES calculation is of the
form Xfe = QkUk*'/'0i where the columns of Qk are orthonormal and the vector
yM is the same one generated by the finite precision computation. It follows
that \\yj^11//? = \\Xk\\- Since the 2-norm of the residual decreases monotonically
in the exact algorithm, we have

Making these substitutions in (4.14) gives


70 Iterative Methods for Solving Linear Systems

from which the desired result (4.20) follows.


When A, and hence T, is positive definite, the expression \j3k£kT^l£i\ in
(4.12) is the size of the residual at step k of the exact CG algorithm applied
to the linear system TX = £1- It follows from (3.6) that

where the factor v^CO = v(^n + 6)/d must be included, since this gives a
bound on the 2-norm of the residual instead of the T-norm of the error. The
second term in (4.12) can be bounded as in Theorem 4.4.2. Since y% = TJ"1/??!
and since, by the Cauchy interlace theorem, the smallest eigenvalue of Tfc is
greater than or equal to that of T, we have

Making these substitutions in (4.12) gives the desired result (4.21).


Theorem 4.4.3 shows that, to a close approximation, the exact arithmetic
residual bounds based on the size of the minimax polynomial on the discrete
set of eigenvalues of A can be replaced, in finite precision arithmetic, by
the size of the minimax polynomial on the union of tiny intervals in (4.19).
Bounds such as (3.11) and (3.12) for i > 0 will not hold in finite precision
arithmetic. Instead, if A has a few large outlying eigenvalues, one must consider
a polynomial that is the product of one with enough roots in the outlying
intervals to ensure that it is tiny throughout these intervals, with a lower-degree
Chebyshev polynomial on the remainder of the spectrum. The maximum value
of this polynomial throughout the set S in (4.19) provides an error bound that
holds in finite precision arithmetic. It is still advantageous, in finite precision
arithmetic, to have most eigenvalues concentrated in a small interval with just
a few outliers (as opposed to having eigenvalues everywhere throughout the
larger interval), but the advantages are less than in exact arithmetic.
It is shown in [65] that not only the residual norm bound (4.21) but also
the corresponding bound on the A-uorm of the error,

holds to a close approximation in finite precision arithmetic. In Figure 4.2, this


error bound is plotted along with the actual A-norm of the error in a finite
precision computation with a random right-hand side and zero initial guess for
the case p = .6 described in section 4.1. The interval width 6 was taken to
be l.e — 15, or about lOe. For comparison, the sharp error bound for exact
arithmetic (3.6) is also shown in Figure 4.2. It is evident that the bound (4.22)
is applicable to the finite precision computation and that it gives a reasonable
estimate of the actual error when the initial residual is random.
Effects of Finite Precision Arithmetic 71

FlG. 4.2. Exact arithmetic error bound (dotted), finite precision arithmetic
error bound (assuming 6 = l.e — 15) (dashed), and actual error in a finite precision
computation (solid).

4.5. Orthogonal Polynomials.


The theorems of the previous section identified the behavior of the first k steps
of the CG and MINRES algorithms in finite precision arithmetic with that
of the first k steps of the exact algorithms applied to a different problem.
(That is, the tridiagonal matrices generated during the first k steps are the
same, and so the residual norms agree to within the factors given in the
theorems.) Of course, if the bound 8 on the interval size in Theorem 4.4.3 were
independent of k, then this would imply that the identity (between tridiagonal
matrices generated in finite precision arithmetic and those generated by the
exact algorithm applied to a linear operator with eigenvalues contained in
intervals of width d about the eigenvalues of A) would hold for arbitrarily
many steps. It is not known whether the assumption of Theorem 4.4.3 can be
satisfied for some small value of S that does not depend on k.
The analysis of section 4.4 is somewhat unusual in linear algebra. Normally,
the approximate solution generated by a finite precision computation is
identified with the exact solution of a nearby problem of the same dimension.
The matrix T in Theorem 4.4.3 can be of any dimension greater than or equal
to k+1. It represents a nearby problem only in the sense that its eigenvalues lie
close to those of A. The arguments of the previous sections have a somewhat
more natural interpretation in terms of orthogonal polynomials.
The 3-term recurrence of the Lanczos algorithm (in exact arithmetic)
implicitly constructs the orthonormal polynomials for a certain set of weights
on the eigenvalues of the matrix—the weights being the squared components
72 Iterative Methods for Solving Linear Systems

of the initial vector in the direction of each eigenvector of A. To see this, let
A = UhUH be an eigendecomposition of A and let qj = UHqj, where the
vectors qj, j = 1,2,..., are the Lanczos vectors generated by the algorithm in
section 2.5. Then, following the algorithm of section 2.5, we have

where

It follows that the iih component of QJ+I is equal to a certain jth-degree


polynomial, say, ij)j(z), evaluated at Aj, times the iih component of q\. The
polynomials ifrj(z), j = l,2,..., satisfy

where ^-1(2) = 0, ^o(z) = 1. If we define the u;-inner product of two


polynomials <j> and if} by

where qn is the iih component of <?i, then the coefficients in the Lanczos
algorithm are given by

where \\</>(z)\\w = (<f>(z), <(>(z))l/2.


Equation (4.24) with coefficient formulas (4.26-4.27) defines the orthonor-
mal polynomials for the measure corresponding to the w-inner product in
(4.25). It follows from the orthonormality of the Lanczos vectors that these
polynomials satisfy (ipj(z),-tpk(z)}w = bjk-
A perturbation vector fj in the Lanczos algorithm, due to finite precision
arithmetic, corresponds to a perturbation fj = UH fj of the same size in (4.23).
The finite precision analogue of recurrence (4.24) is

where £j(\i)qn = fij. If we imagine that the coefficient formulas (4.26-4.27)


hold exactly in finite precision arithmetic, where the functions ^ j ( z ) now
come from the perturbed recurrence (4.28), we still find that the intended
orthogonality relation (tl>j(z)^it(z))w = 6jk may fail completely. (It is
reasonable to assume that the coefficient formulas (4.26-4.27) hold exactly,
since they can be implemented very accurately and any differences between the
Effects of Finite Precision Arithmetic 73

exact formulas and the computed values can be included in the perturbation
term Cj(z).)
It is possible that some coefficient /3j in a finite precision Lanczos
computation will be exactly 0 and that the recurrence will terminate, but this
is unlikely. If 0j is not 0, then it is positive because of formula (4.27). It follows
from a theorem due to Favard [48] that the recurrence coefficients constructed
in a finite precision Lanczos computation are the exact recurrence coefficients
for the orthonormal polynomials corresponding to some nonnegative measure.
That is, if we define p-i(z) = 0, po(z) = 1, and

for j = 1,2,..., where otj and /3j are defined by (4.26-4.28), then we have the
following theorem.
THEOREM 4.5.1 (Favard). If the coefficients (3j in (4.29) are all positive
and the QJ 's are real, then there is a measure du>(z) such that

for all j, k = 0,1,..., oo.


The measure dw(z) in Favard's theorem is (substantially) uniquely deter-
mined, whereas there are infinitely many measures for which the first k poly-
nomials po, • • • j p f c - i are orthonormal. One such measure—a measure with
weights on the eigenvalues of Tk+i, the weights being the squared first com-
ponents of each eigenvector of T^+i—was given in section 4.4.1, and another
such measure—a measure with weights on points in tiny intervals about the
eigenvalues of A—was given in section 4.4.2. It was also shown in [65] that the
weight on each interval is approximately equal to the original weight on the
corresponding eigenvalue of A; that is, the squared component of q\. Thus,
the matrix completion result of section 4.4.2 can also be stated in the following
way: when one attempts to construct the first k orthonormal polynomials for
a measure corresponding to weights on discrete points using the Lanczos algo-
rithm, what one actually obtains are the first k orthonormal polynomials for a
slightly different measure—one in which the weights are smeared out over tiny
intervals about the original points. Exactly how the weights are distributed
over these intervals depends on exactly what rounding errors occur (not just
on their size).
It remains an open question whether the measure defined by Favard's
theorem has its support in such tiny intervals (i.e., whether 8 in Theorem
4.4.3 can be taken to be small and independent of k). If this is not the case,
it might still be possible to show that the measure in Favard's theorem is tiny
everywhere outside such intervals.

Comments and Additional References.


It should come as no surprise that the Lanczos vectors and tridiagonal matrix
can be used for many purposes besides solving linear systems. For example,
74 Iterative Methods for Solving Linear Systems

the eigenvalues of Tfc can be taken as approximations to some of the eigenvalues


of A. It is given as an exercise to show that the orthogonal polynomials
defined in section 4.5 are the characteristic polynomials of the successive
tridiagonal matrices generated by the Lanczos algorithm. This interpretation
enables one to use known properties of the roots of orthogonal polynomials
to describe the eigenvalue approximations. In finite precision arithmetic, the
fact that the polynomials (or at least a finite sequence of these polynomials)
are orthogonal with respect to a slightly smeared-out version of the original
measure helps to explain the nature of eigenvalue approximations generated
during a finite precision Lanczos computation. Depending on how the tiny
intervals of Theorem 4.4.3 are distributed, the corresponding orthogonal
polynomials might have several roots in some of the intervals before having
any roots in some of the others. This is usually the case with an interval
corresponding to a large well-separated eigenvalue. This explains the observed
phenomenon of multiple close approximations to some eigenvalues appearing
in finite precision Lanczos computations before any approximations to some of
the other eigenvalues appear.
The Lanczos vectors and tridiagonal matrix can also be used very effectively
to compute the matrix exponential exp(tA)ip, which is the solution at time t
to the system of differential equations y' = Ay, y(0) — <f>. Similar arguments
to those used here show why the nonorthogonal vectors generated by a finite
precision Lanczos computation can still be used effectively for this purpose
[34]. For a number of other applications, including discussions of the effects of
finite precision arithmetic, see, for example, [35, 60].
The effect of rounding errors on the CG algorithm has been a subject
of concern since the algorithm was first introduced in 1952 by Hestenes and
Stiefel [79]. It was recognized at that time that the algorithm did not always
behave the way exact arithmetic theory predicted. For example, Engeli et
al. [43] applied the CG method (without a preconditioner) to the biharmonic
equation and observed that convergence did not occur until well after step n.
For this and other reasons, the algorithm did not gain widespread popularity
at that time.
With the idea of preconditioning in the CG method, interest in this
algorithm was revived in the early 1970's [115, 27], and it quickly became
the method of choice for computations involving large Hermitian positive
definite matrices. Whatever the effect of roundoff, it was observed that the
method performed very well in comparison to other iterative methods. Further
attempts were made to explain the success of the method, mostly using the
interpretation given in section 2.3 that the algorithm minimizes the A-norm
of the error in a plane that includes the direction of steepest descent. Using
this argument, Wozniakowski [143] showed that a special version of the CG
algorithm does, indeed, reduce the yl-norm of the error at each step by at least
as much as a steepest descent step, even in finite precision arithmetic. Cullum
and Willoughby [30] proved a similar result for a more standard version of the
Effects of Finite Precision Arithmetic 75

algorithm. Still, a more global approach was needed to explain why the CG
algorithm converges so much faster than the method of steepest descent; e.g.,
it converges at least as fast as the Chebyshev algorithm. Paige's work on the
Lanczos algorithm [109] provided a key in this direction. A number of analyses
were developed to explain the behavior of the CG algorithm using information
from the entire computation (i.e., the matrix equation (2.23)), instead of just
one or two steps (e.g., [35. 62, 65, 121]). The analogy developed in this chapter,
identifying the finite precision computation with the exact algorithm applied to
a different matrix, appears to be very effective in explaining and predicting the
behavior of the CG algorithm in finite precision arithmetic [71]. The numerical
examples presented in section 4.1 were first presented in [126].

Exercises.
4.1. Show that the orthonormal polynomials defined by (4.24) are the
characteristic polynomials of the tridiagonal matrices generated by the
Lanczos algorithm.
4.2. How must the error bound you derived in Exercise 3.1 for a matrix
with a small, well-separated eigenvalue be modified for finite precision
arithmetic? Does the finite precision error bound differ more from that
of exact arithmetic in the case when a positive definite coefficient matrix
has one eigenvalue much smaller than the others or in the case when it
has one eigenvalue much larger than the others? (This comparison can
be used to explain why one preconditioner might be considered better
based on exact arithmetic theory, but a different preconditioner might
perform better in actual computations. See [133] for a comparison of
incomplete Cholesky and modified incomplete Cholesky decompositions,
which will be discussed in Chapter 11.)
This page intentionally left blank
Chapter 5

BiCG and Related Methods

Since the GMRES method for non-Hermitian problems requires increasing


amounts of work and storage per iteration, it is important to consider other
methods with a fixed amount of work and storage, even though they will
require more iterations to reduce the 2-norm of the residual to a given level.
Several such methods have already been presented, e.g., simple iteration,
Orthomin(j), and GMRES(j). All have the possibility of failure: simple
iteration may diverge, Orthomin(j) may encounter an undefined coefficient,
and both Orthomin(j) and GMRES(j) may stagnate (cease to reduce the
residual norm).
In this chapter we consider several other iteration methods that, in practice,
have often been found to perform better than the previously listed algorithms.
These algorithms also have the possibility of failure, although that can be
alleviated through the use of look-ahead. With look-ahead, however, the
methods no longer require a fixed amount of work and storage per iteration.
The work and storage grows with the number of look-ahead steps, just as it
grows in GMRES. Unfortunately, there are no a priori theoretical estimates
comparing the error at each step of these methods to that of the optimal
GMRES approximation, unless an unlimited number of look-ahead steps are
allowed. This problem is discussed further in Chapter 6.

5.1. The Two-Sided Lanczos Algorithm.


When the matrix A is Hermitian, the Gram-Schmidt procedure for con-
structing an orthonormal basis for the Krylov space of A reduces to a
3-term recurrence. Unfortunately, this is not the case when A is non-
Hermitian. One can, however, use a pair of 3-term recurrences, one involv-
ing A and the other involving AH, to construct biorthogonal bases for the
Krylov spaces corresponding to A and AH. Let K-k(B,v} denote the Krylov
space span{u, Bv,..., Bk~lv}. Then one constructs two sets of vectors—
vi,...,vk E K.k(A,r0) and wi,...,wk € JCk(AH,r0)—such that (vi,Wj) = 0
for i 7^ j. This procedure is called the two-sided Lanczos algorithm.

77
78 Iterative Methods for Solving Linear Systems

Two-Sided Lanczos Algorithm (without loo

Given r0 and r0 with (r 0 ,r 0 ) ^ 0, set vi = r0/||r0|| and wi = r0/(r0,t;i).


Set /3o = 70 = 0 and VQ = wo = 0. For j = 1,2,...,

Compute Aiij and ^4H«;j.

Set «j = (AVJ,WJ).

Compute

Set

Set

Here we have given the non-Hermitian Lanczos formulation that scales so that
each basis vector Vj has norm 1 and (wj,Vj) = 1. The scaling of the basis
vectors can be chosen differently. Another formulation of the algorithm uses
the ordinary transpose AT, instead of AH.
Letting Vjt be the matrix with columns v\,..., Vk and Wk be the matrix
with columns i u i , . . . , w^, this pair of recurrences can be written in matrix form
as

where T^ is the k-by-k tridiagonal matrix of recurrence coefficients

The k -f 1-by-fc matrices Tk+i,k and Tfc+i^ have Tfc and T^, respectively, as
their top k-by-k blocks, and their last rows consist of zeros except for the last
entry, which is 7^ and /3/t, respectively. The biorthogonality condition implies
that

Note that if A = AH and TO = TO, then the two-sided Lanczos recurrence


reduces to the ordinary Hermitian Lanczos process.
THEOREM 5.1.1. // the two-sided Lanczos vectors are defined at steps
BiCG and Related Methods 79

Proof. Assume that (5.4) holds for i, j < k. The choice of the coefficients
f3j and 7,- assures that for all j, (wj,Vj) — I and ||v,-|| = 1. By construction of
the coefficient a^, we have, using the induction hypothesis,

Using the recurrences for Vk+i and Wk along with the induction hypothesis, we
have

and, similarly, it follows that (wk+i,Vk-i) = 0. Finally, for j < k — 1, we have

and, similarly, it is seen that (wf.+i,Vj) = 0. Since Vk+i and w^+i are just
multiples of Vk+i and u)fc+i, the result (5.4) is proved.
The vectors generated by the two-sided Lanczos process can become
undefined in two different situations. First, if Vj+i = 0 or ibj+i — 0, then
the Lanczos algorithm has found an invariant subspace. If Vj+i = 0, then the
right Lanczos vectors vi,... ,Vj form an A-invariant subspace. If Wj+i = 0,
then the left Lanczos vectors wi,...,Wj form an A^-invariant subspace. This
is referred to as regular termination.
The second case, referred to as serious breakdown, occurs when
(vj+i,Wj+i) = 0 but neither Vj+i = 0 nor Wj+i = 0. In this case, nonzero
vectors Vj+i € K,j+\(A,TO) and Wj+\ € JCj+i(AH,ro) satisfying (vj+i,Wi) =
(wj+i,Vi) = 0 for all i < j simply do not exist. Note, however, that while
such vectors may not exist at step j' + 1, at some later step j '+ £ there may
be nonzero vectors Vj+t 6 ICj+i(A,ro) and Wj+f € )Cj+i(AH, fo) such that
Vj+f is orthogonal to K.j+i-\(AH,?$) and Wj+t is orthogonal to ICj+i-i(A,ro).
Procedures that simply skip steps at which the Lanczos vectors are undefined
and construct the Lanczos vectors for the steps at which they are defined are
referred to as look-ahead Lanczos methods. We will not discuss look-ahead
Lanczos methods here but refer the reader to [101, 113, 20] for details.

5.2. The Biconjugate Gradient Algorithm.


Let us assume for the moment that the Lanczos recurrence does not break
down. (From here on, we will refer to the two-sided Lanczos algorithm as
simply the Lanczos algorithm, since it is the only Lanczos algorithm for non-
Hermitian matrices.) Then the basis vectors might be used to approximate the
solution of a linear system, as was done in the Hermitian case. If Xk is taken
to be of the form
80 Iterative Methods for Solving Linear Systems

then there are several natural ways to choose the vector yk. One choice is
to force rk = TO — AVkyk to be orthogonal to w\,...,wk. This leads to the
equation

It follows from (5.1) and the biorthogonality condition (5.3) that W^AVk =
Tk and that W^TQ = /3£i, /3 = \\TO\\, so the equation for yk becomes

When A is Hermitian and ?Q = TO, this reduces to the CG algorithm, as was


described in section 2.5. If Tk is singular, this equation may have no solution.
In this case, there is no approximation Xk of the form (5.5) for which W^rk = 0.
Note that this type of failure is different from the possible breakdown of the
underlying Lanczos recurrence. The Lanczos vectors may be well defined, but if
the tridiagonal matrix is singular or near singular, an algorithm that attempts
to solve this linear system will have difficulty. An algorithm that attempts to
generate approximations of the form (5.5), where yk satisfies (5.6), is called
the biconjugate gradient (BiCG) algorithm.
The BiCG algorithm can be derived from the non-Hermitian Lanczos
process and the L£>l7-factorization of the tridiagonal matrix Tk in much
the same way that the CG algorithm was derived from the Hermitian
Lanczos process in section 2.5. As for CG, failure of the BiCG method
occurs when a singular tridiagonal matrix is encountered, and, with the
standard implementation of the algorithm, one cannot recover from a singular
tridiagonal matrix at one step, even if later tridiagonal matrices are well
conditioned. We will not carry out this derivation since it is essentially the
same as that in section 2.5 but will simply state the algorithm as follows.

Biconjugate Gradient Algorithm (BiCG).


Given XQ, compute TQ = b — AXQ, and set po = TQ. Choose fo such
that {ro, TO) 7^ 0, and set po = ^o- For k = 1,2,...
Set where

Compute
Set wherer

A better implementation can be derived from that of the QMR algorithm


described in section 5.3.

5.3. The Quasi-Minimal Residual Algorithm.


In the quasi-minimal residual (QMR) algorithm, the approximate solution xk is
again taken to be of the form (5.5), but now yk is chosen to minimize a quantity
BiCG and Related Methods 81

that is closely related to the 2-norm of the residual. Since r^ = TO — AY^y^,


we can write

so the norm of r^ satisfies

Since the columns of V^+\ are not orthogonal, it would be difficult to choose y^
to minimize ||ffc||, but y^ can easily be chosen to minimize the second factor in
(5.8). Since the columns of Vf.+i each have norm one, the first factor in (5.8)
satisfies ||Vfc+i|| < ^/k + 1. In the QMR method, y^ solves the least squares
problem

which always has a solution, even if the tridiagonal matrix Tf. is singular. Thus
the QMR iterates are defined provided that the underlying Lanczos recurrence
does not break down.
The norm of the QMR residual can be related to that of the optimal
GMRES residual as follows.
THEOREM 5.3.1 (Nachtigal [101]). Ifr% denotes the GMRES residual at
step k and r^ denotes the QMR residual at step k, then

where V^i is the matrix of basis vectors for the space K,k+i(A, T-Q) constructed
by the Lanczos algorithm and K(-) denotes the condition number.
Proof. The GMRES residual is also of the form (5.7), but the vector yj? is
chosen to minimize the 2-norm of the GMRES residual. It follows that

where amin(Vk+i) is the smallest singular value. Combining this with inequality
(5.8) for the QMR residual gives the desired result (5.10).
Unfortunately, the condition number of the basis vectors 14+1 produced
by the non-Hermitian Lanczos algorithm cannot be bounded a priori. This
matrix may be ill conditioned, even if the Lanczos vectors are well defined. If
one could devise a short recurrence that would generate well-conditioned basis
vectors, then one could use the quasi-minimization strategy (5.9) to solve the
problem addressed in Chapter 6.
The actual implementation of the QMR algorithm, without saving all of
the Lanczos vectors, is similar to that of the MINRES algorithm described in
section 2.5. The least squares problem (5.9) is solved by factoring the k + 1-
by-k matrix Tk+i,k into the product of a k + l-by-k + 1 unitary matrix FH
82 Iterative Methods for Solving Linear Systems

and a k + 1-by-fc upper triangular matrix R. This is accomplished by using


k Givens rotations Fi,.,.,Fk, where Fj rotates the unit vectors & and &+i
through angle 0{. Since Tk+i,k is tridiagonal, R has the form

The QR decomposition of Tk+i,k is easily updated from that of Tk,k-i- To


obtain R, first premultiply the last column of Tk+itk by the rotations from
steps k — 2 and k — 1 to obtain a matrix of the form

where the x's denote nonzeros and where the (fc + l,fc)-entry, /i, is just 7^,
since this entry is unaffected by the previous rotations. The next rotation, Fjt,
is chosen to annihilate this entry by setting Ck — \d\/VMP + N 2 5 ^fc = Ckh/d
if d 7^ 0, and Ck = 0, Sfc = 1 if d = 0. To solve the least squares problem,
the successive rotations are also applied to the right-hand side vector /7£i to
obtain g — Fk • • • Fi/3£i. Clearly, g differs from the corresponding vector at
step k — 1 only in positions k and k + 1. If Rkxk denotes the top k-by-k block
of R and gkxi denotes the first k entries of g, then the solution to the least
squares problem is the solution of the triangular linear system

In order to update the iterates Xk, we define auxiliary vectors

Then since

and
BiCG and Related Methods 83

we can write

where a^_i is the fcth entry of g. Finally, from the equation -Pfc-Rfcxfc = ^L we
can update the auxiliary vectors using

This leads to the following implementation of the QMR algorithm.

Algorithm 5. Quasi-Minimal Residual Method (QMR)


(without look-ahead).

Given XQ, compute r0 = b - Ax0 and set v\ = r0/||ro||.


Given f 0 , set w\ = r0/||fo||. Initialize £ = (1,0,... ,0) r , /? = lko||-
For k = 1,2,...,

Compute Vk+i, u>k+i, a


using the two-sided Lanczos algorithm.

Apply Ffc_2 and Fk~\ to the last column of T; that is,

Compute the fcth rotation Cfc and s^, to annihilate the (A; + 1, k) entry of T.1

Apply fcth rotation to ^ and to last, column of T:

Compute
where underfined terms are zero

^he formula is cfe = ITffc.fcJI/^/ITfJk,*:)! 2 + |T(fc+ l,fc)| 2 , sfc = c fc T(fc + l,k)/T(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
84 Iterative Methods for Solving Linear Systems

5.4. Relation Between BiCG and QMR.


The observant reader may have noted that the solutions of the linear system
(5.6) and the least squares problem (5.9) are closely related. Hence one might
expect a close relationship between the residual norms in the BiCG and QMR
algorithms. Here we establish such a relationship, assuming that the Lanczos
vectors are well defined and that the tridiagonal matrix in (5.6) is nonsingular.
We begin with a general theorem about the relationship between upper
Hessenberg linear systems and least squares problems. Let Hk, k — 1,2,...,
denote a family of upper Hessenberg matrices, where Hk is k-by-k and Hk-i is
the k — 1-by-fe — 1 principal submatrix of Hk. For each fc, define the k + 1-by-fc
matrix Hk+i,k by

The matrix Hk+i,k can be factored in the form FHR, where F is a k + 1-


by-fc + 1 unitary matrix and R is a k + 1 by k upper triangular matrix. This
factorization can be performed using plane rotations in the manner described
for the GMRES algorithm in section 2.4:

Note that the first k — 1 sines and cosines Sj, Cj, i — 1,..., k — 1, are those used
in the factorization of H^k-i-
Let ft > 0 be given and assume that Hk is nonsingular. Let y^ denote the
solution of the linear system H^y = /3£i, and let y& denote the solution of the
least squares problem miny Hflfc+i^y — /3£i||. Finally, let

LEMMA 5.4.1. Using the above notation, the norms ojv^ and v^ are related
to the sines and cosines of the Givens rotations by

It follows that

Proof. The least squares problem with the extended Hessenberg matrix
Hk+i,k can be written in the form
BiCG and Related Methods 85

and the solution yk is determined by solving the upper triangular linear


system with coefficient matrix equal to the top k-by-k block of R and right-
hand side equal to the first k entries of (5F^\. The remainder Ry^ — (3Ft;\
is therefore zero except for the last entry, which is just the last entry of
—(3F£i = — (3(Fk • • --Fi)£i, which is easily seen to be -(3s\---Sk. This
establishes the first equality in (5.13).
For the linear system solution yk — H^fl^i, we have

which is zero except for the last entry, which is flhk+i,k times the (fc, l)-entry
of H^1. Now Hk can be factored in the form FHR, where F = F^~\ • • • FI and
Fi is the k-by-k principal submatrix of Fi. The matrix Hk+i,k, after applying
the first k — I plane rotations, has the form

where r is the (fc, fc)-entry of R and h = hk+i,fc- The fcth rotation is chosen to
annihilate the nonzero entry in the last row:

Note that r and Cfc are nonzero since Hk is nonsingular.


We can write H^1 = R~1F, and the (fc, l)-entry of this is 1/r times the
(k, l)-entry of F — Fk-i • • • FI, and this is just Si • • • s^-i- It follows that
the nonzero entry of i>k is /?(/ifc+i,fc/ r ) s i''' s fc-i- Finally, using the fact that
\Sk/ck\ — \h/r\ = \hk+i,k/r\i we obtain the second equality in (5.13).
From (5.13) it is clear that

The result (5.14) follows upon replacing \Ck\ by ^/l ~ |sfc| 2 .


An immediate consequence of this lemma is the following relationship
between the BiCG residual r^ and the quantity

which is related to the residual rjj? in the QMR algorithm:

We will refer to z® as the QMR quasi-residual—the vector whose norm is


actually minimized in the QMR algorithm.
86 Iterative Methods for Solving Linear Systems

TABLE 5.1
Relation between QMR quasi-residual norm reduction and ratio of BiCG residual
norm to QMR quasi-residual norm.

THEOREM 5.4.1. Assume that the Lanczos vectors at steps 1 through k are
defined and that the tridiagonal matrix generated by the Lanczos algorithm at
step k is nonsingular. Then the BiCG residual rj? and the QMR quasi-residual
z£ are related by

Proof. From (5.1), (5.5), and (5.6), it follows that the BiCG residual can
be written in the form

The quantity in parentheses has only one nonzero entry (in its (k + l)st
position), and since ||ffc+i|| = 1, we have

The desired result now follows from Lemma 5.4.1 and the definition (5.15) of

In most cases, the quasi-residual norms and the actual residual norms in
the QMR algorithm are of the same order of magnitude. Inequality (5.16)
shows that the latter can exceed the former by at most a factor of \/k -f 1, and
a bound in the other direction is given by

where crmjn denotes the smallest singular value. While it is possible that
<7min(Vk+i) is very small (especially in finite precision arithmetic), it is unlikely
that \\rjf\\ would be much smaller than \\zj~\\. The vector yj* is chosen to satisfy
the least squares problem (5.9), without regard to the matrix Vk+i.
Theorem 5.4.1 shows that if the QMR quasi-residual norm is reduced by a
significant factor at step A;, then the BiCG residual norm will be approximately
BiCG and Related Methods 87

FIG. 5.1. BiCG residual norms (dashed), QMR residual norms (dotted), and
QMR quasi-residual norms (solid).

equal to the QMR quasi-residual norm at step fc, since the denominator in the
right-hand side of (5.17) will be close to 1. If the QMR quasi-residual norm
remains almost constant, however, then the denominator in the right-hand side
of (5.17) will be close to 0, and the BiCG residual norm will be much larger.
Table 5.1 shows the relation between the QMR quasi-residual norm reduction
and the ratio of BiCG residual norm to QMR quasi-residual norm. Note that
the QMR quasi-residual norm must be very flat before the BiCG residual norm
is orders-of-magnitude larger.
Figure 5.1 shows a plot of the logarithms of the norms of the BiCG
residuals (dashed line), the QMR residuals (dotted line), and the QMR quasi-
residuals (solid line) versus iteration ntimber for a simple example problem.
The matrix A, a real 103-by-103 matrix, was taken to have 50 pairs of complex
conjugate eigenvalues, randomly distributed in the rectangle [1,2] x [—z,z],
and 3 additional real eigenvalues at 4, .5, and —1. A random matrix V was
generated and A was set equal to VDV~l, where D is a block-diagonal matrix
with 3 1-by-l blocks corresponding to the separated real eigenvalues of A and
50 2-by-2 blocks of the form

corresponding to the pairs of eigenvalues a ± ib.


In this example, the QMR residual and quasi-residual norm curves are
barely distinguishable. As predicted by Theorem 5.4.1, peaks in the BiCG
88 Iterative Methods for Solving Linear Systems

residual norm curve correspond to plateaus in the QMR convergence curve.


At steps where the QMR quasi-residual norm is reduced by a large factor, the
BiCG residual norm is reduced by an even greater amount so that it "catches
up" with QMR.
Relation (5.17) implies roughly that the BiCG and QMR algorithms will
either both converge well or both perform poorly for a given problem. While
the QMR quasi-residual norm cannot increase during the iteration, it is no
more useful to have a near constant residual norm than it is to have an
increasing one. The analysis here assumes exact arithmetic, however. In finite
precision arithmetic, one might expect that a very large intermediate iterate
(corresponding to a very large residual norm) could lead to inaccuracy in the
final approximation, and, indeed, such a result was established in [66]. This will
be discussed further in section 7.3. A thorough study of the effect of rounding
errors on the BiCG and QMR algorithms has not been carried out, however.
Since the BiCG and QMR algorithms require essentially the same amount
of work and storage per iteration and since the QMR quasi-residual norm is
always less than or equal to the BiCG residual norm, it seems reasonable to
choose QMR over BiCG, although the difference may not be great.

5.5. The Conjugate Gradient Squared Algorithm.


The BiCG and QMR algorithms require multiplication by both A and AH
at each step. This means extra work, and, additionally, it is sometimes
much less convenient to multiply by AH than it is to multiply by A. For
example, there may be a special formula for the product of A with a given
vector when A represents, say, a Jacobian, but a corresponding formula for the
product of AH with a given vector may not be available. In other cases, data
may be stored on a parallel machine in such a way that multiplication by A
is efficient but multiplication by AH involves extra communication between
processors. For these reasons it is desirable to have an iterative method
that requires multiplication only by A and that generates good approximate
solutions from the Krylov spaces of dimension equal to the number of matrix-
vector multiplications. A method that attempts to do this is the conjugate
gradient squared (CGS) method.
Returning to the BiCG algorithm of section 5.2, note that we can write

for certain fcth-degree polynomials <f>k and ifrk- If the algorithm is converging
well, then ||^fc(-A)ro|| is small and one might expect that ||^f (.A)ro|| would be
even smaller. If <p%.(A)rQ could be computed with about the same amount of
work as <f>k(A)rQ, then this would likely result in a faster converging algorithm.
This is the idea of CGS.
Rewriting the BiCG recurrence in terms of these polynomials, we see that
BiCG and Related Methods 89

where

Note that the coefficients can be computed if we know fg and (pj(A}rQ and
^(A)r 0 ,j = l,2,....
From (5.19-5.20), it can be seen that the polynomials (pk(z) and ^k(z)
satisfy the recurrences

and squaring both sides gives

Multiplying (pk by the recurrence for i/?fc gives

and multiplying the recurrence for (p^ by fa-i gives

Denning

these recurrences become

Let rf = $fc(A)r 0 , pf = # fc (A)r 0 , and q% = 0 fc (A)r 0 . Then the following


algorithm generates an approximate solution xj. with the required residual r%.

Conjugate Gradient Squared Algorithm (CGS).


Given XQ = XQ, compute r$ ~ rj = b — Ax$, set UQ = r^, PQ = rjj,
qjl = 0, and vjj = Ap$. Set an arbitrary vector TQ. For k = 1,2,...
Compute where
90 Iterative Methods for Solving Linear Systems

Set
Then
Compute where

Set and

The CGS method requires two matrix-vector multiplications at each step


but no multiplications by the Hermitian transpose. For problems where the
BiCG method converges well, CGS typically requires only about half as many
steps and, therefore, half the work of BiCG (assuming that multiplication
by A or AH requires the same amount of work). When the norm of the
BiCG residual increases at a step, however, that of the CGS residual usually
increases by approximately the square of the increase of the BiCG residual
norm. The CGS convergence curve may therefore show wild oscillations that
can sometimes lead to numerical instabilities.

5.6. The BiCGSTAB Algorithm.


To avoid the large oscillations in the CGS convergence curve, one might try to
produce a residual of the form

where <f>k is again the BiCG polynomial but Xk is chosen to try and keep the
residual norm small at each step while retaining the rapid overall convergence
of CGS. For example, if Xk(z) is of the form

then the coefficients u)j can be chosen at each step to minimize

This leads to the BiCGSTAB algorithm, which might be thought of as a


combination of BiCG with Orthomin(l).
Again letting (pk(A)rQ denote the BiCG residual at step k and t/Jk(A)ro
denote the BiCG direction vector at step k, recall that these polynomials satisfy
recurrences (5.19-5.20). In the BiCGSTAB scheme we will need recurrences
for

It follows from (5.23) and (5.19-5.20) that


BiCG and Related Methods 91

Finally, we need to express the BiCG coefficients a,k~i and bk in terms


of the new vectors. Using the biorthogonality properties of the BiCG
polynomials— ((f>k(A)rQ, AHJrQ) = (A^k(A)r0, Aw rQ) = 0, j = 0,1,..., k - 1
(see Exercise 5.3)—together with the recurrence relations (5.19-5.20), we
derive the following expressions for inner products appearing in the coefficient
formulas (5.21-5.22):

It also follows from these same biorthogonality and recurrence relations that
the BiCGSTAB vectors satisfy

and hence the coefficient formulas (5.21-5.22) can be replaced by

This leads to the following algorithm.

Algorithm 6. BiCGSTAB.

Given XQ, compute TQ = b — AXQ and set po = ^o-


Choose fo such that (ro, fo) 7^ 0. For k — 1,2,...,

Compute Apk-.i.

Set where

Compute

Compute

Set x where

Compute

Compute where
92 Iterative Methods for Solving Linear Systems

5.7. Which Method Should I Use?


For Hermitian problems, the choice of an iterative method is fairly
straightforward—use CG or MINRES for positive definite problems and MIN-
RES for indefinite problems. One can also use a form of the CG algorithm for
indefinite problems. The relation between residual norms for CG and MINRES
is like that for BiCG and QMR, however, as shown in Exercise 5.1. For this
reason, the MINRES method is usually preferred. By using a simpler iteration,
such as the simple iteration method described in section 2.1 or the Chebyshev
method [94] which has not been described, one can avoid the inner products
required in the CG and MINRES algorithms. This gives some savings in the
cost of an iteration, but the price in terms of number of iterations usually out-
weighs the savings. An exception might be the case in which one has such a
good preconditioner that even simple iteration requires only one or two steps.
For some problems, multigrid methods provide such preconditioners.
The choice of an iterative method for non-Hermitian problems is not so
easy. If matrix-vector multiplication is extremely expensive (e.g., if A is dense
and has no special properties to enable fast matrix-vector multiplication), then
(full) GMRES is probably the method of choice because it requires the fewest
matrix-vector multiplications to reduce the residual norm to a desired level.
If matrix-vector multiplication is not so expensive or if storage becomes a
problem for full GMRES, then one of the methods described in this chapter is
probably a good choice. Because of relation (5.17), we generally recommend
QMR over BiCG.
The choice between QMR, CGS, and BiCGSTAB is problem dependent.
There are also transpose-free versions of QMR that have not been described
here [53]. Another approach, to be discussed in section 7.1, is to symmetrize the
problem. For example, instead of solving Ax = b, one could solve AH Ax — AHb
or AAHy = b (so that x ~ AHy) using the CG method. Of course, one
does not actually form the normal equations; it is necessary only to compute
matrix-vector products with A and AH. How this approach compares with
the methods described in this chapter is also problem dependent. In [102], the
GMRES (full or restarted), CGS, and CGNE (CG for AAHy = 6) iterations
were considered. For each method an example was constructed for which that
method was by far the best and another example was given for which that
method was by far the worst. Thus none of these methods can be eliminated
as definitely inferior to one of the others, and none can be recommended as
the method of choice for non-Hermitian problems.
To give some indication of the performance of the methods, we show here
plots of residual norm and error norm versus the number of matrix-vector
multiplications and versus the number of floating point operations (additions,
subtractions, multiplications, and divisions) assuming that multiplication by A
or AH requires 9n operations. This is the cost of applying a 5-diagonal matrix
A to a vector and probably is a lower bound on the cost of matrix-vector
multiplication in most practical applications. The methods considered are full
BiCG and Related Methods 93

FIG. 5.2. Performance of full GMRES (solid with o's), GMRES(W) (dashed),
QMR (solid), COS (dotted), BiCGSTAB (dash-dot), and CGNE (solid with x's).

GMRES, GMRES(IO) (that is, GMRES restarted after every 10 steps), QMR,
CGS, BiCGSTAB, and CGNE.
The problem is the one described in section 5.4—a real 103-by-103 matrix
A with random eigenvectors and with 50 pairs of complex conjugate eigenvalues
randomly distributed in [1,2] x [—i,i] and 3 additional real eigenvalues
at 4, .5, and —1. Results are shown in Figure 5.2. The full GMRES
algorithm necessarily requires the fewest matrix-vector multiplications to
achieve a given residual norm. In terms of floating point operations, however,
when matrix-vector multiplication requires only 9n operations, full GMRES
is the most expensive method. The QMR algorithm uses two matrix-
vector multiplications per step (oner with A and one with AH) to generate
an approximation whose residual lies in the same Krylov space as the
GMRES residual. Hence QMR requires at least twice as many matrix-vector
multiplications to reduce the residual norm to a given level, and for this
problem it requires only slightly more than this. A transpose-free variant
of QMR would likely be more competitive. Since the CGS and BiCGSTAB
methods construct a residual at step k that comes from the Krylov space
of dimension 2k (using two matrix-vector multiplications per step), these
methods could conceivably require as few matrix-vector multiplications as
GMRES. For this example they require a moderate number of additional
matrix-vector multiplications, but these seem to be the most efficient in terms
of floating point operations. The CGNE method proved very inefficient for this
problem, hardly reducing the error at all over the first 52 steps (104 matrix-
94 Iterative Methods for Solving Linear Systems

vector multiplications). The condition number of the matrix AAH is 108, so


this is not so surprising.
The results of this one test problem should not be construed as indicative of
the relative performance of these algorithms for all or even most applications.
In the Exercises, we give examples in which some of these methods perform far
better or far worse than the others. It remains an open problem to characterize
the classes of problems for which one method outperforms the others. For
additional experimental results, see [128, 28].

Comments and Additional References.


The QMR algorithm was developed by Preund and Nachtigal [54]. Theorem
5.3.1 was given in [101].
Relation (5.13) in Lemma 5.4.1 has been established in a number of places
(e.g., [22, 53, 54, 75, 111, 140]), but, surprisingly, the one-step leap to relation
(5.14) and its consequence (5.17) seems not to have been taken explicitly until
[29]. A similar relation between the GMRES and FOM residuals was observed
in [22].
The CGS algorithm was developed by Sonneveld [124] and BiCGSTAB by
van der Vorst [134].
An excellent survey article on iterative methods based on the nonsymmetric
Lanczos algorithm, along with many references, is given in [76].

Exercises.
5.1. Use Lemma 5.4.1 to show that for Hermitian matrices A, the CG residual
r% is related to the MINRES residual r%* by

provided that the tridiagonal matrix Tk generated by the Lanczos


algorithm is nonsingular.
5.2. Let r£ denote the residual at step k of the QMR algorithm and let
Tk+i,k denote the k + l-by-k tridiagonal matrix generated by the Lanczos
algorithm. Let T be any tridiagonal matrix whose upper left k + 1-by-fc
block is Tfc+i^. Use the fact that the Arnoldi algorithm, applied to T
with initial vector £1, generates the same matrix Tfc+i^ at step k to show
that

where rj^(T) is the residual at step k of the GMRES algorithm applied to


the linear system TX = ||T*O||£I. If T1 is taken to be the tridiagonal matrix
generated at step n of the Lanczos algorithm (assuming the algorithm
does not break down or terminate before step n), then the eigenvalues
of T are the same as those of A. Thus the convergence of QMR is like
BiCG and Related Methods 95

that of GMRES applied to a matrix with the same eigenvalues as A.


This does not provide useful a priori information about the convergence
rate of QMR, however, as it was noted in section 3.2 that eigenvalue
information alone tells nothing about the behavior of GMRES. (This
result was proved in [54] for the special case T — Tk+i, and it was proved
in [28] for T = Tn.)

5.3. Prove the biconjugacy relations

for the BiCG algorithm.

5.4. The following examples are taken from [102]. They demonstrate that
the performance of various iterative methods can differ dramatically for
a given problem and that the best method for one problem may be the
worst for another.

(a) CGNE wins. Suppose A is the unitary shift matrix

and 6 is the first unit vector £1. How many iterations will the
full GMRES method need to solve Ax = 6, with a zero initial
guess? What is a lower bound on the number of matrix-vector
multiplications required by CGS? How many iterations are required
if one applies CG to the normal equations AAHy — 6, x = AHyl
(b) CGNE loses. Suppose A is the block diagonal matrix

What is the degree of the minimal polynomial of A! How many


steps will GMRES require to obtain the solution to a linear system
Ax — bl How many matrix-vector multiplications will CGS require,
assuming that fo = TO? The singular values of this matrix lie
approximately in the range [2/n,n/2]. Would you expect CGNE
to require few or many iterations if n is large?
96 Iterative Methods for Solving Linear Systems

(c) CGS wins. Suppose A is a Hermitian matrix with many eigenvalues


distributed throughout an interval [c, d\ on the positive real axis.
Which method would you expect to require the least amount of
work to solve a linear system Ax = 6—(full) GMRES, CGS, or
CGNE? Explain your answer. (Of course, one would do better to
solve a Hermitian problem using CG or MINRES, but perhaps it is
not known that A is Hermitian.)
(d) CGS loses. Let A be the skew-symmetric matrix

that is, an n-by-ra block diagonal matrix with 2-by-2 blocks. Show
that this matrix is normal and has eigenvalues ±z and singular value
1. How many steps are required to solve a linear system Ax = b
using CGNE? GMRES? Show, however, that for any real initial
residual ro, if TO = TO then CGS breaks down with a division by 0
at the first step.

5.5. When the two-sided Lanczos algorithm is used in the solution of linear
systems, the right starting vector is always the initial residual, ro/||ro||,
but the left starting vector fo is not specified. Consider an arbitrary
3-term recurrence:

where
The 7's are chosen so that the vectors have norm 1, but the a's and /?'s
can be anything. Show that if this recurrence is run for no more than
[(n + 2)/2] steps, then there is a nonzero vector w\ such that

i.e., assuming there is no exact breakdown with (vj, Wj) = 0, the arbitrary
recurrence is the two-sided Lanczos algorithm for a certain left starting
vector wi. (Hint: The condition (5.24) is equivalent to (WI,A*VJ) = 0
V t < j — 1, j = 2 , . . . , [(n + 2)/2]. Show that there are only n — I linearly
independent vectors to which w\ must be orthogonal.)
This somewhat disturbing result suggests that some assumptions must
be made about the left starting vector, if we are to have any hope of
establishing good a priori error bounds for the Lanczos-based linear
system solvers [67]. In practice, however, it is observed that the
convergence behavior of these methods is about the same for most
randomly chosen left starting vectors or for TO = TO, which is sometimes
recommended.
Chapter 6

Is There a Short Recurrence for a Near-Optimal


Approximation?

Of the many non-Hermitian iterative methods described in the previous


chapter, none can be shown to generate a near-optimal approximate solution
for every initial guess. It sometimes happens that the QMR approximation
at step k is almost as good as the (optimal) GMRES approximation, but
sometimes this is not the case. It was shown by Faber and Manteuffel [45]
that if "optimal" is taken to mean having the smallest possible error in some
inner product norm that is independent of the initial vector, then the optimal
approximation cannot be generated with a short recurrence. The details of
this result are provided in section 6.1. The result should not necessarily be
construed as ruling out the possibility of a clear "method of choice" for non-
Hermitian problems. Instead, it may suggest directions in the search for such
a method. Possibilities are discussed in section 6.2.

6.1. The Faber and Manteuffel Result.


Consider a recurrence of the following form. Given XQ, compute po = b — AXQ,
and for k = 1,2,..., set

for some coefficients dk-\ and frfc-ij, j = k — s + 1,..., k — 1, where s is some


integer less than n. It is easy to show by induction that the approximate
solution Xk generated by this recurrence is of the form

and that the direction vectors po> • • • ,Pk-i form a basis for the Krylov space

The recurrence Orthodir(3) is of the form (6.1-6.2), with s = 3, as is


Orthomin(2). To see that Orthomin(2) is of this form, note that in that
97
98 Iterative Methods for Solving Linear Systems

algorithm we have

Substituting for r-fc in the recurrence for pk gives

and using the fact that Tk-\ = Pk-i + bk-iPk-2 gives

The normalization of p^ is of no concern, since that can be accounted


for by choosing the coefficient a^-i appropriately, so if pk is replaced by
[(-l)fc nj^i1 ajl}pk, then Orthomin(2) fits the pattern (6.1-6.2). The MINRES
algorithm for Hermitian problems is also of this form, and it has the desirable
property of generating, at each step, the approximation of the form (6.3) for
which the 2-norm of the residual is minimal. The CG algorithm for Hermitian
positive definite problems is also of the form (6.1-6.2), and at each step it
generates the approximation of the form (6.3) for which the j4-norm of the
error is minimal.
For what matrices A can one construct a recurrence of the form (6.1-6.2)
with the property that for any initial vector XQ, the approximation Xk at step k
is the "optimal" approximation from the space (6.3), where "optimal" means
that the error ek = A~lb — x^ is minimal in some inner product norm, the
inner product being independent of the initial vector? This is essentially the
question answered by Faber and Manteuffel [45]. See also [46, 5, 86, 138]. We
will not include the entire proof, but the answer is that for s < -^/n, except for a
few anomalies, the matrices for which such a recurrence exists are those of the
form B-V2CB1/2, where C is either Hermitian or of the form C = el0(dI+F),
with d real and FH = —F, and B is a Hermitian positive definite matrix.
Equivalently (Exercise 6.1), such a recurrence exists for matrices A of the form

If A is of the form (6.4), then Bi/2AB~1^2 is just a shifted and rotated


Hermitian matrix.
To see why this class of matrices is special, note that the error e^ m a
recurrence of the form (6.1-6.2) satisfies

If ((-,-}) denotes the inner product in which the norm of e^ is minimized, then
Cfc must be the unique vector of the form (6.5) satisfying
Is There a Short Recurrence for a Near-Optimal Approximation? 99

It follows that since ek = e-k-\ ~ a>k-\Pk-\, the coefficient afc_i must be

For j < k — 1, we have

so if {(efc_i,pj)) = 0, then in order to have ({ek,pj}} = 0, it is necessary that


either ak-i = 0 or ({pk-i,Pj}) — 0- If it is required that ((pk,Pj)) — 0 f°r all
k and all j < fc, then the coefficients 6fc-ij must be given by

A precise statement of the Faber and Manteuffel result is given in the


following definition and theorem.
DEFINITION 6.1.1. An algorithm of the form (6.1-6.2) is an s-term CG
method for A if, for every PQ, the vectors pk, k — l , 2 , . . . , m — 1, satisfy
({pkiPj}} — 0 for all j < k, where m is the number of steps required to obtain
the exact solution xm = A~lb.
THEOREM 6.1.1 (Faber and Manteuffel [45]). An s-term CG method exists
for the matrix A if and only if either

(i) the minimal polynomial of A has degree less than or equal to s, or

(ii) A* is a polynomial of degree less than or equal to s — 2 in A, where


A* is the adjoint of A with respect to some inner product, that is,
((Av,w)) = ({v,A*w}) for all vectors v and w.
Proof (of sufficiency only). The choice of coefficients ai and fejj, i —
0 , . . . , s - 1, j < i not only forces ({pk,Pj}) = 0, k = 1,..., s - 1, j < k,
but also ensures that the error at steps 1 through s is minimized in the norm
corresponding to the given inner product. Since the error at step k is equal
to a certain fcth-degree polynomial in A times the initial error, if the minimal
polynomial of A has degree k < s, then the algorithm will discover this minimal
polynomial (or another one for which ek = pk(A)eo — 0), and the exact solution
will be obtained after k < s steps. In this case, then, iteration (6.1-6.2) is an
s-term CG method.
For k > s and i < k — s + 1, it follows from (6.2) that

If ((pj,pi)) = 0 for j = k — s + 1,..., k — 1, then we will have ((pk,Pi)) — 0 if


and only if
100 Iterative Methods for Solving Linear Systems

If A* = g s _2(A) for some polynomial qs-2 of degree s — 2 or less, then (6.6)


will hold, since Pk-i is orthogonal to the space

which contains qs-2(A)pi since z + s — 2 < k — 2.


To clarify condition (ii) in Theorem 6.1.1, first recall (section 1.3.1) that
for any inner product {{•,-}) there is a Hermitian positive definite matrix B
such that

for all vectors v and it?, where (-, •} denotes the standard Euclidean inner
product. The 5-adjoint of A, denoted A* in the theorem, is the unique matrix
satisfying

for all v and w. Prom this definition it follows that

where the superscript H denotes the adjoint in the Euclidean norm AH = AT.
The matrix A is said to be B-normal if and only if A*A = AA*. If f?1/2 denotes
the Hermitian positive definite square root of B, then this is equivalent to the
condition that

which is the condition that Bl/2AB~1/2 be normal.


Let B be fixed and let A denote the matrix Bl^AB~1/2. It can be shown
(Exercise 6.2) that A is normal (A is J3-normal) if and only if AH can be written
as a polynomial (of some degree) in A If 77 is the smallest degree for which
this is true, then 77 is called the B-normal degree of A. For any integer t > 77, A
is said to be B-normal(t). With this notation, condition (ii) of Theorem 6.1.1
can be stated as follows:

Condition (ii') still may seem obscure, but the following theorem, also from
[45], shows that matrices A with 5-normal degree 77 greater than 1 but less
than v/n also have minimal polynomials of degree less than n. These matrices
belong to a subspace of C nxn of dimension less than n2, so they might just be
considered anomalies. The more interesting case is 77 = 1 or the £?-normal(l)
matrices in (ii').
THEOREM 6.1.2 (Faber and Manteuffel [45]). If A has B-normal degree
77 > 1, then the minimal polynomial of A has degree less than or equal to rj2.
Is There a Short Recurrence for a Near-Optimal Approximation? 101

Proof. The degree d(A) of the minimal polynomial of A is the same as


that of A = Blf<2AB~l/'i. Since A is normal, it has exactly d(A) distinct
eigenvalues, and we will have AH = q(A) if and only if

How many distinct complex numbers z can satisfy q(z) = z? Note that q(z] = z
or q(q(z)) — z. The expression q(q(z)) — z is a polynomial of degree exactly
ry2 if q has degree 77 > 1. (If the degree of q were 1, this expression could
be identically zero.) It follows that there are at most rf distinct roots, so
d(A) <rf. D
The B-normal(l) matrices, for which a 3-term CG method exists, are
characterized in the following theorem.
THEOREM 6.1.3 (Faber and Manteuffel [45]). If A is B-normal(l) then
d(A) = 1, A* = A, or

where r is real and F = —FH.


Proof. Since A is normal, if A has all real eigenvalues, then AH = A or
A* = A.
Suppose A has at least one complex eigenvalue. There is a linear
polynomial q such that each of the eigenvalues \i of A satisfies q(\i] — Aj.
This implies that q(\i) = \i or q(q(\i)} — Aj = 0. In general, this equation has
just one root A^, and if this is the case then d(A) = 1.
Let q(z) = az — b. The expression q(q(\i)} — \i = 0 can be written as

There is more than one root \i only if the expression on the left is identically
zero, which means that a = —b/b. Let 6 = rez0, i = \J— I. Then

If q(z] = z, then

which yields

Thus, if A is an eigenvalue of A, the real part of Ae is r/2. This implies that

has only pure imaginary eigenvalues; hence, since A is normal, F = —FH.


102 Iterative Methods for Solving Linear Systems

6.2. Implications.
The class of B-normal(l) matrices of the previous section are matrices for
which CG methods are already known. They are diagonalizable matrices whose
spectrum is contained in a line segment in the complex plane. See [26, 142].
Theorems 6.1.1-6.1.3 imply that for most non-Hermitian problems, one
cannot expect to find a short recurrence that generates the optimal approxi-
mation from successive Krylov spaces, if "optimality" is defined in terms of an
inner product norm that is independent of the initial vector. It turns out that
most non-Hermitian iterative methods actually do find the optimal approxi-
mation in some norm [11] (see Exercise 6.3). Unfortunately, however, it is a
norm that cannot be related easily to the 2-norm or the oo-norm or any other
norm that is likely to be of interest. For example, the BiCG approximation is
optimal in the P^^P^-norm, where the columns of Pn are the biconjugate
direction vectors. The QMR approximation is optimal in the AHV~HV~1A-
norm, where the columns of Vn are the biorthogonal basis vectors.
The possibility of a short recurrence that would generate optimal approxi-
mations in some norm that depends on the initial vector but that can be shown
to differ from, say, the 2-norm by no more than some moderate size factor re-
mains. This might be the best hope for developing a clear "method of choice"
for non-Hermitian linear systems.
It should also be noted that the Faber and Manteuffel result deals only
with a single recurrence. It is still an open question whether coupled short
recurrences can generate optimal approximations. For some preliminary
results, see [12].
It remains a major open problem to find a method that generates provably
"near-optimal" approximations in some standard norm while still requiring
only O(ri) work and storage (in addition to the matrix-vector multiplication)
at each iteration—or to prove that such a method does not exist.

Exercises.
6.1. Show that a matrix A is of the form (6.4) if and only if it is of the form
B^^CB1/2, where C is either Hermitian or of the form el&(dl + F),
with d real and FH = -F.

6.2. Show that a matrix A is normal if and only if AH = q(A) for some
polynomial q. (Hint: If A is normal, write A in the from A = UhUH,
where A is diagonal and U is unitary, and determine a polynomial q for
which q(K) = A.)

6.3. The following are special instances of results due to Earth and Manteuffel
[U]:

(a) Assume that the BiCG iteration does not break down or find the
exact solution before step n. Use the fact that the BiCG error at
Is There a Short Recurrence for a Near-Optimal Approximation? 103

step k is of the form

and the residual satisfies

to show that the BiCG approximation at each step is optimal in


the PT^HPT^1-norm, where the columns of Pn are the biconjugate
direction vectors.
(b) Assume that the two-sided Lanczos recurrence does not break down
or terminate before step n. Use the fact that the QMR error at step
k is of the form e^ = CQ — VkUk and that the QMR residual satisfies
r^Vk+iV^AVk = 0 to show that the QMR approximation at each
step is optimal in the AHV~HV~lA-ncxm., where the columns of
Vn are the biorthogonal basis vectors.

6.4. Write down a CG method for matrices of the form /—F, where F = —FH,
which minimizes the 2-norm of the residual at each step. (Hint: Note
that one can use a 3-term recurrence to construct an orthonormal basis
for the Krylov space spanjgi, (/ — -F)gi,...,(/ — F)k~lq\}, when F is
skew-Hermitian.)
This page intentionally left blank
Chapter 7

Miscellaneous Issues

7.1. Symmetrizing the Problem.


Because of the difficulties in solving non-Hermitian linear systems, one might
consider converting a non-Hermitian problem to a Hermitian one by solving
the normal equations. That is, one applies an iterative method to one of the
linear systems

As usual, this can be accomplished without actually forming the matrices


AHA or AAH, and, in the latter case, one need not explicitly generate
approximations to y but instead can carry along approximations Xk = AHyk.
For instance, if the CG method is used to solve either of the systems in (7.1),
then the algorithms, sometimes called CGNR and CGNE, respectively, can be
implemented as follows.

Algorithm 7. CG for the Normal Equations (CGNR and CGNE).

Given an initial guess XQ, compute TQ = b — AXQ.


Compute AHro and set po = AHTQ. For k = 1,2,...,

Compute Apk-i-

Sert where

Compute

Compute

Set where

The CGNR algorithm minimizes the AHA-norm of the error, which is the
105
106 Iterative Methods for Solving Linear Systems

2-norm of the residual b — Axk, over the affine space

The CGNE algorithm minimizes the AAH-norm of the error in y^, which is
the 2-norm of the error x — Xfc, over the affine space

Note that these two spaces are the same, and both involve powers of the
symmetrized matrix AH A or AAH.
Numerical analysts sometimes cringe at the thought of solving the normal
equations for two reasons. First, since the condition number of AHA or AAH
is the square of the condition number of A, if there were an iterative method
for solving Ax — 6 whose convergence rate was governed by the condition
number of A, then squaring this condition number would significantly degrade
the convergence rate. Unfortunately, however, for a non-Hermitian matrix
A, there is no iterative method whose convergence rate is governed by the
condition number of A.
The other objection to solving the normal equations is that one cannot
expect to achieve as high a level of accuracy when solving a linear system
Cy = d as when solving Ax — 6, if the condition number of C is greater
than that of A. This statement can be based on a simple perturbation
argument. Since the entries of C and d probably cannot be represented exactly
on the computer (or in the case of iterative methods, the product of C with
a given vector cannot be computed exactly), the best approximation y to y
that one can hope to find numerically is one that satisfies a nearby system
(C + 8C)y = d + 8d, where the size of 6C and 8d are determined by the
machine precision. If y is the solution of such a perturbed system, then it can
be shown, for sufficiently small perturbations, that

If C and d were the only data available for the problem, then the terms
1)5(711/11(711 and ||<5d||/||d|| on the right-hand side of this inequality could not be
expected to be less than about e, the machine precision. For the CGNR and
CGNE methods, however, not only is C = AHA available, but the matrix A
itself is available. (That is, one can apply A to a given vector.) As a result, it
will be shown in section 7.4 that the achievable level of accuracy is about the
same as that for the original linear system.
Thus, neither of the standard arguments against solving the normal
equations is convincing. There are problems for which the CGNR and CGNE
methods are best (Exercise 5.4a), and there are other problems for which one
of the non-Hermitian matrix iterations far outperforms these two (Exercise
5.4b). In practice, the latter situation seems to be more common. There is
Miscellaneous Issues 107

little theory characterizing problems for which the normal equations approach
is or is not to be preferred to a non-Hermitian iterative method. (See Exercise
7.1, however.)

7.2. Error Estimation and Stopping Criteria.


When a linear system is "solved" by a computer using floating point arithmetic,
one does not obtain the exact solution, whether a direct or an iterative method
is employed. With iterative methods especially, it is important to have some
idea of what constitutes an acceptably good approximate solution, so the
iteration can be stopped when this level of accuracy is achieved. Often it
is desired to have an approximate solution for which some standard norm of
the error, say, the 2-norm or the oo-norm, is less than some tolerance. One can
compute the residual b — Axk, but one cannot compute the error A~lb — Xk-
Hence one might try to estimate the desired error norm using the residual or
other quantities generated during the iteration.
The relative error norm is related to the relative residual norm by

where «(A) = |||A||| • \\\A 1 ||j and ||| • ||| represents any vector norm and its
induced matrix norm. To see this, note that since b — Axk = A(A~lb — £fc),
we have

Since we also have IH-A" 1 ^)! < HJA" 1 !!! • |||6|||, combining this with (7.3) gives
the first inequality in (7.2). Using the inequality |||b||| < |||A||| • IP"1^!! with
(7.4) gives the second inequality in (7.2). To obtain upper and lower bounds on
the desired error norm, one might therefore attempt to estimate the condition
number of A in this norm.
It was noted at the end of Chapter 4 that the eigenvalues of the tridiagonal
matrix T^ generated by the Lanczos algorithm (and, implicitly, by the CG and
MINRES algorithms for Hermitian matrices) provide estimates of some of the
eigenvalues of A. Hence, if A is Hermitian and the norm in (7.2) is the 2-norm,
then the eigenvalues of T^ can be used to estimate K,(A). It is easy to show
(Exercise 7.2) that the ratio of largest to smallest eigenvalue of TJt gives a
lower bound on the condition number of A, but in practice it is usually a very
good estimate, even for moderate size values of k. Hence one might stop the
iteration when

where tol is the desired tolerance for the 2-norm of the relative error. This
could cause the iteration to terminate too soon, since ft(Tfc) < ft(-A), but, more
108 Iterative Methods for Solving Linear Systems

often, it results in extra iterations because the right-hand side of (7.2) is an


overestimate of the actual error norm.
Unfortunately, for most other norms in (7.2), the condition number of A
cannot be well approximated by the condition number (or any other simple
function) of 7^. For non-Hermitian matrix iterations such as BiCG, QMR,
and GMRES, the condition number of A cannot be approximated easily using
the underlying non-Hermitian tridiagonal or upper Hessenberg matrix. (In the
GMRES algorithm, one might consider approximating K,(A) in the 2-norm by
the ratio of largest to smallest singular value of Hk, since, at least for k — n,
we have K,(Hn) = K(A). Unfortunately, however, for k < n, the singular values
of Hk do not usually provide good estimates of the singular values of A.)
Since the right-hand side of (7.2) is an overestimate of the actual error
norm, one might try to use quantities generated during the iteration in different
ways. Sometimes an iteration is stopped when the difference between two
consecutive iterates or between several consecutive iterates is less than some
tolerance. For most iterative methods, however, this could lead to termination
before the desired level of accuracy is achieved.
Consider the CG algorithm where A is Hermitian and positive definite and
it is desired to reduce the A-norm of the error to a certain tolerance. The error
6k = x — Xk satisfies

The A-norm of the difference Xk — %k-i = a>k-\Pk-i gives a lower bound on the
error at step k — I:

To obtain an upper bound, one can use the fact that the A-norm of the
error is reduced by at least the factor (K — !)/(« + 1) at every step, or, more
generally, that the A-norm of the error is reduced by at least the factor

after every d steps. Here K denotes the condition number of A in the 2-norm
and is the ratio of largest to smallest eigenvalue of A. This error bound follows
from the fact that the CG polynomial is optimal for minimizing the ^4-norm of
the error and hence is at least as good as the product of the (k—rf)th-degree CG
polynomial with the dth-degree Chebyshev polynomial. (See Theorem 3.1.1.)
Miscellaneous Issues 109

Using (7.8) with (7.7), we find

One could again estimate K using the ratio of largest to smallest eigenvalue of
Tfc and stop when the quantity in (7.9) is less than a given tolerance for some
value of d.
Since (7.9) still involves an upper bound on the error which, for some
problems, is not a very good estimate, different approaches have been
considered. One of the more interesting ones involves looking at the quadratic
form r^A~lrk as an integral and using different quadrature formulas to obtain
upper and lower bounds for this integral [57]. The bounds obtained in this
way appear to give very good estimates of the actual A-norm of the error. The
subject of effective stopping criteria for iterative methods remains a topic of
current research.

7.3. Attainable Accuracy.


Usually, the accuracy required from an iterative method is considerably less
than it is capable of ultimately achieving. The important property of the
method is the number of iterations or total work required to achieve a fairly
modest level of accuracy. Occasionally, however, iterative methods are used
for very ill-conditioned problems, and then it is important to know how the
machine precision and the condition number of the matrix limit the attainable
accuracy. Such analysis has been carried out for a number of iterative methods,
and here we describe some results for a class of methods which includes the
CG algorithm (Algorithm 2), the CGNR and CGNE algorithms (Algorithm
7), and some implementations of the MINRES, BiCG, and CGS algorithms
(although not the ones recommended here). We will see in Part II of this book
that the preconditioned versions of these algorithms also fall into this category.
The analysis applies to algorithms in which the residual vector r^ is updated
rather than computed directly, using formulas of the form

Here pk-i is some direction vector and fljt_i is some coefficient. It is assumed
that the initial residual is computed directly as TO = 6 — AXQ.
It can be shown that when formulas (7.10) are implemented in finite
precision arithmetic, the difference between the true residual b — Axk and
the updated vector r^ satisfies

where e is the machine precision [66]. The growth in intermediate iterates,


reflected on the right-hand side of (7.11), appears to play an important role in
determining the size of the quantity on the left-hand side.
110 Iterative Methods for Solving Linear Systems

It is often observed numerically (but in most cases has not been proved)
that the vectors r^ converge to zero as k —> oo or, at least, that their norms
become many orders of magnitude smaller than the machine precision. In
such cases, the right-hand side of (7.11) (without the O(k] factor, which is an
overestimate) gives a reasonable estimate of the best attainable actual residual:

where c is a moderate size constant.


The quantity on the left in (7.12) gives a measure of the backward error in
Xk, since if this quantity is bounded by C) then Xk is the exact solution of a
nearby problem (A + 6A)xk = b, where ||6A||/||A|| < C + O(C 2 )- In general, the
best one can hope for from a computed "solution" is that its backward error
be about the machine precision e, since errors of this order are made in simply
representing the matrix A (either through its entries or through a procedure
for computing the product of A with a given vector) on the computer.
Based on (7.12), one can expect a small backward error from iterative
methods of the form (7.10), if the norms of the iterates Xk do not greatly exceed
that of the true solution. For the CG algorithm for Hermitian positive definite
linear systems, this is the case. It can be shown in exact arithmetic that the 2-
norm of the error decreases monotonically in the CG algorithm [79]. From the
inequality \\x — x^\\ < \\x — XQ\\, it follows that ||a;fc|| < 2||o;|| + \\XQ\\. Assuming
that \\XQ\\ < ||a;||, the quantity maxfe |jxfc||/||o;|| in (7.12) is therefore bounded
by about 3, and one can expect to eventually obtain an approximate solution
whose backward error is a moderate multiple of e. In finite precision arithmetic,
the relation established in Chapter 4 between the error norms in finite precision
arithmetic and exact arithmetic error norms for a larger problem can be used
to show that a monotone reduction in the 2-norm of the error can be expected
(to a close approximation) in finite precision arithmetic as well.
If the BiCG algorithm is implemented in the form (7.10), as in the
algorithm stated in section 5.2, the norms of intermediate iterates may grow.
Since no special error norm (that can be related easily to the 2-norm) is
guaranteed to decrease in the BiCG algorithm, the norms of the iterates cannot
be bounded a priori, and growth of the iterates will cause loss of accuracy in
the final approximate solution, even assuming that the updated vectors r^
converge to zero.
An example is shown in Figure 7.1. Here A was taken to be a discretization
of the convection-diffusion operator

on the unit square with Dirichlet boundary conditions, using centered dif-
ferences on a 32-by-32 mesh. The solution was taken to be u(x, y) =
x(x — I}2y2(y — I) 2 , and the initial guess was set to zero. The initial vec-
tor f 0 was set equal to TQ.
Miscellaneous Issues 111

FIG. 7.1. Actual residual norm (solid) and updated residual norm (dashed). Top
curves are for CGS, bottom ones for BiCG. The asterisk shows the maximum ratio

The lower solid line in Figure 7.1 represents the true BiCG residual norm
\\b — Axk\\/(\\A\\\\x\\), while the lower dashed line shows the updated residual
norm, ||rfc||/(||j4||||x||). The lower asterisk in the figure shows the maximum
ratio ||xfc||/||x|| at the step at which it occurred. The experiment was run on
a machine with unit roundoff e ~ l.le — 16, and the maximum ratio ||xfc||/||x||
was approximately 103. As a result, instead of achieving a final residual norm
of about e, the final residual norm is about l.e — 13«103e.
Also shown in Figure 7.1 (upper solid and dashed lines) are the results of
running the CGS algorithm given in section 5.5 for this same problem. Again,
there are no a priori bounds on the size of intermediate iterates, and in this
case we had maxfc ||xfc||/||x|| « 4 • 1010. As a result, the final actual residual
norm reaches the level Q.e — 6, which is roughly 4 • 1010 e.
Since the CGNE and CGNR algorithms are also of the form (7.10), the
estimate (7.12) is applicable to them as well. Since CGNE minimizes the 2-
norm of the error, it follows, as for CG, that ||xfc|| < 2||x|| + ||xo||, so the
backward error in the final approximation will be a moderate multiple of the
machine precision. The CGNR method minimizes the 2-norm of the residual,
but since it is equivalent to CG for the linear system AHAx — AHb, it follows
that the 2-norm of the error also decreases monotonically. Hence we again
expect a final backward error of order e.
An example for the CGNE method is shown in Figure 7.2. The matrix A
was taken to be of the form A = U*EVT, where U and V are random orthogonal
matrices and E = diag(<7i,..., crn), with
112 Iterative Methods for Solving Linear Systems

FIG. 7.2. Actual residual norm (solid) and updated residual norm (dashed) for
CGNE. The asterisk shows the maximum ratio ||xfc||/||a;|l.

For a problem of size n = 40, a random solution was set and a zero initial
guess was used. The solid line in Figure 7.2 shows the actual residual norm
\\b — Axk^/(\\A\\\\x\\), while the dashed line represents the updated residual
norm ||»"fe||/(||A||||a;||). The maximum ratio ||xfc||/||z|| is approximately 1, as
indicated by the asterisk in Figure 7.2. Note that while rounding errors
greatly affect the convergence rate of the method—in exact arithmetic, the
exact solution would be obtained after 40 steps—the ultimately attainable
accuracy is as great as one could reasonably expect—a backward error of size
approximately 4e. There is no loss of final accuracy due to the fact that we
are (implicitly) solving the normal equations.
When a preconditioner is used with the above algorithms, the 2-norm of the
error may not decrease monotonically, and then one must use other properties
to establish bounds on the norms of the iterates.
It is sometimes asked whether one can accurately solve a very ill-
conditioned linear system if a very good preconditioner is available. That
is, suppose n(A) is very large but n(M~lA) or K(M~l^AM~1^2), where M is
a known preconditioning matrix, is not. We will see in Part II that the pre-
conditioned CG algorithm, for instance, still uses formulas of the form (7.10).
There is simply an additional formula, Mz^ = r/fc, to determine a precondi-
tioned residual Zfc. The final residual norm is still given approximately by
(7.12), and, unless the final residual vector is deficient in certain eigencompo-
nents of A, this suggests an error satisfying

The presence of even an excellent preconditioner M (such as the LU factors


Miscellaneous Issues 113

from direct Gaussian elimination) does not appear to improve this error bound
for algorithms of the form (7.10).
For a discussion of the effect of rounding errors on the attainable accuracy
with some different implementations, see, for example, [33, 70, 122].

7.4. Multiple Right-Hand Sides and Block Methods.


Frequently, it is desired to solve several linear systems with the same coefficient
matrix but different right-hand sides. Sometimes the right-hand side vectors
are known at the start and sometimes one linear system must be solved
before the right-hand side for the next linear system can be computed (as, for
example, in time-dependent partial differential equations). One might hope
that information gained in the solution of one linear system could be used to
facilitate the solution of subsequent problems with the same coefficient matrix.
We will consider only the case in which the right-hand sides are all available
at the start. In that case, block versions of the previously described algorithms
can be used. Suppose there are s right-hand sides. Then the linear systems
can be written in the form

where X is the n-by-s matrix of solution vectors and B is the n-by-s matrix
of right-hand sides.
Let A be Hermitian positive definite and consider the block CG algorithm.
Instead of minimizing the -A-norm of the error for each linear system over a
single Krylov space, one can minimize

over all X^ of the form

That is, the approximation rcjj. for the ^th equation is equal to XQ plus a
linear combination of vectors from all of the Krylov spaces

The following algorithm accomplishes this minimization.


114 Iterative Methods for Solving Linear Systems

Algorithm 8. Block Conjugate Gradient Method (Block CG)


(for Hermitian positive definite problems with multiple right-hand sides).

Given an initial guess XQ, compute RQ = B — AX0 and set PO = RO-


For k = 1,2,...,

Compute

Set. where

Compute

Set where

It is left as an exercise to show that the following block orthogonality properties


hold:

As long as the matrices Pk and Rk retain full rank, the algorithm is


well defined. If their columns become linearly dependent, then equations
corresponding to dependent columns can be treated separately, and the
algorithm will be continued with the remaining equations. A number of
strategies have been developed for varying the block size. See, for example,
[107, 130, 105].
A block algorithm requires somewhat more work than running s separate
recurrences because s separate inner products and scalar divisions are replaced
by the formation of an s-by-s matrix and the solution of s linear systems with
this coefficient matrix. (Note, as usual, that it is not necessary to actually
invert the matrices Pj^^APk-i and R^_lRk-i in the block CG algorithm,
but instead one can solve s linear systems with right-hand sides given by the
columns of Rjf^Rk-i and R^Rk, respectively. This can be accomplished
by factoring the coefficient matrices using Cholesky decomposition and then
backsolving with the triangular factors s times. The total work is O(s3).) If
s3 < n, then the extra work will be negligible.
How much improvement is obtained in the number of iterations required
due to the fact that the error is minimized over a larger space? This depends
on the right-hand side vectors. If all of the vectors in all of the s Krylov spaces
are linearly independent, then at most n/s steps are required for the block
algorithm, as compared to n for the nonblock version. Usually, however, the
number of iterations required even for the nonblock iteration is significantly less
than n/s. In this case, if the right-hand sides are unrelated random vectors,
then the improvement in the number of iterations is usually modest. Special
relations between right-hand side vectors, however, may lead to significant
advantages for the block algorithm.
Miscellaneous Issues 115

7.5. Computer Implementation.


If two iterative methods are both capable of generating a sufficiently accurate
approximation to a system of linear equations, then we usually compare
the two methods by counting operations—how many additions, subtractions,
multiplications, and divisions will each method require? If the number of
operations per iteration is about the same for the two methods, then we might
just compare number of iterations. The method that requires fewer iterations
is chosen. (One should be very careful about iteration count comparisons,
however, to be sure that the algorithms being compared really do require the
same amount of work per iteration!)
These are approximate measures of the relative computer time that will be
required by the two algorithms, but they are only approximate. Computational
time may also depend on data locality and potential for parallelism, that is, the
ability to effectively use multiple processors simultaneously. These factors vary
from one machine to another, so it is generally impossible to give a definitive
answer to the question of which algorithm is faster.
Almost all of the algorithms discussed in this book perform vector inner
products during the course of the iteration. If different pieces of the vectors are
stored on different processors, then this requires some global communication
to add together the inner products of the subvectors computed on the
different processors. It has sometimes been thought that this would be a
major bottleneck for distributed memory multiprocessors, but on today's
supercomputers, this does not appear to be the case. It is a relatively small
amount of data that must be passed between processors, and the bulk of the
time for the iterative solver still lies in the matrix-vector multiplication and
in the preconditioning step.
Sparse matrix-vector multiplication is often parallelized by assigning
different rows or different blocks of the matrix to different processors, along
with the corresponding pieces of the vectors on which they must operate. Many
different distribution schemes are possible.
The most difficult part of an iterative method to parallelize is often the
preconditioning step. This may require the solution of a sparse triangular
system, which is a largely sequential operation—one solution component must
be known before the next can be computed. For this reason, a number of more
parallelizable preconditioned have been proposed. Examples include sparse
approximate inverses (so the preconditioning step becomes just another sparse
matrix-vector multiplication) and domain decomposition methods. Domain
decomposition methods, to be discussed in Chapter 12, divide the physical
domain of the problem into pieces and assign different pieces to different
processors. The preconditioning step involves each processor solving a problem
on its subdomain. In order to prevent the number of iterations from growing
with the number of subdomains, however, some global communication is
required. This is in the form of a coarse grid solve.
A number of parallel iterative method packages have been developed
116 Iterative Methods for Solving Linear Systems

for different machines. Some examples are described in [118, 82]. More
information about the parallelization of iterative methods can be found in
[117].

Exercises.
7.1. Let A be a matrix of the form / — F, where F = —FH. Suppose the
eigenvalues of A are contained in the line segment [1 —17,1 +17]. It was
shown by Freund and Ruscheweyh [55] that if a MINRES algorithm is
applied to this matrix, then the residual at step k satisfies

Moreover, if A contains eigenvalues throughout the interval, then this


bound is sharp.
Determine a bound on the residual in the CGNR method. Will CGNR
require more or fewer iterations than the MINRES method for this
problem?
7.2. Use the fact that Tf. = Q^AQk in the Hermitian Lanczos algorithm
to show that the eigenvalues of Tk lie between the smallest and largest
eigenvalues of A.
7.3. Prove the block orthogonality properties
for the block CG algorithm.
Part II

Preconditioners
This page intentionally left blank
Chapter 8

Overview and Preconditioned Algorithms

All of the iterative methods discussed in Part I of this book converge very
rapidly if the coefficient matrix A is close to the identity. Unfortunately,
in most applications, A is not close to the identity, but one might consider
replacing the original linear system Ax = b by the modified system

These are referred to as left and right preconditioning, respectively. If M is


Hermitian and positive definite, then one can precondition symmetrically and
solve the modified linear system

where M = LLH. The matrix L could be the Hermitian square root of M


or the lower triangular Cholesky factor of M or any other matrix satisfying
M = LLH. In either case, it is necessary only to be able to solve linear systems
with coefficient matrix M, not to actually compute M"1 or L.
If the preconditioner M can be chosen so that
1. linear systems with coefficient matrix M are easy to solve, and
2. M~1A or AM~l or L~1AL~H approximates the identity,
then an efficient solution technique results from applying an iterative method
to the modified linear system (8.1) or (8.2).
The exact sense in which the preconditioned matrix should approximate
the identity depends on the iterative method being used. For simple iteration,
one would like p(I — M~1A] « 1 to achieve fast asymptotic convergence or
||7 — M"1^!) « 1 to achieve large error reduction at each step.
For the CG or MINRES methods for Hermitian positive definite problems,
one would like the condition number of the symmetrically preconditioned
matrix L~1AL~H to be close to one, in order for the error bound based on the
Chebyshev polynomial to be small. Alternatively, a preconditioned matrix with
just a few large eigenvalues and the remainder tightly clustered would also be
good for the CG and MINRES algorithms, as would a preconditioned matrix
119
120 Iterative Methods for Solving Linear Systems

with just a few distinct eigenvalues. For MINRES applied to a Hermitian


indefinite linear system but with a positive definite preconditioner, it is again
the eigenvalue distribution of the preconditioned matrix that is of importance.
The eigenvalues should be distributed in such a way that a polynomial of
moderate degree with value one at the origin can be made small at all of the
eigenvalues.
For GMRES, a preconditioned matrix that is close to normal and whose
eigenvalues are tightly clustered around some point away from the origin would
be good, but other properties might also suffice to define a good preconditioner.
It is less clear exactly what properties one should look for in a preconditioner
for some of the other non-Hermitian matrix iterations (such as BiCG, QMR,
CGS, or BiCGSTAB), but again, since each of these methods converges in one
iteration if the coefficient matrix is the identity, there is the intuitive concept
that the preconditioned matrix should somehow approximate the identity.
It is easy to modify the algorithms of Part I to use left preconditioning—
simply replace A by M~1A and b by M~lb everywhere they appear. Right
or symmetric preconditioning requires a little more thought since we want to
generate approximations Xk to the solution of the original linear system, not
the modified one in (8.1) or (8.2).
If the CG algorithm is applied directly to equation (8.2), then the iterates
satisfy

Defining

we obtain the following preconditioned CG algorithm for Ax = b.


Overview and Preconditioned Algorithms 121

Algorithm 2P. Preconditioned Conjugate Gradient Method (PCG)


(for Hermitian positive definite problems, with Hermitian positive definite
preconditioners).

Given an initial guess XQ, compute TQ = b — AXQ and solve


MZQ = TQ. Set PQ — ZQ. For fc = 1, 2, . . .,

Compute Apk-\.

Set i where

Compute

Solve

Set where

The same modifications can be made to any of the MINRES implemen-


tations, provided that the preconditioner M is positive definite. To obtain a
preconditioned version of Algorithm 4, first consider the Lanczos algorithm
applied directly to the matrix L~1AL~H with initial vector q\. Successive
vectors satisfy

If we define qj = Lqj, Vj = LVJ, and Wj = M"1^, then the same equations


can be written in terms of qj, Vj, and Wj.

Preconditioned Lanczos Algorithm (for Hermitian matrices A,


with Hermitian positive definite preconditioners M).

Given UQ, solve Mw\ = t;o, and set /?o = {^o,^!)1^2-


Set q\ = VQ//SQ and ^i — u>i//?o- Define QQ = 0. For j — 1,2,...,

Set Vj = AWJ — (3j-\qj-\.

Compute and update

Solve

Set where

If Algorithm 4 of section 2.5 is applied directly to the preconditioned


linear system (8.2) and if we let j/j. and pk denote the iterates and direction
122 Iterative Methods for Solving Linear Systems

vectors generated by that algorithm and if we then define Xk = L~Hy^ and


Pit = L~Hpk, then these vectors are generated by the following preconditioned
algorithm.

Algorithm 4P. Preconditioned Minimal Residual Algorithm


(PMINRES)
(for Hermitian problems, with Hermitian positive definite preconditioners).

Given XQ, compute TQ = b — AXQ and solve MZQ — r$.


Set ft = (r 0 ,zo) 1/2 , qi = rQ/(3, and w\ = zo//3.
Initialize £ = (1,0,... ,0)r. Forfc= 1,2,...,

Compute qk+1, wk+i, ak = T(k,k), and /3k = T(k + l,fc) = T(k,k + 1)


using the preconditioned Lanczos algorithm.

Apply Fk-2 and Fk-i to the last column of T; that is

Compute the kth rotation, ck and sk, to annihilate the (k + 1,fc)entry of T.1

Apply &th rotation to £ and to last column of T:

Compute
where undefined terms are zero for k < 2.

Set where

Right-preconditioned algorithms for non-Hermitian matrices are similarly


derived so that the algorithms actually generate and store approximations to
the solution of the original linear system.
Preconditioners can be divided roughly into three categories:

I. Preconditioners designed for general classes of matrices; e.g., matrices


with nonzero diagonal entries, positive definite matrices, M-matrices.
Examples of such preconditioners are the Jacobi, Gauss-Seidel, and
lr
The formula is ck = |T(fc,fc)|/ 1 /|T(fc,fc)| 2 + |T(fc 4-l,fc)| 2 , sfc = ckT(k + l,k)/T(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
Overview and Preconditioned Algorithms 123

SOU preconditioners, the incomplete Cholesky, and modified incomplete


Cholesky preconditioners.
II. Preconditioners designed for broad classes of underlying problems; e.g.,
elliptic partial differential equations. Examples are multigrid and domain
decomposition preconditioners.
III. Preconditioners designed for a specific matrix or underlying problem;
e.g., the transport equation. An example is the diffusion synthetic
acceleration (DSA) preconditioner, which will be mentioned but not
analyzed in section 9.2.
An advantage of category I preconditioners is that they can be used in
settings where the exact origin of the problem is not necessarily known—
for example, in software packages for solving systems of ordinary differential
equations or optimization problems. Most of the preconditioners in category I
require knowledge of at least some of the entries of A. Usually, this information
is readily available, but sometimes it is much easier to compute matrix-vector
products, through some special formula, than it is to compute the actual
entries of the matrix (in the standard basis). For example, one might use
a finite difference approximation to the product of a Jacobian matrix with
a given vector, without ever computing the Jacobian itself [23]. For such
problems, efficient general preconditioners may be difficult to derive, and we
do not address this topic here.
For practical preconditioners designed for general matrices, there are few
quantitative theorems to describe just how good the preconditioner is, e.g., how
much smaller the condition number of the preconditioned matrix is compared
to that of the original matrix. There are, however, comparison theorems
available. For large classes of problems, one may be able to prove that a
preconditioner M\ is better than another preconditioner M2, in that, say,
p(I — MI 1A) < p(I — M^lA). Such results are discussed in Chapter 10. These
results may lead to theorems about the optimal preconditioner of a given form;
e.g., the optimal diagonal preconditioner.
For classes of problems arising from partial differential equations, it is
sometimes possible to show that a preconditioner alters the dependence of the
condition number on the mesh size used in a finite difference or finite element
approximation. That is, instead of considering a single matrix A and asking
how much a particular preconditioner reduces the condition number of A, we
consider a class of matrices A^ and preconditioners M^ parameterized by a
mesh spacing h. It can sometimes be shown that while the condition number
of Ah grows like O(h~2) as h —> 0, the condition number of M^ AhM^ '
is only O(h] or 0(1). This is not of much help if one's goal is to solve a
specific linear system Ax = 6, but if the goal is to solve the underlying partial
differential equation, it quantifies the difficulty of solving the linear system in
relation to the accuracy of the finite difference or finite element scheme. In
Chapter 11, incomplete decompositions are considered, and it is shown that for
124 Iterative Methods for Solving Linear Systems

a model problem, a modified incomplete Cholesky decomposition reduces the


condition number of A from O(h~2} to O(h~l). Multigrid methods, discussed
in Chapter 12, have proved especially effective for solving problems arising from
partial differential equations because they often eliminate the dependence of
the condition number on h entirely.
Despite great strides in developing preconditioners for general linear
systems or for broad classes of underlying problems, it is still possible in many
situations to use physical intuition about a specific problem to develop a more
effective preconditioner. Note that this is different from the situation with
the iterative techniques themselves. Seldom (if ever) can one use physical
properties of the problem being solved to devise an iteration strategy (i.e., a
choice of the polynomial Pk for which r^ = Pk(A)ro) that is better than, say,
the CG method. For this reason, the subject of preconditioners is still a very
broad one, encompassing all areas of science. No complete survey can be given.
In this book, we present some known general theory about preconditioners and
a few example problems to illustrate their use in practice.

Comments and Additional References.


The idea of preconditioning the CG method actually appeared in the original
Hestenes and Stiefel paper [79]. It was not widely used until much later,
however, after works such as [27] and [99]. See also [87], which describes
some early applications. It was the development of effective preconditioning
strategies that helped bring the CG algorithm into widespread use as an
iterative method.
Chapter 9

Two Example Problems

In the study of preconditioners, it is useful to have some specific problems


in mind. Here we describe two such problems—one of which (the diffusion
equation) gives rise to a symmetric positive definite linear system, and one
of which (the transport equation) gives rise to a nonsymmetric linear system.
Many other examples could equally well have been chosen for presentation, but
these two problems are both physically important and illustrative of many of
the principles to be discussed. Throughout this chapter we will deal only with
real matrices.

9.1. The Diffusion Equation.


A number of different physical processes can be described by the diffusion
equation:

Here u might represent the temperature distribution at time t in an object Q,


to which an external heat source / is applied. The positive coefficient a(x)
is the thermal conductivity of the material. To determine the temperature at
time t, we need to know an initial temperature distribution u(x, 0) and some
boundary conditions, say,

corresponding to the boundary of the region being held at a fixed temperature


(which we have denoted as 0).
Other phenomena lead to an equation of the same form. For example,
equation (9.1) also represents the diffusion of a substance through a permeable
region $7, if u is interpreted as the concentration of the substance, a as the
diffusion coefficient of the material, and / as the specific rate of generation of
the substance by chemical reactions or outside sources.
A standard method for obtaining approximate solutions to partial differen-
tial equations such as (9.1) is the method of finite differences. Here the region
Q is divided into small pieces, and at each point of a grid on fi, the derivatives
in (9.1) are replaced by difference quotients that approach the true derivatives
as the grid becomes finer.
125
126 Iterative Methods for Solving Lineax Systems

FlG. 9.1. Finite difference discretization, natural ordering.

For example, suppose the region fi is the unit square [0,1] x [0,1]. Introduce
a uniform grid {xj, yj : i = 0,1,..., nx + 1, j = 0,1,..., ny + 1} with spacing
hx = l/(nx + 1) in the x-direction and hy = l/(ny + 1) in the y-direction,
as shown in Figure 9.1 for nx = 3, ny = 5. A standard centered difference
approximation to the partial derivative in the x-direction in (9.1) is

where 0^1/2,.; = a(xi ± fc.x/2,t/j) and Uij represents the approximation to


u(xi,yj). An analogous expression is obtained for the partial derivative in the
y direction:

where Ojj±i/2 = G(XJ, yj ± hy/2). We will sometimes be interested in problems


for which a(x, y) is discontinuous along a mesh line. In such cases, the
values CLij-i/zj and a-ij±\/2 will be taken to be averages of the surrounding
values. For instance, if a(x, y) is discontinuous along the line y = y^, then
«i±i/2j = lime_0+(a(>i ± l/2hx, yf + e) + a(xi ± l/2hx, yj - e))/2.
If the steady-state version of problem (9.1-9.2),

is approximated by this finite difference technique, then we obtain the following


system of nxny linear algebraic equations to solve for the unknown function
Two Example Problems 127

values Uij at the interior mesh points:

For the time-dependent equation, a backward or centered difference


approximation in time is often used, resulting in a system of linear algebraic
equations to solve at each time step. For example, if the solution u\ j at time
tf is known, and if backward differences in time are used, then in order to
obtain the approximate solution uf^1 at time ti+\ — tt + At, one must solve
the following system of equations:

Here we have considered a two-dimensional problem for illustration, but it


should be noted that iterative methods are especially important for three-
dimensional problems, where direct methods become truly prohibitive in terms
of both time and storage. The extension of the difference scheme to the unit
cube is straightforward.
To write the equations (9.3) or (9.4) in matrix form, we must choose an
ordering for the equations and unknowns. A common choice, known as the
natural ordering, is to number the gridpoints from left to right and bottom to
top, as shown in Figure 9.1. With this ordering, equations (9.3) can be written
in the form

where A is a block tridiagonal matrix with ny diagonal blocks, each of


dimension nx by n x ; u is the nxny-vector of function values with Uij stored in
position (j — l}nx + i; and f is the n^ny-vector of right-hand side values with
f i j in position (j — l}nx 4- i. Define
128 Iterative Methods for Solving Linear Systems

Then the coefficient matrix A can be written in the form

For the time-dependent problem (9.4), the diagonal entries of A are increased
by I/At, and the terms «| -/Ai are added to the right-hand side vector.
THEOREM 9.1.1. Assume that a(x,y) > a > 0 m (0,1) x (0,1). Then the
coefficient matrix A defined in (9.5-9.8) is symmetric and positive definite.
Proof. Symmetry is obvious. The matrix is weakly diagonally dominant,
so by Gerschgorin's theorem (Theorem 1.3.11) its eigenvalues are all greater
than or equal to zero. Suppose there is a nonzero vector v such that Av = 0,
and suppose that the component of v with the largest absolute value is the
one corresponding to the ( i , j ) grid point. We can choose the sign of v so that
this component is positive. From the definition of A and the assumption that
a(x, y] > 0, it follows that v^j can be written as a weighted average of the
surrounding values of v:

where terms corresponding to boundary nodes are replaced by zero. The


weights Wt±i,y and w>tj±i are positive and sum to 1. It follows that if
all neighbors of v^j are interior points, then they must all have the same
maximum value since none can be greater than vy. Repeating this argument
for neighboring points, we eventually find a point with this same maximum
value which has at least one neighbor on the boundary. But now the value of
v at this point is a weighted sum of neighboring interior values, where the sum
Two Example Problems 129

of the weights is less than 1. It follows that the value of v at one of these other
interior points must be greater than v^j if Vij > 0, which is a contradiction.
Therefore the only vector v for which Av = 0 is the zero vector, and A is
positive definite.
It is clear that the coefficient matrix for the time dependent problem (9.4)
is also positive definite, since it is strictly diagonally dominant.
The argument used in Theorem 9.1.1 is a type of discrete maximum
principle. Note that it did not make use of the specific values of the entries
of A—only that A has positive diagonal entries and nonpositive off-diagonal
entries (so that the weights in the weighted average are positive); that A is
rowwise weakly diagonally dominant, with strong diagonal dominance in at
least one row; and that starting from any point (z, j) in the grid, one can
reach any other point through a path connecting nearest neighbors. This last
property will be associated with an irreducible matrix to be defined in section
10.2.
Other orderings of the equations and unknowns are also possible. These
change the appearance of the matrix but, provided that the equations and
unknowns are ordered in the same way—that is, provided that the rows
and columns of A are permuted symmetrically to form a matrix PTAP—
the eigenvalues remain the same. For example, if the nodes of the grid in
Figure 9.1 are colored in a checkerboard fashion, with red nodes coupling only
to black nodes and vice versa, then if the red nodes are ordered first and the
black nodes second, then the matrix A takes the form

where D\ and D<z are diagonal matrices.


A matrix of the form (9.6-9.8) is sometimes called a 5-point approxima-
tion, since the second derivatives at a point ( i , j ) are approximated in terms
of the function values at that point and its four neighbors. A more accurate
approximation can be obtained with a 9-point approximation, coupling func-
tion values at each point with its eight nearest neighbors. Another approach
to obtaining approximate solutions to partial differential equations is the finite
element method. The idea of a finite element method is to approximate the
solution by a piecewise polynomial—piecewise linear functions on triangles or
piecewise bilinear functions on rectangles, etc.—and then to choose the piece-
wise polynomial to minimize a certain error norm (usually the A-norm of the
difference between the true and approximate solution). For piecewise constant
a(x,y}, the 5-point finite difference matrix turns out to be the same as the
matrix arising from a piecewise linear finite element approximation.

9.1.1. Poisson's Equation. In the special case when the diffusion coef-
ficient a(x,y] is constant, say, a(x,y] = 1, the coefficient matrix (with the
natural ordering of nodes) for the steady-state problem (now known as Pois-
130 Iterative Methods for Solving Linear Systems

son's equation) takes on a very special form:

This is known as a block-TST matrix, where "TST" stands for Toeplitz


(constant along diagonals), symmetric, tridiagonal [83]. It is a block-TST
matrix because the blocks along a given diagonal of the matrix are the same,
the matrix is symmetric and block tridiagonal, and each of the blocks is a
TST matrix. The eigenvalues and eigenvectors of such matrices are known
explicitly.
LEMMA 9.1.1. Let G be an m-by-m TST matrix with diagonal entries a
and off-diagonal entries (3. Then the eigenvalues of G are

and the corresponding orthonormal eigenvectors are

Proof. It is easy to verify (9.12-9.13) from the definition of a TST matrix,


but here we provide a derivation of these formulas.
Assume that 0 7^ 0, since otherwise G is just a multiple of the identity
and the lemma is trivial. Suppose A is an eigenvalue of G with corresponding
eigenvector q. Letting qo = qm+\ = 0, we can write Aq = \q in the form

This is a linear difference equation, and it can be solved similarly to a corre-


sponding linear differential equation. Specifically, we consider the characteris-
tic polynomial

If the roots of this polynomial are denoted z+ and z_, then the general solution
of the difference equation (9.14) can be seen to be
Two Example Problems 131

and the constants are determined by the boundary conditions go = Qm+i — 0.


The roots of x(^) are

and the condition qo = 0 implies c\ + c<i — 0. The condition qm+\ = 0 implies


z™+l = z™+l. There are m + 1 solutions to this equation, namely,

but the k = 0 case can be discarded because it corresponds to z+ = Z- and


hence qi = 0.
Multiplying by exp(—7rki/(m + 1)) in (9.16) and substituting the values of
z± from (9.15) yields

Rearranging, we find

and squaring both sides and solving the quadratic equation for A gives

Taking the plus sign we obtain (9.12), while the minus sign repeats these same
values and can be discarded.
Substituting (9.12) for A in (9.15), we find

and therefore

If we take c\ = —(z/2)>/2/(m + 1), as in (9.13), then it is easy to check that


each vector q^ has norm one. The eigenvectors are orthogonal since the
matrix is symmetric.
COROLLARY 9.1.1. Allm-by-m TST matrices commute with each other.
132 Iterative Methods for Solving Linear Systems

Proof. According to (9.13), all such matrices have the same orthonormal
eigenvectors. If GI = QAiQT and G2 = Q^2QT, then GiG2 = QAiA 2 Q T =
QA2&iQT = G2Gl.
THEOREM 9.1.2. The eigenvalues of the matrix A defined in (9.10-9.11)
are

and the corresponding eigenvectors are

where u££' denotes the component corresponding to grid point (ra, £) in the
eigenvector associated with A^/-.
Proof. Let A be an eigenvalue of A with corresponding eigenvector u, which
can be partitioned in the form

The equation Au = Xu can be written in the form

where we have set UQ = uny+i = 0. From Lemma 9.1.1, we can write


S = QK$QT and T = QA.rQT, where AS and AT are diagonal, with jih-
diagonal entries

The rath entry of column j of Q is

Multiply (9.19) by QT on the left to obtain

Since the matrices here are diagonal, equations along different vertical lines in
the grid decouple:
Two Example Problems 133

If, for a fixed value of j, the vector (2/7,1, • • -yj,ny)T is an eigenvector of the
TST matrix

with corresponding eigenvalue A, and if the other components of the vector y


are 0, then equations (9.20) will be satisfied. By Lemma 9.1.1, the eigenvalues
of this matrix are

The corresponding eigenvectors are

Since the £th block of u^^ is equal to Q times the £th block of y and since
only the jth entry of the £th block of y is nonzero, we have

Deriving the eigenvalues Xj^ and corresponding vectors u^M for each j =
1,..., n x , we obtain all nxny eigenpairs of A. D
COROLLARY 9.1.2. Assume that hx — hy = h. Then the smallest and
largest eigenvalues of A in (9.10-9.11) behave like

as h —» 0, so the condition number of A is (4/7r2)/i 2 + 0(1).


Proof. The smallest eigenvalue of A is the one with j = k = I and the
largest is the one with j = k = nx = ny in (9.17):

Expanding sin(x) and sin(?r/2 — x) in a Taylor series gives the desired result
(9.21), and dividing Xmax by \min gives the condition number estimate.
134 Iterative Methods for Solving Linear Systems

The proof of Theorem 9.1.4 provides the basis for a direct solution
technique for Poisson's equation known as a fast Poisson solver. The idea
is to separate the problem into individual tridiagonal systems that can be
solved independently. The only difficult part is then applying the eigenvector
matrix Q to the vectors y obtained from the tridiagonal systems, and this is
accomplished using the fast Fourier transform. We will not discuss fast Poisson
solvers here but refer the reader to [83] for a discussion of this subject.
Because the eigenvalues and eigenvectors of the 5-point finite difference
matrix for Poisson's equation on a square are known, preconditioners are often
analyzed and even tested numerically on this particular problem, known as
the model problem. It should be noted, however, that except for multigrid
methods, none of the preconditioned iterative methods discussed in this book is
competitive with a fast Poisson solver for the model problem. The advantage of
iterative methods is that they can be applied to more general problems, such
as the diffusion equation with a nonconstant diffusion coefficient, Poisson's
equation on an irregular region, or Poisson's equation with a nonuniform grid.
Fast Poisson solvers apply only to block-TST matrices. They are sometimes
used as preconditioners in iterative methods for solving more general problems.
Analysis of a preconditioner for the model problem is useful, only to the extent
that it can be expected to carry over to more general situations.

9.2. The Transport Equation.


The transport equation is an integro-differential equation that describes the
motion of particles (neutrons, photons, etc.) that move in straight lines with
constant speed between collisions but which are subject to a certain probability
of colliding with outside objects and being scattered, slowed down, absorbed,
or multiplied. A sufficiently large aggregate of particles is treated so that they
may be regarded as a continuum, and statistical fluctuations are ignored. In the
most general setting, the unknown neutron flux is a function of spatial position
r = (x,y,z), direction fi = (sin 6 cos <j>, sin 0 sin 0, cos 6}, energy E, and time t.
Because of the large number of independent variables, the transport equation
is seldom solved numerically in its most general form. Instead, a number of
approximations are made.
First, a finite number of energy groups are considered and integrals over
energy are replaced by sums over the groups. This results in a weakly coupled
set of equations for the flux associated with each energy group. These equations
are usually solved by a method that we will later identify as a block Gauss-
Seidel iteration. The flux in the highest energy group is calculated using
previously computed approximations for the other energy groups. This newly
computed flux is then substituted into the equation for the next energy group,
and so on, down to the lowest energy group, at which point the entire process
is repeated until convergence. We will be concerned with the mono-energetic
transport equation that must be solved for each energy group, at each step of
this outer iteration.
Two Example Problems 135

The mono-energetic transport equation with isotropic scattering can be


written as

Here ^ is the unknown angular flux corresponding to a fixed speed v, at is


the known total cross section, as is the known scattering cross section of the
material, and / is a known external source. The scalar flux <j) is the angular
flux integrated over directions on the unit sphere S2. (Actually, the scalar flux
is defined without the factor l/(4?r) in (9.22), but we will include this factor
for convenience.) Initial values i/j(r,£l,Q) and boundary values are needed to
specify the solution. If the problem is defined on a region 7£ with outward
normal n(r) at point r, then the incoming flux can be specified by

Finite difference techniques and preconditioned iterative linear system


solvers are often used for the solution of (9.22-9.23). To simplify the discussion
here, however, we will consider a one-dimensional version of these equations.
The difference methods used and the theoretical results established all have
analogues in higher dimensions. Let 7£ be the region a < x < b. The one-
dimensional mono-energetic transport equation with isotropic scattering is

A standard approach to solving (9.24-9.26) numerically is to require


that the equations hold at discrete angles //, which are chosen to be Gauss
quadrature points, and to replace the integral in (9.24) by a weighted Gauss
quadrature sum. This is called the method of discrete ordinates:
136 Iterative Methods for Solving Linear Systems

Here tjjj is the approximation to ^(a;,/Uj,i), and the quadrature points p,j and
weights Wj are such that for any polynomial p(fi) of degree 2nM — 1 or less,

We assume an even number of quadrature points n^ so that the points p,j ar<
nonzero and symmetric about the origin, fj,n^^j+i = — /zj.
Equation (9.27) can be approximated further by a method known as
diamond differencing—replacing derivatives in x by centered differences am
approximating function values at zone centers by the average of their value;
at the surrounding nodes. Let the domain in x be discretized by

and define (Ax) i+ i/ 2 = Xi+i — Xi and xi+1/2 = (xi+i + Xj)/2. Equation (9.27
is replaced by

The combination of discrete ordinates and diamond differencing is by n<


means the only (or necessarily the best) technique for solving the transpor
equation. For a discussion of a variety of different approaches, see, for example
[93]. Still, this method is widely used, so we consider methods for solving th<
linear systems arising from this finite difference scheme.
Consider the time-independent version of (9.28):

with boundary conditions

Equations (9.29-9.30) can be written in matrix form as follows. Define


Two Example Problems 137

Define nx + l-by-nx + 1 triangular matrices Hj by

and nx + l-by-nx diagonal matrices E s j by

where «rSji+1/2 = o"s(^i+i/2)- Finally, define the nx-by-nx + 1 matrix S, which


averages nodal values to obtain zone-centered values, by

Equations (9.29-9.30) can be written in the form

where we have taken Uj = Wj/2 so that £!?=i Uj = I , and


138 Iterative Methods for Solving Linear Systems

Usually equation (9.31) is not dealt with directly because in higher


dimensions the angular flux vector ty is quite large. The desired quantity
is usually the scalar flux (f> (from which the angular flux can be computed
if needed), which is a function only of position. Therefore, the angular flux
variables Vi > • • • , ^nM are eliminated from (9.31) using Gaussian elimination,
and the resulting Schur complement system is solved for the scalar flux </>:

To solve this equation, one does not actually form the Schur complement matrix
AQ = I — X)j=i UjSH^Hsj, which is a dense n^-by-n^ matrix. To apply this
matrix to a given vector v, one steps through each value of j, multiplying v by
Ssj, solving a triangular system with coefficient matrix Hj, multiplying the
result by S, and subtracting the weighted outcome from the final vector, which
has been initialized to v. In this way, only three vectors of length nx need be
stored simultaneously.
One popular method for solving equation (9.32) is to use the simple itera-
tion defined in section 2.1 without a preconditioned that is, the preconditioner
is M = /. In the neutron transport literature, this is known as source iteration.
Given an initial guess ^°\ for k = 0,1,..., set

Note that this unpreconditioned iteration for the Schur complement system
(9.32) is equivalent to a preconditioned iteration for the original linear system
(9.31), where the preconditioner is the block lower triangle of the matrix. That
is, suppose (i{?i , . . . , ^/4M )T is an arbitrary initial guess for the angular flux
and 0<°) = £"=iU>jSi^.0). For k = 0,1,..., choose the (k + l)st iterate to
satisfy
Two Example Problems 139

(Equivalently, if A is the coefficient matrix in (9.31), M is the block lower


triangle of A, b is the right-hand side vector in (9.31), and t/fe+1) is the
vector (^! fc+1) ,...,^ +1) ,0< fc+1 >) T , then Mu<fc+1> = (M - A)u^ + b or
-1 fc
u (fc+i) _ u(k) _|_ Af (6 — A«( )).) Then the scalar flux approximation at each
step fc satisfies

which is identical to (9.33).


The coefficient matrix in (9.31) and the one in (9.32) are nonsymmetric.
In general, they are not diagonally dominant, but in the special case where
e
i+i/2,j < 0 for aH ii Ji the matrix in (9.31) is weakly diagonally dominant and
has positive diagonal elements (since the total cross section at(x) is nonnegative
and (j,j is nonzero) and nonpositive off-diagonal elements (since crs(x) > 0).
We will see later that this implies certain nice properties for the block Gauss-
Seidel method (9.34), such as convergence and positivity of the solution. The
condition ei+l^,j < 0 is equivalent to

which means physically that the mesh width is no more than two mean free
paths of the particles being simulated. It is often desirable, however, to use a
coarser mesh.
Even in the more general case when (9.35) is not satisfied, it turns out that
the iteration (9.33) converges. Before proving this, however, let us return to
the differential equation (9.24) and use a Fourier analysis argument to derive
an estimate of the rate of convergence that might be expected from the linear
system solver. Assume that as and crt are constant and that the problem is
denned on an infinite domain. If the iterative method (9.33) is applied directly
to the steady-state version of the differential equation (9.24), then we can write

Define #( fc+1 > = t/» - iMfc+1) and $(fc+1> = <f> - <j>(k+l\ where V, 0 are the true
solution to the steady-state version of (9.24). Then equations (9.36-9.37) give
140 Iterative Methods for Solving Linear Systems

Suppose $(fc)(a;) = exp(zAx) and ^k+l\x,fi) = g(/x)exp(zAz). Introducing


these expressions into equations (9.38-9.39), we find that

Thus the functions exp(zAx) are eigenfunctions of this iteration, with corre-
sponding eigenvalues

The largest eigenvalue, or spectral radius, corresponding to A = 0 is crs/vt-


Thus, we expect an asymptotic convergence rate of as/o~t-
When the iteration (9.33) is applied to the linear system (9.32), we can
actually prove a stronger result. The following theorem shows that in a certain
norm, the factor supx as(x)/at(x) gives a bound on the error reduction achieved
at each step. Unlike the above analysis, this theorem does not require that as
and at be constant or that the problem be defined on an infinite domain.
THEOREM 9.2.1 (Ashby et al. [3]). Assume ut(x) > crs(x) > 0 for all
x G 13, = (a, b), and assume also that o~t(x) > c > 0 on H. Then for each j,

where 9 = diag(<T t (x 1 / 2 )(Az) 1 /2 > ... ,o-t(xni_l/2)(A.x)nx_1/2).


Proof. First note that the n^-by-nx matrix SH~lEs,j is the same matrix
obtained by taking the product of the upper left nx-by-nx blocks of 5, H71,
_1 J
and ESJ for j < n^/2 or the lower right nx-by-nx blocks of S H j , and S s j
for j > n M /2. Accordingly, let Sj, H^1, and SSJ- denote these nx-by-nx blocks.
We will establish the bound (9.40) for He^SjH^tsje-1^.
Note that Hj can be written in the form
Two Example Problems 141

where

It can be seen that for each j, Gj = 2(1 — Sj). Dropping the subscript j for
convenience, we can write

Multiplying by ©1//2 on the left and by Q 1//2


on the right gives

and it follows that

The matrix norm on the right-hand side in (9.41) is equal to the inverse of the
square root of the smallest eigenvalue of

The third term in this sum is positive definite, since

\ /

is nonsingular and |/z_,-| > 0. It will follow that the smallest eigenvalue of the
matrix in (9.42) is strictly greater than 1 if the second term,
142 Iterative Methods for Solving Linear Systems

can be shown to be positive semidefinite. Directly computing this matrix, we


find that

which has nx — 1 eigenvalues equal to 0 and the remaining eigenvalue equal


to 2nx. Hence this matrix and the one in (9.43) are positive semidefinite. It
follows that the matrix norm on the right-hand side in (9.41) is strictly less
than 1, and from this the desired result is obtained.
COROLLARY 9.2.1. Under the assumptions of Theorem 9.2.1, the iteration
(9.33) converges to the solution (j> o/(9.32), and if e^ = <f> — <p^ denotes the
error at step k, then

where Q is defined in the theorem and 7 is defined in (9.41).


Proof. Prom (9.33) we have

and taking norms on both sides and recalling that the weights Wj are
nonnegative and sum to 1, we find

Since 7 < 1 and since the inequality in (9.44) is strict, with the amount by
which the actual reduction factor differs from 7 being independent of k, it
follows that the iteration (9.33) converges to the solution of (9.32).
For 7 « 1, Corollary 9.2.1 shows that the simple source iteration (9.33)
converges rapidly, but for 7 sa 1, convergence may be slow. In Part I of this
book, we discussed many ways to accelerate the simple iteration method, such
as Orthomin(l), QMR, BiCGSTAB, or full GMRES. Figure 9.2 shows the
convergence of simple iteration, Orthomin(l), and full GMRES applied to two
test problems. QMR and BiCGSTAB were also tested on these problems, and
each required only slightly more iterations than full GMRES, but at twice the
cost in terms of matrix-vector multiplications. The vertical axis is the oo-
norm of the error in the approximate solution. The exact solution to the linear
system was computed directly for comparison. Here we used a uniform mesh
spacing Ax = .25 (nx = 120) and eight angles, but the convergence rate was
not very sensitive to these mesh parameters.
The first problem, taken from [92], is a model shielding problem, with cross
sections corresponding to water and iron in different regions, as illustrated
below.
Two Example Problems 143

water water iron water


0 < x < 12 12 < x < 15 15 < x < 21 21 < x < 30
at = 3.3333 <rt = 3.3333 crt = 1.3333 at = 3.3333
crs = 3.3136 as = 3.3136 <rs = 1.1077 <TS = 3.3136
/=! / =o / =o /=o

FIG. 9.2. Error curves for (a) 7 = .994 and (b) 7 = .497. Simple iteration
(solid), Orthomin(l) (dashed), full GMRES (dotted), and DSA-preconditioned simple
iteration (dash-dot).

The slab thicknesses are in cm and the cross sections are in cm""1. There is
a vacuum boundary condition at the right end (i^nx,j — 0, j < n^/2) and
a reflecting boundary condition at the left (^o.ra^-j+i — "00j> j < n,u/2)- A.
uniform source / = 1 is placed in the first (leftmost) region. In the second test
problem, we simply replaced as in each region by half its value: as — 1.6568
in the first, second, and fourth regions; as — .55385 in the third.
Also shown in Figure 9.2 is the convergence of the simple iteration method
with a preconditioner designed specifically for the transport equation known as
diffusion synthetic acceleration (DSA). In the first problem, where 7 = .994, it
is clear that the unpreconditioned simple iteration (9.33) is unacceptably slow
to converge. The convergence rate is improved significantly by Orthomin(l),
with little extra work and storage per iteration, and it is improved even more
by full GMRES but at the cost of extra work and storage. The most effective
method for solving this problem, however, is the DSA-preconditioned simple
iteration.
For the second problem, the reduction in the number of iterations is less
dramatic. (Note the different horizontal scales in the two graphs.) Unpre-
conditioned simple iteration converges fairly rapidly, Orthomin(l) reduces the
144 Iterative Methods for Solving Linear Systems

number of iterations by about a factor of 2, and further accelerations such as


full GMRES and DSA can bring about only a modest reduction in the num-
ber of iteration steps. If the cost of an iteration is significantly greater, these
methods will not be cost effective.
For the time-dependent problem (9.28), the time derivative term essentially
adds to the total cross section at. That is, suppose (9.28) is solved using
centered differences in time. The equations for ^jl at time tf+i = it + At
become

The matrix equation for the flux i(;e+l in terms of / and if>e is like that in
(9.31), except that the entries di+i/2,j and ei+1/2j of Hj are each increased
by l/(uAt). One would obtain the same coefficient matrix for the steady-
state problem if at were replaced by at + 2/(uAt). Thus, for time-dependent
problems the convergence rate of iteration (9.33) at time step (.+1 is governed
by the quantity

In many cases, this quantity is bounded well away from 1, even if asjat is not.
For steady-state problems with 7 w 1, it is clear from Figure 9.2a that the
DSA preconditioner is extremely effective in terms of reducing the number of
iterations. At each iteration a linear system corresponding to the steady-state
diffusion equation must be solved. Since this is a book on iterative methods
and not specifically on the transport equation, we will not give a complete
account of diffusion synthetic acceleration. For a discussion and analysis, see
[91, 3]. The basic idea, however, is that when as/at « 1, the scalar flux <j)
approximately satisfies a diffusion equation and therefore the diffusion operator
is an effective preconditioner for the linear system. In one dimension, the
diffusion operator is represented by a tridiagonal matrix which is easy to solve,
but in higher dimensions, the diffusion equation itself may require an iterative
solution technique. The advantage of solving the diffusion equation is that it
is independent of angle. An iteration for the diffusion equation requires about
l/n M times as much work as an iteration for equation (9.32), so a number of
inner iterations on the preconditioner may be acceptable in order to reduce
the number of outer iterations. Of course, the diffusion operator could be used
as a preconditioner for other iterative methods as well; that is, DSA could
be further accelerated by replacing the simple iteration strategy with, say,
GMRES.
Two Example Problems 145

A number of different formulations and solution techniques for the trans-


port equation have been developed. In [96, 97], for example, multigrid methods
are applied to the transport equation. The development of accurate and ef-
ficient methods for solving the transport equation remains an area of active
research.

Comments and Additional References.


Iterative solution of the transport equation requires at least two levels of nested
iterations—an outer iteration over energy groups and an inner iteration for each
group. If the DS A preconditioner is used, then a third level of iteration may be
required to solve the diffusion equation. Since the ultimate goal is to solve the
outermost linear system, one might consider accepting less accurate solutions
to the inner linear systems, especially at early stages of the outer iteration, if
this would lead to less total work in solving the outermost system.
The appropriate level of accuracy depends, of course, on the iterative
methods used. As might be guessed from the analysis of Chapter 4, the
CG method is especially sensitive to errors (rounding errors or otherwise),
so an outer CG iteration may require more accuracy from an inner iteration.
This might be a motivation for using a different outer iteration, such as the
Chebyshev method [61, 94], (Of course, the transport equation, in the form
stated in this chapter, is nonsymmetric, so the CG method could not be used
anyway, unless it was applied to the normal equations.)
For discussions of accuracy requirements in inner and outer iterations, see
[59, 56].

Exercises.
9.1. Use the Taylor series to show that the approximation

is second-order accurate, provided that adu/dx e C3 and acflu/dx3 6


C1; that is, show that the absolute value of the difference between the
right- and left-hand sides is bounded by

9.2. Let u(x,y) be the solution to Poisson's equation V 2 ^ = / on the


unit square with homogeneous Dirichlet boundary conditions: u(x, 0) =
u(x, 1} — u(0, y] = u(l,y) = 0, and let u be the vector of values u(xi, yj)
on a uniform grid of spacing h in each direction. Let u be the solution
to the linear system *\/\u = f, where y| represents the matrix defined
in (9.10-9.11), with hx = hy = h, and f is the vector of right-hand side
146 Iterative Methods for Solving Linear Systems

values f ( x i , y j ) . Use the previous exercise and Corollary 9.1.2 to show


that

for some constant C independent of h. (Note that the ordinary Euclidean


norm of the difference between u and u is not O(h2) but only O(h). The
norm in (9.45) is more like the £2 norm for functions:

This is a reasonable way to measure the error in a vector that approxi-


mates a function at n points, since if the difference is equal to e at each
point, then the error norm in (9.45) is e, not \fne..}
9.3. Show that the eigenvectors in (9.18) are orthonormal.
9.4. Use Theorem 9.2.1 to show that if Orthomin(l) is applied to the scaled
transport equation

where (p — O"1/2^, then it will converge to the solution for any initial
vector, and, at each step, the 2-norm of the residual (in the scaled
equation) will be reduced by at least the factor 7 in (9.41).
9.5. A physicist has a code that solves the transport equation using source
iteration (9.33). She decides to improve the approximation by replacing
^(fc+i) aj. eacn s^ep with the linear combination ak+i<j>(k+i' + (1 —
atk+i)<t>(k\ where oifc+i is chosen to make the 2-norm of the residual as
small as possible. Which of the methods described in this book is she
using?
Chapter 10

Comparison of Preconditioners

We first briefly consider the classical iterative methods—Jacobi, Gauss-Seidel,


and SOR. Then more general theory is developed for comparing preconditioners
used with simple iteration or with the conjugate gradient or MINRES methods
for symmetric positive definite problems. Most of the theorems in this chapter
(and throughout the remainder of this book) apply only to real matrices, but
this restriction will be apparent from the hypotheses of the theorem. The
algorithms can be used for complex matrices as well.

10.1. Jacobi, Gauss-Seidel, SOR.


An equivalent way to describe Algorithm I of section 2.1 is as follows. Write
A in the form A — M — N so that the linear system Ax = b becomes

Given an approximation Xk-\, obtain a new approximation Xk by substituting


£fc-i into the right-hand side of (10.1) so that

To see that (10.2) is equivalent to Algorithm 1, multiply by M~l in (10.2) and


substitute M~1N = I — M~1A to obtain

The simple iteration algorithm was traditionally described by (10.2), and the
decomposition A = M — N was referred to as a matrix splitting. The terms
"matrix splitting" and "preconditioner," when referring to the matrix M, are
synonymous.
If M is taken to be the diagonal of A, then the simple iteration procedure
with this matrix splitting is called Jacobi's method. We assume here that the
diagonal entries of A are nonzero, so M"1 is defined. It is sometimes useful
to write the matrix equation (10.2) in element form to see exactly how the
update to the approximate solution vector is accomplished. Using parentheses
147
148 Iterative Methods for Solving Linear Systems

to denote components of vectors, Jacobi's method can be written in the form

Note that the new vector Xk cannot overwrite Xk-i in Jacobi's method until
all of its entries have been computed.
If M is taken to be the lower triangle of A, then the simple iteration
procedure is called the Gauss-Seidel method. Equations (10.2) become

For the Gauss-Seidel method, the latest approximations to the components


of x are used in the update of subsequent components. It is convenient to
overwrite the old components of Xk-i with those of Xk as soon as they are
computed.
The convergence rate of the Gauss-Seidel method often can be improved
by introducing a relaxation parameter u. The SOR (successive overrelaxation)
method is defined by

In matrix form, if A = D — L — U, where D is diagonal, L is strictly lower


triangular, and U is strictly upper triangular, then M = u!~1D — L. The
method should actually be called overrelaxation or underrelaxation, according
to whether u > 1 or u> < 1. When u = 1 the SOR method reduces to Gauss-
Seidel. In the Gauss-Seidel method, each component Xk(i) is chosen so that the
zth equation is satisfied by the current partially updated approximate solution
vector. For the SOR method, the ith component of the current residual vector
is (1 — uj)aa(xk(i) — Xk-i(i)), where Xk(i) is the value that would make the ith
component of the residual zero.
Block versions of the Jacobi, Gauss-Seidel, and SOR iterations are easily
defined. (Here we mean block preconditioners, not blocks of iteration vectors
as in section 7.4.) If M is taken to be the block diagonal of A—that is, if A is
of the form
Comparison of Preconditioners 149

where each diagonal block Aij is square and nonsingular, and

—then the simple iteration procedure with this matrix splitting is called the
block Jacobi method. Similarly, for M equal to the block lower triangle of A,
we obtain the block Gauss-Seidel method; for M of the form u>~lD — L, where
D is the block diagonal and L is the strictly block lower triangular part of A,
we obtain the block SOR method.
When A is real symmetric or complex Hermitian, then symmetric or
Hermitian versions of the Gauss-Seidel and SOR preconditioners can be
defined. If one defines MI = u~lD — L, as in the SOR method, and
M-2 = u~lD — U and sets

then the resulting iteration is known as the symmetric SOR or SSOR method.
It is left as an exercise to show that the preconditioner M in this case is

that is, if we eliminate #fc_i/2> then Xk satisfies Mx^ = Nx^-i + ft, where
A = M — N. The SSOR preconditioner is sometimes used with the CG
algorithm for Hermitian positive definite problems.

10.1.1. Analysis of SOR. A beautiful theory describing the convergence


rate of the SOR iteration and the optimal value for u was developed by Young
[144]. We include here only the basics of that theory, for two reasons. First,
it is described in many other places. In addition to [144], see, for instance,
[77, 83].
Second, the upshot of the theory is an expression for the optimal value
of u; and the spectral radius of the SOR iteration matrix in terms of the
spectral radius of the Jacobi iteration matrix. In most practical applications,
the spectral radius of the Jacobi matrix is not known, so computer programs
have been developed to try to dynamically estimate the optimal value of uj.
The theory is most often applied to the model problem for Poisson's equation
on a square, described in section 9.1.1, because here the eigenvalues are known.
For this problem it can be shown that with the optimal value of u, the spectral
radius of the SOR iteration matrix is 1 — O(h) instead of 1 — O(h2) as it is
for u> — 1. This is a tremendous improvement; it means that the number of
iterations required to achieve a fixed level of accuracy is reduced from O(h~2)
to O(h~1}. (Recall that the spectral radius is the same as the 2-norm for a
150 Iterative Methods for Solving Linear Systems

Hermitian matrix. Hence it determines not only the asymptotic convergence


rate but the amount by which the error is reduced at each step.)
One obtains the same level of improvement, however, with the unprecon-
ditioned CG algorithm, and here there are no parameters to estimate. The
development of the CG algorithm for Hermitian positive definite problems has
made SOR theory less relevant. Therefore, we will concentrate most of our
effort on finding preconditioners that lead to still further improvement on this
O(h~1} estimate.
Let A be written in the form A = D — L — U, where D is diagonal, L is
strictly lower triangular, and U is strictly upper triangular. The asymptotic
convergence rates of the Jacobi and SOR methods depend on the spectral
radii of Gj = I - D'1A = D~l(L + U) and Gu = I - (u~lD - L)~1A =
(D — u>L) -1 [(l — u)D + uiU], respectively. Note that if we prescale A by its
diagonal so that A = D^A = I - D~1L - D~1U, then the Jacobi and SOR
iteration matrices do not change. For convenience, let us assume that A has
been prescaled by its diagonal and let L and U now denote the strictly lower
and strictly upper triangular parts of the scaled matrix, A = I — L — U. Then
the Jacobi and SOR iteration matrices are

We first note that for the SOR method, we need only consider values of uj
in the open interval (0, 2).
THEOREM 10.1.1. For any u e C, we have

Since the matrices here are triangular, their determinants are equal to the
product of their diagonal entries, so we have det(Gij) — (1 — u;)n. The
determinant of G^ is also equal to the product of its eigenvalues, and it follows
that at least one of the n eigenvalues must have absolute value greater than or
equal to |1 — u\. D
Theorem 10.1.1 holds for any matrix A (with nonzero diagonal entries).
By making additional assumptions about the matrix A, one can prove more
about the relation between the convergence rates of the Jacobi, Gauss-Seidel,
and SOR iterations. In the following theorems, we make what seems to be a
rather unusual assumption (10.9). We subsequently note that this assumption
can sometimes be verified just by considering the sparsity pattern of A.
Comparison of Preconditioned 151

THEOREM 10.1.2. Suppose that the matrix A = I — L — U has the following


property: for any c € R,

for all 7 € R\{0}. Then the following properties hold:


(i) // // is an eigenvalue of Gj, then —p is an eigenvalue of Gj with the
same multiplicity.
(ii) // A = 0 is an eigenvalue of G^, then u — 1.
(iii) If X 7^ 0 is an eigenvalue of G^ for some uj € (0,2), then

(iv) If p. is an eigenvalue of Gj and A satisfies (10.10) for some us e (0,2),


then A is an eigenvalue ofG^.
Proof. From property (10.9) with 7 = —1, we have, for any number /i,

Since the eigenvalues of Gj are the numbers fj. for which det(Gj — ///) = 0
and their multiplicities are also determined by this characteristic polynomial,
result (i) follows.
Since the matrix / — uL is lower triangular with ones on the diagonal, its
determinant is 1; for any number A we have

If A = 0 is an eigenvalue of G^., then (10.11) implies that det[(l— uj)I+uU)} = 0.


Since this matrix is upper triangular with (1 — w)'s along the diagonal, we
deduce that u = 1 and thus prove (ii).
For A 7^ 0, equation (10.11) implies that

Using property (10.9) with 7 = A"1/2, we have

It follows that if A ^ 0 is an eigenvalue of G^ and if y, satisfies (10.10), then /z


is an eigenvalue of Gj. Conversely, if /x is an eigenvalue of Gj and A satisfies
(10.10), then A is an eigenvalue of G^. This proves (iii) and (iv).
152 Iterative Methods for Solving Linear Systems

COROLLARY 10.1.1. When the coefficient matrix A satisfies (10.9), asymp-


totically the Gauss-Seidel iteration is twice as fast as the Jacobi iteration; that
w,p(Gi) = (p(Gj)) 2 .
Proof. For u; = 1, (10.10) becomes

If all eigenvalues A of G\ are 0, then part (iv) of Theorem 10.1.2 implies that
all eigenvalues of Gj are 0 as well. If there is a nonzero eigenvalue A of GI,
then part (iii) of Theorem 10.1.2 implies that there is an eigenvalue // of Gj
such that n = A 1 / 2 . Hence p(Gj)2 > p(Gi). Part (iv) of Theorem 10.1.2
implies that there is no eigenvalue /j, of Gj such that |/u|2 > p(G\); if there
were such an eigenvalue //, then A = /j2 would be an eigenvalue of GI, which
is a contradiction. Hence p(Gj)2 = p(G\).
In some cases—for example, when A is Hermitian—the Jacobi iteration
matrix Gj has only real eigenvalues. The SOR iteration matrix is non-
Hermitian and may well have complex eigenvalues, but one can prove the
following theorem about the optimal value of o> for the SOR iteration and the
corresponding optimal convergence rate.
THEOREM 10.1.3. Suppose that A satisfies (10.9), that Gj has only real
eigenvalues, and that /3 = p(Gj) < 1. Then the SOR iteration converges for
every u> £ (0,2), and the spectral radius of the SOR matrix is

where Uapt, the optimal value of u>, is

For any other value of u, we have

Proof. Solving (10.10) for A gives

It follows from Theorem 10.1.2 that if fj, is an eigenvalue of Gj, then both roots
A are eigenvalues of G^.
Since fj, is real, the term inside the square root in (10.16) is negative if
Comparison of Preconditioners 153

and in this case

In the remaining part of the range of a;, both roots A are positive and the
larger one is

Also, this value is greater than or equal to u — 1 for u € (0, u>] since in this
range we have

It is easy to check that for any fixed u G (0,u;], expression (10.18) is a


strictly increasing function of |//|. Likewise, u> is a strictly increasing function
of |/i|, and we have

It follows that an eigenvalue A of G^ for which |A| = p(Gu]) corresponds to


an eigenvalue // of Gj for which [{j,\ = 0 because such an eigenvalue is greater
than or equal to those corresponding to smaller values of |/^| if u € (0, u>opt],
and it is equal to the others if w 6 (u;opt, 2). We thus deduce that (10.13)
holds for (jjypt given by (10.14). Since the expressions in (10.13) are less than
1 for all cj € (0,2), the SOR iteration converges. It can also be seen that
for fixed |/z| = /3, expression (10.18) is a strictly decreasing function of u> for
u> 6 (0, Wopt], thereby reaching its minimum at cj = Uopt- Inequality (10.15) is
then proved.
The expression in (10.13) for p(GM) is plotted in Figure 10.1 for different
values of /? = p(Gj). It can be seen from the figure that if the optimal value
ujgpt is not known, then it is better to overestimate it than to underestimate
it, especially for values of (3 near 1. Some computer codes have been designed
to estimate a;^ dynamically, but these will not be discussed here.
The condition (10.9) of Theorem 10.1.2 can sometimes be established just
by considering the sparsity pattern of A.
DEFINITION 10.1.1. A matrix A of order n has Property A if there exist
two disjoint subsets Si and 82 of Zn = {1,..., n} such that Si (J 3% = Zn and
such that if a^j ^ 0 for some i ^ j, then either i £ Si and j E 82 or i £ 82
and j € S\.
154 Iterative Methods for Solving Linear Systems

FIG. 10.1. Spectral radius of the SOR matrix for different values of ui and
0 = p(Gj).

THEOREM 10.1.4. A matrix A has Property A if and only if A is a diagonal


matrix or else there exists a permutation matrix P such that P~1AP has the
form

where D\ and D% are square diagonal matrices.


Proof. If A has Property A, then if Si or $2 is empty, A is a diagonal
matrix. Otherwise, order the rows and columns of A with indices in Si first,
followed by those with indices in £2- Prom the definition of Si and 82, it
follows that the two diagonal blocks of order card(Si) and card(S2) will t>e
diagonal matrices.
Conversely, if A can be permuted into the form (10.19), then take Si to
be the set of indices corresponding to the first diagonal block and 52 to be
those corresponding to the second diagonal block. Then Si and 62 satisfy the
properties required in the definition of Property A.
We state without proof the following theorem. For a proof see, e.g., [83].
THEOREM 10.1.5. If a matrix A has Property A then there is a permutation
matrix P such that P~1AP satisfies (10.9).
The Poisson Equation. We saw an example earlier of a matrix with
Property A, namely, the matrix arising from a 5-point finite difference
approximation to Poisson's equation on a square. By numbering the nodes
of the grid in a red-black checkerboard fashion, we obtained a matrix of the
form (10.19). It turns out that even if the natural ordering of nodes is used,
the assumption (10.9) is satisfied for this matrix.
The eigenvalues of this matrix are known explicitly and are given in
Theorem 9.1.2. If we assume that hx = hy = h and scale the matrix to
Comparison of Preconditioners 155

have ones on its diagonal, then these eigenvalues are

where m = nx = ny. The eigenvalues of the Jacobi iteration matrix Gj are


one minus these values, so we have

where the last equality comes from setting i = k = loii = k = mto obtain
the maximum absolute value and then using a Taylor expansion for sin(:r).
Knowing the value of p(Gj), Theorem 10.1.3 tells us the optimal value of
u as well as the convergence rate of the SOR iteration for this and other values
of (jj. It follows from Theorem 10.1.3 that

and, therefore,

In contrast, for u = 1, Theorem 10.1.3 shows that the spectral radius of the
Gauss-Seidel iteration matrix is

Comparing (10.20-10.22) and ignoring higher order terms in h, it can be


seen that while the asymptotic convergence rate of the Gauss-Seidel method is
twice that of Jacobi's method, the difference between the Gauss-Seidel method
and SOR with the optimal u is rmlch greater. Looking at the reduction in the
log of the error for each method, we see that while the log of the error at
consecutive steps differs by O(/i2) for the Jacobi and Gauss-Seidel methods, it
differs by O(h) for SOR with the optimal w.
Figure 10.2 shows a plot of the convergence of these three methods as
well as the unpreconditioned CG algorithm for h = 1/51. A random solution
was set and the right-hand side was computed. The 2-norm of the error is
plotted. While the SOR method is a great improvement over the Jacobi and
Gauss-Seidel iterations, we see that even for this moderate value of h all of
the methods require many iterations to obtain a good approximate solution.
As already noted, the CG iteration is more appropriate than simple iteration
for symmetric positive definite problems such as this, and the remaining
chapters of this book will discuss preconditioners designed to further enhance
the convergence rate.
156 Iterative Methods for Solving Linear Systems

FIG. 10.2. Convergence of iterative methods for the model problem, h = 1/51.
Jacobi (dotted), Gauss-Seidel (dashed), SOR with optimal ui (solid), unpreconditioned
CG (dash-dot).

10.2. The Perron-Frobenius Theorem.


A powerful theory is available for comparing asymptotic convergence rates
of simple iteration methods when used with a class of splittings known as
"regular splittings." This theory is based on the work of Perron and Frobenius
on nonnegative matrices. The Perron-Frobenius theorem is an important tool
in many areas of applied linear algebra. We include here proofs of only parts of
that theory. For a more complete exposition, see [80], from which this material
was extracted.
Notation. We will use the notation A > B (A > B) to mean that each
entry of the real matrix A is greater than or equal to (strictly greater than) the
corresponding entry of B. The matrix with (z,j)-entry |ay| will be denoted by
\A\. The matrix A is called positive (nonnegative) if A > 0 (A > 0).
Let A and B be n-by-n matrices and let v be an n-vector. It is left as an
exercise to show the following results:

10.2a. \Ak\ < \A\k for all fc = 1,2,....

10.2b. If 0 < A < B, then 0 < Ak < Bk for all k = 1,2,....

10.2c. If A > 0, then A" > 0 for all fc = 1,2,....

10.2d. If A > 0 and v > 0 and v is not the 0 vector, then Av > 0.

10.2e. If A > 0 and v > 0 and Av > av for some a > 0, then Akv > akv for all
£=1,2,....
Comparison of Preconditioners 157

THEOREM 10.2.1. Let A and B be n-by-n matrices. If \A\ < B, then


P(A) < p(\A\) < p(B).
Proof. It follows from exercises 10.2a and 10.2b that for every k = 1, 2 , . . . ,
we have \Ak\ < \A\k < Bk, so the Frobenius norms of these matrices satisfy

Since the spectral radius of a matrix C is just lirm^oo |||Clfc|||1/fc, where


HI • HI is any matrix norm (Corollary 1.3.1), taking limits in (10.23) gives
p(A)<p(\A\)<p(B). n
COROLLARY 10.2.1. Let A and B be n-by-n matrices. IfO<A<B, then
p(A)<p(B).
COROLLARY 10.2.2. Let A and B be n-by-n matrices. IfQ<A<B, then
p(A)<p(B).
Proof. There is a number a > 1 such that 0 < A < a A < B. It follows
from Corollary 10.2.1 that p(B) > ap(A), so if p(A) / 0, then p(B] > p(A).
If p(A) = 0, consider the matrix C with (1, l)-entry equal to bu > 0 and all
other entries equal to zero. The spectral radius of this matrix is 6n, and we
have C = \C\< B, so p(B) > bn > 0. D
In 1907, Perron proved important results for positive matrices. Some of
these results are contained in the following theorem.
THEOREM 10.2.2. Let A be an n-by-n matrix and suppose A > 0. Then
p(A) > 0, p(A) is an eigenvalue of A, and there is a positive vector v such that
Av = p(A)v.
Proof. It follows from Corollary 10.2.2 that p(A) > 0. By definition of
the spectral radius, there is an eigenvalue A with |A| = p(A). Let v be an
associated nonzero eigenvector. We have

so y = A\v\ — p(A)\v\ > 0. If y is the 0 vector, then this implies that p(A)
is an eigenvalue of A with the nonnegative eigenvector }v\. If \v\ had a zero
component, then that component of A\v\ would have to be zero, and since each
entry of A is positive, this would imply that v is the 0 vector (Exercise 10.2d),
which is a contradiction. Thus, if y is the 0 vector, Theorem 10,2.2 is proved.
If y is not the 0 vector, then Ay > 0 (Exercise 10.2d); setting z = A\v\ > 0,
we have 0 < Ay = Az — p(A)z or Az > p(A)z. It follows that there is some
number a > p(A) such that Az > az. From Exercise 10.2e, it follows that for
every k > 1, Akz > akz. From this we conclude that ||J4fc|j1/fc > a > p(A)
for all k. But since lim^-^oo ll^^ll 1 ^ = p(A), this leads to the contradiction
p(A) > a > p(A). a
Theorem 10.2.2 is part of the Perron theorem, which also states that there
is a unique eigenvalue A with modulus equal to p(A) and that this eigenvalue
is simple.
THEOREM 10.2.3 (Perron). If A is an n-by-n matrix and A > 0, then
158 Iterative Methods for Solving Linear Systems

(a) p(A) > 0;


(b) p(A) is a simple eigenvalue of A;
(c) p(A) is the unique eigenvalue of maximum modulus; that is, for any other
eigenvalue A of A, |A| < p(A); and
(d) there is a vector v with v > 0 such that Av = p(A)v.
The unique normalized eigenvector characterized in Theorem 10.2.3 is often
called the Perron vector of A; p(A) is often called the Perron root of A.
In many instances we will be concerned with nonnegative matrices that
are not necessarily positive, so it is desirable to extend the results of Perron
to this case. Some of the results can be extended just by taking suitable
limits, but, unfortunately, limit arguments are only partially applicable. The
results of Perron's theorem that generalize by taking limits are contained in
the following theorem.
THEOREM 10.2.4. If A is an n-by-n matrix and A > 0, then p(A) is an
eigenvalue of A and there is a nonnegative vector v > 0, with \\v\\ = I , such
that Av = p(A)v.
Proof. For any e > 0, define A(e) = [oij + e] > 0. Let v(e) > 0 with
||v(e)|| = 1 denote the Perron vector of A(e) and p(e) the Perron root. Since
the set of vectors v(e) is contained in the compact set {w : \\w\\ = 1}, there
is a monotone decreasing sequence ei > €3 > ... with lim^oo efc = 0 such that
limfe-.oo v(ejt) = v exists and satisfies ||u|| = 1. Since v(efe) > 0, it follows that
v>0.
By Theorem 10.2.1, the sequence of numbers {p(ek)}k=i,2,... is a monotone
decreasing sequence. Hence p = lim.k-nx p(^k) exists and p > p(A). But from
the fact that

and the fact that v is not the zero vector, it follows that p is an eigenvalue of
A and so p < p(A). Hence it must be that p = p(A). D
The parts of Theorem 10.2.3 that are not contained in Theorem 10.2.4 do
not carry over to all nonnegative matrices. They can, however, be extended
to irreducible nonnegative matrices, and this extension was carried out by
Frobenius.
DEFINITION 10.2.1. Let A be an n-by-n matrix. The graph of A is

The set G(A) can be visualized as follows. For each integer i = 1,... ,ra,
draw a vertex, and for each pair (i, j ) E G(A), draw a directed edge from
vertex i to vertex j. This is illustrated in Figure 10.3.
Comparison of Preconditioners 159

FIG. 10.3. Graph of a matrix.

DEFINITION 10.2.2. Ann-by-n matrix A is called irreducible if every vertex


in the graph of A is connected to every other vertex through a chain of edges.
Otherwise, A is called reducible.
The matrix A is reducible if and only if there is an ordering of the indices
such that A takes the form

where An and A?? are square blocks of dimension greater than or equal to
1. To see this, first suppose that A is of the form (10.25) for some ordering
of the indices. Let I\ be the set of row numbers of the entries of A\\ and
let /2 be the set of row numbers of the entries of ^22- If j € /2 is connected
to i e /i, then somewhere in the path from j to i there must be an edge
connecting an element of /2 to an element of /i, but this would correspond
to a nonzero entry in the (2,1) block of (10.25). Conversely, if A is reducible,
then there must be indices j and i such that j is not connected to i. Let
/i = {k : k is connected to i} and let /2 consist of the remaining indices. The
sets /i and /2 are nonempty, since i 6 I\ and j G /2- Enumerate first /i, then
/2- If an entry in /2 were connected to any entry in /i, it would be connected
to i, which is a contradiction. Therefore, the (2,1) block in the representation
of A using this ordering would have to be 0, as in (10.25). The matrix on the
left in Figure 10.3 is irreducible, while that on the right is reducible.
THEOREM 10.2.5 (Perron-Probenius). Let A be an n-by-n real matrix and
suppose that A is irreducible and nonnegative. Then
(a) p(A) > 0;
(b) p(A) is a simple eigenvalue of A;
(c) if A has exactly k eigenvalues of maximum modulus p(A), then these
eigenvalues are the kth roots of unity times p(A): \j = e2m:''kp(A); and
160 Iterative Methods for Solving Linear Systems

(d) there is a vector v with v > 0 such that Av = p(A)v.

10.3. Comparison of Regular Splittings.


We now use the Perron-Frobenius theorem to compare "regular splittings"
when the coefficient matrix A is "inverse-positive." The main results of this
section (Theorem 10.3.1 and Corollaries) are due to Varga [135].
DEFINITION 10.3.1. For n-by-n real matrices A, M, and N, the splitting
A = M — N is a regular splitting if M is nonsingular with M~l > 0 and
M>A.
THEOREM 10.3.1. Let A — M — N be a regular splitting of A, where
A~l > 0. Then

Proof. Since M~1A = / — M~1N is nonsingular, it follows that M~1N


cannot have an eigenvalue equal to 1. Since M~1N > 0, this, combined
with Theorem 10.2.4, shows that p(M~lN] cannot be 1. It also follows from
Theorem 10.2.4 that there is a vector v > 0 such that M~lNv = p(M~lN)v.
Now we can also write

So

If p(M 1N) > 1, then this would imply that A 1Nv has negative components,
which is impossible since A~l > 0, N > 0, and v > 0. This proves that
p(M~lN] < 1. It also follows from (10.26) that p(M~lN}/(\ - P(M-1N)) is
an eigenvalue of A~1N, so we have

or equivalently, since

Now, we also have A 1N > 0, from which it follows by Theorem 10.2.4 that
there is a vector w > 0 such that A~lNw = p(A~1N)w. Using the relation

we cn write
Comparison of Preconditioners 161

so p(A~1N)/(l + p(A~1N)) is an eigenvalue of M~1N. It follows that

and combining this with (10.27), the theorem is proved.


Prom Theorem 10.3.1 and the fact that x/(\ + x) is an increasing function
of x, the following corollary is obtained.
COROLLARY 10.3.1. Let A = MI - NI = M% - N2 be two regular splittings
of A, where A~l > 0. // NI < N% then

With the slightly stronger assumption that A~l > 0, the inequalities in
Corollary 10.3.1 can be replaced by strict inequalities.
COROLLARY 10.3.2. Let A = MI - NI = M2 - N2 be two regular splittings
of A, where A~* > 0. // NI < N2 and neither NI nor N2 — NI is the null
matrix, then

It may not be easy to determine if the inverse of a coefficient matrix A


is nonnegative or positive, which are the conditions required in Corollaries
10.3.1 and 10.3.2. In [81, pp. 114-115], a number of equivalent criteria are
established. We state a few of these here.
DEFINITION 10.3.2. An n-by-n matrix A is called an M-matrix if
(i) an >0, i = l , . . . , n ,
(ii) Ojj < 0, i, j = I,... ,n, j ^ i, and
(iii) A is nonsingular and A~l > 0.
The name "M-matrix" was introduced by Ostrowski in 1937 as an abbreviation
for "Minkowskische Determinante."
THEOREM 10.3.2 (see [81]). Let A be a real n-by-n matrix with nonpositive
off-diagonal entries. The following statements are equivalent:

1. A is an M-matrix.

2. A is nonsingular and A~l > 0. (Note that condition (i) in the definition
of an M-matrix is not necessary. It is implied by the other two
conditions.)

3. All eigenvalues of A have positive real part. (A matrix A with this


property is called positive stable, whether or not its off-diagonal entries
are nonpositive.)

4. Every real eigenvalue of A is positive.


162 Iterative Methods for Solving Linear Systems

5. All principal minors of A are M-matrices.


6. A can be factored in the form A = LU, where L is lower triangular, U
is upper triangular, and all diagonal entries of each are positive.
7. The diagonal entries of A are positive, and AD is strictly row diagonally
dominant for some positive diagonal matrix D.

It was noted in section 9.2 that under assumption (9.35) the coefficient
matrix (9.31) arising from the transport equation has positive diagonal entries
and nonpositive off-diagonal entries. It was also noted that the matrix is weakly
row diagonally dominant. It is only weakly diagonally dominant because
off-diagonal entries of the rows in the last block sum to 1. If we assume,
however, that 7 = supx as(x)/0t(x) < 1> then the other rows are strongly
diagonally dominant. If the last block column is multiplied by a number
greater than 1 but less than 7"1, then the resulting matrix will be strictly row
diagonally dominant. Thus this matrix satisfies criterion (7) of Theorem 10.3.2,
and therefore it is an M-matrix. The block Gauss-Seidel splitting described
in section 9.2 is a regular splitting, so by Theorem 10.3.1. iteration (9.34)
converges. Additionally, if the initial error has all components of one sign, then
the same holds for the error at each successive step, since the iteration matrix
/ — M~1A = M~1N has nonnegative entries. This property is often important
when subsequent computations with the approximate solution vector expect a
nonnegative vector because the physical flux is nonnegative.
In the case of real symmetric matrices, criterion (3) (or (4)) of Theorem
10.3.2 implies that a positive definite matrix with nonpositive off-diagonal
entries is an M-matrix. We provide a proof of this part.
DEFINITION 10.3.3. A real matrix A is a Stieltjes matrix if A is symmetric
positive definite and the off-diagonal entries of A are nonpositive.
THEOREM 10.3.3. Any Stieltjes matrix is an M-matrix.
Proof. Let A be a Stieltjes matrix. The diagonal elements of A are positive
because A is positive definite, so we need only verify that A~l > 0. Write
A — D — C, where D — diag(A) is positive and C is nonnegative. Since A is
positive definite, it is nonsingular, and A~l — [D(I — .B)]"1 = (I — B)~lD~l,
where B = D~1C. If p(B] < 1, then the inverse of / — B is given by the
Neumann series

and since B > 0 it would follow that (/ - B)~l > 0 and, hence, A~l > 0.
Thus, we need only show that p(B) < 1.
Suppose p(B] > 1. Since B > 0, it follows from Theorem 10.2.4 that
p(B] is an eigenvalue of B. But then D~1A = / — B must have a nonpositive
eigenvalue, 1 — p(B). This matrix is similar to the symmetric positive definite
matrix D"l/'2AD~1^, so we have a contradiction. Thus p(B) < 1.
The matrix arising from the diffusion equation defined in (9.6-9.8) is a
Stieltjes matrix and, hence, an M-matrix.
Comparison of Preconditioners 163

It follows from Corollary 10.3.1 that if A is an M-matrix then the


asymptotic convergence rate of the Gauss-Seidel iteration is at least as good as
that of Jacobi's method. In this case, both methods employ regular splittings,
and the lower triangle of A, used in the Gauss-Seidel iteration, is closer
(elementwise) to A than the diagonal of A used in the Jacobi iteration. If the
matrix A is also inverse positive, A~l > 0, then Corollary 10.3.2 implies that
the asymptotic convergence rate of the Gauss-Seidel iteration is strictly better
than that of Jacobi's method. (A stronger relation was proved in Corollary
10.1.1, but this was only for matrices satisfying (10.9).) Among all diagonal
matrices M whose diagonal entries are greater than or equal to those of A,
however, Corollary 10.3.2 implies that the Jacobi splitting M = diag(A) is
the best. Similarly, when considering regular splittings in which the matrix
M is restricted to have a certain sparsity pattern (e.g., banded with a fixed
bandwidth), Corollary 10.3.2 implies that the best choice of M, as far as
asymptotic convergence rate of the simple iteration method is concerned, is
to take the variable entries of M to be equal to the corresponding entries of A.
Corollaries 10.3.1 and 10.3.2 confirm one's intuition, in the special case
of regular splittings of inverse-nonnegative or inverse-positive matrices, that
the closer the preconditioner M is to the coefficient matrix A, the better the
convergence of the preconditioned simple iteration (at least asymptotically).
Of course, many regular splittings cannot be compared using these theorems
because certain entries of one splitting are closer to those of A while different
entries of the other are closer. Also, many of the best splittings are not regular
splittings, so these theorems do not apply. The SOR splitting is not a regular
splitting for an M-matrix if u > 1.

10.4. Regular Splittings Used with the CG Algorithm.


For Hermitian positive definite systems, the A-norm of the error in the PCG
algorithm (which is the L^lAL~H-nonn of the error for the modified linear
system (8.2)) and the 2-norm of L~l times the residual in the PMINRES
algorithm can be bounded in terms of the square root of the condition number
of the preconditioned matrix using (3.8) and (3.12). Hence, in measuring the
effect of a preconditioner, we will be concerned not with the spectral radius of
/ — M~1A but with the condition number of L~1AL~H (or, equivalently, with
the ratio of largest to smallest eigenvalue of M~1A).
With slight modifications, Corollaries 10.3.1 and 10.3.2 can also be used
to compare condition numbers of PCG or PMINRES iteration matrices when
A, MI, and MI are real symmetric and positive definite. Note that some
modifications will be required, however, because unlike simple iteration, the
PCG and PMINRES algorithms are insensitive to scalar multiples in the
preconditioner; that is, the approximations generated by these algorithms with
preconditioner M are the same as those generated with preconditioner cM for
any c > 0 (Exercise 10.3).
THEOREM 10.4.1. Let A, M\, and M2 be symmetric, positive definite
164 Iterative Methods for Solving Linear Systems

matrices satisfying the hypotheses of Corollary 10.3.1, and suppose that the
largest eigenvalue of M%1A is greater than or equal to 1. Then the ratios of
largest to smallest eigenvalues of M^1A and M%1A satisfy

Proof. Since the elements of M2 1JV2 are nonnegative, it follows from


Theorem 10.2.4 that its spectral radius is equal to its (algebraically) largest
eigenvalue:

The result from Corollary implies that

or, equivale

Dividing the second inequality by the first gives

Since, by assumption, Amax(M2~1^l) > 1 and since p(M^lN2) < 1 implies that
Amin(M2~1-A) > 0, the second factor on the right-hand side is less than 2, and
the theorem is proved.
THEOREM 10.4.2. The assumption in Theorem 10.4.1 that the largest
eigenvalue of M%1A is greater than or equal to 1 is satisfied if A and M%
have at least one diagonal element in common.
Proof. If A and M? have a diagonal element in common, then the symmetric
matrix N% has a zero diagonal element. This implies that M^N-i has a
nonpositive eigenvalue since the smallest eigenvalue of this matrix satisfies

if £j is the vector with a 1 in the position of this zero diagonal element and O's
elsewhere. Therefore, M^1A = I — M^lNz has an eigenvalue greater than or
equal to 1.
Theorems 10.4.1 and 10.4.2 show that once a pair of regular splittings
have been scaled properly for comparison (that is, MI has been multiplied
by a constant, if necessary, so that A and MI have at least one diagonal
element in common), the one that is closer to A elementwise gives a smaller
condition number for the PCG or PMINRES iteration matrix (except possibly
Comparison of Preconditioners 165

for a factor of 2). This means that the Chebyshev bound (3.8) on the error
at each step will be smaller (or, at worst, only slightly larger) for the closer
preconditioner. Other properties, however, such as tight clustering of most
of the eigenvalues, also affect the convergence rate of PCG and PMINRES.
Unfortunately, it would be difficult to provide general comparison theorems
based on all of these factors, so the condition number is generally used for this
purpose.

10.5. Optimal Diagonal and Block Diagonal Preconditioners.


Aside from regular splittings, about the only class of preconditioners among
which an optimal or near optimal preconditioner is known is the class of
diagonal or block-diagonal preconditioners. If "optimality" is defined in terms
of the symmetrically preconditioned matrix having a small condition number,
then the (block) diagonal of a Hermitian positive definite matrix A is close to
the best (block) diagonal preconditioner.
Recall the definition of Property A from section 10.1. A matrix with this
property is also said to be 2-cyclic. Moreover, we can make the following more
general definition.
DEFINITION 10.5.1. A matrix A is block 2-cyclic if it can be permuted into
the form

where D\ and DI are block diagonal matrices

Forsythe and Strauss [52] showed that for a Hermitian positive definite
matrix A in 2-cyclic form, the optimal diagonal preconditioner is M = diag(A).
Eisenstat, Lewis, and Schultz [41] later generalized this to cover matrices in
block 2-cyclic form with block diagonal preconditioners. They showed that
if each block D^j is the identity, then A is optimally scaled with respect to
all block diagonal matrices with blocks of order n^j. The following slightly
stronger result is due to Eisner [42].
THEOREM 10.5.1 (Eisner). // a Hermitian positive definite matrix A has
the form

then for any nonsingular D of the form


166 Iterative Methods for Solving Linear Systems

Proof. If A is an eigenvalue of A with an eigenvector whose first block is v


and whose second block is w (which we will denote as (v,w)), then it follows
from (10.29) that

From this we conclude that (v; —w)T is an eigenvector of A with eigenvalue


2 — A, since

It follows that if An is the largest eigenvalue of A, then AI = 2 — An is the


smallest, and K,(A) = A n /(2 - A n ).
Let

Thus we have

and for any nonsingular matrix D, we can write

(10.31) K(A) < p(SA'1SA) = p(D-lSA-1SAD) < \\D~1SA-1SAD\\.


Now, if D is of the form (10.30), then S and D commute. Also, ||5|| = 1, so
we have

Combining (10.31) and (10.32) gives the desired result.


Suppose A is not of the form (10.29) but can be permuted into that form,
say, A = PTAP, where P is a permutation matrix and A is of the form (10.29).
Then for any block-diagonal matrix D of the form (10.30), we can write

If the permutation is such that PDPT is a block-diagonal matrix of the


form (10.30), then A, like A, is optimally scaled among such block-diagonal
matrices; if n(DHAD) were less than n(A) for some D of the form (10.30),
then K(DHAD) would be less than K,(A), where D = PDPT, which is a
contradiction. In particular, if A has Property A then the optimal diagonal
Comparison of Preconditioners 167

preconditioner is M — diag(^4), since if D is diagonal then PTDP is diagonal


for any permutation matrix P. If A is written in a block form, where the blocks
can be permuted into block 2-cyclic form (without permuting entries from one
block to another), then the optimal block diagonal preconditioner (with the
same size blocks) is M = block diag(A).
Theorem 10.5.1 implies that for Hermitian positive definite block 2-cyclic
matrices, the block diagonal of the matrix is the best block-diagonal precon-
ditioner (in terms of minimizing the condition number of the preconditioned
matrix). For arbitrary Hermitian positive definite matrices, the block diagonal
of the matrix is almost optimal. The following theorem of van der Sluis [131]
deals with ordinary diagonal preconditioners, while the next theorem, due to
Demmel [31] (and stated here without proof), deals with the block case.
THEOREM 10.5.2 (van der Sluis). // a Hermitian positive definite matrix
A has all diagonal elements equal, then

where T> — {positive definite diagonal matrices} and m is the maximum


number of nonzeros in any row of A.
Proof. Write A = UHU, where U is upper triangular. Since A has
equal diagonal elements, say, 1, each column of U has norm 1. Also, each
off-diagonal entry of A has absolute value less than or equal to 1, since
\a,ij\ = \u^Uj\ < \\Ui\\ • {\Uj\\ < 1, where Ui and Uj are the z'th and jth columns
of U. Additionally, it follows from Gerschgorin's theorem that

For any nonsingular matrix D we can write

Now so we have

(Note that we have not yet made any assumption about the matrix D. The
result holds for any nonsingular matrix D such that \\UD\\ > \\D\\.)
Now assume that D is a positive definite diagonal matrix with largest entry
djj. Let £j be the jth unit vector. Then

Combining (10.34) and (10.35) gives the desired result.


168 Iterative Methods for Solving Linear Systems

THEOREM 10.5.3 (Demmel). If a Hermitian positive definite matrix A has


all diagonal blocks equal to the identity, say

then

where T>B = {nonsingular block-diagonal matrices with blocks of order m,


... ,nm}, and m is the number of diagonal blocks in A.
As an example, consider the matrix denned in (9.6-9.8) arising from a 5-
point finite difference approximation to the diffusion equation. This matrix
is block tridiagonal with ny diagonal blocks, each of order nx. Of all block
diagonal preconditioners D with blocks of order nx, the optimal one for
minimizing the condition number of the symmetrically preconditioned matrix
.D~1/2AD~1/'2, or the ratio of largest to smallest eigenvalue of D~1A, is

It follows from Theorem 10.5.3 that this matrix D is within a factor of ny of


being optimal, but it follows from Theorem 10.5.1 that D is actually optimal
because the blocks of A can be permuted into block 2-cyclic form.
These theorems on block-diagonal preconditioners establish just what one
might expect—the best (or almost best) block-diagonal preconditioner M has
all of its block-diagonal elements equal to the corresponding elements of A.
Unfortunately, such results do not hold for matrices M with other sparsity
patterns. For example, suppose one considers tridiagonal preconditioners M
for the matrix in (9.6-9.8) or even for the simpler 5-point approximation to
the negative Laplacian. The tridiagonal part of this matrix is
Comparison of Preconditioners 169

The block-diagonal part of A is a tridiagonal matrix, and it is the optimal


block-diagonal preconditioner for A, but it is not the optimal tridiagonal
preconditioner. By replacing the zeros in A between the diagonal blocks with
certain nonzero entries, one can obtain a better preconditioner. (To obtain
as much as a factor of 2 improvement in the condition number, however, at
least some of the replacement entries must be negative, since otherwise this
would be a regular splitting and Theorem 10.4.1 would apply.) The optimal
tridiagonal preconditioner for the 5-point Laplacian is not known analytically.
Based on the results of this section, it is reasonable to say that one should
always (well, almost always) use at least the diagonal of a positive definite
matrix as a preconditioner with the CG or MINRES algorithm. Sometimes
matrices that arise in practice have diagonal entries that vary over many
orders of magnitude. For example, a finite difference or finite element matrix
arising from the diffusion equation (9.1-9.2) will have widely varying diagonal
entries if the diffusion coefficient a(x, y) varies over orders of magnitude. The
eigenvalues of the matrix will likewise vary over orders of magnitude, although
a simple diagonal scaling would greatly reduce the condition number. ' For
such problems it is extremely important to scale the matrix by its diagonal
or, equivalently, to use a diagonal preconditioner (or some more sophisticated
preconditioner that implicitly incorporates diagonal scaling). The extra work
required for diagonal preconditioning is minimal. Of course, for the model
problem for Poisson's equation, the diagonal of the matrix is a multiple of
the identity, so unpreconditioned CG and diagonally scaled CG are identical.
The arguments for diagonal scaling might not apply if the unsealed matrix
has special properties apart from the condition number that make it especially
amenable to solution by CG or MINRES.

Exercises.
10.1. Show that the SSOR preconditioner is of the form (10.6).

10.2. Prove the results in (10.2a-e).


10.3. Show that the iterates Xk generated by the PCG algorithm with
preconditioner M are the same as those generated with preconditioner
cM for any c > 0.

10.4. The multigroup transport equation can be written in the form

where ipg(r,Fl) is the unknown flux associated with energy group g and
ag(r, Q), crflifl/(r, f2 • fi'), and fg(r,fl) are known cross section and source
terms. (Appropriate boundary conditions are also given.) A standard
170 Iterative Methods for Solving Linear Systems

method for solving this set of equations is to move the terms of the sum
corresponding to different energy groups to the right-hand side and solve
the resulting set of equations for ^i,..., I/JG in increasing order of index,
using the most recently updated quantities on the right-hand side; that
is,

Identify this procedure with one of the preconditioned iterative methods


described in this chapter. How might it be accelerated?
Chapter 11

Incomplete Decompositions

A number of matrix decompositions were described in section 1.3. These


include the LU or Cholesky decomposition as well as the QR factorization.
Each of these can be used to solve a linear system Ax = b. If the matrix A is
sparse, however, the triangular factors L and U are usually much less sparse;
this is similar for the unitary and upper triangular factors Q and R. For
large sparse matrices, such as those arising from the discretization of partial
differential equations, it is usually impractical to compute and work with these
factors.
Instead, one might obtain an approximate factorization, say, A « LU,
where L and U are sparse lower and upper triangular matrices, respectively.
The product M = LU then could be used as a preconditioner in an iterative
method for solving Ax — b. In this chapter we discuss a number of such
incomplete factorizations.

11.1. Incomplete Cholesky Decomposition.


Any Hermitian positive definite matrix A can be factored in the form A = LLH,
where L is a lower triangular matrix. This is called the Cholesky factorization.
If A is a sparse matrix, however, such as the 5-point approximation to the
diffusion equation defined in (9.6-9.8), then the lower triangular factor L is
usually much less sparse than A. In this case, the entire band "fills in" during
Gaussian elimination, and L has nonzeros throughout a band of width nx below
the main diagonal. The amount of work to compute L is O(r?x • nxny) = O(n2)
if nx — ny and n = nxny. The work required to solve a linear system with
coefficient matrix L is O(nx • nxny} or O(n3/2).
One might obtain an approximate factorization of A by restricting the
lower triangular matrix L to have a given sparsity pattern, say, the sparsity
pattern of the lower triangle of A. The nonzeros of L then could be chosen so
that the product LLH would match A in the positions where A has nonzeros,
although, of course, LLH could not match A everywhere. An approximate
factorization of this form is called an incomplete Cholesky decomposition. The
matrix M = LLH then can be used as a preconditioner in an iterative method
such as the PCG algorithm. To solve a linear system Mz — r, one first solves

171
172 Iterative Methods for Solving Linear Systems

the lower triangular system Ly = r and then solves the upper triangular system
LHz = y.
The same idea can also be applied to non-Hermitian matrices to obtain
an approximate LU factorization. The product M = LU of the incomplete
LU factors then can be used as a preconditioner in a non-Hermitian matrix
iteration such as GMRES, QMR, or BiCGSTAB. The idea of generating such
approximate factorizations has been discussed by a number of people, the
first of whom was Varga [136]. The idea became popular when it was used
by Meijerink and van der Vorst [99] to generate preconditioned for the CG
method and related iterations. It has proved a very successful technique in a
range of applications and is now widely used in large physics codes. The main
results of this section are from [99].
We will show that the incomplete LU decomposition exists if the coefficient
matrix A is an M-matrix. This result was generalized by Manteuffel [95] to
cover H-matrices with positive diagonal elements. The matrix A = [aij] is
an H-matrix if its comparison matrix—the matrix with diagonal entries |aji|,
i = 1,..., n and off-diagonal entries — \Oij\, i,j = 1,...,n, j ^ i—is an M-
matrix. Any diagonally dominant matrix is an H-matrix, regardless of the
signs of its entries.
In fact, this decomposition often exists even when A is not an ff-matrix. It
is frequently applied to problems in which the coefficient matrix is not an H-
matrix, and entries are modified, when necessary, to make the decomposition
stable [87, 95].
The proof will use two results about M-matrices, one due to Fan [47] and
one due to Varga [135].
LEMMA 11.1.1 (Fan). If A = [a^] is an M-matrix, then A^ = [aL ] is an
M-matrix, where A^ is the matrix that arises by eliminating the first column
of A using the first row.
LEMMA 11.1.2 (Varga). If A = [aij] is an M-matrix and the elements of
B = [bij] satisfy

then B is also an M-matrix.


Proof. Write where We have

so it will follow that B~l > 0 and, therefore, that B Is an M-matrix. To


see that p(G) < 1, note that if A is written in the form A = M — N, where
M = diag(yl), then this is a regular splitting, so we have p(M~lN) < 1. From
the assumptions on B, however, it follows that 0 < G < M~1N, so from the
Perron-Frobenius theorem we have
Incomplete Decompositions 173

Lemma 11.1.2 also could be derived from (7) in Theorem 10.3.3.


Let P be a subset of the indices {(i, j) : j ^ i, i, j = 1,... ,n}. The indices
in the set P will be the ones forced to be 0 in our incomplete LU factorization.
The following theorem not only establishes the existence of the incomplete LU
factorization but also shows how to compute it.
THEOREM 11.1.1 (Meijerink and van der Vorst). If A = [ay-] is an n-by-n
M-matrix, then for every subset P of off-diagonal indices there exists a lower
triangular matrix L = [lij] with unit diagonal and an upper triangular matrix
U = [uij] such that A = LU — R, where

The factors L and U are unique, and the splitting A = LU — R is a regular


splitting.
Proof. The proof proceeds by construction through n — 1 stages analogous
to the stages of Gaussian elimination. At the fcth stage, first replace the entries
in the current coefficient matrix with indices (k. j) and (i, k} € P by 0. Then
perform a Gaussian elimination step in the usual way: eliminate the entries in
rows k + 1 through n of column k by adding appropriate multiples of row k to
rows k + 1 through n. To make this precise, define the matrices

by the relations

where R^ is zero except in positions ( k , j ) e P and in positions ( i , k ) 6 P,


where rjj.. = — oL~ and rik = — a\k~ '. The lower triangular matrix L^ is
the identity, except for the kih column, which is

From this it is easily seen that A^ is the matrix that arises from A^ by
eliminating elements in the fcth column using row fc, while A^ is obtained
from A(k~l) by replacing entries in row or column k whose indices are in P by
0.
Now, A(°) = A is an M-matrix, so R^ > 0. From Lemma 11.1.2 it follows
that AW is an M-matrix and, therefore, L^ > 0. From Lemma 11.1.1 it
follows that A^ is an M-matrix. Continuing the argument in this fashion, we
can prove that A^ and A^ are M-matrices and L^ > 0 and R^ > 0 for
k = I,... ,n — 1. From the definitions it follows immediately that
174 Iterative Methods for Solving Linear Systems

By combining these equations we have

Let us now define U = A^n'l\ L = (Il"=i ^ (n ~ j) )~ 1 , and R = E?=?R[i)-


Then LU = A + fl, (LU)-1 > 0, and R > 0, so the splitting A = LU - R
is regular. The uniqueness of the factors L and U follows from equating the
elements of A and LU for ( i , j ) g P and from the fact that L has a unit
diagonal.
COROLLARY 11.1.1 (Meijerink and van der Vorst). If A is a symmetric
M-matrix, then for each subset P of the off-diagonal indices with the property
that ( i , j ) € P implies (j,i) € P, there exists a unique lower triangular matrix
L with lij = 0 if (i, j) € P such that A — LLT — R, where r^ = 0 i f ( i , j ) 0 P.
The splitting A = LLT — R is a regular splitting.
When A has the sparsity pattern of the 5-point approximation to the
diffusion equation (9.6-9.8), the incomplete Cholesky decomposition that
forces L to have the same sparsity pattern as the lower triangle of A is especially
simple. It is convenient to write the incomplete decomposition in the form
LDLT, where D is a diagonal matrix. Let a denote the main diagonal of A,
b the first lower diagonal, and c the (m + l)st lower diagonal, where m = nx.
Let a denote the main diagonal of L, b the first lower diagonal, and c the
(m -\- l)st lower diagonal; let d denote the main diagonal of D. Then we have

The product M = LDLT has an zth row of the form

where FJ = (bj_iCj_i)/a,-_i. Usually the off-diagonal entries bj_i and Cj_i


are significantly smaller in absolute value than a, (for the model problem,
bj_iCi_i/ai = 1/4) and are also significantly smaller in absolute value than aj.
Thus, one expects the remainder matrix R in the splitting A = M — R to be
small in comparison to A or M.
Although the incomplete Cholesky decomposition is a regular splitting, it
cannot be compared to preconditioners such as the diagonal of A or the lower
triangle of A (using Corollary 10.3.1 or Theorem 10.4.1), because some entries
of the incomplete Cholesky preconditioner M = LDLT are closer to those
of A than are the corresponding entries of diag(^4) or lower triangle(^4), but
some entries are further away. Numerical evidence suggests, however, that
the incomplete Cholesky preconditioner used with the CG algorithm often
requires significantly fewer iterations than a simple diagonal preconditioner.
Of course, each iteration requires somewhat more work, and backsolving
Incomplete Decompositions 175

with the incomplete Cholesky factors is not an easily parallelizable operation.


Consequently, there have been a number of experiments suggesting that on
vector or parallel computers it may be faster just to use M = diag(^4) as a
preconditioner.
Other sparsity patterns can be used for the incomplete Cholesky factors.
For example, while the previously described preconditioner is often referred
to as IC(0) since the factor L has no diagonals that are not already in A,
Meijerink and van der Vorst suggest the preconditioner 1C (3), where the set
P of zero off-diagonal indices is

With this preconditioner, L has three extra nonzero diagonals—the second,


(m — 2)nd, and (m — l)st subdiagonals—and again the entries are chosen so
that LDLT matches A in positions not in P.
The effectiveness of the incomplete Cholesky decomposition as a precon-
ditioner depends on the ordering of equations and unknowns. For example,
with the red-black ordering of nodes for the model problem, the matrix A
takes the form (9.9), where D\ and £>2 axe diagonal and B, which represents
the coupling between red and black points, is also sparse. With this ordering,
backsolving with the IC(0) factor is more parallelizable than for the natural
ordering since L takes the form

One can solve a linear system with coefficient matrix L by first determining
the red components of the solution in parallel, then applying the matrix BT
to these components, and then solving for the black components in parallel.
Unfortunately, however, the incomplete Cholesky preconditioner obtained with
this ordering is significantly less effective in reducing the number of CG
iterations required than that obtained with the natural ordering.

11.2. Modified Incomplete Cholesky Decomposition.


While the incomplete Cholesky preconditioner may significantly reduce the
number of iterations required by the PCG algorithm, we will see in this
section that for second-order elliptic differential equations the number of
iterations is still O(h~l), as it is for the unpreconditioned CG algorithm; that
is, the condition number of the preconditioned matrix is O(h~2). Only the
constant has been improved. A slight modification of the incomplete Cholesky
decomposition, however, can lead to an O(h~l) condition number. Such a
modification was developed by Dupont, Kendall, and Rachford [37] and later
by Gustafsson [74]. Also, see [6]. The main results of this section are from
[74].
Consider the matrix A = Ah in (9.6-9.8) arising from the 5-point
approximation to the steady-state diffusion equation, or, more generally,
176 Iterative Methods for Solving Linear Systems

consider any matrix Ah obtained from a finite difference or finite element


approximation with mesh size h for the second-order self-adjoint elliptic
differential equation

defined on a region fi C R2, with appropriate boundary conditions on dft.


Assume o^ = cti(x, y) > a > 0, i = 1,2.
Such a matrix usually has several special properties. First, it contains
only local couplings in the sense that if a^- ^ 0, then the distance from node
i to node j is bounded by a constant (independent of h) times h. We will
write this as O(h). Second, since each element of a matrix vector product Av
approximates £v(x,y), where v(x,y) is the function represented by the vector
V, and since £ acting on a constant function v yields 0, the row sums of A are
zero, except possibly at points that couple to the boundary of f l . Assume that
A is scaled so that the nonzero entries of A are of size O(l). The dimension n
of A is O(/i~2). The 5-point Laplacian (multiplied by /i2) is a typical example:

If A = M — R is a splitting of A, then the largest and smallest eigenvalues


of the preconditioned matrix M~1A are

and can be written in the form

Suppose the vector v represents a function v(x,y) in C*o(fi)—the space of


continuously differentiable functions with value 0 on the boundary of Q. By
an elementary summation by parts, we can write

Because of the zero row sum property of .A, we have ]T^ a^v2 = 0 unless node
i is coupled to the boundary of fi, and this happens only if the distance from
node i to the boundary is O(h). Since v(x,y) € CQ(Q), it follows that at such
points \Vi\ is bounded by O(h). Consequently, since the nonzero entries of A
are of order 0(1), the second sum in (11.4) is bounded in magnitude by the
Incomplete Decompositions 177

number of nodes i that couple to d$l times O(h2). In most cases this will be
0(h).
Because of the local property of A, it follows that for nodes i and j such
that dij is nonzero, the distance between nodes i and j is O(h) and, therefore,
\Vi — Vj\ is bounded by O(h). The first sum in (11.4) therefore satisfies

since there are O(h 2) terms, each of size O(h2).


For the remainder matrix R, we can also write

Suppose that the remainder matrix also has the property that nonzero entries
Tij correspond only to nodes i and j that are separated by no more than
O(h), and suppose also that the nonzero entries of R are of size O(l) (but
are perhaps smaller than the nonzero entries of A). This is the case for the
incomplete Cholesky decomposition where, for the 5-point Laplacian, r^- is
nonzero only if j = i + m — 1 OT j = i — m - f l . These positions correspond to
the nodes pictured below, whose distance from node i is \f2h.

Then, by the same argument as used for A, the first sum in (11.5) is bounded
in absolute value by O(l).
The bound on the second term in (11.4), however, depended on the zero
row sum property of A. If this property is not shared by R (and it is not for
the incomplete Cholesky decomposition or for any regular splitting, since the
entries of R axe all nonnegative), then this second sum could be much larger.
It is bounded by the number of nonzero entries of R in rows corresponding to
nodes away from the boundary, which is typically O(/i~ 2 ), times the nonzero
values of r^, which are of size 0(1), times the value of the function v(x,y)
away from the boundary, which is O(l). Hence the second sum in (11.5) may
be as large as O(h~2). For vectors v representing a Co-function, the ratio
(Rv,v)/(Av,v) in (11.3) is then of size O(h~2), so if (Rv,v} is positive (as it
is for a regular splitting if v > 0), then the ratio (Av,v}/(Mv,v) in (11-3) is
of size O(h2). In contrast, if we consider the first unit vector £1, for example,
178 Iterative Methods for Solving Linear Systems

then {-A£i,£i)/(M£i,£i} = 0(1). It follows that the condition number of the


preconditioned matrix is at least O(h~2), which is the same order as K,(A).
We therefore seek a preconditioner M = LLT such that A = M — R and
\(Rv, v)\ < O(h~1} for v(x,y) € CQ, in order to have a chance of producing
a preconditioned matrix with condition number O(h~1} instead of O(h~2).
Suppose A is written in the form A = M — R, where

and where R is negative semidefinite (that is, (Rv,v) < 0 Vu), X^^j = 0 Vi,
and E is a positive definite diagonal matrix. Assume also that R has nonzero
entries only in positions (i,j) corresponding to nodes i and j that are within
O(h) of each other. Our choice of the matrix E depends on the boundary
conditions. For Dirichlet problems, which will be dealt with here, we choose
E = r]h2dia,g(A), where 77 > 0 is a parameter. For Neumann and mixed
problems, similar results can be proved if some elements of E, corresponding
to points on the part of the boundary with Neumann conditions, are taken to
be of order O(h).
From (11.5), it can be seen that .R in (11.6) satisfies

when v(x,y) E CQ(^), since the first sum in (11.5) is of size O(l). Since the
row sums of R are all zero and the nonzero entries of E are of size O(/i 2 ), we
have

so the necessary condition |(.Rt;,u}| < O(h~l) is certainly satisfied. The


following theorem gives a sufficient condition to obtain a preconditioned matrix
with condition number O(h~l).
THEOREM 11.2.1 (Gustafsson). Let A = M — R, where R is of the form
(11.6), R is negative semidefinite and has zero row sums and only local cou-
plings, and E is a positive definite diagonal matrix with diagonal entries of size
O(h2). Then a sufficient condition to obtain \max(M~1A)/\min(M~1A) =
O(h~l) is

where c > 0 is independent of h.


Proof. There exist constants c\ and 02, independent of h, such that
ci/i2 < (Av,v}/(v,v) < C2. Since the entries of E are of order h2, it follows
that 0 < (Ev,v)/(Av,v) < 03 for some constant 03. From (11.3) and the fact
that E is positive definite and R is negative semidefinite, we can write
Incomplete Decompositions 179

The rightmost expression here, and hence


order O(h~1} if R satisfies (11.7).
When A is an M-matrix arising from discretization of (11-1), a simple
modification of the incomplete Cholesky idea, known as modified incomplete
Cholesky decomposition (MIC) [37, 74], yields a preconditioner M such that
^max(M~l A]/\min(M~l A) — O(h~1}. Let L be a lower triangular matrix with
zeros in positions corresponding to indices in some set P. Choose the nonzero
entries of L so that M = LLT matches A in positions outside of P except for
the main diagonal. Setting E = r)h2di&g(A), also force R = LLT — (A + E)
to have zero rowsums. It can be shown, similar to the unmodified incomplete
Cholesky case, that this decomposition exists for a general M-matrix A and
that the off-diagonal elements of Ft are nonnegative while the diagonal elements
are negative. As for ordinary incomplete Cholesky decomposition, a popular
choice for the set P is the set of positions in which A has zeros, so L has the
same sparsity pattern as the lower triangle of A.
When A has the sparsity pattern of the 5-point approximation (9.6-9.8),
this can be accomplished as follows. Again, it is convenient to write the
modified incomplete Cholesky decomposition in the form LDL1', where D
is a diagonal matrix. Let a denote the main diagonal of A, b the first lower
diagonal, and c the (m + l)st lower diagonal. Let a denote the main diagonal
of L, b the first lower diagonal, and c the (m+ l)st lower diagonal; let d denote
the main diagonal of D. Then we have b = b, c = c, and for i — 1,..., n,

where elements not defined should be replaced by zeros. The matrix R in


(11.6) satisfies

and all other elements of R are zero.


It can be shown that for smooth coefficients ai(x,y) and az(x, y], the
above procedure yields a preconditioner M for which the preconditioned matrix
L~1AL~T has condition number O(h~l). For simplicity, we will show this
only for the case when a\(x,y) = OL^(x,y) = I and A is the standard 5-point
Laplacian (11.2). The technique of proof is similar in the more general case.
LEMMA 11.2.1 (Gustafsson). Let r,, i = l,...,n — m, be the elements
defined by (11.8-11.9) for the 5-point Laplacian matrix A. Then

where c > 0 is independent of h.


Proof. We first show that
180 Iterative Methods for Solving Linear Systems

For the model problem, the recurrence equations (11.8-11.9) can be written in
the form

For we have Assume that


Then we have

(In fact, for n sufficiently large, the elements §j approach a constant value 7
satisfying

The Value is

Since we obtain

THEOREM 11.2.2 (Gustafsson). Let M = LLT, where the nonzero elements


of L are defined by (11.8-11.9), and A is the 5-point Laplacian matrix. Then
\max(M~lA)/Xmin(M-lA) = 0(h~l).
Proof. For the model problem, using expression (11.4), we can write

for any vector v. An analogous expression for (Rv,v) shows, since the row
sums of R are all zero,

Since — R is a symmetric weakly diagonally dominant matrix with nonnegative


diagonal elements and nonpositive off-diagonal elements, it follows that R is
negative semidefinite. From Lemma 11.2.1 it follows that

Using the inequality ^(a — 6)2 < (a — c)2 4- (c — 6)2, which holds for any
real numbers a, 6, and c, inequality (11.11) can be written in the form
Incomplete Decompositions 181

FIG. 11.1. Convergence of iterative methods for the model problem, h = 1/51.
Unpreconditioned CG (dash-dot), ICCG (dashed), MICCG (solid).

where the right-hand side can also be expressed as (1 -I- ch) L X^r^oK^+i ~
Vif1 + (vi — ^j+m)2]. Since FJ is nonzero only when bj and Cj are nonzero, we
combine this with inequality (11.10) and obtain

The desired result then follows from Lemma 11.2.1.


For sufficiently small values of ft, it is clear that MIC(O) gives a better
condition number for the preconditioned system than does IC(0). In fact, even
for coarse grids, the MIC(O) preconditioner, with a small parameter 77, gives
a significantly better condition number than IC(0) for the model problem.
Figure 11.1 shows the convergence of unpreconditioned CG, ICCG(O), and
MICCG(O) for the model problem with h = 1/51. The quantity plotted is
the 2-norm of the error divided by the 2-norm of the true solution, which was
set to a random vector. A zero initial guess was used. The parameter rj in
the MIC preconditioner was set to .01, although the convergence behavior is
not very sensitive to this parameter. For this problem the condition numbers
of the iteration matrices are as follows: unpreconditioned, 1053; IC(0), 94;
MIC(O), 15. Although the bound (3.8) on the A-norm of the error in terms of
the square root of the condition number may be an overestimate of the actual
A-norm of the error, one does find that as the mesh is refined, the number of
unpreconditioned CG and ICCG iterations tends to grow like O(h~l), while
the number of MICCG iterations grows like O(h"1^2}.
When ICCG and MICCG are applied in practice to problems other than the
model problem, it has sometimes been observed that ICCG actually converges
faster, despite a significantly larger condition number. This might be accounted
182 Iterative Methods for Solving Linear Systems

for by a smaller sharp error bound (3.6) for ICCG, but the reason appears to be
that rounding errors have a greater effect on the convergence rate of MICCG,
because of more large, well-separated eigenvalues. For a discussion, see [133].

Comments and Additional References.


Sometimes the set P of zero entries in the (modified) incomplete Cholesky or
incomplete LU decomposition is not set ahead of time, but, instead, entries
are discarded only if their absolute values lie below some threshold. See, for
example, [100]. Further analysis of incomplete factorizations can be found in
a number of places, including [6, 7, 8, 13, 14, 106].
In addition to incomplete LU decompositions, incomplete QR factoriza-
tions have been developed and used as preconditioners [116]. In order to make
better use of parallelism, sparse approximate inverses have also been proposed
as preconditioners. See, for instance, [15, 16, 17, 88].
The analysis given here for modified incomplete Cholesky decomposition
applied to the model problem and the earlier analysis of the SOR method
for the model problem were not so easy. The two methods required very
different proof techniques, and similar analysis for other preconditioners would
require still different arguments. If one changes the model problem slightly,
however, by replacing the Dirichlet boundary conditions by periodic boundary
conditions, then the analysis of these and other preconditioners becomes much
easier. The reason is that the resulting coefficient matrix and preconditioners
all have the same Fourier modes as eigenvectors. Knowing the eigenvalues of
the coefficient matrix and the preconditioner, it then becomes relatively easy
to identify the largest and smallest ratios, which are the extreme eigenvalues
of the preconditioned matrix. It has been observed numerically and argued
heuristically that the results obtained for the periodic problem are very similar
to those for the model problem with Dirichlet boundary conditions. For an
excellent discussion, see [24].
Chapter 12

Multigrid and Domain Decomposition Methods

Chapters 10 and 11 dealt with preconditioners designed for general classes


of matrices. The origin of the problem was not a factor in denning the
preconditioner, although analysis of the preconditioner was sometimes limited
to problems arising from certain types of partial differential equations. In
this chapter we deal with preconditioners designed specifically for problems
arising from partial differential equations. The methods are intended for
use with broad classes of problems and are not restricted to one particular
equation. Attempts have been made to extend some of these ideas to general
linear systems, as in algebraic multigrid methods, but the extensions are not
immediate.
Because the methods are designed for partial differential equation prob-
lems, their analysis may require detailed knowledge of the properties of finite
difference and finite element approximations, while we are assuming just a ba-
sic familiarity with these ideas. For this reason, we restrict our analysis to the
model problem, where properties shared by more general finite element and
finite difference approximations can be verified directly, and we indicate how
the analysis can be extended.

12.1. Multigrid Methods.


Multigrid methods were not originally described as a combination of an
iteration scheme and a preconditioner, but it is clear that they can be viewed
in this way. The first multigrid methods used simple iteration, so we will start
with that approach. It will be apparent that the same preconditioners can be
used with any of the other Krylov subspace methods described in Part I of this
book.
Recall the simple iteration scheme

The error is given by

183
184 Iterative Methods for Solving Linear Systems

so the norm of the error satisfies

The error is reduced quickly if ||/ - M"1^) '<< 1.


Most multigrid methods can be written in the general form (12.1), where
the iterates x/- represent quantities generated after a coarse grid correction
cycle and a given number of relaxation sweeps. That is, given an approximation
Xk-i to the solution, the multigrid algorithm generates a new approximation
Xk-i,o via a formula of the form

where the matrix C represents a coarse grid approximation to A~l. The


method then generates a certain number, say £, of new approximations, Xk-i,j,
j = 1,..., i by performing relaxation sweeps of the form

where the matrix G also represents an approximation to A~l. If we denote by


Xk the quantity Xk-i,e, then we find

and the error e^ satisfies

Thus, for multigrid methods, the matrix / — M~1A in (12.2) is of the special
form (/ — GAY(I — CA) for certain matrices C and G.

12.1.1. Aggregation Methods. We first analyze iterations of the form


(12.3-12.4) in a general setting, where the matrix C involves the inverse of a
smaller matrix. Such methods are sometimes called aggregation methods.
While multigrid and aggregation methods can be applied to non-Hermitian
and indefinite problems, the analysis here will be restricted to Hermitian
positive definite problems. We will estimate the rate at which the A-norm
of the error, \\ek\\A = {e^Aefc}1/2, is reduced. Taking norms on each side in
equation (12.5), we find that

The quantity ||(/ — GA)f(I — CA)\\A is called the contraction number of the
method and will be denoted by a. In terms of the 2-norm, a is given by

A simple bound for a is


Multigrid and Domain Decomposition Methods 185

The methods to be considered use matrices C and G, for which ||7 —


A1/2CA1'/2|| = 1 and \\I - Al/2GAl/'2\\ < 1. Hence the methods are convergent
whenever A is Hermitian and positive definite, and, moreover, they reduce the
A-norm of the error at each step.
Inequality (12.7), however, is too crude an estimate to provide much
useful information about the rate of convergence. In fact, the methods to
be considered use matrices C and G, which are designed to complement each
other in such a way that the norm of the matrix product in (12.6) is much less
than the product of the norms in (12.7). Instead of inequality (12.7), we use
the definition of the matrix norm to estimate a by

If the range of I — Al^CA1/2 is a restricted set of vectors on which 7 —


Al^GA1^2 is highly contractive, then the bound in (12.8) may be much smaller
than that in (12.7).
We now define the form of the matrix C in iteration (12.3). Suppose A is
an n-by-n matrix and n < n. Let 7? be an arbitrary n-by-n matrix of rank n,
and define an n-by-n matrix 7™ by

Define an n-by-n matrix A by

and take C to be the matrix

The following theorem shows that when C is defined in this way, the matrix
Ai/2CA1/2 is just the orthogonal projector from Cn to the n-dimensional
subspace A1/2 • range(7£).
THEOREM 12.1.1. IfC is defined by (12.9-12.11), then

where ??.(•) denotes the range and jV(-) the null space of an operator. The
matrix A:/2CA1/2 is an orthogonal projector from Cn to an n-dimensional
subspace and hence

Proof. Using definitions (12.10) and (12.11), we find


186 Iterative Methods for Solving Linear Systems

This establishes (12.12). To establish (12.13), note that since / - Al/2CA1/2


is a Hermitian matrix, its norm is the absolute value of its largest eigenvalue.
If z is an eigenvector of this matrix with eigenvalue A, then, using (12.12), we
can write

It follows that either A = 0 or A1/2z e A^). In the latter case, CAl/2z is


zero and hence A = 1. Since the eigenvalues of 7 — Al/2CA1/2, and hence of
AII2CA1/2, are O's and 1's, this establishes that A^CA1/2 is an orthogonal
projector and that (12.13) holds.
Applying the theorem, inequality (12.8) becomes

and since the null space of /" is the orthogonal complement of the range of /£,
this can be written as

Suppose the matrix G is given. Let d2 > • • • > d^ denote the eigenvalues of
(I-A^2GAl'2YH(I-A^2GA^2)e, and let u i , . . . , vn denote the corresponding
orthonormal eigenvectors. For any vector y we can write y — ^2i=i(y,Vi)vi and

Now, in general, we have

but with the additional constraint y _L A1/2 • K(I%), a smaller bound


may be attained. If /? can be chosen so that vi,...,Vn—the eigenvectors
corresponding to the n largest eigenvalues—lie in the space A1/2 • K(I%), then
y will have no components in the direction of these eigenvectors and expression
(12.15) can be replaced by

Under these ideal conditions—vi,..., Vn € A1/2 • Ti(I%)—the bound (12.16) is


replaced by
Multigrid and Domain Decomposition Methods 187

As an example, suppose G is taken to be of the form


(12.18)
where the constant 7 is chosen in an optimal or near optimal way. Then
the eigenvectors wi, . . . , « „ of (/ - A1/2GA1/2)**(/ - A^GA1/*)1 are just the
eigenvectors of A, and the eigenvalues d f , . . . , d^ of this matrix are each of the
form (1 — j\i)2f for some i, where AI < • • • < An are the eigenvalues of A. In
this case, the bound (12.16) becomes

To minimize this bound, take 7 = 2/(A n -f AI) and then

This is the usual bound on the convergence rate for the method of steepest
descent.
On the other hand, suppose some of the eigenvectors of A, say, those
corresponding to the h smallest eigenvalues of .A, lie in the desired space
A1/2 • n(F£). Then an improved bound like (12.17) holds, and this bound
becomes

To minimize this bound, take 7 = 2/(A n + AA+I), and (12.19) becomes

Thus, the effective condition number of A is reduced from n — Xn/\i to


£ = An/An+i- If the latter ratio is much smaller, as is typically the case
when the matrix A approximates a self-adjoint elliptic differential operator,
then much faster convergence is achieved by using a partial step of the form
(12.3) than by iterating only with steps of the form (12.4).

12.1.2. Analysis of a Two-Grid Method for the Model Problem.


Recall that for the model problem — A u = / in the unit square with Dirichlet
boundary conditions, the matrix A arising from a 5-point finite difference
approximation on a grid of m-by-m interior points with spacing h = l/(m +1)
has eigenvalues
188 Iterative Methods for Solving Linear Systems

and the (p, ^-components of the corresponding eigenvectors are

as shown in Theorem 9.1.2. The eigenvalues are all positive and the smallest
and largest eigenvalues are given in Corollary 9.1.2:

For ih or jh of size 0(1), say, i > (m + l)/4 or j > (m + l)/4, we have


AJJ = O(h~2), which is the same order as Amax. Hence if the ((m + l)/4) 2
smallest eigencomponents in the error could be annihilated by solving a smaller
problem, using a partial step of the form (12.3), then the ratio of the largest
to the smallest remaining eigenvalue would be O(l), independent of h. The
bound (12.20) on the convergence rate of iteration (12.3-12.4) with G = 7!
would be a constant less than one and independent of hi
Note also that the eigenvectors corresponding to the smaller values of i
and j are "low frequency." That is, the sine functions do not go through many
periods as p and q range from 1 to m. Thus these eigenvectors could be
represented on a coarser grid. We now show how the annihilation of the small
eigencomponents can be accomplished, approximately, by solving the problem
on a coarser grid.
Assume that m +1 is even and let m = (m —1)/2 be the number of interior
points in each direction of a coarser grid with spacing h = Ih. Let h — m2.
Define the coarse-to-fine prolongation matrix /? to be linear interpolation along
horizontal, vertical, and diagonal (southwest to northeast) lines. That is, if w
is a vector defined at the nodes (1,1) - (m, m) of the coarse grid, define

THEOREM 12.1.2. Let A be the 5-point Laplacian matrix so that \itj and
«W> satisfy (12.21-12.22), and let I? be defined by (12.25). Let v^,..'.,v^
denote the eigenvectors corresponding to the s smallest eigenvalues, \i < • • • <
\s. Ifv is any vector in span[v^,..., v^], with \\v\\ = 1, then v can be written
in the form

for some
Multigrid and Domain Decomposition Methods 189

Proof. First suppose that v = v^'^ is an eigenvector. Then for any re-vector
•u;, we have

where w = A^ w. Since, from (12.24), the norm of A1/2 is bounded by


2\/2 h~l, we have

Let u>(l'fi match v^'^ at the nodes of the coarse grid so that

for p, q — 1...., m. Then from formula (12.22) for t/ lj ) it follows that

Note that if Zptq is denned by (12.28) for all points p and q, then the vector;
z(^j) are orthonormal (Exercise 12.1), as are the vectors v^^ (Exercise 9.3). I:
z(*,j) is defined to be 0 at the other grid points, then it can be checked (Exercise
12.1) that

Summing over p and q and using the formula 1 — cosx = 2 sin2 (o;/2), w<
have
190 Iterative Methods for Solving Linear Systems

From (12.21) it follows that

Making these substitutions and using the fact that ||v(*J')||2 = 1 and ||z(t>J')||2 <
1/4, we can write

or

Combining this with (12.27) gives

Now let V^ be the matrix whose columns are the eigenvectors v^'^
corresponding to the 5 smallest eigenvalues, and let v = V8£ be an arbitrary
vector in the span of the first s eigenvectors, with ||u|| = ||£|| = 1. Consider
approximating v by the vector A1/2I^(W8A7 £), where W3 has columns w^
corresponding to the s smallest eigenvalues and As is the diagonal matrix of
these eigenvalues. The difference S = v — Al/2I£(WsA.s f) is given by

where the columns of As are the vectors 6^ = v^ - I^w^'j\ From (12.28)


1/2
the vector A5A5 ! f can be written in the form d + e, where
Multigrid and Domain Decomposition Methods 191

{ 0 . at even-even, odd-even, and even-odd grid points,


Z°°£ at odd-odd grid points,
where V°e, V/0, V™, and Z°° consist of the rows of Vs or Zs (the matrix whose
columns are the vectors z^"> defined in (12.28) for indices ( i , j ) corresponding
to the s smallest eigenvalues) corresponding to the odd-even, even-odd, and
odd-odd grid points and

Each of the matrices Fsoe, V*0, V°°, and Z™ has norm less than or equal
to 1, because it is part of an orthogonal matrix. Since ||£|| = 1, it follows that

Using (12.29-12.31), we have

and from (12.33) and (12.24), it follows that

The constant 2 + \/6 in (12.26) is not the best possible estimate because the
piecewise linear interpolant of uW) used in the theorem is not the best possible
coarse grid approximation to u( tj ').
Corollary then

Proof. The left-hand side of inequality (12.34) is the square of the norm of
the vector v = S|=i(j/, u^)u^, and according to Theorem 12.1.2 this vector
satisfies

for some n-vector w. The condition y _L A1/2?^/?) and \\y\\ — 1 implies

Since (y, u) = (v,v} — ||i>||2, the desired result (12.34) follows.


192 Iterative Methods for Solving Linear Systems

We now use Theorem 12.1.2 and Corollary 12.1.1 to bound the quantity
on the right-hand side of (12.14), again assuming that G = 7/. In this case,
inequality (12.14) can be written in the form

Taking we can write

where K' = \n/\s+i. Applying Corollary 12.1.1 (and using the fact that a
function of the form x + ((K' — !)/(«' + l)) 2f (l — x) is an increasing function
of x for 0 < x < 1), this becomes

provided that c2/i2As < 1.


Now, from (12.24) we know that An is bounded by ah~2, where a = 8.
(We are using the symbolic constants c and a instead of their actual values
because similar results hold for more general finite element and finite difference
equations for some constants c and a that are independent of h but are not
necessarily the same ones as for the model problem.) Let (3 > 0 be any number
less than or equal to a and such that

Choose s so that As is the largest eigenvalue of A less than or equal to (3h~2:

Then expression (12.35) becomes

We thus obtain a bound on a2 that is strictly less than one and independent
of h. For example, choosing 0 — l/(2c2) gives

For the model problem, this establishes K' < 16(2 + \/6)2 and, for i = 1,
<r < .997. This is a large overestimate of the actual contraction number for the
two-grid method, but it does establish convergence at a rate that is independent
of h. To obtain a better estimate of cr, it would be necessary to derive a sharper
bound on the constant c in Theorem 12.1.2.
Multigrid and Domain Decomposition Methods 193

12.1.3. Extension to More General Finite Element Equations.


The key to the analysis of section 12.1.2 was Theorem 12.1.2, showing that
vectors in the span of eigenvectors associated with small eigenvalues of A
can be well approximated on a coarser grid. This is true in general for
standard finite element matrices and often for finite difference matrices. It is
a consequence of the fact that the (functions represented by the) eigenvectors
corresponding to smaller eigenvalues on both the fine and coarse grids provide
good approximations to eigenfunctions of the elliptic differential operator, and
hence they also approximate each other. For the analogues of Theorem 12.1.2
and Corollary 12.1.1 in a more general setting, see [64]. Similar results can be
found in [104].

12.1.4. Multigrid Methods. The two-grid algorithm described in sec-


tions 12.1.1-12.1.2 is not practical in most cases because it requires solving a
linear system on a grid of spacing 2h. Usually this is still too large a problem to
solve directly. The algorithm could be applied recursively, and the solution to
the problem on the coarser grid could be obtained by projecting the right-hand
side onto a still coarser grid, solving a linear system there, interpolating the
solution back to the finer grid, performing relaxation steps there, and repeating
this cycle until convergence. If the grid-2/i problem is solved very accurately,
however, this method would also be time consuming. A number of cycles might
be required to solve the problems on coarser grids before ever returning to the
fine grid where the solution is actually needed.
Instead, coarser grid problems can be "solved" very inaccurately by
performing just one relaxation sweep until the coarsest level is reached, at
which point the problem is solved directly. Let grid levels 0,1,..., J be defined
with maximum mesh spacings ho < hi < • • • < hj, and let A^ denote the
coefficient matrix for the problem at level j. The linear system on the finest
grid is Au = /, where A = A^. The multigrid V-cycle consists of the following
steps.

Given an initial guess UQ, compute r<> = /o = / — AUQ. For


* = 1,2,...,

(7—1)
Project r%_i onto grid level j; that is, set

where 7j_: is the restriction matrix from grid level j — I


to grid level j.
Perform a relaxation sweep (with zero initial guess) on
grid level j; that is, solve
194 Iterative Methods for Solving Linear Systems

and compute

endfor

Project rjj._^ onto grid level J by setting fW = /j_i^_^ ' and


solve on the coarsest grid A^dk\ = f^-

For j = J- !,...,!,
Interpolate d^_ r ' to grid level j and add to tf{~x ; that
is, replace

where 7j+1 is the prolongation matrix from grid level j'+l


to grid level j.
Perform a relaxation sweep with initial guess 6^^ on grid
level j; that is, set

endfor

Interpolate d^^ to grid level 0 and replace Uk-i •*— «fc-i + ^i^Jb-i-
Perform a relaxation sweep with initial guess UK-I on grid level 0;
that is, set Uk = v,k-i+G~l(f—Auk-\). Compute the new residual
rk~rf=f-Auk.

This iteration is called a V-cycle because it consists of going down through


the grids from fine to coarse, performing a relaxation sweep on each grid, then
coming back up from coarse to fine, and again performing a relaxation sweep
at each level, as pictured below. (Sometimes an initial relaxation sweep on
the fine grid is performed before projecting the residual onto the next coarser
grid.) Other patterns of visiting the grids are also possible. In the W-cyde,
for instance, one uses two V-cycles at each of the coarser levels, resulting in a
pattern like that shown below for four grid levels. In the full multigrid V-cyde,
one starts on the coarsest grid, goes up one level and then back down, up two
levels and then back down, etc., until the finest level is reached. This provides
the initial guess for the standard V-cycle, which is then performed.

V-Cycle W-cycle Full multigrid V-cycle


Multigrid and Domain Decomposition Methods 195

The restriction and prolongation matrices /|+1 and /|+1, as well as the
relaxation scheme with matrix G, can be tuned to the particular problem.
For the model problem, the linear interpolation matrix /j+1 is appropriate,
although it is not the only choice, and it is reasonable to define the restriction
matrix I3-+ to be /L_i, as in section 12.1.1. The damped Jacobi relaxation
scheme described in section 12.1.1 is convenient for analysis, but other
relaxation schemes maj|i perform better in practice. The red-black Gauss-
Seidel relaxation method is often used. (That is, if nodes are ordered so that
the matrix A has the form (9.9), then G is taken to be the lower triangle of
A.)
Figure 12.1 shows the convergence of the multigrid V-cycle with red-black
Gauss-Seidel relaxation for the model problem for grid sizes h = 1/64 and
h — 1/128. The coarsest grid, on which the problem was solved directly,
was of size h = 1/4. Also shown in Figure 12.1 is the convergence curve for
MICCG(O). The work per iteration (or per cycle for multigrid) for these two
algorithms is similar. During a multigrid V-cycle, a (red-black) Gauss-Seidel
relaxation step is performed once on the fine grid and twice on each of the
coarser grids. Since the number of points on each coarser level grid is about
1/4 that of the finer grid, this is the equivalent of about 1 and 2/3 Gauss-
Seidel sweeps on the finest grid. After the fine grid relaxation is complete,
a new residual must be computed, requiring an additional matrix-vector
multiplication on the fine grid. In the MICCG(O) algorithm, backsolving with
the L and LT factors of the MIC decomposition is twice the work of backsolving
with a single lower triangular matrix in the Gauss-Seidel method, but only one
matrix-vector multiplication is performed at each step. The CG algorithm also
requires some inner products that are not present in the multigrid algorithm,
but the multigrid method requires prolongation and restriction operations that
roughly balance with the work for the inner products. The exact operation
count is implementation dependent, but for the implementation used here
(which was designed for a general 5-point matrix, not just the Laplacian),
the operation count per cycle/iteration was about 4 In for multigrid and about
30n for MICCG(O).
It is clear from Figure 12.1 that for the model problem, the multigrid
method is by far the most efficient of the iterative methods we have discussed.
Moreover, the multigrid method demonstrated here is not the best. The
number of cycles can be reduced even further (from 9 down to about 5 to
achieve an error of size 10~6) by using the W-cycle or the full multigrid V-
cycle, with more accurate restriction and prolongation operators. The work
per cycle is somewhat greater, but the reduction in number of cycles more than
makes up for the slight increase in cycle time. (See Exercise 12.2.)
The multigrid method described here works well for a variety of problems,
including nonsymmetric differential equations, such as — A u + cux = /, as
well as for the model problem. It should be noted, however, that while the
performance of ICCG and MICCG is not greatly changed if the model problem
196 Iterative Methods for Solving Linear Systems

FlG. 12.1. Convergence of the multigrid V-cyde (solid) and MICCG(Q) (dashed)
for the model problem.

is replaced by the diffusion equation (9.1) with a highly varying diffusion


coefficient, this is not the case for the multigrid method. The multigrid
algorithm used for the model problem will still converge at a rate that is
independent of h if applied to the diffusion equation, but the convergence rate
will be greatly affected by the variation in the diffusion coefficient. For problems
with discontinuous diffusion coefficients, linear interpolation, as used here, is
not really appropriate. It should be replaced by a form of interpolation that
takes account of the discontinuities [1].
For this reason, instead of thinking of the multigrid method, one should
view the multigrid approach as a framework for developing iterative methods
(that is, preconditioners to be used with simple iteration or other Krylov
subspace methods). Sometimes, based on known properties of the differential
equation, one can identify appropriate prolongation, restriction, and relaxation
matrices that will result in a multigrid method whose convergence rate is not
only independent of h but is much better than that of other methods for
realistic mesh sizes. One should look for both a relaxation method that damps
high frequencies very rapidly and restriction and prolongation matrices having
the property that the low frequency components of the error are greatly reduced
when the residual is projected onto a coarser grid, a problem is solved on that
grid, and the solution is interpolated to the finer grid and added to the previous
approximation. Such multigrid methods have been developed for a wide variety
of physical problems. This is not always possible, however. For problems that
are barely resolved on the grid of interest, it may be unclear how the problem
Multigrid and Domain Decomposition Methods 197

should even be defined on coarser level grids, and one cannot expect to gain
much information from a "solution" on a such a grid.

12.1.5. Multigrid as a Preconditioner for Krylov Subspace Meth-


ods. Some multigrid aficionados will argue that if one has used the proper
restriction, prolongation, and relaxation operators, then the multigrid algo-
rithm will require so few cycles (one or two full multigrid V-cycles to reach
the level of truncation error) that it is almost pointless to try to accelerate it
with CG-like methods. This may be true, but unfortunately such restriction,
prolongation, and relaxation schemes are not always known. In such cases,
CG, GMRES, QMR, or BiCGSTAB acceleration may help.
Equivalently, one can consider multigrid as a preconditioner for one of these
Krylov subspace methods. To solve an equation Mz = r with the multigrid V-
cycle preconditioner M as coefficient matrix, one simply performs one multigrid
V-cycle with right-hand side r and initial guess zero.
For some interesting examples using multigrid as a preconditioner for
GMRES and BiCGSTAB, see [108]. The use of multigrid (with damped Jacobi
relaxation) as a preconditioner for the CG algorithm for solving diffusion-like
equations is described in [4].

12.2. Basic Ideas of Domain Decomposition Methods.


Simulation problems often involve complicated structures such as airplanes
and automobiles. Limitations on computer time and storage may prevent the
modeling of the entire structure at once, so instead a piece of the problem
is studied, e.g., an airplane wing. If different parts of the problem could be
solved independently and then the results somehow pieced together to give
the solution to the entire problem, then a loosely coupled array of parallel
processors could be used for the task. This is one of the motivations for
domain decomposition methods. Even if the domain of the problem is not
so complicated, one might be able to break the domain into pieces on which
the problem is more easily solved, e.g., rectangles on which a fast Poisson
solver could be used or subdomains more suitable for multigrid methods.
If the solutions of the subproblems can be combined in a clever way to
solve the overall problem, then this may provide a faster and more parallel
solution method than applying a standard iterative method directly to the
large problem. We will see that this solution approach is equivalent to using
a preconditioner that involves solving on subdomains. The clever way of
combining the solutions from subdomains is usually a CG-like iterative method.
Domain decomposition methods fall roughly into two classes—those using
overlapping domains, such as the additive and multiplicative Schwarz methods,
and those using nonoverlapping domains, which are sometimes called substruc-
turing methods. If one takes a more general view of the term "subdomain,"
then the subdomains need not be contiguous parts of the physical domain at
all but may be parts of the solution space, such as components that can be
198 Iterative Methods for Solving Linear Systems

represented on a coarser grid and those that cannot. With this interpretation,
multigrid methods fall under the heading of domain decomposition methods.
In this chapter, we describe some basic domain decomposition methods but
give little of the convergence theory. For further discussion, see [123] or [77].

12.2.1. Alternating Schwarz Method. Let £ be a differential operator


defined on a domain fi, and suppose we wish to solve the boundary value
problem

The domain Q is an open set in the plane or in 3-space, and d£l denotes the
boundary of Q. We denote the closure of Cl by £2 = O U 5J7. We have chosen
Dirichlet boundary conditions (u = g on <9£1), but Neumann or Robin boundary
conditions could be specified as well.
The domain f2 might be divided into two overlapping pieces, fl\ and £1%,
such that f2 = HI U J72) as pictured in Figure 12.2. Let I\ and T% denote the
parts of the boundaries of QI and £l<z, respectively, that are not part of the
boundary of fi. To solve this problem, one might guess the solution on FI and
solve the problem

where g\ is the initial guess for the solution on FI. Letting g% be the value of
u\ on F2, one then solves

If the computed solutions u\ and 1/2 are the same in the region where they
overlap then the solution to problem (12.36) is

If the values of u\ and u^ differ in the overlap region, then the process can
be repeated, replacing gi by the value of u-2 on T\, and re-solving problem
(12.37), etc. This idea was introduced by Schwarz in 1870 [120], not as a
computational technique, but to establish the existence of solutions to elliptic
problems on regions where analytic solutions were not known. When used as
a computational technique it is called the alternating Schwarz method.
A slight variation of the alternating Schwarz method, known as the
multiplicative Schwarz method, is more often used in computations. Let the
problem (12.36) be discretized using a standard finite difference or finite
element method, and assume that the overlap region is sufficiently wide so
Multigrid and Domain Decomposition Methods 199

FIG. 12.2. Decomposition of domain into two overlapping pieces.

that nodes in fii\f?2 do not couple to nodes in i^V^i, and vice versa. Assume
also that the boundaries FI and F2 are grid lines. If nodes in Jl^f^ are
numbered first, followed by nodes in fii n J72, and then followed by nodes in
£&2\Oi, then the discretized problem can be written in the form

where the right-hand side vector / includes contributions from the boundary
term u ~ g on d$l.
Starting with an initial guess u^ (which actually need only be defined on
FI for a standard 5-point discretization or, more generally, on points in f^2\^i
that couple to points in QI n f^), the multiplicative Schwarz method for the
discretized system generates approximations u^k\ k = 1,2,..., satisfying

The first equation corresponds to solving the problem on QI, using boundary
data obtained from u^ > ^ . The second equation corresponds to solving the
problem on £22, using boundary data obtained from u^ \n.
200 Iterative Methods for Solving Linear Systems

Note that this is somewhat like a block Gauss-Seidel method for (12.39),
since one solves the first block equation using old data on the right-hand side
and the second block equation using updated data on the right-hand side, but
in this case the blocks overlap:

Let Ej', i = 1,2, be the rectangular matrix that takes a vector defined on
all of fi and restricts it to £1$:

The matrix Ei takes a vector defined on Qj and extends it with zeros to the
rest of £2. The matrices on the left in (12.40) are of the form E?AEi, where A
is the coefficient matrix in (12.39). Using this notation, iteration (12.40) can
be written in the equivalent form

Writing this as two half-steps and extending the equations to the entire
domain, the iteration becomes

Defining Bi = Ei(E?AEi)~lE?', these two half-steps can be combined to give

This is the simple iteration method described in section 2.1 with preconditioner
M"1 = Bi + B2 - BiABi.
One could also consider solving (12.39) using an overlapping block Jacobir
type method; that is, using data from the previous iterate in the right-hand
sides of both equations in (12.40). This leads to the set of equations
Multigrid and Domain Decomposition Methods 201

where u^. = Eju^. The value of u^ in the overlap region has been set in
two different ways by these equations. For the multiplicative Schwarz method,
we used the second equation to define the value of u^ in the overlap region;
for this variant it is customary to take w[j in rj 2 to be the sum of the two values
defined by these equations. This leads to the additive Schwarz method:

In this case, the preconditioner M~l = Bi+B? is Hermitian if A is Hermitian,


so the simple iteration (12.42) can be replaced by the CG or MINRES
algorithm. In fact, some form of acceleration or damping factor must be used
with iteration (12.42) to ensure convergence. To solve the preconditioning
equation Mz — r, one simply solves a problem on each of the two subdomains
independently, using boundary data from the previous iterate, and adds the
results.
We will not prove any convergence results for the additive and multiplica-
tive Schwarz preconditioners, but note that for the model problem and many
other elliptic differential equations using say, GMRES acceleration, these pre-
conditioners have the following properties (see [123]):
• The number of iterations (to reduce the initial residual norm by a fixed
factor) is independent of the mesh size, provided that the overlap region
is kept fixed. (Remember, however, that with only two subdomains the
solution time on each subdomain grows as the mesh size is decreased!)
• The overlap region can be quite small without greatly affecting the
convergence rate.
• The number of iterations for the multiplicative variant is about half
that for the additive algorithm. (This is similar to the relation between
ordinary Gauss-Seidel and Jacobi iterations (10.20, 10.22).)

12.2.2. Many Subdomains and the Use of Coarse Grids. The


multiplicative and additive Schwarz methods are easily extended to multiple
subdomains. We will concentrate on the additive variant because it provides
greater potential for parallelism. If the region S7 is divided into J overlapping
subregions £2i,..., fij, then the additive Schwarz preconditioner is

where Bi = Ei(EjAEj)"1E?', as in the previous section. To apply M~l to a


vector r, one solves a problem on each subdomain with right-hand side Ej'r
and adds the results. These subdomain solves can be carried out in parallel,
so it is desirable to have a large number of subdomains.
Consider the Krylov space generated by / and M~1A:
202 Iterative Methods for Solving Linear Systems

(For a Hermitian positive definite problem we could equally well consider the
Krylov space generated by L~H f and L~1AL~H', where M = LLH.) Suppose
/ has nonzero components in only one of the subdomains, say, £l\. Since
a standard finite difference or finite element matrix A contains only local
couplings, the vector Af will be nonzero only in subdomains that overlap with
fii (or, perhaps, subdomains that are separated from fii by just a few mesh
widths). It is only for these subdomains that the right-hand side E?f of the
subdomain problem will be nonzero and hence that M~lAf will be nonzero.
If this set of subdomains is denoted Si, then it is only for subdomains that
overlap with (or are separated by just a few mesh widths from) subdomains in
Si that the next Krylov vector (M~lA)^f will be nonzero. And so on. The
number of Krylov space vectors will have to reach the length of the shortest
path from J7i to the most distant subregion, say, J7j, before any of the Krylov
vectors will have nonzero components in that subregion. Yet the solution u(x)
of the differential equation and the vector u satisfying the discretized problem
Au = / may well have nonzero (and not particularly small) components in all
of the subdomains. Hence any Krylov space method for solving Au = / with a
zero initial guess and the additive Schwarz preconditioner will require at least
this shortest path length number of iterations to converge (that is, to satisfy
a reasonable error tolerance). As the number of subdomains increases, the
shortest path between the most distant subregions also increases, so the number
of iterations required by, say, the GMRES method with the additive Schwarz
preconditioner will also increase. The reason is that there is no mechanism for
global communication among the subdomains.
An interesting cure for this problem was proposed by Dryja and Widlund
[36]. In addition to the subdomain solves, solve the problem on a coarse grid
whose elements are the subregions of the original grid. If this problem is de-
noted ACUC = fc and if /£ denotes an appropriate type of interpolation (say,
linear interpolation) from the coarse to the fine grid, then the preconditioner
M~l in (12.43) is replaced by

It turns out that this small amount of global communication is sufficient


to eliminate the dependence of the number of iterations on the number of
subdomains. For this two-level method, the number of iterations is independent
of both the mesh width h and the subdomain size H, assuming that the size
of the overlap region is O(H).
A two-level method such as this, however, must still require more than
O(n] work if both the subdomain and coarse grid solvers require more than
O(p) work, where p is the number of points in the subdomain or coarse
grid. With just a few large subdomains, the subdomain solves will be too
expensive and with many small subdomains, the coarse grid solve will be
too expensive. For this reason, multilevel algorithms were developed to solve
Multigrid and Domain Decomposition Methods 203

the large subproblems recursively by another application of the two-level


preconditioner. The multilevel domain decomposition methods bear much
resemblance to (and are sometimes identical with) standard multigrid methods
described in section 12.1.

12.2.3. Nonoverlapping Subdomains. Many finite element codes use


a decomposition of the domain into nonoverlapping subregions to define an
ordering of the unknowns for use with Gaussian elimination. If the domain fi
is divided into two pieces, QI and fi2> with interface F, and if points in the
interior of HI (that do not couple to points in f^) are numbered first, followed
by points in the interior of 0,^ (that do not couple to points in fii) and then
by points on the interface F, then the linear system takes the form

If the matrices AH and Ayz can be inverted easily, then the variables u\ and
U2 can be eliminated using Gaussian elimination and a much smaller Schur
complement problem solved on the interface F:

Once up is known, it can be substituted into (12.45) to obtain u\ and u^.


The coefficient matrix in (12.46) is small but dense and very expensive to
form. It can be applied to a vector, however, by performing a few sparse matrix
vector multiplications and solving on the subdomains. Hence an iterative
method might be applied to (12.46). This is the idea of iterative substructuring
methods. It is an idea that we have already seen in the solution of the transport
equation in section 9.2. The source iteration described there can be thought
of as an angular domain decomposition method using iterative substructuring
(although the source iteration method was actually developed before these
terms came into widespread use).
As we saw in section 9.2, the simple iterative substructuring method for
that problem was equivalent to a block Gauss-Seidel iteration for the original
linear system. While the transport equation is not elliptic, if the same idea
were applied to a linear system arising from an elliptic differential equation,
the convergence rate would not be independent of the mesh size. To obtain
a convergence rate that is independent of, or only weakly dependent on, the
mesh size, a preconditioner is needed for the system (12.46). Since the actual
matrix in (12.46) is never formed, the standard Jacobi, Gauss-Seidel, SOR,
and incomplete Cholesky-type preconditioners cannot be used.
Many preconditioners for (12.46) have been proposed, and a survey of these
preconditioners is beyond the scope of this book. For a discussion of several
such interface preconditioners, see [123].
204 Iterative Methods for Solving Linear Systems

Comments and Additional References.


Multigrid methods were first introduced by Fedorenko [49], and an early
analysis was given by Bakhvalov [9]. A seminal paper by Brandt demonstrated
the effectiveness of these techniques for a variety of problems [19]. An excellent
introduction to multigrid methods, without much formal analysis, is given
in a short book by Briggs [21], and additional information can be found in
[18, 77, 78, 84, 98, 141, 146],

Exercises.
12.1. Show that the vectors z^\ i, j = 1,...,TOwith components

defined on an m-by-m grid with spacing h = l/(m + 1), form an


orthonormal set. Show that if Zptq is equal to Zptg if p and q are both
odd but 0 otherwise, then

where v^ is defined in (12.22).


12.2. A work unit is often defined as the number of operations needed to
perform one Gauss-Seidel sweep on the finest grid in a multigrid method.
For a two-dimensional problem, approximately how many work units are
required by (a) a multigrid V-cycle with a presmoothing step (that is,
with a relaxation sweep performed on the fine grid at the beginning of
the cycle as well as at the end), (b) a multigrid W-cycle, and (c) a
full multigrid V-cycle? (You can ignore the work for restrictions and
prolongations.) How do your answers change for a three-dimensional
problem, assuming that each coarser grid still has mesh spacing equal to
twice that of the next finer grid?
12.3. What is the preconditioner M in the iteration (12.5) if i = 11 Compute
the two-grid preconditioner M described in section 12.1.2 for a small
model problem. Is it a regular splitting? Are the entries of M close
to those of Al (It is unlikely that one would choose this matrix as a
preconditioner for A if one looked only at the entries of A and did not
consider the origin of the problem!)
12.4. In the two-level additive Schwarz method, suppose that each subdomain
and coarse grid solve requires O(p3/2) work, where p is the number of
points in the subdomain or coarse grid. (This is the cost of backsolving
with a banded triangular factor for a matrix of order p with bandwidth
p1/2.) Assuming that the number of points on the coarse grid is
approximately equal to the number of subdomains, what is the optimal
number of subdomains needed to minimize the total work in applying
the preconditioner (12.44), and how much total work is required?
References

[1] R. Alcouffe, A. Brandt, J. Dendy, and J. Painter, The multigrid method for
diffusion equations with strongly discontinuous coefficients, SI AM J. Sci. Statist.
Comput., 2 (1981), pp. 430-454.
[2] M. Arioli and C. Fassino, Roundoff error analysis of algorithms based on Krylov
subspace methods, BIT, 36 (1996), pp. 189-205.
[3] S. F. Ashby, P. N. Brown, M. R. Dorr, and A. C- Hindmarsh, A linear algebraic
analysis of diffusion synthetic acceleration for the Boltzmann transport equation,
SIAM J. Numer. Anal., 32 (1995), pp. 179-214.
[4] S. F. Ashby, R. D. Falgout, S. G. Smith, and T. W. Fogwell, Multigrid
preconditioned conjugate gradients for the numerical simulation of groundwater flow
on the Cray T3D, American Nuclear Society Proceedings, Portland, OR, 1995.
[5] S. F. Ashby, T. A. Manteuffel, and P. E. Saylor, A taxonomy for conjugate
gradient methods, SIAM J. Numer. Anal., 27 (1990), pp. 1542-1568.
[6] 0. Axelsson, A generalized SSOR method, BIT, 13 (1972), pp. 442-467.
[7] 0. Axelsson, Bounds of eigenvalues of preconditioned matrices, SIAM J. Matrix
Anal. Appl., 13 (1992), pp. 847-862.
[8] O. Axelsson and H. Lu, On eigenvalue estimates for block incomplete factorization
methods, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1074-1085.
[9] N. S. Bakhvalov, On the convergence of a relaxation method with natural
constraints on the elliptic operator, U.S.S.R. Comput. Math, and Math. Phys., 6
(1966), pp. 101-135.
[10] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donate, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, Templates for the Solution
of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA,
1995.
[11] T. Barth and T. Manteuffel, Variable metric conjugate gradient methods, in
PCG '94: Matrix Analysis and Parallel Computing, M. Natori and T. Nodera, eds.,
Yokohama, 1994.
[12] T. Barth and T. Manteuffel, Conjugate gradient algorithms using multiple
recursions, in Linear and Nonlinear Conjugate Gradient-Related Methods, L. Adams
and J. L. Nazareth, eds., SIAM, Philadelphia, PA, 1996.
[13] R. Beauwens, Approximate factorizations with S/P consistently ordered M-
factors, BIT, 29 (1989), pp. 658-681.
[14] R. Beauwens, Modified incomplete factorization strategies, in Preconditioned
Conjugate Gradient Methods, 0. Axelsson and L. Kolotilina, eds., Lecture Notes
in Mathematics 1457, Springer-Verlag, Berlin, New York, 1990, pp. 1-16.
205
206 References

[15] M. W. Benson and P. O. Prederickson, Iterative solution of large sparse systems


arising in certain multidimensional approximation problems, Utilitas Math., 22
(1982), pp. 127-140.
[16] M. Benzi, C. D. Meyer, and M. Tuma, A sparse approximate inverse precondi-
tionerfor the conjugate gradient method, SIAM J. Sci. Comput., 17 (1996), pp. 1135-
1149.
[17] M. Benzi and M. Tuma, A sparse approximate inverse preconditioner for
nonsymmetric linear systems, SIAM J. Sci. Comput., to appear.
[18] J. H. Bramble, Multigrid Methods, Longman Scientific and Technical, Harlow,
U.K., 1993.
[19] A. Brandt, Multilevel adaptive solutions to boundary value problems, Math.
Comp., 31 (1977), pp. 333-390.
[20] C. Brezinski, M. Redivo Zaglia, and H. Sadok, Avoiding breakdown and near-
breakdown in Lanczos type algorithms, Numer. Algorithms, 1 (1991), pp. 199-206.
[21] W. L. Briggs, A Multigrid Tutorial, SIAM, Philadelphia, PA, 1987.
[22] P. N. Brown, A theoretical comparison of the Arnoldi and GMRES algorithms,
SIAM J. Sci. Statist. Comput., 20 (1991), pp. 58-78.
[23] P. N. Brown and A. C. Hindmarsh, Matrix-free methods for stiff systems of
ODE's, SIAM J. Numer. Anal., 23 (1986), pp. 610-638.
[24] T. F. Chan and H. C. Elman, Fourier analysis of iterative methods for elliptic
boundary value problems, SIAM Rev., 31 (1989), pp. 20-49.
[25] R. Chandra, Conjugate Gradient Methods for Partial Differential Equations,
Ph.D. dissertation, Yale University, New Haven, CT, 1978.
[26] P. Concus and G. H. Golub, A generalized conjugate gradient method for
nonsymmetric systems of linear equations, in Computing Methods in Applied
Sciences and Engineering, R. Glowinski and J. L. Lions, eds., Lecture Notes in
Economics and Mathematical Systems 134, Springer-Verlag, Berlin, New York, 1976,
pp. 56-65.
[27] P. Concus, G. H. Golub, and D. P. O'Leary, A generalized conjugate gradient
method for the numerical solution of elliptic partial differential equations, in Sparse
Matrix Computations, J. R. Bunch and D. J. Rose, eds., Academic Press, New York,
1976.
[28] J. Cullum, Iterative methods for solving Ax — b, GMRES/FOM versus
QMR/BiCG, Adv. Comput. Math., 6 (1996), pp. 1-24.
[29] J. Cullum and A. Greenbaum, Relations between Galerkin and norm-minimizing
iterative methods for solving linear systems, SIAM J. Matrix Anal. Appl., 17 (1996),
pp. 223-247.
[30] J. Cullum and R. Willoughby, Lanczos Algorithms for Large Symmetric Eigen-
value Computations, Vol. I. Theory, Birkhauser Boston, Cambridge, MA, 1985.
[31] J. Demmel, The condition number of equivalence transformations that block
diagonalize matrix pencils, SIAM J. Numer. Anal., 20 (1983), pp. 599-610.
[32] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, UNPACK Users'
Guide, SIAM, Philadelphia, PA, 1979.
[33] J. Drkosova, A. Greenbaum, M. Rozloznik, and Z. Strakos, Numerical stability
of the GMRES method, BIT, 3 (1995), pp. 309-330.
[34] V. Druskin, A. Greenbaum, and L. Knizhnerman, Using nonorthogonal Lanczos
vectors in the computation of matrix functions, SIAM J. Sci. Comput., to appear.
[35] V. Druskin and L. Knizhnerman, Error bounds in the simple Lanczos procedure
for computing functions of symmetric matrices and eigenvalues, Comput. Math.
Math. Phys., 31 (1991), pp. 20-30.
References 207

[36] M. Dryja and O. B. Widlund, Some domain decomposition algorithms for elliptic
problems, in Iterative Methods for Large Linear Systems, L. Hayes and D. Kincaid,
eds., Academic Press, San Diego, CA, 1989, pp. 273-291.
[37] T. Dupont, R. P. Kendall, and H. H. Rachford, Jr., An approximate factorization
procedure for solving self-adjoint elliptic difference equations, SIAM J. Numer. Anal.,
5 (1968), pp. 559-573.
[38] M. Eiermann, Fields of values and iterative methods, Linear Algebra Appl., 180
(1993), pp. 167-197.
[39] M. Eiermann, Fields of values and iterative methods, talk presented at Oberwol-
fach meeting on Iterative Methods and Scientific Computing, Oberwolfach, Germany,
April, 1997, to appear.
[40] S. Eisenstat, H. Elman, and M. Schultz, Variational iterative methods for
nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 20 (1983), pp. 345-
357.
[41] S. Eisenstat, J. Lewis, and M. Schultz, Optimal block diagonal scaling of block
2-cyclic matrices, Linear Algebra Appl., 44 (1982), pp. 181-186.
[42] L. Eisner, A note on optimal block scaling of matrices, Numer. Math., 44 (1984),
pp. 127-128.
[43] M. Engeli, T. Ginsburg, H. Rutishauser, and E. Stiefel, Refined Iterative Methods
for Computation of the Solution and the Eigenvalues of Self-adjoint Boundary Value
Problems, Birkhauser-Verlag, Basel, Switzerland, 1959.
[44] V. Faber, W. Joubert, M. Knill, and T. Manteuffel, Minimal residual method
stronger than polynomial preconditioning, SIAM J. Matrix Anal. Appl., 17 (1996),
pp. 707-729.
[45] V. Faber and T. Manteuffel, Necessary and sufficient conditions for the existence
of a conjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 352-362.
[46] V. Faber and T. Manteuffel, Orthogonal error methods, SIAM J. Numer. Anal.,
24 (1987), pp. 170-187.
[47] K. Fan, Note on M-matrices, Quart. J. Math. Oxford Ser., 11 (1960), pp. 43-49.
[48] J. Favard, Sur les polynomes de Tchebicheff, C. R. Acas. Sci. Paris, 200 (1935),
pp. 2052-2053.
[49] R. P. Fedorenko, The speed of convergence of one iterative process, U.S.S.R.
Comput. Math, and Math. Phys., 1 (1961), pp. 1092-1096.
[50] B. Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems,
Wiley-Teubner, Leipzig, 1996.
[51] R. Fletcher, Conjugate gradient methods for indefinite systems, in Proc. Dundee
Biennial Conference on Numerical Analysis. G. A. Watson, ed., Springer-Verlag,
Berlin, New York, 1975.
[52] G. E. Forsythe and E. G. Strauss, On best conditioned matrices, Proc. Amer.
Math. Soc., 6 (1955), pp. 340-345.
[53] R. W. Freund, A transpose-free quasi-minimal residual algorithm for non-
Hermitian linear systems, SIAM J. Sci. Comput., 14 (1993), pp. 470-482.
[54] R. W. Freund and N. M. Nachtigal, QMR: A quasi-minimal residual method for
non-Hermitian linear systems, Numer. Math., 60 (1991), pp. 315-339.
[55] R. Freund and S. Ruscheweyh, On a class of Chebyshev approximation problems
which arise in connection with a conjugate gradient type method, Numer. Math., 48
(1986), pp. 525-542.
[56] E. Giladi, G. H. Golub, and J. B. Keller, Inner and outer iterations for the
Chebyshev algorithm, SCCM-95-10 (1995), Stanford University, Palo Alto. To appear
in SIAM J. Numer. Anal.
208 References

[57] G. H. Golub and G. Meurant, Matrices, moments, and quadratures II or how to


compute the norm of the error in iterative methods, BIT, to appear.
[58] G. H. Golub and D. P. O'Leary, Some history of the conjugate gradient and
Lanczos algorithms: 1948-1976, SIAM Rev., 31 (1989), pp. 50-102.
[59] G. H. Golub and M. L. Overton, The convergence of inexact Chebyshev and
Richardson iterative methods for solving linear systems, Numer. Math., 53 (1988),
pp. 571-593.
[60] G. H. Golub and Z. Strakos, Estimates in Quadratic Formulas, Numer. Algo-
rithms, 8 (1994), pp. 241-268.
[61] G. H. Golub and R. S. Varga, Chebyshev semi-iterative methods, successive
overrelaxation iterative methods, and second-order Richardson iterative methods,
parts I and II, Numer. Math., 3 (1961), pp. 147-168.
[62] J. F. Grcar, Analyses of the Lanczos Algorithm and of the Approximation Problem
in Richardson's Method, Ph.D. dissertation, University of Illinois, Urbana, IL, 1981.
[63] A. Greenbauni, Comparison of splittings used with the conjugate gradient algo-
rithm, Numer. Math., 33 (1979), pp. 181-194.
[64] A. Greenbaum, Analysis of a multigrid method as an iterative technique for
solving linear systems, SIAM J. Numer. Anal., 21 (1984), pp. 473-485.
[65] A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate gradient
recurrences, Linear Algebra Appl., 113 (1989), pp. 7-63.
[66] A. Greenbaum, Estimating the attainable accuracy of recursively computed
residual methods, SIAM J. Matrix Anal. Appl., to appear.
[67] A. Greenbaum, On the role of the left starting vector in the two-sided Lanczos
algorithm, in Proc. Dundee Biennial Conference on Numerical Analysis, 1997, to
appear.
[68] A. Greenbaum and L. Gurvits, Max-min properties of matrix factor norms, SIAM
J. Sci. Comput., 15 (1994), pp. 348-358.
[69] A. Greenbaum, V. Ptak, and Z. Strakos, Any nonincreasing convergence curve is
possible for GMRES, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 465-469.
[70] A. Greenbaum, M. Rozloznik, and Z. Strakos, Numerical behavior of the MGS
GMRES implementation, BIT, 37 (1997), pp. 706-719.
[71] A. Greenbaum and Z. Strakos, Predicting the behavior of finite precision Lanczos
and conjugate gradient computations, SIAM J. Matrix Anal. Appl., 13 (1992),
pp. 121-137.
[72] A. Greenbaum and Z. Strakos, Matrices that generate the same Krylov residual
spaces, in Recent Advances in Iterative Methods, G. Golub, A. Greenbaum, and M.
Luskin, eds., Springer-Verlag, Berlin, New York, 1994, pp. 95-118.
[73] L. Greengard and V. Rokhlin, A Fast Algorithm for Particle Simulations, J.
Comput. Phys., 73 (1987), pp. 325-348.
[74] I. Gustafsson, A class of 1st order factorization methods, BIT, 18 (1978), pp. 142-
156.
[75] M. H. Gutknecht, Changing the norm in conjugate gradient-type algorithms,
SIAM J. Numer. Anal., 30 (1993), pp. 40-56.
[76] M. H. Gutknecht, Solving linear systems with the Lanczos process, Acta Numer-
ica, 6 (1997), pp. 271-398.
[77] W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations, Springer-
Verlag, Berlin, New York, 1994.
[78] W. Hackbusch and U. Trottenberg, Multigrid Methods, Springer-Verlag, Berlin,
New York, 1982.
[79] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear
References 209

systems, J. Res. Nat. Bur. Standards, 49 (1952), pp. 409-435.


[80] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press,
London, U.K., 1985.
[81] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University
Press, London, U.K., 1991.
[82] S. A. Hutchinson, J. N. Shadid, and R. S. Tuminaro, Aztec User's Guide,
SAND95-1559, Sandia National Laboratories, Albuquerque, NM, 1995.
[83] A. Iserles, A First Course in the Numerical Analysis of Differential Equations,
Cambridge University Press, London, U.K., 1996.
[84] D. Jespersen, Multigrid methods for partial differential equations, in Studies
in Numerical Analysis, Studies in Mathematics 24, Mathematical Association of
America, 1984.
[85] W. D. Joubert, A robust GMRES-based adaptive polynomial preconditioning
algorithm for nonsymmetric linear systems, SIAM J. Sci. Comput., 15 (1994),
pp. 427-439.
[86] W. D. Joubert and D. M. Young, Necessary and sufficient conditions for the
simplification of generalized conjugate-gradient algorithms, Linear Algebra Appl.,
88/89 (1987), pp. 449-485.
[87] D. Kershaw, The incomplete Cholesky conjugate gradient method for the iterative
solution of systems of linear equations, J. Comput. Phys., 26 (1978), pp. 43-65.
[88] L. Yu. Kolotilina and A. Yu. Yeremin, Factorized sparse approximate inverse
preconditioning I. Theory, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 45-58.
[89] C. Lanczos, An iteration method for the solution of the eigenvalue problem of
linear differential and integral operators, J. Res. Nat. Bur. Standards, 45 (1950),
pp. 255-282.
[90] C. Lanczos, Solutions of linear equations by minimized iterations, J. Res. Nat.
Bur. Standards, 49 (1952), pp. 33-53.
[91] E. W. Larsen, Unconditionally Stable Diffusion-Synthetic Acceleration Methods
for the Slab Geometry Discrete Ordinates Equations. Part I: Theory, Nuclear Sci.
Engrg., 82 (1982), pp. 47-63.
[92] D. R. McCoy and E. W. Larsen, Unconditionally Stable Diffusion-Synthetic
Acceleration Methods for the Slab Geometry Discrete Ordinates Equations. Part II:
Numerical Results, Nuclear Sci. Engrg., 82 (1982), pp. 64-70.
[93] E. E. Lewis and W. F. Miller, Computational Methods of Neutron Transport,
John Wiley & Sons, New York, 1984.
[94] T. A. Manteuffel, The Tchebychev iteration for nonsymmetric linear systems,
Numer. Math., 28 (1977), pp. 307-327.
[95] T. A. ManteufFel, An incomplete factorization technique for positive definite linear
systems, Math. Comp., 34 (1980), pp. 473-497.
[96] T. ManteufFel, S. McCormick, J. Morel, S. Oliveira, and G. Yang, A fast multigrid
algorithm for isotropic transport problems I: Pure scattering, SIAM J. Sci. Comput.,
16 (1995), pp. 601-635.
[97] T. Manteuffel, S. McCormick, J. Morel, and G. Yang, A fast multigrid algorithm
for isotropic transport problems II: With absorption, SIAM J. Sci. Comput., 17 (1996),
pp. 1449-1475.
[98] S. McCormick, Multigrid Methods, SIAM, Philadelphia, PA, 1987.
[99] J. A. Meijerink and H. A. van der Vorst, An iterative solution method for linear
systems of which the coefficient matrix is a symmetric M-matrix, Math. Comp., 31
(1977), pp. 148-162.
[100] N. Munksgaard, Solving Sparse Symmetric Sets of Linear Equations by Precon-
210 References

ditioned Conjugate Gradients, ACM Trans. Math. Software, 6 (1980), pp. 206-219.
[101] N. Nachtigal, A look-ahead variant of the Lanczos algorithm and its application
to the quasi-minimal residual method for non-Hermitian linear systems, Ph.D.
dissertation, Massachusetts Institute of Technology, Cambridge, MA, 1991.
[102] N. M. Nachtigal, S. Reddy, and L. N. Trefethen, How fast are nonsymmetric
matrix iterations?, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 778-795.
[103] N. M. Nachtigal, L. Reichel, and L. N. Trefethen, A hybrid GMRES algorithm for
nonsymmetric linear systems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 796-825.
[104] R. A. Nicolaides, On the L2 convergence of an algorithm for solving finite element
equations, Math. Comp., 31 (1977), pp. 892-906.
[105] A. A. Nikishin and A. Yu. Yeremin, Variable block CG algorithms for solving large
sparse symmetric positive definite linear systems on parallel computers, I: General
iterative scheme, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1135-1153.
[106] Y. Notay, Upper eigenvalue bounds and related modified incomplete factorization
strategies, in Iterative Methods in Linear Algebra, R. Beauwens and P. de Groen,
eds., North-Holland, Amsterdam, 1991, pp. 551-562.
[107] D. P. O'Leary, The block conjugate gradient algorithm and related methods,
Linear Algebra Appl., 29 (1980), pp. 293-322.
[108] C. W. Oosterlee and T. Washio, An evaluation of parallel multigrid as a solver
and a preconditioner for singular perturbed problems, Part I. The standard grid
sequence, SIAM J. Sci. Comput., to appear.
[109] C. C. Paige, Error Analysis of the Lanczos Algorithm for Tridiagonalizing a
Symmetric Matrix, J. Inst. Math. Appl., 18 (1976), pp. 341-349.
[110] C. C. Paige, Accuracy and Effectiveness of the Lanczos Algorithm for the
Symmetric Eigenproblem, Linear Algebra Appl., 34 (1980), pp. 235-258.
[Ill] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear
equations, SIAM J. Numer. Anal., 11 (1974), pp. 197-209.
[112] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood
Cliffs, NJ, 1980.
[113] B. N. Parlett, D. R. Taylor, and Z. A. Liu, A look-ahead Lanczos algorithm for
unsymmetric matrices, Math. Comp., 44 (1985), pp. 105-124.
[114] C. Pearcy, An elementary proof of the power inequality for the numerical radius,
Michigan Math. J., 13 (1966), pp. 289-291.
[115] J. K. Reid, On the method of conjugate gradients for the solution of large sparse
linear systems, in Large Sparse Sets of Linear Equations, J. K. Reid, ed., Academic
Press, New York, 1971.
[116] Y. Saad, Preconditioning techniques for nonsymmetric and indefinite linear
systems, J. Comput. Appl. Math., 24 (1988), pp. 89-105.
[117] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Pub. Co., Boston,
MA, 1996.
[118] Y. Saad and A. Malevsky, PSPARSLIB: A portable library of distributed memory
sparse iterative solvers, in Proc. Parallel Computing Technologies (PaCT-95), 3rd
International Conference, V. E. Malyshkin, et al., ed., St. Petersburg, 1995.
[119] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm
for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986),
pp. 856-869.
[120] H. A. Schwarz, Gesammelte Mathematische Abhandlungen, Vol. 2, Springer,
Berlin, 1890, pp. 133-143 (first published in Vierteljahrsschrift Naturforsch. Ges.
Zurich, 15 (1870), pp. 272-286).
[121] H. D. Simon, The Lanczos algorithm with partial reorthogonalization, Math.
References 211

Comp., 42 (1984), pp. 115-136-


[122] G. L. G. Sleijpen, H. A. Van der Vorst, and J. Modersitzki, The Main Effects
of Rounding Errors in Krylov Solvers for Symmetric Linear Systems, Preprint 1006,
Universiteit Utrecht, The Netherlands, 1997.
[123] B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition. Parallel Multilevel
Methods for Elliptic Partial Differential Equations, Cambridge University Press,
London, U.K., 1996.
[124] P. Sonneveld, CGS, a fast Lanczos-type solver for nonsymmetric linear systems,
SIAM J. Sci. Statist. Comput., 10 (1989), pp. 36-52.
[125] G. Strang and G. J. Fix, An Analysis of the Finite Element Method, Prentice-
Hall, Englewood Cliffs, NJ, 1973.
[126] Z. Strakos, On the real convergence rate of the conjugate gradient method, Linear
Algebra Appl., 154/156 (1991), pp. 535-549.
[127] K. C. Toh, GMRES vs. ideal GMRES, SIAM J. Matrix Anal. Appl., 18 (1997),
pp. 30-36.
[128] C. H. Tong, A Comparative Study of Preconditioned Lanczos Methods for
Nnsymmetric Linear Systems, Sandia report SAND91-8240, 1992.
[129] L. N. Trefethen, Approximation theory and numerical linear algebra, in Algo-
rithms for Approximation II, J. Mason and M. Cox, eds., Chapman and Hall, London,
U.K., 1990.
[130] R. Underwood, An Iterative Block Lanczos Method for the Solution of Large
Sparse Symmetric Eigenproblems, Technical report STAN-CS-75-496, Computer
Science Department, Stanford University, Stanford, CA, 1975.
[131] A. van der Sluis, Condition numbers and equilibration matrices, Numer. Math.,
14 (1969), pp. 14-23.
[132] A. van der Sluis and H. A. van der Vorst, The rate of convergence of conjugate
gradients, Numer. Math., 48 (1986), pp. 543-560.
[133] H. A. van der Vorst, The convergence behavior of preconditioned CG and CG-S
in the presence of rounding errors, in Preconditioned Conjugate Gradient Methods,
O. Axelsson and L. Kolotilina, eds., Lecture Notes in Mathematics 1457, Springer-
Verlag, Berlin, New York, 1990.
[134] H. A. van der Vorst, Bi-CGSTAB: A fast and smoothly converging variant of
Bi-CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Comput., 13
(1992), pp. 631-644.
[135] R. S. Varga, Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ,
1962.
[136] R. S. Varga, Factorization and normalized iterative methods, in Boundary
Problems in Differential Equations, R. E. Langer, ed., 1960, pp. 121-142.
[137] P. Vinsome, Orthomin, an iterative method for solving sparse sets of simultane-
ous linear equations, in Proc. 4th Symposium on Numerical Simulation of Reservoir
Performance, Society of Petroleum Engineers, 1976, pp. 149-159.
[138] V. V. Voevodin, The problem of a non-self adjoint generalization of the conjugate
gradient method has been closed, U.S.S.R. Comput. Math, and Math. Phys., 23
(1983), pp. 143-144.
[139] H. F. Walker, Implementation of the GMRES method using Householder trans-
formations. SIAM J. Sci. Statist. Comput., 9 (1988), pp. 152-163.
[140] R. Weiss, Convergence Behavior of Generalized Conjugate Gradient Methods,
Ph.D. dissertation, University of Karlsruhe, Karlsruhe, Germany, 1990.
[141] P. Wesseling, An Introduction to Multigrid Methods, Wiley, Chichester, U.K.,
1992.
212 References

[142] O. Widlund, A Lanczos method for a class of nonsymmetric systems of linear


equations, SIAM J. Numer. Anal., 15 (1978), pp. 801-812.
[143] H. Wozniakowski, Roundoff error analysis of a new class of conjugate gradient
algorithms, Linear Algebra Appl., 29 (1980), pp. 507-529.
[144] D. M. Young, Iterative Solution of Large Linear Systems, Academic Press, New
York, 1971.
[145] D. M. Young and K. C. Jea, Generalized conjugate gradient acceleration of
nonsymmetrizable iterative methods, Linear Algebra Appl. 34, 1980, pp. 159-194.
[146] H. Yserentant, Old and new convergence proofs for multigrid methods, Acta
Numerica, 2 (1993), pp. 285-326.
Index

Adjoint, see Matrix, Adjoint Biorthogonal vectors, 77-80,


Aggregation methods, 184-187 91, 102, 103
Approximation theory Block 2-cyclic matrix, 165-167
Complex, 55 Block CG, 113-114, 116
Real, 4, 5, 51 Algorithm 8, 114
Arnoldi algorithm, 38, 41, 61, Block Gauss-Seidel iteration,
94 see Gauss-Seidel iteration,
Block
B-adjoint, see Matrix, B-ad- Block-TST matrix, 130-134
joint
5-normal matrix, 100-102
Cauchy integral, 57
Backward error, 110
Earth, T., 102 Cauchy interlace theorem, 18,
68
BiCG, 5, 77-80, 88-91, 95
Error estimation, 108 Cauchy matrix, 1
Finite precision arithmetic CG, 3-5, 25, 33-37, 46
Accuracy, 109-111 Algorithm 2, 35
Norm in which optimal, 102 Block, see Block CG
Preconditioners for, 120 Error bounds, 49-54, 59
QMR, relation to, 84-88 Error estimation, 108-109
When to use, 92 Finite precision arithmetic
BiCGSTAB, 5, 90-91, 94 Accuracy, 109-110
Algorithm 6, 91 Convergence rate, 4, 61-
Preconditioners for, 120 75
Incomplete LU, 172 ICCG vs. MICCG, 75,
Multigrid, 197 181
Transport equation, 142- Lanczos, relation to, 41-47,
144 107
When to use, 92-94 MINRES, relation to, 94
Biconjugate gradient algorithm, Model problem, 150, 155
see BiCG Normal equations, 105
Biconjugate vectors, 95, 102, Preconditioned, 120-121
103 Algorithm 2P, 121
213
214 Index

Preconditioners for, 119, Preconditioned matrix,


147, 169 163-169, 175-182
Additive Schwarz, 201 Conjugate gradient algorithm,
Diagonal, 169, 174, 181 see CG
ICCG, 171, 174, 181, 195 Conjugate gradient method, s-
MICCG, 181, 195-196 term, 99-101, 103
Multigrid, 197 Conjugate gradient squared al-
Regular splittings, 163- gorithm, see CGS
165 Conjugate residual method, 46
Recurrence formula, 98 Cullum, J., 74
When to use, 92, 96 Demmel, J., 167, 168
CGNE, 105-107 Departure from normality, 16
Algorithm 7, 105 Diagonalizable matrix, 13, 54
Finite precision arithmetic Diagonally dominant matrix,
Accuracy, 109-112 18, 128, 129, 139
When to use, 92-96 Diamond differencing, see Fi-
CGNR, 105-107, 116 nite differences
Algorithm 7, 105 Diffusion equation, 125-129,
Finite precision arithmetic 162, 168, 169, 171, 174,
Accuracy, 109-112 175, 196
CGS, 5, 88-90, 94 Discontinuous diffusion co-
Finite precision arithmetic efficient, 196
Accuracy, 109-111 Diffusion synthetic acceleration
Preconditioners for, 120 (DSA), 123, 143-144
When to use, 92-96 Discrete ordinates, see Finite
differences
Chandra, R., 46
Domain decomposition meth-
Characteristic polynomial, 75
ods, 6, 183, 197-203
Chebyshev error bound, 51-52,
Additive Schwarz, 197,
75, 165, 181
200-203
Chebyshev iteration, 92
Alternating Schwarz, 198
Chebyshev polynomial, 51-53, Iterative substructuring,
57, 59, 68, 70, 108, 119 203
Cholesky decomposition, 15, Multilevel methods, 202
119, 171 Multiplicative Schwarz,
Incomplete, see Incomplete 197-201
Cholesky decomposition Nonoverlapping domains,
Modified incomplete, see 197, 203
Modified incomplete Cho- Overlapping domains, 197
lesky decomposition Parallelization, 115, 197,
Companion matrix, 56 201
Comparison matrix, 172 Substructuring, 197
Condition number, 109-113 Two-level method, 202
Estimation of, 107-109 Dryja, M., 202
Index 215

Dupont, T., 175 Floating point operations, 92-


94, 115
Eiermann, M., 32 FOM, 94
Eigenvalues, 16-22 Forsythe, G. E., 165
Error bounds involving, 4- Fourier analysis, 139-140, 182
6, 50-56 Freund, R. W., 94
Eisenstat, S., 46, 165 Full orthogonalization method,
Elman, H., 46 see FOM
Eisner, L., 165
Engeli, M., 74 Gauss quadrature. 135-136
Error bound Gauss-Seidel iteration, 26, 46,
Nonsharp, 6, 31, 50-55, 58 147-155, 201
Sharp, 4, 5, 26, 49-51, 55,
Block, 134, 149, 203
116, 182
Transport equation, 138-
Faber and Manteuffel result, 5, 139
97-102 Convergence rate, 163
Faber polynomial, 32, 57 Overlapping block, 200
Faber, V., 5, 58, 97-101 Red-black, 195-196
Fan, K., 172 Gaussian elimination, 138
Fast Fourier transform, 134 Operation count, 1, 171
Fast multipole method, 1 Generalized minimal residual
Fast Poisson solver, 134 algorithm, see GMRES
Favard's theorem, 73 Gerschgorin's theorem, 17-18,
Favard, J., 73 128
Field of values, 16-22, 30-33, 46 Givens rotations, 15, 39-41, 82,
generalized, 22, 47 84-85
conical, 22 GMRES, 5-6, 25, 37-41, 46, 77
Finite differences, 125-127, 135, Algorithm 3, 41
169, 183, 193, 198, 202 Error bounds, 49, 54-58
5-point approxi- Error estimation, 108
mation, 129, 154, 168-169, Finite precision arithmetic,
171, 174, 175, 179, 187 61
9-point approximation, 129 FOM, relation to, 94
Diamond differencing, 136 Hybrid, 5
Discrete ordinates, 135 Preconditioners for, 120
Finite elements, 129, 169, 176, Additive Schwarz, 201,
183, 193, 198, 202, 203 202
Finite precision arithmetic Incomplete LU, 172
Attainable accuracy, 109- Multigrid, 197
113 Multiplicative Schwarz,
CG and MINRES 201
Convergence rate, 4, 61- QMR, comparison with, 81,
75 94-95, 97
Lanczos algorithm, 61-75 Restarted, 5, 41, 47, 77
216 Index

Transport equation, 142- Improvement of, 2


144
When to use, 92, 95-96 Lanczos algorithm, 41-42, 46,
Gram-Schmidt method, 8, 77 94, 107, 116
Graph of a matrix, 158 Finite precision arithmetic,
Greenbaum, A., 68 61-75
Gustafsson, I., 175, 178-180 Preconditioned, 121
Two-sided, 77-88, 103
ff-matrix, 172
Look-ahead, 77, 79
Hermitian matrix, 3-5
Regular termination, 79
Hermitian part, 19, 31, 46
Serious breakdown, 79
Hestenes, M. R., 46, 74
Lanczos, C., 46
Horn, R. A., 22
Least squares, 15, 39-41, 43,
Householder reflections, 15, 61
81-82, 84-85
Incomplete Cholesky decompo- Lewis, J., 165
sition, 123, 171-178, 182 Local couplings, 176-179
Parallelizing, 175 LU decomposition, 15, 45, 46,
Incomplete LU decomposition, 80, 171
15, 172-174, 182 Incomplete, see Incomplete
Incomplete QR decomposition, LU decomposition
182
Induced norm, see Norms, Ma- M-matrix, 122, 161-163, 172-
trix, Induced 174, 179
Inner products, 7-8, 97-102 Manteuffel, T., 5, 97-102
Parallelization, 115 Manteuffel, T. A., 172
Irreducible matrix, 129, 158- Matrix
160 Adjoint, 99
5-adjoint, 100
Jacobi iteration, 25, 46, 147-
B-normal, 100-102
155, 201
Block-TST, 130-134
Block, 149
Cauchy, 1
Convergence rate, 163
Companion, 56
Damped, 195, 197
Diagonalizable, 13, 54
Overlapping block, 200
Diagonally dominant, 18,
Jacobian, 88, 123
128, 129, 139
Jea, K. C., 46
Graph, see Graph of a ma-
Johnson, C. R., 22
trix
Jordan form, 13-14, 21, 29, 46
Hermitian, 3-5
Krylov space Irreducible, 129, 158-160
Approximation properties, Non-Hermitian, 3, 5-6, 125
2 Nondiagonalizable, 46
Hermitian, 3-5 Nonnegative, 156, 158-160
Non-Hermitian, 3, 5-6 Nonnormal, 6, 28, 49, 55-
Definition of, 2 58
Index 217

Normal, 5, 13, 19, 28, 47, Regular splittings, 163-


49, 54, 58, 96, 100-102 165
Orthogonal, 8 Recurrence formula, 98
Positive, 157-158 When to use, 92, 96
Positive definite, 18, 58, Model problem, 134, 149, 175,
122, 125 179-183, 187-192, 195-
Reducible, 159 196, 201
Skew-Hermitian, 96, 103 Modified Gram-Schmidt
Sparse, 1, 171, 179 method, 8, 15, 38, 61
TST, 130-133 Modified incomplete Cholesky
Unitary, 8, 81, 84 decomposition, 123, 124,
Unitary shift, 95 175-182
Matrix completion problem, Multigrid methods, 6, 26, 92,
66-70 124, 134, 183-197
Matrix exponential, 74 Algebraic, 183
Matrix Splittings, see Precondi- Coarse grid correction, 184
tioners Contraction number, 184,
Matrix-vector multiplication, 192
92-95, 123 Full multigrid V-cycle, 194,
Operation count, 1, 92 195, 197
Parallelization, 115 Prolongation matrix, 188,
Maximum principle 195-197
Discrete, 129 Relaxation sweep, 184,
Meijerink, J. A., 172-175 195-197
Minimal polynomial, 95, 99-101 Restriction matrix, 195-
Minimal residual algorithm, see 197
MINRES Transport equation, 145
Minimax polynomial, 51, 70 Two-grid method, 187-192
MINRES, 3-5, 25, 35, 37-38, 46 V-cycle, 193-196
Algorithm 4, 44 W-cycle, 194, 195
CG, relation to, 94
Error bounds, 49-54, 59 Nachtigal, N. M., 94
Finite precision arithmetic Natural ordering, 127-129, 154
Accuracy, 109 Non-Hermitian Lanczos algo-
Convergence rate, 4, 61- rithm, see Lanczos algo-
75 rithm, Two-sided
Lanczos, relation to, 41-46, Non-Hermitian matrix, 3, 5-6,
107 125
Preconditioned, 121-122 Nondiagonalizable matrix, 46
Algorithm 4P, 122 Nonnegative matrix, 156, 158-
Preconditioners for, 119, 160
147 Nonnormal matrix, 6, 28, 49,
Additive Schwarz, 201 55-58
Diagonal, 169 Normal equations, 92, 105-107
218 Index

Normal matrix, 5, 13, 19, 28, Preconditioned algorithms,


47, 49, 54, 58, 96, 100-102 119-124
Norms Algorithm 2P, 121
Matrix, 9-12 Algorithm 4P, 122
Frobenius, 9, 157 Finite precision arithmetic
Induced, 9-11, 26, 107 Accuracy, 112-113
Vector, 7-8, 12, 97-102, Implementation of, 2
107-109 Lanczos, 121
Numerical radius, 18, 21-22, 32 Preconditioners
Numerical range, see Field of Additive Schwarz, 201
values Comparison theorems, 123,
147,160-169
Ordering Definition of, 2
Gaussian elimination, 203 Derivation of, 6
Incomplete Cholesky decom- Domain decomposition, 6,
position, 175 123
Natural, 127-129, 154 Parallelization, 115
Red-black, 129, 154, 175 DSA, 123, 143-144
Orthodir, 37-38, 46 Fast Poisson solver, 134
Recurrence formula, 97 Gauss-Seidel, 122, 148, 163
Orthogonal matrix, 8 Block, 149, 162
Orthogonal polynomials, 71-75 Hermitian indefinite, 3, 29
Orthogonal projector, 185 Incomplete Cholesky, 123
Orthogonal vectors, 8-9 Interface, 203
Orthomin, 25, 29-37, 46, 77, 90 Jacobi, 122, 147, 163
Recurrence formula, 97-98 Block, 148
Transport equation, 142- Modified incomplete Cho-
144 lesky, 123, 124
Ostrowski, 161 Multigrid, 6, 26, 92, 123,
124, 134, 183, 197
Paige, C. C., 45, 46, 64, 67, 68, Multiplicative Schwarz, 201
75 Optimal, 123, 165-169
Parallel processors, 3, 115-116, Positive definite, 3, 29
175 Properties of, 25, 119-120
Perron root, 158 SOR, 123, 148, 163
Perron vector, 158 Block, 149
Perron-Frobenius theorem, Sparse approximate
156-160, 172 inverse, 182
Poisson's equation, 129-134, Parallelization, 115
149, 154-155 SSOR, 149, 169
Positive definite matrix, 18, 58, Symmetric Gauss-Seidel,
122, 125 149
Positive matrix, 157-158 Types of, 122-123
Power inequality, 21, 56 Preconditioning
Index 219

Left, 119 Schur form, 11, 14-16


Right, 119 Schwarz, H. A., 198
Symmetric, 119 Simple iteration, 25-29, 46, 77,
Principal vectors, 13 147-149, 183-184, 200
Projections, 8-9, 30, 33, 35 Algorithm 1, 26
Property A, 153-154, 165 Convergence rate, 28, 46,
Pseudo-eigenvalues, 57-58 156
Preconditioners for, 119
QMR, 5, 80-83, 88, 94 Transport equation, see
Algorithm 5, 83 Source iteration
BiCG, relation to, 80, 84- When to use, 92
88 Singular value decomposition,
Error estimation, 108 16
GMRES, comparison with, Skew-Hermitian matrix, 96, 103
94-95, 97 Sonneveld, P., 94
Norm in which optimal, SOR, 26, 46, 147-155, 182
102, 103 Block. 149
Preconditioners for, 120 Optimal u, 149-150, 152-
Incomplete LU, 172 153, 155
Multigrid, 197 Source iteration, 138-144
Quasi-residual, 85-88 Sparse matrix, 1, 171, 179
Transport equation, 142- Spectral radius, 11-12, 27-29,
144 157-160
Transpose-free, 92 Gauss-Seidel, 150-155
When to use, 92-94 Jacobi, 149-155
QR decomposition, 15-16, 39-
SOR, 149-155
41, 82, 171
SSOR, 149
Incomplete, see Incomplete
Steepest descent, 25, 29-31, 33,
QR decomposition
74, 187
Quasi-minimal residual algo-
Stiefel, E., 46, 74
rithm, see QMR
Stieltjes matrix, 162
Red-black ordering, 129, 154, Strauss, E. G., 165
175 Successive overrelaxation, see
Reducible matrix, 159 SOR
Regular splittings, 156, 160- Symmetric SOR, see SSOR
165. 173-174, 177 SYMMLQ, 46
Resolvent, 57
Rounding errors, see Finite pre- Templates, 7
cision arithmetic Toeplitz-Hausdorff theorem, 18
Toh, K. C., 58, 59
Saad, Y., 46 Transport equation, 6. 123. 125,
Saunders, M. A., 45, 46 134-145, 162, 203
Schultz, M. H., 46, 165 Multigroup, 169
Schur complement, 138, 203 Trefethen, L. N., 57
TST-matrix, 130-133
220 Index

Two-sided Lanczos algorithm


see Lanczos algorithm, Two-
sided

Unitary matrix, 8, 81, 84


Unitary shift matrix, 95

Van der Sluis, A., 167


Van der Vorst, H. A., 94, 172-
175
Varga, R. S., 46, 160, 172
Vinsome, P., 46

Widlund, O., 202


Willoughby, R., 74
Wozniakowski, H., 74

Young, D. M., 46, 149

Zero row sum property, 176-179

You might also like