Matrix Norms
Tom Lyche
Centre of Mathematics for Applications,
Department of Informatics,
University of Oslo
September 28, 2009
Matrix Norms
We consider matrix norms on (Cm,n , C). All results holds for
(Rm,n , R).
Definition (Matrix Norms)
A function k·k : Cm,n → C is called a matrix norm on Cm,n if
for all A, B ∈ Cm,n and all α ∈ C
1. kAk ≥ 0 with equality if and only if A = 0. (positivity)
2. kαAk = |α| kAk. (homogeneity)
3. kA + Bk ≤ kAk + kBk. (subadditivity)
A matrix norm is simply a vector norm on the finite
dimensional vector spaces (Cm,n , C) of m × n matrices.
Equivalent norms
Adapting some general results on vector norms to matrix
norms give
Theorem
x
1. All matrix norms are equivalent. Thus, if k·k and k·k0 are
two matrix norms on Cm,n then there are positive
constants µ and M such that µkAk ≤ kAk0 ≤ MkAk
holds for all A ∈ Cm,n .
2. A matrix norm is a continuous function k·k : Cm,n → R.
Examples
I From any vector norm k kV on Cmn we can define a
matrix norm on Cm,n by kAk := kvec(A)kV , where
vec(A) ∈ Cmn is the vector obtained by stacking the
columns of A on top of each other.
I
m X
X n
kAkS := |aij |, p = 1, Sum norm,
i=1 j=1
m X n
X 1/2
kAkF := |aij |2 , p = 2, Frobenius norm,
i=1 j=1
kAkM := max|aij |, p = ∞, Max norm.
i,j
(1)
The Frobenius Matrix Norm 1.
Pn Pm Pm Pn
I kAH k2F = j=1 i=1 |aij |
2
= i=1 j=1 |aij |
2
= kAk2F .
The Frobenius Matrix Norm 2.
Pm Pn 2
Pm 2
j=1 |aij | = i=1 kai· k2
I
i=1
Pm Pn 2
Pn Pm 2
Pn 2
j=1 |aij | = i=1 |aij | = j=1 ka·j k2 .
I
i=1 j=1
Unitary Invariance.
I If A ∈ Cm,n and U ∈ Cm,m , V ∈ Cn,n are unitary
2. P 2.
kUAk2F = nj=1 kUa·j k22 = nj=1 ka·j k22 = kAk2F .
P
I
1. 1.
I kAVkF = kVH AH kF = kAH kF = kAkF .
Submultiplicativity
I Suppose A, B are rectangular matrices so that the
product AB is defined.
2
kABk2F = ni=1 kj=1 aTi· b·j ≤
P P
I
Pn Pk 2 2 2 2
i=1 j=1 kai· k2 kb·j k2 = kAkF kBkF .
Subordinance
I kAxk2 ≤ kAkF kxk2 , for all x ∈ Cn .
I Since kvkF = kvk2 for a vector this follows from
submultiplicativity.
Explicit Expression
I Let A ∈ Cm,n have singular values σ1 , . . . , σn and SVD
A = UΣVH . Then
3. p
I kAkF = kUH AVkF = kΣkF = σ12 + · · · + σn2 .
Consistency
I A matrix norm is called consistent on Cn,n if
4. kABk ≤ kAk kBk (submultiplicativity)
n,n
holds for all A, B ∈ C .
I A matrix norm is consistent if it is defined on Cm,n for
all m, n ∈ N, and 4. holds for all matrices A, B for which
the product AB is defined.
I The Frobenius norm is consistent.
I The Sum norm is consistent.
I The Max norm is not consistent.
√
I The norm kAk := mnkAkM , A ∈ Cm,n is consistent.
Subordinate Matrix Norm
Definition
I Suppose m, n ∈ N are given,
I Let k kα on Cm and k kβ on Cn be vector norms, and let
k k be a matrix norm on Cm,n .
I We say that the matrix norm k k is subordinate to the
vector norms k kα and k kβ if kAxkα ≤ kAk kxkβ for all
A ∈ Cm,n and all x ∈ Cn .
I If k kα = k kβ then we say that k k is subordinate to k kα .
I The Frobenius norm is subordinate to the Euclidian
vector norm.
I The Sum norm is subordinate to the l1 -norm.
I kAxk∞ ≤ kAkM kxk1 .
Operator Norm
Definition
Suppose m, n ∈ N are given and let k·kα be a vector norm on
Cn and k·kβ a vector norm on Cm . For A ∈ Cm,n we define
kAxkβ
kAk := kAkα,β := max . (2)
x6=0 kxkα
We call this the (α, β) operator norm, the (α, β)-norm, or
simply the α-norm if α = β.
Observations
kAxkα
I kAkα,β = maxx∈ker(A)
/ kxkβ
= maxkxkβ =1 kAxkα .
I kAxkα ≤ kAkkxkβ .
I kAkα,β = kAx∗ kα for some x∗ ∈ Cn with kx∗ kβ = 1.
I The operator norm is a matrix norm on Cmn, .
I The Sum norm and Frobenius norm are not an α operator
norm for any α.
Operator norm Properties
I The operator norm is a matrix norm on Cm,n .
I The operator norm is consistent if the vector norm k kα is
defined for all m ∈ N and k kβ = k kα .
Proof
In 2. and 3. below we take the max over the unit sphere Sβ .
1. Nonnegativity is obvious. If kAk = 0 then kAykβ = 0 for
each y ∈ Cn . In particular, each column Aej in A is zero.
Hence A = 0.
2. kcAk = maxx kcAxkα = maxx |c| kAxkα = |c| kAk.
3. kA + Bk = maxx k(A + B)xkα ≤
maxx kAxkα + maxx kBxkα = kAk + kBk.
kABxkα
4. kABk = maxBx6=0 kxkα
= maxBx6=0 kABxk α kBxkα
kBxkα kxkα
≤ maxy6=0 kAyk
kykα
α
maxx6=0 kBxk
kxkα
α
= kAk kBk.
The p matrix norm
I The operator norms k·kp defined from the p-vector norms
are of special interest.
I Recall
p 1/p
Pn
kxkp := j=1 |xj | , p ≥ 1, kxk∞ := max1≤j≤n |xj |.
I Used quite frequently for p = 1, 2, ∞.
I We define for any 1 ≤ p ≤ ∞
kAxkp
kAkp := max = max kAykp . (3)
x6=0 kxkp kykp =1
I The p-norms are consistent matrix norms which are
subordinate to the p-vector norm.
Explicit expressions
Theorem
For A ∈ Cm,n we have
m
X
kAk1 := max |ak,j |, (max column sum)
1≤j≤n
k=1
kAk2 := σ1 , (largest singular value of A)
n
X
kAk∞ := max |ak,j |, (max row sum).
1≤k≤m
j=1
(4)
The expression kAk2 is called the two-norm or the spectral
norm of A. The explicit expression follows from the minmax
theorem for singular values.
Examples
1 14 4 16
For A := 15 [ 2 22 13 ] we find
I kAk1 = 29 .
15
I kAk2 = 2.
I kAk∞ = 37 .
15
√
I kAkF = 5.
I A := [ 1 2
3 4]
I kAk1 = 6
I kAk2 = 5.465
I kAk∞ = 7.
I kAkF = 5.4772
The 2 norm
Theorem
Suppose A ∈ Cn,n has singular values σ1 ≥ σ2 ≥ · · · ≥ σn and
eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |. Then
1
kAk2 = σ1 and kA−1 k2 = , (5)
σn
1
kAk2 = λ1 and kA−1 k2 = , if A is symmetric positive definite,
λn
(6)
1
kAk2 = |λ1 | and kA−1 k2 = , if A is normal. (7)
|λn |
For the norms of A−1 we assume of course that A is
nonsingular.
Unitary Transformations
Definition
A matrix norm k k on Cm,n is called unitary invariant if
kUAVk = kAk for any A ∈ Cm,n and any unitary matrices
U ∈ Cm,m and V ∈ Cn,n .
If U and V are unitary then U(A + E)V = UAV + F, where
kFk = kEk.
Theorem
The Frobenius norm and the spectral norm are unitary
invariant. Moreover kAH kF = kAkF and kAH k2 = kAk2 .
Proof 2 norm
I kUAk2 = maxkxk2 =1 kUAxk2 = maxkxk2 =1 kAxk2 = kAk2 .
I kAH k2 = kAk2 (same singular values).
I kAVk2 = k(AV)H k2 = kVH AH k2 = kAH k2 = kAk2 .
Perturbation of linear systems
I
x1 + x2 = 20
x1 + (1 − 10 )x2 = 20 − 10−15
−16
I The exact solution is x1 = x2 = 10.
I Suppose we replace the second equation by
x1 + (1 + 10−16 )x2 = 20 − 10−15 ,
I the exact solution changes to x1 = 30, x2 = −10.
I A small change in one of the coefficients, from 1 − 10−16
to 1 + 10−16 , changed the exact solution by a large
amount.
Ill Conditioning
I A mathematical problem in which the solution is very
sensitive to changes in the data is called ill-conditioned
or sometimes ill-posed.
I Such problems are difficult to solve on a computer.
I If at all possible, the mathematical model should be
changed to obtain a more well-conditioned or
properly-posed problem.
Perturbations
I We consider what effect a small change (perturbation) in
the data A,b has on the solution x of a linear system
Ax = b.
I Suppose y solves (A + E )y = b+e where E is a (small)
n × n matrix and e a (small) vector.
I How large can y−x be?
I To measure this we use vector and matrix norms.
Conditions on the norms
I k·k will denote a vector norm on Cn and also a
submultiplicative matrix norm on Cn,n which in addition is
subordinate to the vector norm.
I Thus for any A, B ∈ Cn,n and any x ∈ Cn we have
kABk ≤ kAk kBk and kAxk ≤ kAk kxk.
I This is satisfied if the matrix norm is the operator norm
corresponding to the given vector norm or the Frobenius
norm.
Absolute and relative error
I The difference ky − xk measures the absolute error in y
as an approximation to x,
I ky − xk/kxk or ky − xk/kyk is a measure for the relative
error.
Perturbation in the right hand side
Theorem
Suppose A ∈ Cn,n is invertible, b, e ∈ Cn , b 6= 0 and Ax = b,
Ay = b+e. Then
1 kek ky − xk kek
≤ ≤ K (A) , K (A) = kAkkA−1 k.
K (A) kbk kxk kbk
(8)
I Proof:
I Consider (8). kek/kbk is a measure for the size of the
perturbation e relative to the size of b. ky − xk/kxk can
in the worst case be
K (A) = kAkkA−1 k
times as large as kek/kbk.
Condition number
I K (A) is called the condition number with respect to
inversion of a matrix, or just the condition number, if it
is clear from the context that we are talking about solving
linear systems.
I The condition number depends on the matrix A and on
the norm used. If K (A) is large, A is called
ill-conditioned (with respect to inversion).
I If K (A) is small, A is called well-conditioned (with
respect to inversion).
Condition number properties
I Since kAkkA−1 k ≥ kAA−1 k = kI k ≥ 1 we always have
K (A) ≥ 1.
I Since all matrix norms are equivalent, the dependence of
K (A) on the norm chosen is less important than the
dependence on A.
I Usually one chooses the spectral norm when discussing
properties of the condition number, and the l1 and l∞
norm when one wishes to compute it or estimate it.
The 2-norm
I Suppose A has singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0
and eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | if A is square.
I K2 (A) = kAk2 kA−1 k2 = σσn1
|λ1 |
I K2 (A) = kAk2 kA−1 k2 = |λ n|
, A normal.
I It follows that A is ill-conditioned with respect to
inversion if and only if σ1 /σn is large, or |λ1 |/|λn | is large
when A is normal.
I K2 (A) = kAk2 kA−1 k2 = λλn1 , A positive definite.
The residual
Suppose we have computed an approximate solution y to
Ax = b. The vector r(y :) = Ay − b is called the residual
vector , or just the residual. We can bound x−y in term of
r(y).
Theorem
Suppose A ∈ Cn,n , b ∈ Cn , A is nonsingular and b 6= 0. Let
r(y) = Ay − b for each y ∈ Cn . If Ax = b then
1 kr(y)k ky − xk kr(y)k
≤ ≤ K (A) . (9)
K (A) kbk kxk kbk
Discussion
I If A is well-conditioned, (9) says that
ky − xk/kxk ≈ kr(y)k/kbk.
I In other words, the accuracy in y is about the same order
of magnitude as the residual as long as kbk ≈ 1.
I If A is ill-conditioned, anything can happen.
I The solution can be inaccurate even if the residual is small
I We can have an accurate solution even if the residual is
large.
The inverse of A + E
Theorem
Suppose A ∈ Cn,n is nonsingular and let k·k be a consistent
matrix norm on Cn,n . If E ∈ Cn,n is so small that
r := kA−1 Ek < 1 then A + E is nonsingular and
−1 kA−1 k
k(A + E) k ≤ . (10)
1−r
If r < 1/2 then
k(A + E)−1 − A−1 k kEk
−1 ≤ 2K (A) . (11)
kA k kAk
Proof
I We use that if B ∈ Cn,n and kBk < 1 then I − B is
nonsingular and k(I − B)−1 k ≤ 1−kBk
1
.
I Since r < 1 the matrix I − B := I + A−1 E is nonsingular.
I Since (I − B)−1 A−1 (A + E) = I we see that A + E is
nonsingular with inverse (I − B)−1 A−1 .
I Hence, k(A + E)−1 k ≤ k(I − B)−1 kkA−1 k and (10)
follows.
I From the identity (A + E)−1 − A−1 = −A−1 E(A + E)−1
we obtain by
k(A + E)−1 − A−1 k ≤ kA−1 kkEkk(A + E)−1 k ≤
kEk kA−1 k
K (A) kAk 1−r
.
I Dividing by kA−1 k and setting r = 1/2 proves (11).
Perturbation in A
Theorem
Suppose A, E ∈ Cn,n , b ∈ Cn with A invertible and b 6= 0. If
r := kA−1 Ek < 1/2 for some operator norm then A + E is
invertible. If Ax = b and (A + E)y = b then
ky − xk kEk
≤ kA−1 Ek ≤ K (A) , (12)
kyk kAk
ky − xk kEk
≤ 2K (A) .. (13)
kxk kAk
Proof
I A + E is invertible.
I (12) follows easily by taking norms in the equation
x − y = A−1 Ey and dividing by kyk.
From the identity y − x = (A + E)−1 − A−1 Ax we
I
obtain ky − xk ≤ k(A + E)−1 − A−1 kkAkkxk and (13)
follows.
Finding the rank of a matrix
I Gauss-Jordan cannot be used to determine rank
numerically
I Use singular value decomposition
I numerically will normally find σn > 0.
I Determine minimal r so that σr +1 , . . . , σn are ”close” to
round off unit.
I Use this r as an estimate for the rank.
Convergence in Rm,n and Cm,n
I Consider an infinite sequence of matrices
{Ak } = A0 , A1 , A2 , . . . in Cm,n .
I {Ak } is said to converge to the limit A in Cm,n if each
element sequence {Ak (ij)}k converges to the
corresponding element A(ij) for i = 1, . . . , m and
j = 1, . . . , n.
I {Ak } is a Cauchy sequence if for each > 0 there is an
integer N ∈ N such that for each k, l ≥ N and all i, j we
have |Ak (ij) − Al (ij)| ≤ .
I {Ak } is bounded if there is a constant M such that
|Ak (ij)| ≤ M for all i, j, k.
More on Convergence
I By stacking the columns of A into a vector in Cmn we
obtain
I A sequence {Ak } in Cm,n converges to a matrix A ∈ Cm,n
if and only if limk→∞ kAk − Ak = 0 for any matrix norm
k·k.
I A sequence {Ak } in Cm,n is convergent if and only if it is
a Cauchy sequence.
I Every bounded sequence {Ak } in Cm,n has a convergent
subsequence.
The Spectral Radius
I ρ(A) = maxλ∈σ(A) |λ|.
I For any matrix norm k·k on Cn,n and any A ∈ Cn,n we
have ρ(A) ≤ kAk.
I Proof: Let (λ, x) be an eigenpair for A
I X := [x, . . . , x] ∈ Cn,n .
I λX = AX, which implies
|λ| kXk = kλXk = kAXk ≤ kAk kXk.
I Since kXk = 6 0 we obtain |λ| ≤ kAk.
A Special Norm
Theorem
Let A ∈ Cn,n and > 0 be given. There is a consistent matrix
norm k·k0 on Cn,n such that ρ(A) ≤ kAk0 ≤ ρ(A) + .
A Very Important Result
Theorem
For any A ∈ Cn,n we have
lim Ak = 0 ⇐⇒ ρ(A) < 1.
k→∞
I Proof:
I Suppose ρ(A) < 1.
I There is a consistent matrix norm k·k on Cn,n such that
kAk < 1.
I But then kAk k ≤ kAkk → 0 as k → ∞.
I Hence Ak → 0.
I The converse is easier.
Convergence can be slow
0.99 1 0 0.4 9.37 1849
I A = 0 0.99 1 , A100 = 0 0.4 37 ,
0 0 0.99 0 0 0.4
−9
10 0.004
A2000 = 0 10−9
0 0 10−9
More limits
I For any submultiplicative matrix norm k·k on Cn,n and
any A ∈ Cn,n we have
lim kAk k1/k = ρ(A). (14)
k→∞