Definition 8.4 Consistent Matrix Norms) A matrix norm is called consistent on
8.3 The Condition Number with Respect to Inversion
Consider the system of two linear equations
x1+ x2=20
x1+(1−10−16)x2=20−10−15
whose exact solution isx1=x2=10. If we replace the second equation by x1+(1+10−16)x2=20−10−15,
the exact solution changes tox1 = 30,x2 = −10. Here a small change in one of the coefficients, from 1−10−16to 1+10−16, changed the exact solution by a large amount.
A mathematical problem in which the solution is very sensitive to changes in the data is called ill-conditioned. Such problems can be difficult to solve on a computer.
In this section we consider what effect a small change (perturbation) in the data A,bhas on the inverse ofAand on the solutionxof a linear systemAx = b. To measure this we use vector and matrix norms. In this section will denote a vector norm onCnand also a matrix norm onCn×n. We assume that the matrix norm is consistent onCn×nand subordinate to the vector norm. Thus, for anyA,B∈Cn×n and anyx∈Cnwe have
AB ≤ A BandAx ≤ A x.
Recall that this holds if the matrix norm is the operator norm corresponding to the given vector norm. It also holds for the Frobenius matrix norm and the Euclidian vector norm. This follows from Lemma7.1. We recall that ifI ∈Rn×nthenI =1 for an operator norm, whileIF =√
n.
8.3.1 Perturbation of the Right Hand Side in a Linear Systems
Supposex,ysolveAx=band(A+E)y=b+e, respectively. whereA,A+E∈ Cn×nare nonsingular andb,e∈Cn. How large cany−xbe? The differencey−x measures theabsolute errorinyas an approximation tox, whiley−x/xand y−x/yare measures for therelative error.
We consider first the simpler case of a perturbation in the right-hand sideb.
Theorem 8.7 (Perturbation in the Right-Hand Side) Suppose A ∈ Cn×n is nonsingular,b,e∈Cn,b=0andAx=b,Ay=b+e. Then
1 K(A)
e
b ≤ y−x
x ≤K(A)e
b, (8.22)
whereK(A)= A A−1is the condition number ofA.
Proof SubtractingAx=bfromAy=b+ewe haveA(y−x)=eory−x=A−1e.
Combiningy−x = A−1e ≤ A−1 eandb = Ax ≤ A xwe obtain the upper bound in (8.22). Combininge ≤ A y −x and x ≤
A−1 bwe obtain the lower bound.
Consider (8.22).e/bis a measure of the size of the perturbationerelative to the size ofb. The upper bound says thaty−x/xin the worst case can be K(A)times as large ase/b.
The bounds in (8.22) depends onK(A). This number is called thecondition number with respect to inversion of a matrix, or just the condition number of A, if it is clear from the context that we are talking about inverting a matrix. The condition number depends on the matrixAand on the norm used. IfK(A)is large, Ais calledill-conditioned(with respect to inversion). IfK(A)is small,Ais called well-conditioned(with respect to inversion). We always haveK(A)≥1. For since x = I x ≤ Ixfor anyx we haveI ≥ 1 and thereforeA A−1 ≥ AA−1 = I ≥1.
Since all matrix norms are equivalent, the dependence ofK(A)on the norm chosen is less important than the dependence on A. Example 8.1 provided an illustration of this. See also Exercise8.19. Sometimes one chooses the spectral norm when discussing properties of the condition number, and the1,∞, or Frobenius norm when one wishes to compute it or estimate it.
Suppose we have computed an approximate solutiony toAx = b. The vector r(y):= Ay−bis called theresidual vector, or just the residual. We can bound x−yin terms ofr.
Theorem 8.8 (Perturbation and Residual) Suppose A ∈ Cn×n, b ∈ Cn,Ais nonsingular andb=0. Letr(y)=Ay−bfory∈Cn. IfAx=bthen
1 K(A)
r(y)
b ≤ y−x
x ≤K(A)r(y)
b . (8.23)
Proof We simply takee=r(y)in Theorem8.7.
Consider next a perturbation in the coefficient matrix in a linear system. Suppose A,E∈Cn×nwithA,A+Enonsingular. We like to compare the solutionxandy of the systemsAx=band(A+E)y=b.
Theorem 8.9 (Perturbation in Matrix) SupposeA,E ∈ Cn×n,b ∈ Cn withA nonsingular andb=0. Ifr := A−1E<1thenA+Eis nonsingular. IfAx=b and(A+E)y=bthen
y−x
y ≤r≤K(A)E
A, (8.24)
y−x x ≤ r
1−r ≤ K(A) 1−r
E
A. (8.25)
Proof We showA+E singular impliesr ≥ 1. SupposeA+E is singular. Then (A+E)x = 0 for some nonzero x ∈ Cn. Multiplying byA−1 it follows that (I+A−1E)x=0and this implies thatx = A−1Ex ≤rx. But thenr≥1.
Subtracting (A +E)y = b from Ax = b gives A(x −y) = Ey or x − y = A−1Ey. Taking norms and dividing by y proves (8.24). Solving
x−y = A−1Ey fory we obtainy =(I +A−1E)−1x. By Theorem12.14we havey ≤ (I +A−1E)−1x ≤ 1−xr and (8.24) impliesy−x ≤ ry ≤
r
1−rx ≤K(A)1−r EAx. Dividing byxgives (8.25).
In Theorem8.9we gave bounds for the relative error inx as an approximation toyand the relative error inyas an approximation tox.E/Ais a measure for the size of the perturbationEinArelative to the size ofA. The condition number again plays a crucial role.y−x/ycan be as large asK(A)timesE/A. It can be shown that the upper bound can be attained for anyAand anyb. In deriving the upper bound we used the inequalityA−1Ey ≤ A−1 E y. For a more or less random perturbationEthis is not a severe overestimate forA−1Ey. In the situation whereEis due to round-off errors (8.24) can give a fairly realistic estimate fory−x/y.
The following explicit expressions for the 2-norm condition number follow from Theorem8.4.
Theorem 8.10 (Spectral Condition Number) SupposeA∈ Cn×n is nonsingular with singular valuesσ1≥σ2≥ ã ã ã ≥σn >0and eigenvalues|λ1| ≥ |λ2| ≥ ã ã ã ≥
|λn|>0. Then
K2(A)=
⎧⎪
⎪⎨
⎪⎪
⎩
λ1/λn, ifAis positive definite,
|λ1|/|λn|, ifAis normal, σ1/σn, in general.
(8.26)
It follows thatAis ill-conditioned with respect to inversion if and only ifσ1/σn
is large, orλ1/λnis large whenAis positive definite.
IfAis well-conditioned, (8.23) says thaty−x/x ≈ r(y)/b. In other words, the accuracy iny is about the same order of magnitude as the residual as long asb ≈1. IfAis ill-conditioned, anything can happen. We can for example have an accurate solution even if the residual is large.
8.3.2 Perturbation of a Square Matrix
SupposeAis nonsingular andEa perturbation ofA. We expectB:=A+Eto be nonsingular whenEis small relative toA. But how small is small? It is also useful to have bounds onB−1 in terms ofA−1and the difference B−1−A−1. We consider the relative errorsB−1/A−1,B−1−A−1/B−1andB−1− A−1/A−1.
Theorem 8.11 (Perturbation of Inverse Matrix) SupposeA ∈ Cn×n is nonsin- gular and letB:=A+E ∈Cn×nbe nonsingular. For any consistent matrix norm
we have
B−1−A−1
B−1 ≤ A−1E ≤K(A)E
A, (8.27)
whereK(A):= A A−1. Ifr:= A−1E<1thenBis nonsingular and 1
1+r ≤ B−1 A−1 ≤ 1
1−r. (8.28)
We also have
B−1−A−1 A−1 ≤ r
1−r ≤ K(A) 1−r
E
A. (8.29)
We can replaceA−1EbyEA−1everywhere.
Proof ThatBis nonsingular ifr <1 follows from Theorem8.9. We have−E = A−B=A(B−1−A−1)B=B(B−1−A−1)Aso that
B−1−A−1= −A−1EB−1= −B−1EA−1. (8.30) Therefore, ifBis nonsingular then by (8.30)
B−1−A−1 ≤ A−1EB−1 ≤K(A)E AB−1.
Dividing through byB−1gives the upper bounds in (8.27). Next, (8.30) implies B−1 ≤ A−1 + A−1EB−1 ≤ A−1 +rB−1.
Solving forB−1 and dividing byA−1 we obtain the upper bound in (8.28).
Similarly we obtain the lower bound in (8.28) fromA−1 ≤ B−1 +rB−1. The bound in (8.29) follows by multiplying (8.27) byB−1/A−1and using (8.28).
That we can replaceA−1EbyEA−1everywhere follows from (8.30).