Exercise 4.4 Cholesky Update (Exam Exercise 2015-2))
5.1 Inner Products, Orthogonality and Unitary Matrices
Aninner productorscalar productin a vector space is a function mapping pairs of vectors into a scalar.
© Springer Nature Switzerland AG 2020
T. Lyche,Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_5
99
5.1.1 Real and Complex Inner Products
Definition 5.1 (Inner Product) Aninner productin a complex vector spaceVis a functionV×V →Csatisfying for allx,y,z∈V and alla, b∈Cthe following conditions:
1. x,x ≥0 with equality if and only ifx =0. (positivity)
2. x,y = y,x (skew symmetry)
3. ax+by,z =ax,z +by,z. (linearity)
The pair(V,ã,ã)is called an inner product space.
Note the complex conjugate in 2. Since
x, ay+bz = ay+bz,x =ay,x +bz,x =ay,x +bz,x we find
x, ay+bz =ax,y +bx,z, ax, ay = |a|2x,y. (5.1) An inner product in a real vector space V is real valued function satisfying Properties 1,2,3 in Definition 5.1, where we can replace skew symmetry by symmetry
x,y = y,x (symmetry).
In the real case we have linearity in both variables since we can remove the complex conjugates in (5.1).
Recall that (cf. (1.10)) thestandard inner product inCnis given by x,y :=y∗x=xTy=
n j=1
xjyj.
Note the complex conjugate ony. It is clearly an inner product inCn. The function
ã :V→R, x −→ x :=
x,x (5.2)
is called theinner product norm.
The inner product norm for the standard inner product is the Euclidian norm x = x2=√
x∗x.
The following inequality holds for any inner product.
Theorem 5.1 (Cauchy-Schwarz Inequality) For any x,y in a real or complex inner product space
|x,y| ≤ xy, (5.3)
with equality if and only ifxandyare linearly dependent.
Proof Ify=0then 0x+y=0andxandyare linearly dependent. Moreover the inequality holds with equality sincex,y = x,0y =0x,y =0 andy =0.
So assumey=0. Define
z:=x−ay, a :=x,y y,y.
By linearityz,y = x,y −ay,y =0 so that by 2. and (5.1)
ay,z + z, ay =az,y +az,y =0. (5.4) But then
x2= x,x = z+ay,z+ay
(5.4)
= z,z + ay, ay(5.1)= z2+ |a|2y2
≥ |a|2y2=|x,y|2 y2 .
Multiplying byy2gives (5.3). We have equality if and only ifz=0, which means
thatxandyare linearly dependent.
Theorem 5.2 (Inner Product Norm) For allx,yin an inner product space and allainCwe have
1. x ≥0with equality if and only ifx =0. (positivity)
2. ax = |a| x. (homogeneity)
3. x+y ≤ x + y, (subadditivity)
wherex :=√ x,x.
In general a function :Cn →Rthat satisfies these three properties is called a vector norm. A class of vector norms calledp-norms will be studied in Chap.8.
Proof The first statement is an immediate consequence of positivity, while the second one follows from (5.1). Expandingx+ay2 = x+ay,x+ayusing (5.1) we obtain
x+ay2= x2+ay,x+ax,y+|a|2y2, a ∈C, x,y∈V. (5.5) Now (5.5) witha=1 and the Cauchy-Schwarz inequality implies
x+y2≤ x2+2xy + y2=(x + y)2.
Taking square roots completes the proof.
In the real case the Cauchy-Schwarz inequality implies that−1 ≤ xx,yy ≤ 1 for nonzeroxandy, so there is a unique angleθin[0, π]such that
cosθ= x,y
xy. (5.6)
This defines theanglebetween vectors in a real inner product space.
5.1.2 Orthogonality
Definition 5.2 (Orthogonality) Two vectorsx,yin a real or complex inner prod- uct space areorthogonalorperpendicular, denoted asx ⊥y, ifx,y = 0. The vectors areorthonormalif in additionx = y =1.
From the definitions (5.6), (5.20) of angleθbetween two nonzero vectors inRn orCnit follows thatx⊥yif and only ifθ=π/2.
Theorem 5.3 (Pythagoras) For a real or complex inner product space
x+y2= x2+ y2, if x ⊥y. (5.7) Proof We seta=1 in (5.5) and use the orthogonality.
Definition 5.3 (Orthogonal- and Orthonormal Bases) A set of nonzero vectors {v1, . . . ,vk} in a subspace S of a real or complex inner product space is an orthogonal basisforS if it is a basis forS andvi,vj = 0 fori = j. It is an orthonormal basisforSif it is a basis forSandvi,vj =δijfor alli, j.
A basis for a subspace of an inner product space can be turned into an orthogonal- or orthonormal basis for the subspace by the following construction (Fig.5.1).
Fig. 5.1 The construction of v1andv2in Gram-Schmidt.
The constantcis given by c:= s2,v1/v1,v1
s2
cv1
v2 v1:=s1
*
*
*
A AAA
Theorem 5.4 (Gram-Schmidt) Let{s1, . . . ,sk}be a basis for a real or complex inner product space(S,ã,ã). Define
v1:=s1, vj :=sj −
j−1
i=1
sj,vi
vi,vivi, j =2, . . . , k. (5.8) Then{v1, . . . ,vk}is an orthogonal basis forSand the normalized vectors
{u1, . . . ,uk} :=
, v1
v1, . . . , vk vk
-
form an orthonormal basis forS.
Proof To show that{v1, . . . ,vk}is an orthogonal basis forSwe use induction on k. Define subspaces Sj := span{s1, . . . ,sj}forj = 1, . . . , k. Clearly v1 = s1
is an orthogonal basis forS1. Suppose for somej ≥ 2 thatv1, . . . ,vj−1 is an orthogonal basis forSj−1and let vj be given by (5.8) as a linear combination of sj andv1, . . . ,vj−1. Now each of thesevi is a linear combination ofs1, . . . ,si, and we obtainvj = ji=1aisi for somea0, . . . , aj withaj =1. Sinces1, . . . ,sj are linearly independent andaj = 0 we deduce that vj = 0. By the induction hypothesis
vj,vl = sj,vl −
j−1
i=1
sj,vi
vi,vivi,vl = sj,vl −sj,vl
vl,vlvl,vl =0 forl=1, . . . , j−1. Thusv1, . . . ,vjis an orthogonal basis forSj.
If {v1, . . . ,vk} is an orthogonal basis for S then clearly {u1, . . . ,uk} is an
orthonormal basis forS.
Sometimes we want to extend an orthogonal basis for a subspace to an orthogonal basis for a larger space.
Theorem 5.5 (Orthogonal Extension of Basis) SupposeS⊂T are finite dimen- sional subspaces of a vector space V. An orthogonal basis forS can always be extended to an orthogonal basis forT.
Proof Suppose dimS := k < dimT = n. Using Theorem1.3we first extend an orthogonal basiss1, . . . ,sk forS to a basiss1, . . . ,sk,sk+1, . . . ,sn forT, and then apply the Gram-Schmidt process to this basis obtaining an orthogonal basis v1, . . . ,vn forT. This is an extension of the basis forS sincevi = si fori = 1, . . . , k. We show this by induction. Clearlyv1=s1. Suppose for some 2≤r < k thatvj = sj forj = 1, . . . , r −1. Consider (5.8) forj = r. Since sr,vi =
sr,si =0 fori < rwe obtainvr =sr.
LettingS=span(s1, . . . ,sk)andT beRnorCnwe obtain
Corollary 5.1 (Extending Orthogonal Vectors to a Basis) For1 ≤ k < n a set{s1, . . . ,sk}of nonzero orthogonal vectors inRnorCn can be extended to an orthogonal basis for the whole space.
5.1.3 Sum of Subspaces and Orthogonal Projections
SupposeSandT are subspaces of a real or complex vector spaceVendowed with an inner productx,y. We define
• Sum:S+T := {s+t:s∈Sandt ∈T},
• direct sumS⊕T: a sum whereS∩T = {0},
• orthogonal sumS⊕T⊥ : a sum wheres,t =0 for alls∈Sandt∈T. We note that
• S+T is a vector space, a subspace ofVwhich in this book will beRnorCn(cf.
Example1.2).
• Everyv ∈ S⊕T can be decomposed uniquely in the formv = s+t, where s ∈ Sandt ∈ T. For ifv =s1+t1=s2+t2fors1,s2 ∈S andt1,t2 ∈ T, then0=s1−s2+t1−t2ors1−s2=t2−t1. It follows thats1−s2andt2−t1 belong to bothSandT and hence toS∩T. But thens1−s2=t2−t1=0so s1=s2andt2=t1.
By (1.8) in the introduction chapter we have
dim(S⊕T)=dim(S)+dim(T).
The subspacesSandT in a direct sum are calledcomplementary subspaces.
• An orthogonal sum is a direct sum. For ifv∈S∩T thenvis orthogonal to itself, v,v =0, which implies thatv=0. We often writeT :=S⊥.
• Supposev = s0+t0 ∈ S⊕T, wheres0 ∈ S andt0 ∈ T. The vectors0is called theoblique projection ofv intoS alongT. Similarly, The vectort0is
Fig. 5.2 The orthogonal projections ofs+tintoS andT
s0 S
t0 s+t
called the oblique projection ofv intoT alongS. IfS⊕T⊥ is an orthogonal sum then s0 is called the orthogonal projectionofv into S. Similarly,t0 is called theorthogonal projectionofvinT = S⊥. The orthogonal projections are illustrated in Fig.5.2.
Theorem 5.6 (Orthogonal Projection) Let S and T be subspaces of a finite dimensional real or complex vector space V with an inner product ã,ã. The orthogonal projectionss0ofv ∈ S⊕T⊥ intoSandt0 ofv ∈ S⊕T⊥ intoT satisfy v=s0+t0, and
s0,s = v,s, for alls∈S, t0,t = v,t, for allt∈T. (5.9) Moreover, if{v1, . . . ,vk}is an orthogonal basis forSthen
s0= k i=1
v,vi
vi,vivi. (5.10)
Proof We haves0,s = v−t0,s = v,s, sincet0,s =0 for alls ∈ S and (5.9) follows. Ifs0is given by (5.10) then forj =1, . . . , k
s0,vj = k
i=1
v,vi
vi,vivi,vj = k i=1
v,vi
vi,vivi,vj = v,vj.
By linearity (5.9) holds for all s ∈ S. By uniqueness it must be the orthogonal projections ofv∈S⊕T⊥ intoS. The proof fort0is similar.
Corollary 5.2 (Best Approximation) LetSbe a subspaces of a finite dimensional real or complex vector spaceVwith an inner productã,ãand corresponding norm v :=√
v,v. Ifs0∈Sis the orthogonal projection ofv∈Vthen
v−s0<v−s, for alls∈S, s=s0. (5.11) Proof Let s0 = s ∈ S and 0 = u := s0−s ∈ S. It follows from (5.9) that v−s0,u =0. By (5.7) (Pythagoras) we obtain
v−s2= v−s0+u2= v−s02+ u2>v−s02.
5.1.4 Unitary and Orthogonal Matrices
In the rest of this chapter orthogonality is in terms of thestandard inner product in Cngiven byx,y :=y∗x = nj=1xjyj. For symmetric and Hermitian matrices we have the following characterization.
Lemma 5.1 LetA∈Cn×nandx,ybe the standard inner product inCn. Then 1. AT =A ⇐⇒ Ax,y = x,Ayfor allx,y∈Cn.
2. A∗=A ⇐⇒ Ax,y = x,Ayfor allx,y∈Cn. Proof SupposeAT =Aandx,y∈Cn. Then
x,Ay =(Ay)∗x=y∗A∗x=y∗ATx=y∗Ax= Ax,y. For the converse we takex=ejandy=eifor somei, j and obtain
eTi Aej = Aej,ei = ej,Aei =eTi ATej.
Thus,A=AT since they have the samei, j element for alli, j. The proof of 2. is
similar.
A square matrixU ∈ Cn×nisunitaryifU∗U =I. IfU is real thenUTU = I andU is called an orthogonal matrix. Unitary and orthogonal matrices have orthonormal columns.
IfU∗U = I the matrixU is nonsingular,U−1 = U∗ and thereforeU U∗ = U U−1 = I as well. Moreover, both the columns and rows of a unitary matrix of ordernform orthonormal bases forCn. We also note that the product of two unitary matrices is unitary. Indeed, ifU∗1U1=IandU∗2U2=Ithen(U1U2)∗(U1U2)= U∗2U∗1U1U2=I.
Theorem 5.7 (Unitary Matrix) The matrixU ∈ Cn×n is unitary if and only if U x,U y = x,yfor allx,y ∈Cn. In particular, ifU is unitary thenU x2 = x2for allx∈Cn.
Proof IfU∗U=I andx,y∈Cnthen
U x,U y =(U y)∗(U x)=y∗U∗U x=y∗x= x,y.
Conversely, if U x,U y = x,y for all x,y ∈ Cn then U∗U = I since for i, j =1, . . . , n
(U∗U)i,j =e∗iU∗U ej =(U ei)∗(U ej)= U ej,U ei = ej,ei =e∗iej, so that(U∗U)i,j =δi,j for alli, j. The last part of the theorem follows immediately
by takingy=x: