Two vectors are said to be orthogonal if their inner product is zero see also Section 1.4.. Orthogonal unit vectors are also known as orthonormal vectors.. Hence, orthonormal vectors can
Trang 1MATRIX ALGEBRA FOR ECONOMETRICS
MEI-YUAN CHEN
Department of FinanceNational Chung Hsing University
July 2, 2003
c ATEX source files are mat-alg1.tex
Trang 21.1 Vector Operations 1
1.2 Inner Products 2
1.3 Unit Vectors 3
1.4 Direction Cosines 4
1.5 Statistical Applications 6
2 Vector Spaces 8 2.1 The Dimension of a Vector Space 8
2.2 The Sum and Direct Sum of Vector Spaces 10
2.3 Orthogonal Basis Vectors 11
2.4 Orthogonal Projection of a Vector 13
2.5 Statistical Applications 14
3 Matrices 16 3.1 Matrix Operations 17
3.2 Matrix Scalar Functions 20
3.3 Matrix Rank 22
3.4 Matrix Inversion 24
3.5 Statistical Applications 26
4 Linear Transformations and Systems of Linear Equaltions 27 4.1 Systems of Linear Equations 27
4.2 Linear Transformations 28
5 Special Matrices 32 5.1 Symmetric Matrices 32
5.2 Quadratic Forms and Definite Matrices 32
5.3 Differentiation Involving Vectors and Matrices 34
5.4 Idempotent Matrices 35
5.5 Orthogonal Matrices 35
5.6 Projection Matrices 36
Trang 35.7 Partitioned Matrices 37
5.8 Statistical Applications 38
6 Eigenvalues and Eigenvectors 39 6.1 Eigenvalues and Eigenvectors 39
6.2 Some Basic Properties of Eigenvalues and Eigenvectors 41
6.3 Diagonalization 43
6.4 Orthogonal Diagonalization 45
Trang 4represent the corresponding coordinates In what follows, vectors are denoted by Englishand Greek alphabets in boldface The representation of vectors on a geometric plane is asfollowing:
-J J J J J J J J J J
Consider vectors u, v, and w in Rn and scalars h and k Two vectors u and v are said to
be equal if they are the same componentwise, i.e., ui = vi, i = 1, , n The sum of u and
Trang 6The Euclidean norm (or `2-norm) of u is defined as
Some other commonly used vector norms are as follows The norm kuk1 of u, calledthe sum norm or `1-norm, is defined by
kuk∞= max
1≤i≤n|ui|
Note that for any norm and a scalar h, khuk = |h| kuk
A vector is said to be a unit vector if it has norm one Two vectors are said to be orthogonal
if their inner product is zero (see also Section 1.4) For example, (1, 0, 0), (0, 1, 0), (0, 0, 1),and (0.267, 0.534, 0.802) are all unit vectors in R3, but only the first three vectors aremutually orthogonal Orthogonal unit vectors are also known as orthonormal vectors It
is also easy to see that any non-zero vector can be normalized to unit length, i.e., for any
u 6= 0, u/kuk has norm one
Trang 7Any n-dimensional vector can be represented as a linear combination of n orthonormalvectors:
u = (u1, u2, , un)
= (u1, 0, , 0) + (0, u2, 0, , 0) + · · · + (0, , 0, un)
= u1(1, 0, , 0) + u2(0, 1, , 0) + · · · + un(0, , 0, 1)
Hence, orthonormal vectors can be viewed as orthogonal coordinate axes of unit length
We could, of course, change the coordinate system without affecting the vector Forexample, we can express u = (1, 1/2) in terms of two orthogonal vectors (2,0) and (0,3)
Note that for any scalar non-zero h, hu has the direction cosines
cos θi= hui/khuk = ±(ui/kuk), i = 1, , n
That is, direction cosines are independent of vector magnitudes; only the sign of h rection) matters Let u and v be two vectors in Rn with direction cosines ri, and
(di-si, i = 1, , n Also let θ denote the angle between u and v Then by the law ofcosine,
cos θ = kuk
2+ kvk2− ku − vk2
2kukkvk .Using the definition of direction cosines, the numerator can be expressed as
Trang 8where θ is the angle between u and v.
Alternative proof for above theorem is as following Write u/kuk = (cos α, sin α)0 andv/kvk = (cos β, sin β)0 Then the inner product of these two vectors is
(u/kuk) · (v/kvk) = cos α cos β + sin α sin β = cos(β − α)
when β > α Then β − α is the angle between u/kuk and v/kvk
When θ = 0(π), u and v are on the same ”line” and have the same (or opposite)direction In this case, u and v are said to be linearly dependent (collinear) and u = hvfor some scalar h When θ = π/2, u and v are said to be orthogonal Therefore, twonon-zero vectors u and v are orthogonal if and only if u · v = 0
As −1 ≤ cos θ ≤ 1, we immediately have from Theorem 1:
Theorem 1.2 (Cauchy-Schwartz Inequality) Given two vectors u and v,
|u · v| ≤ kukkvk;
the equality holds when u and v are linearly dependent
By the Cauchy-Schwartz inequality,
Trang 9Theorem 1.3 (Triangle Inequality) Given two vectors u and v,
ku + vk ≤ kuk + kvk;
the equality holds when u = hv and h > 0
If u and v are orthogonal, we have
ku + vk2 = kuk2+ kvk2,
the generalized Pythagoras theorem
1.5 Statistical Applications
Given a random variable X with n observations x1, , xn, different statistics can be used
to summarize the information contained in this sample An important statistic is thesample average of xi which shows the ”central tendency” of these observations:
s2x is invariant with respect to scalar addition Also note that sample variance is divided
by n − 1 rather than n This is because
Trang 10For two random variables X and Y with the vectors of observations x and y, theirsample covariance characterizes the co-variation of these observations:
rx,y :=
P n i=1(xi− ¯x)(yi− ¯y)[P n
Trang 112 If u ∈ V and h is any real scalar, then hu ∈ V
Then V is called a vector space in n-dimensional space Note that vectors in Rn with thestandard operations of addition and multiplication must obey the properties mentioned
in Section 1.1 For example, Rn and {0} are vector spaces, but the set of all vectors (a, b)with a ≥ 0 and b ≥ 0 is not a vector space A set S is a subspace of a vector space V if
S ⊂ V is closed under vector addition and scalar multiplication For examples, {0}, R3,lines through the origin, and planes through the origin are subspaces of R3; {0}, R2, andlines through the origin are subspaces of R2
Definition 2.2 The vectors u1, , uk in a vector space V are said to span V if everyvector in V can be expressed as a linear combination of these vectors
It is not difficult to see that, given k spanning vectors u1, , uk, all linear combinations
of these vectors form a subspace of V , denoted as span(u1, , uk) In fact, this is thesmallest subspace containing u1, , uk For example, let u1 and u2 be non-collinearvectors in Rn with initial points at the origin, then all linear combinations of u1 and u2form a subspace which is a plane through the origin
Let S = {u1, , uk} be a set of non-zero vectors Then S is said to be a linearlyindependent set if the only solution to the vector equation
a1u1+ a2u2+ · · · + akuk= 0
is a1 = a2 = · · · = 0; if there are other solutions, S is said to be linearly dependent.Clearly, any one of k linearly dependent vectors can be written as a linear combination ofthe remaining k − 1 vectors For example, the vectors: u1 = (2, −1, 0, 3), u2= (1, 2, 5, −1),
Trang 12and u3= (7, −1, 5, 8) are linearly dependent because 3u1+ u2− u3= 0, and the vectors:(1, 0, 0), (0, 1, 0), and (0, 0, 1) are clearly linearly independent It is also easy to show thefollowing result; see Exercise 2.3.
Theorem 2.1 Let S = {u1, , uk} ⊆ V
(a) If there exists a subset of S which contains r ≤ k linearly dependent vectors, then S
is also linearly dependent
(b) If S is a linearly independent set, then any subset of r ≤ k vectors is also linearlyindependent
Proof: (a) Let u1, , ur be linear dependent vectors, then there exist some nonzero aisuch that a1u1+ · · · + arur = 0 Therefore, the system
a1u1+ · · · + arur+ ar+1ur+1+ · · · + akuk= 0
has nonzero ais even providing ar+1 = · · · = ak = 0 2
A set of linearly independent vectors in V is a basis for V if these vectors span V Anonzero vector space V is finite dimensional if its basis contains a finite number of spanningvectors; otherwise, V is infinite dimensional The dimension of a finite dimensional vectorspace V is the number of vectors in a basis for V Note that {0} is a vector space withdimension zero As examples we note that {(1, 0), (0, 1)} and {(3, 7), (5, 5)} are two basesfor R2 If the dimension of a vector space is known, the result below shows that a set ofvectors is a basis if it is either a spanning set or a linearly independent set
Theorem 2.2 Let V be a k-dimensional vector space and S = {u1, , uk} Then S is abasis for V provided that either S spans V or S is a set of linearly independent vectors.Proof: If S spans V but S is not a basis, then the vectors in S are linearly dependent
We thus have a subset of S that spans V but contains r < k linearly independent vectors
It follows that the dimension of V should be r, contradicting the original hypothesis.Conversely, if S is a linearly independent set but not a basis, then S does not span
V Thus, there must exist r > k linearly independent vectors spanning V This againcontradicts the hypothesis that V is k-dimensional 2
If S = {u1, , ur} is a set of linearly independent vectors in a k-dimensional vectorspace V such that r < k, then S is not a basis for V We can find a vector ur+1 which
Trang 13is linearly independent of the vectors in S By enlarging S to S0 = {u1, , ur+1} andrepeating this step k − r times, we obtain a set of k linearly independent vectors It followsfrom Theorem 2.2 that this set must be a basis for V We have proved:
Theorem 2.3 Let V be a k-dimensional vector space and S = {u1, , ur}, r ≤ k, be aset of linearly independent vectors Then there exist k − r vectors ur+1, , uk which arelinearly independent of S such that {u1, , ur, , uk} form a basis for V
Given two vector spaces U and V , the set of all vectors belonging to both U and V iscalled the intersection of U and V , denoted as U ∩V , and the set of all vectors u+v, where
u ∈ U and v ∈ V , is called the sum or union of U and V , denoted as U ∪ V For example,
u = (2, 1, 0, 0, 0) ∈ R2 and v = (0, 0, −1, 3, 5) ∈ R3, then u + v = (2, 1, −1, 3, 5) ∈ R5.Another example is, u = (2, 1, 0, 0) ∈ R2 and v = (1, 0.5, −1, 0) ∈ R3, then u + v =(3, 1.5, −1, 0) ∈ R3
Theorem 2.4 Given two vector spaces U and V , let dim(U ) = m, dim(V ) = n, dim(U ∩
V ) = k, and dim(U ∪ V ) = p, where dim(·) denotes the dimension of a vector space Then,
with vs not in U ∩ V Thus, the vectors in S(U ) and S(V ) form a spanning set for
S ∪ V The assertion follows from the fact that these vectors form a basis Hence, itremains to show that these vectors are linearly independent Consider an arbitrary linearcombination:
a1w1+ · · · + akwk+ b1u1+ · · · + bm−kum−k= −c1v1− · · · − cn−kvn−k
Clearly, the left-hand side is a vector in U , and hence the right-hand side is also in U Note, however, that a linear combination of v1, , vn−k should be a vector in V but not
Trang 14in U Hence, the only possibility is that the right-hand side is the zero vector This impliesthat the coefficients ci must be all zeros because v1, , vn−k are linearly independent.Consequently, all ai and bi must also be all zeros 2When U ∩ V = {0}, U ∪ V is called the direct sum of U and V , denoted as U ⊕ V Itfollows from Theorem 2.4 that the dimension of the direct sum of U and V is
dim(U ⊕ V ) = dim(U ) + dim(V )
If w ∈ U ⊕ V , then w = u1+ v1, for some u1 ∈ U and v1 ∈ V If one can also write
w = u2+ v2, where u1 6= u2∈ U and v1 6= v2∈ V , then u1− u2 = v2− v1 is a non-zerovector belonging to both U and V , which is not possible by the definition of direct sum.This shows:
Theorem 2.5 Any vector w ∈ U ⊕ V can be written uniquely as w = u + v, where u ∈ Uand v ∈ V
More generally, given vector spaces Vi, i = 1, , n, such that Vi∩ Vj = {0} for all
i 6= j, we have
dim(V1⊕ V2⊕ · · · ⊕ Vn) = dim(V1) + dim(V2) + · · · + dim(Vn)
That is, the dimension of the direct sum is simply the sum of individual dimensions.Theorem 2.5 thus implies that any vector w ∈ (V1⊕ · · · ⊕ Vn) can be written uniquely as
w = v1+ · · · + vn, where vi ∈ Vi, i = l, , n
2.3 Orthogonal Basis Vectors
A set of vectors is an orthogonal set if all vectors in this set are mutually orthogonal;
an orthogonal set of unit vectors is an orthonormal set If a vector v is orthogonal to
u1, , uk, then v must be orthogonal to any linear combination of u1, , uk, and hencethe space spanned by these vectors
A k-dimensional space must contain exactly k mutually orthogonal vectors Given
a k-dimensional space V , consider now an arbitrary linear combination of k mutuallyorthogonal vectors in V :
a1u1+ a2u2+ · · · + akuk= 0
Trang 15Taking inner products with ui, i = 1, , k, we obtain
S⊥ := {v ∈ V : v · s = 0 for every s ∈ S}
Thus, S and S⊥ are orthogonal, and S ∩ S⊥ = {0} so that S ∪ S⊥ = S ⊕ S⊥ Clearly,
S ⊕ S⊥ is a subspace of V Suppose that V is n-dimensional and S is r-dimensionalwith r < n Let {u1, , un} be the orthonormal basis of V with {u1, , ur} being theorthonormal basis of S If v ∈ S⊥, it can be written as
v = a1u1+ · · · + arur+ ar+1ur+1+ · · · + anun
As v · ui= ai, we have a1 = · · · = ar= 0 and ai6= 0 for i = r + 1, , n Hence, any vector
in S⊥ can be expressed as a linear combination of orthonormal vectors ur+1, , un Itfollows that S⊥ is n − r dimensional and
dim(S ⊕ S⊥) = dim(S) + dim(S⊥) = r + (n − r) = n
That is, dim(S ⊕ S⊥) = dim(V ) This proves the following important result
Theorem 2.6 Let V be a vector space and S its subspace Then, V = S ⊕ S⊥
The corollary below follows from Theorem 2.5 and 2.6
Corollary 2.7 Given the vector space V , v ∈ V can be uniquely expressed as v = s + e,where s is in a subspace S and e is in S⊥
Trang 162.4 Orthogonal Projection of a Vector
Given two vectors u and v, let u = s + e It turns out that s can be chosen as a scalarmultiple of v and e is orthogonal to v To see this, observe that
That is, u can always be decomposed into two orthogonal components s and e, where s
is known as the orthogonal projection of u on v (or the space spanned by v) and e is thecomponent of u orthogonal to v For example, consider u = (a, b) and v = (1, 0) Then
u = (a, 0) + (0, b), where (a, 0) is the orthogonal projection of u on (1, 0) and (0, b) isorthogonal to (1, 0)
More generally, Corollary 2.7 shows that a vector u in an n-dimensional space V can
be uniquely decomposed into two orthogonal components: the orthogonal projection of
u onto a r-dimensional subspace S and the component in its orthogonal complement If
S = span(v1, , vr), we can write the orthogonal projection onto S as
u is the ”best approximation” of u in the sense that the distance between u and itsorthogonal projection onto S is less than the distance between u and any other vector inS
Trang 17Theorem 2.8 Given a vector space V with a subspace S, let s be the orthogonal projection
The inequality becomes an equality if and only if v = s 2
As discussed in Section 2.3, a set of linearly independent vectors u1, , uk can betransformed to an orthogonal basis The Gram-Schmidt orthogonalization procedure doesthis by sequentially performing orthogonal projection of each ui on previously orthogo-nalized vectors Specifically,
y = α` + βx, that ”best” fits the data points (yi, xi), i = 1, , n In the light of Theorem2.8, this objective can be achieved by computing the orthogonal projection of y onto thespace spanned by ` and x (or y∗ on the space spanned by x∗)
We first write y = ˆy + e = α` + βx + e, where e is orthogonal to ` and x, and henceˆ
y To find unknown α and β, note that
y · ` = α(` · `) + β(x · `),
Trang 19with the (i, j)th element aij Sometimes, aij is written as (A)ij The ith row of A is
ai·= (ai1, ai2, , aik), and the jth column of A is
Note that given two n × 1 vectors u and v, the inner product of u and v is defined as
u0v and the “outer” product of u and v is defined as uv0 which is a n × n matrix Somespecific matrices are defined as follows
Definition 3.1 (a) A becomes a scalar when n = k = 1; A reduces to be a row (column)vector when n = 1 (k = 1)
(b) A is a square matrix if n = k
(c) A is a zero matrix if aij = 0 for all i = 1, , n and j = 1, , k
(d) A is a diagonal matrix when n = k such that aij = 0 for all i 6= j and aii 6= 0 forsome is, i.e.,
Trang 20(e) A lower triangular matrix is a square matrix of the following form:
(f ) A symmetric matrix is a square matrix such that aij = aji for all i, j Similarly,
A = A0 if A is symmetric
Relationships between two matrices are defined as follows Let A = (aij) and B = (bij)
be two n1× k1 and n2× k2 matrices, respectively and h be a scalar
Definition 3.2 (a) When n1= n2, k1 = k2, A and B are equal if aij = bij for all i, j
(b) The sum of A and B is the matrix A + B = C with cij = aij+ bij for all i, j givenWhen n1= n2, k1 = k2 which is the conformable condition for matrix addition
(c) The scalar multiplication of A is the matrix hA = C with cij = haij for all i, j
(d) The product of A and B is the n×m matrix AB = C with cij = ai··b·j =P k 1
s=1aisbsj
for all i, j given k1 = n2which is the conformable condition for matrix multiplication.That is, cij is the inner product of the ith row of A and the jth column of B,hence the number of columns of A must be the same as the number of rows of B.Matrix C is also called the premultiplication of the matrix B by the matrix A or thepostmultiplication of the matrix A by B
Let A, B, and C denote matrices and h and k denote scalars Matrix addition andmultiplication have the following properties:
1 A + B = B + A, since (A + B)ij = aij + bij;
Trang 21Let A be an n × k matrix The transpose of A, denoted as A0, is the k × n matrixwhose ith column is the ith row of A That is, (A0)ij = (A)ji Clearly, a symmetricmatrix is such that A = A0 Matrix transposition has the following properties:
1 (αA)0 = αA0 for a scalar α;
Trang 22Let A be n × k; and B be m × r The Kronecker product of A and B is the nm × krmatrix:
It should be clear that the Kronecker product is not commutative: A ⊗ B 6= B ⊗ A Let
A and B be two n × k matrices, the Hadamard product (direct product) of A and B is thematrix A B = C with cij = aijbij
Theorem 3.1 Let A, B, and C be n × m matrices Then
Trang 23Theorem 3.2 Let A and B be n × m matrices Then
rank(A B) ≤ rank(A)rank(B)
Proof: See Schott (1997) page 266
Theorem 3.3 Let A and B be n × m matrices Then
(a) A B is nonnegative definite if A and B are nonnegative definite,
(b) A B is positive definite if A and B are positive definite
Proof: See Schott (1997) page 269
3.2 Matrix Scalar Functions
Scalar functions of a matrix summarize various characteristics of matrix elements Animportant scalar function is the determinant function Formally, the determinant of asquare matrix A, n × n, denoted as det(A), is the sum of all signed elementary productsfrom A That is
det(A) = X(−1)f (i1 , ,i n )a1i 1a2i 2· · · anin
= X(−1)f (i1 , ,i n )ai 1 1ai 2 2· · · ainn
where the summation is taken over all permutations, (i1, , in) of the set of integers(1, , n) and the function f (i1, , in) equals the number of transpositions necessary tochange (i1, , in) to (1, , n) For example,
Trang 24Note that the determinant produces all products of n terms of the elements of the matrix
A such that exactly one element is selected from each row and each column of A
An alternative expression for det(A) can be given in terms of the cofactors of A.Given a square matrix A, the minor of entry aij, denoted as Mij is the determinant ofthe submatrix that remains after the ith row and jth column are deleted from A Thenumber (−1)i+jMij is called the cofactor of entry aij, denoted as Cij The determinantcan be computed as follows; we omit the proof
Theorem 3.4 Let A be an n × n matrix Then for each 1 ≤ i ≤ n and 1 ≤ j ≤ n,det(A) = ai1Ci1+ ai2Ci2+ · · · + ainCin
This method is known as the cofactor expansion along the ith row Similarly, the cofactorexpansion along the jth column is
det(A) = a1jC1j + a2jC2j + · · · + anjCnj
In particular, when n = 2, det(A) = a11a22− a12a21 a well known formula for a 2 × 2matrix Clearly, if A contains a zero row or column, its determinant is zero Also notethat if the cofactors of a row or column are matched with the elements from a differentrow or column, this matrix has determinant zero
It is not difficult to verify that the determinant of a diagonal or triangular matrix isthe product of all elements on the main diagonal, i.e., det(A) =Q
iaii Let A be an n × nmatrix and h a scalar The determinant has the following properties:
1 det(A) = det(A0);
2 If A∗is obtained by multiplying a row (column) of A by h, then det(A∗) = hdet(A);
3 If A∗ is obtained by multiplying each element of A by h, then det(A∗) = hndet(A);
4 If A∗ is obtained by interchanging two rows (columns) of A, then det(A∗) =
...3.2 Matrix Scalar Functions
Scalar functions of a matrix summarize various characteristics of matrix elements Animportant scalar function is the determinant function Formally, the... n = 2, det(A) = a11a22− a12a21 a well known formula for a × 2matrix Clearly, if A contains a zero row or column, its determinant is zero Also notethat... terms of the cofactors of A.Given a square matrix A, the minor of entry aij, denoted as Mij is the determinant ofthe submatrix that remains after the ith row and