INTRODUCING THE ACTORS: MATRICES

When a matrix is described, height is given first, then width:

an m x n matrix is m high and n wide; an n x m matrix is n high and m wide. After struggling for years to remember which goes first, we hit on a mnemonic: first take the elevator, then walk down the hall.

How would you add the matrices

[ 1 2 5] 0 2 3 and [ 1 2] 0 2 . ? You can't: matrices can be added only if they have the same height and same width.

Probably no other area of mathematics has been applied in such nu- merous and diverse contexts as the theory of matrices. In mechanics, electromagnetics, statistics, economics, operations research, the so- cial sciences, and so on, the list of applications seems endless. By and large this is due to the utility of matrix structure and method- ology in conceptualizing sometimes complicated relationships and in the orderly processing of otherwise tedious algebraic calculations and numerical manipulations.-James Cochran, Applied Mathematics:

Principles, Techniques, and Applications The other central actor in linear algebra is the matrix.

Definition 1.2.1 (Matrix). An m x n matrix is a rectangular array of entries, m high and n wide. We denote by Mat ( m, n) the set of m x n matrices.

We use capital letters to denote matrices. Usually our matrices will be arrays of numbers, real or complex, but matrices can be arrays of polyno- mials, or of more general functions; a matrix can even be an array of other matrices. A vector v E !Rm is an m x 1 matrix; a number is a 1 x 1 matrix.

Addition of matrices and multiplication by a scalar work in the obvious way:

Example 1.2.2 (Addition of matrices; multiplication by scalars).

[l OJ [O ~ -~ + ; -31 -i [l ; -31 -~ and 2 [ -~

So far, it's not clear why we should bother with matrices. What do we gain by talking about the 2 x 2 matrix [ ~ ~] rather than the point

( D E JR'? The answe' is that the mat,ix fom>at allows us to perlorm

"When Werner Heisenberg dis- covered 'matrix' mechanics in 1925, he didn't know what a matrix was (Max Born had to tell him), and neither Heisenberg nor Born knew what to make of the appearance of matrices in the context of the atom."-Manfred R.

Schroeder, "Number Theory and the Real World," Mathematical Intelligencer, Vol. 7, No. 4.

The entry a;,j of a matrix A is the entry at the intersection of the ith row and jth column; it is the entry you find by taking the elevator to the ith floor (row) and walking down the corridor to the jth room.

One student objected that here we speak of the ith row and jth column, but in Example 1.2.6 we write that "the ith column of A is Ae;." Do not assume that i has to be a row and j a column. "The ith column of A is Ae;'' is just a convenient way to say "The first column of A is Ae1, the second column is Ae2 , etc." The thing to remember is that the first index of a;,j refers to the row, and the second to the column; thus a3,2

corresponds to the entry of the third row, second column.

Another reader pointed out that sometimes we write "n x m matrix" and sometimes "m x n".

Again, which is n and which ism doesn't matter, and we could just as well write s x v. The point is that the first letter refers to height and the second to width.

another operation: matrix multiplication. We will see in Section 1.3 that every linear transformation corresponds to multiplication by a matrix, and (Theorem 1.3.10) that composition of linear transformations corresponds to multiplying together the corresponding matrices. This is one reason matrix multiplication is a natural and important operation; other important applications of matrix multiplication are found in probability theory and graph theory.

Matrix multiplication is best learned by example. The simplest way to compute AB is to write B above and to the right of A. Then the product AB fits in the space to the right of A and below B, the (i,j)th entry of AB being the intersection of the ith row of A and the jth column of B, as shown in Example 1.2.3. Note that for AB to exist, the width of A must equal the height of B. The resulting matrix then has the height of A and the width of B.

Example 1.2.3 (Matrix multiplication). The first entry of the first row of AB is obtained by multiplying, one by one, the entries of the first row of A by those of the first column of B, and adding these products together: in equation 1.2.1, (2 x 1) + (-1 x 3) = -1. The second entry of the first row is obtained by multiplying the first row of A by the second column of B: (2 x 4) + (-1 x 0) = 8. After multiplying the first row of A by all the columns of B, the process is repeated with the second row of A:

(3 x 1) + (2 x 3) = 9, and so on:

Given the matrices

A=[~ ~] B=[~ ~]

[~ -~]

-.,._..

-1 0

4 0 8 12

-~]

-~J 1.2.1

-6] -2

what are the products AB, AC and CD? Check your answers in the footnote below.1 Now compute BA. What do you notice? What if you try to compute CA ?2

Next we state the formal definition of the process we've just described.

If the indices bother you, refer to Figure 1.2.1.

i AB= O 5 ; AC= 5 -2 -1 ; [O 1] [ 1 -1 1] CD = t 0 -1] 0 -1 .

2Matrix multiplication is not commutative; BA = ; ~] i= AB = [ ~

Although the product AC exists, CA does not.

Definition 1.2.4: Note that the summation is over the inner index k of a;,kbk,j.

Definition 1.2.4 says nothing new, but it provides some prac- tice moving between the concrete (multiplying two particular matrices) and the symbolic (expressing this operation so that it applies to any two matrices of appropri- ate dimensions, even if the entries are complex numbers or functions, not real numbers.) In linear algebra one is constantly moving from one form of representation to another. For example, as we have seen, a point b in nr can be con- sidered as a single entity, b, or as the ordered list of its coordinates;

a matrix A can be thought of as a single entity, A, or as the rectangular array of its entries.

44 Chapter 1. Vectors, matrices, and derivatives

Definition 1.2.4 (Matrix multiplication). If A is an m x n matrix whose ( i, j)th entry is ai,j, and B is an n x p matrix whose ( i, j)th entry is bi,j, then C = AB is the m x p matrix with entries

m i

ci,j = L ai,kbk,j

k=l

~---~

'---- --- --- --- -- ---- n

--- ã---ã--,

jth col.

ith row

1.2.2

FIGURE 1.2.1. The entry Ci,j of the matrix C =AB is the sum of the products of the entries a;,k of the matrix A and the corresponding entry bk,j of the matrix B. The entries a;,k are all in the ith row of A; the first index i is constant, and the second index k varies. The entries bk,j are all in the jth column of B; the first index k varies, and the second index j is constant. Since the width of A equals the height of B, the entries of A and those of B can be paired up exactly.

Remark. Often matrix multiplication is written in a row: [A][B] = [AB]. The format shown in Example 1.2.3 avoids confusion: the product of the ith row of A and the jth column of B lies at the intersection of that row and column. It also avoids recopying matrices when doing repeated multi- plications. For example, to multiply A times B times C times D we write

[ B ]

[(AB)]

[ C ] [ D l

[(AB)C] [(ABC)D]

Noncommutativity of matrix multiplication

1.2.3

As we saw earlier, matrix multiplication is not commutative. It may well be possible to multiply A by B but not B by A. Even if both matrices have

A AB

FIGURE 1.2.2.

The ith column of the product AB depends on all the entries of A but only the ith column of B.

A AB

FIGURE 1.2.3.

The jth row of the product AB depends on all the entries of B but only the jth row of A.

the same number of rows and columns, AB will usually not equal BA, as shown in Example 1.2.5.

Example 1.2.5 (Matrix multiplication is not commutative).

is not equal to [~ n

[~ ~J [~ n

Multiplying a matrix by a standard basis vector

1.2.4

Multiplying a matrix A by the standard basis vector e i selects out the ith column of A, as shown in the following example. We will use this fact often.

Example 1.2.6 (The ith column of A is Aei)ã Below, we show that the second column of A is Ae2:

corresponds to . D,

Similarly, the ith column of AB is Abi, where bi is the ith column of B, as shown in Example 1.2.7 and represented in Figure 1.2.2. The jth row of AB is the product of the jth row of A and the matrix B, as shown in Example 1.2.8 and Figure 1.2.3.

Example 1.2.7 (The ith column of AB is Abi). The second column of the product AB is the product of A and the second column of B:

B b2

,....-...

[ 3 1 4 0 -;J [~]

1.2.5

[; -;J [-~ 12 -2 8 -6] [; -;J [ 182]

-....-.- -....-.- .._,,,_,

A AB A Ab2

FIGURE 1.2.4.

Arthur Cayley (1821-1895) in- troduced matrices in 1858.

Cayley worked as a lawyer un- til 1863, when he was appointed professor at Cambridge. As professor, he "had to manage on a salary only a fraction of that which he had earned as a skilled lawyer. However, Cayley was very happy to have the chance to devote him- self entirely to mathematics." - From a biographical sketch by J.

J. O'Connor and E. F. Robertson.

For more , see the MacTutor His- tory of Mathematics archive at http://www-history

.mcs.st-and.ac. uk/history /

In his 1858 article on matrices, Cayley stated that matrix multiplication is associative but gave no proof. The impression one gets is that he played around with matrices (mostly 2 x 2 and 3 x 3) to get some feeling for how they behave, without worrying about rigor. Concerning another matrix result, Theorem 4.8.27 (the Cayley-Hamilton theorem) he ver- ified it for 2 x 2 and 3 x 3 matrices, and stopped there.

46 Chapter 1. Vectors, matrices, and derivatives

Example 1.2.8. The second row of the product AB is the product of the second row of A and the matrix B:

[ ~ 4 0 -;J B

[ 1 3 4 0 -;J 1.2.6

[~ -;J [-! 12 -2 8 -6] [3 2] 9 12 -2]

A AB

Matrix multiplication is associative

When multiplying the matrices A, B, and C, we could set up the repeated multiplication as we did in equation 1.2.3, which corresponds to the product (AB)C. We can use another format to get the product A(BC):

[ c ]

[ B ] [ C ] [ A ] [ AB ] [ (AB)C ]

or [ B ] [ (BC) ] . 1.2.7 [ A ] [ A(BC)]

Is (AB)C the same as A(BC)? In Section 1.3 we see that associativity of matrix multiplication follows from Theorem 1.3.10. Here we give a computational proof.

Proposition 1.2.9 (Matrix multiplication is associative). If A is an n x m matrix, B an m x p matrix, and C a p x q matrix, so that (AB)C and A(BC) are both defined, then they are equal:

(AB)C = A(BC). 1.2.8

Proof. Figure 1.2.5 shows that the (i,j)th entry of both A(BC) and (AB)C depend only on the ith row of A and the jth column of C (but on all the entries of B). Without loss of generality we can assume that A is a line matrix and C is a column matrix (n = q = 1), so that both (AB)C and A(BC) are numbers. Now apply associativity of multiplication of numbers:

(AB)C = t (~akbk,l) c1

..._,,_..,

Ith entry of AB

= t~akbk,1C1 = ~ak (tbk,1ci) = A(BC).

' - . . . - - ' kth entry of BC

1.2.9 D

Exercise 1.2.24 asks you to show that matrix multiplication is distributive over addition:

A(B + C) = AB+ AC (B+C)A = BA+CA.

Not all operations are associative. For example, the operation that takes two matrices A, B and gives AB - BA is not associative.

The cross product, discussed in Section 1.4, is also not associative.

The main diagonal is also called the diagonal. The diagonal from bottom left to top right is the anti- diagonal.

The columns of the identity matrix In are, of course, the standard basis vectors ei, ... , en:

p c -1

q ,-~ã~-

c I IP I J B BC

L __

m B

, - - - -m

] A

n A

FIGURE 1.2.5. LEFT: This way of writing the matrices corresponds to calculating (AB)C. The ith row of AB depends on the ith row of A and the entire matrix B. RIGHT: This format corresponds to calculating A(BC). The jth column of BC depends on the jth column of C and the entire matrix B.

The identity matrix

The identity matrix I plays the same role in matrix multiplication as the number 1 in multiplication of numbers: I A = A = AI.

Definition 1.2.10 {Identity matrix). The identity matrix In is the n x n-matrix with l's along the main diagonal (the diagonal from top left to bottom right) and O's elsewhere.

Fo<example, 12 = [ ~ ~] and /3 = [ ~

If A is an n x m matrix, then

IA= AI =A, or, more precisely, 0 1 0

1.2.10 since if n # m one must change the size of the identity matrix to match the size of A. When the context is clear, we will omit the index.

Matrix inverses

The inverse A-1 of a matrix A plays the same role in matrix multiplication as the inverse l/a for the number a.

The only number with no inverse is 0, but many nonzero matrices do not have inverses. In addition, the noncommutativity of matrix multiplication makes the definition more complicated.

Definition 1.2.11 {Left and right inverses of matrices). Let A be a matrix. If there is a matrix B such that BA= I, then Bis a left inverse of A. If there is a matrix C such that AC = I, then C is a right inverse of A.

It is possible for a nonsquare matrix to have lots of left inverses and no right inverse, or lots of right inverses and no left inverse, as explored in Exercise 1.2.23.

We will see in Section 2.2 (dis- cussion following Corollary 2.2. 7) that only square matrices can have a two-sided inverse (i.e., an inverse). Furthermore, if a square matrix has a left inverse, then that left inverse is necessarily also a right inverse; if it has a right inverse, that right inverse is a left inverse.

While we can write the inverse of a number x either as x- 1 or as 1/x, giving x x-1 = x (1/x) = 1, the inverse of a matrix A is only written A-1 . We cannot divide by a matrix. If for two matrices A and B you were to write A/ B, it would be unclear whether this meant

Proposition 1.2.15: We are in- debted to Robert Terrell for the mnemonic, "socks on, shoes on;

shoes off, socks off" . To undo a process, you undo first the last thing you did:

(Jog)-1=g-1or1.

48 Chapter 1. Vectors, matrices, and derivatives

Example 1.2.12 (A matrix with neither right nor left inverse). The matrix [ ~ ~] does not have a right or a left inverse. To see this, assume it has a right inverse. Then there exists a matrix [ ~ ~] such that

[1 OJ 0 0 [a e d bJ = [1 OJ 0 1 ã 1.2.11

But that product is [ ~ ~ J, with 0 in the bottom right corner, not the required l. A similar computation shows that there is no left inverse. 6.

Definition 1.2.13 (Invertible matrix). An invertible matrix is a matrix that has both a left inverse and a right inverse.

Associativity of matrix multiplication gives the following result:

Proposition and Definition 1.2.14 (Matrix inverse). If a matrix A has both a left and a right inverse, then it has only one left inverse and one right inverse, and they are identical; such a matrix is called the inverse of A and is denoted A-1 .

Proof. If a matrix A has a right inverse B, then AB= I. If it has a left inverse C, then CA= I. So

C(AB) =CI= C and (CA)B = IB = B, so C = B. 0 1.2.12 We discuss how to find inverses of matrices in Section 2.3. A formula exists for 2 x 2 matrices: the inverse of

A=[~ ~J is A-1 -- ad-be -e 1 [ d -bJ a , 1.2.13 as Exercise 1.2.12 asks you to confirm. The formula for the inverse of a 3 x 3 matrix is given in Exercise 1.4.20.

Notice that a 2 x 2 matrix is invertible if ad - be =I= 0. The converse is also true: if ad - be = 0, the matrix is not invertible, as you are asked to show in Exercise 1.2.13.

Associativity of matrix multiplication is also used to prove the following:

Proposition 1.2.15 (Inverse of product of matrices). If A and B are invertible matrices, then AB is invertible, and the inverse is given by

1.2.14 Proof. The computation

(AB)(B-1 A-1 ) = A(BB-1)A-1 = AA-1 =I 1.2.15

If V ~ [ ~ l ã "' "an'pooe ;,

vT = [1 0 2].

[~ : ~]

A symmetric matrix If A is any matrix, then AT A is symmetric, as Exercise 1.2.16 asks you to show.

[ -~ ~ ;]

-2 -3 0 An antisymmetric matrix Symmetric and antisymmetric matrices are necessarily square.

[~ ~ ! ~]

An upper triangular matrix

and a similar one for ( s-1 A-1 ) (AB) prove the result. D Where does this use associativity? Check the footnote below.3

The transpose

Do not confuse a matrix with its transpose, and never write a vector hori- zontally. A vector written horizontally is its transpose; confusion between a vector (or matrix) and its transpose leads to endless difficulties with the order in which things should be multiplied, as you can see from Theorem 1.2.17.

Definition 1.2.16 (Transpose). The transpose AT of a matrix A is formed by interchanging the rows and columns of A, reading the rows from left to right, and columns from top to bottom.

FoHxample,if A ~ [ i ~ -n then AT ~ [ J n

The transpose of a single row of a matrix is a vector.

Theorem 1.2.17 (Transpose of product). The transpose of a prod- uct is the product of the transposes in reverse order:

1.2.16 The proof is straightforward and is left as Exercise 1.2.14.

Special kinds of matrices

Definition 1.2.18 (Symmetric and antisymmetric matrices). A symmetric matrix is equal to its transpose. An antisymmetric matrix is equal to minus its transpose.

Definition 1.2.19 (Triangular matrix). An upper triangular matrix is a square matrix with nonzero entries only on or above the main diagonal. A lower triangular matrix is a square matrix with nonzero entries only on or below the main diagonal.

3 Associativity is used for the first two equalities below:

D (EF) (DE) F

,,...,,.._,_..----.. ,._.--.. :?)

(AB) (B-1 A-1 ) = A ( B (B-1 A-1 )) = A((BB-1 ) A-1 = A(IA-1 ) =I .

.__,,._,~ ._,.,.__~---~---

(AB) C A (BC)

li n ~l

A diagonal matrix

A situation like this, with each outcome depending only on the one just before it, is called a Markov chain.

This kind of approach is useful in determining efficient storage.

How should a lumber yard store wood, to minimize time lost dig- ging out a particular plank from under others? How should the op- erating system of a computer store data most efficiently?

Sometimes easy access isn't the goal. In Zola's novel Au Bon- heur des Dames, the story of the growth of the first big department store in Paris, the hero places goods in the most inconvenient arrangement possible, forcing cus- tomers to pass through parts of the store where they otherwise wouldn't set foot, and which are mined with temptations for im- pulse shopping.

50 Chapter 1. Vectors, matrices, and derivatives

Exercise 1.3 asks you to show that if A and B are upper triangular n x n matrices, then so is AB. Exercise 2.4 asks you to show that a triangular matrix is invertible if and only if its diagonal entries are all nonzero.

Definition 1.2.20 {Diagonal matrix). A diagonal matrix is a square matrix with nonzero entries (if any) only on the main diagonal.

What happens if you square the matrix [ ~ ~]? If you cube it?4 Applications: probabilities and graphs

From the perspective of this book, matrices are most important because they represent linear transformations, discussed in the next section. But matrix multiplication has other important applications. Two good exam- ples are probability theory and graph theory.

Example 1.2.21 (Matrices and probabilities). Suppose you have three reference books on a shelf: a thesaurus, a French dictionary, and an English dictionary. Each time you consult one of these books, you put it back on the shelf at the far left. When you need a reference, we denote by P1 the probability that it will be the thesaurus, by P2 the probability that it will be the French dictionary, by P3 the probability it will be the English dictionary.

There are six possible arrangements on the shelf: 12 3, 13 2, and so on. We can write the following 6 x 6 transition matrix T, indicating the probability of going from one arrangement to another:

(12 3) (13 2) (213) (2 31) (312) (3 21)

(12 3) P1 0 P2 0 P3 0

(13 2) 0 P1 P2 0 P3 0

(213) P1 0 P2 0 0 P3

(231) P1 0 0 P2 0 P3

(312) 0 P1 0 P2 P3 0

(3 21) 0 P1 0 P2 0 P3

The move from (213) to (3 21) has probability P3 , since if you start with the order (213) (French dictionary, thesaurus, English dictionary), consult the English dictionary, and put it back to the far left, you will then have the order ( 3 2 1). So the entry at the 3rd row, 6th column is P3. The move from (213) to (312) has probability 0, since moving the English dictionary from third to first position won't change the position of the other books.

So the entry at the 3rd row, 5th column is 0.

Now say you start with the fourth arrangement, (2 31). Multiplying the line matrix [O 0 0 1 0 OJ (probability 1 for the fourth choice, 0 for the others)

4 [ ~ ~ r -[ ~ ~] ; [ ~ ~ r = [ ~3 ~] . We will see in Section 2. 7 how supremely important this observation is.

THE MEAN VALUE THEOREM AND CRITERIA FOR

THE MAIN ALGORITHM: ROW REDUCTION