RULES FOR COMPUTING DERIVATIVES

In Theorem 1.8.1, we consid- ered writing f and g as f and g, since some of the computations only make sense for vectors: the dot product f ãgin part 7, for example. We did not do so partly to avoid heavy notation. In practice, you can compute without worry- ing about the distinction.

We discussed in Remark 1.5.6 the importance of limiting the domain to an open subset.

In part 4 (and in part 7, on the next page) the expression

f,g:U-+Rm

is shorthand for f : U --+ Rm and g: U-+ Rm.

This section gives rules that allow you to differentiate any function that is givens by a formula. Some are grouped in Theorem 1.8.1; the chain rule is discussed in Theorem 1.8.3.

Theorem 1.8.1 (Rules for computing derivatives). Let U C Rn be open.

1. If f : U --+ Rm is a constant function, then f is differentiable, and its derivative is (O] (it is the zero linear transformation Rn - 4 Rm, represented by them x n matrix tilled with O's).

2. If f : Rn - 4 Rm is linear, then it is differentiable everywhere, and its derivative at all points a is f, i.e., (Df(a))v = f(v).

3. If fi, ... , fm: U - 4 Rare m scalar-valued functions differentiable at a, then the ""°tor-valued mappfog f = ( X ) ' U ~ JRm is

differentiable at a, with derivative

[ (Dfi(a))V l

[Df(a))v = : [Dfm(a))v

1.8.1 Conversely, if f is differentiable at a, each fi is differentiable at a, and [Dfi(a)] = [Difi(a), ... , Dnfi(a)].

4. If f, g: U - 4 Rm are differentiable at a, then so is f + g, and [D(f + g)(a)] = [Df(a)) + [Dg(a)). 1.8.2

FIGURE 1.8. l.

Gottfried Leibniz (1646-1716) Parts 5 and 7 of Theorem 1.8.1 are versions of Leibniz's rule.

Leibniz (or Leibnitz) was both philosopher and mathematician.

He wrote the first book on calculus and invented the notation for derivatives and integrals still used today. Contention between Leib- niz and Newton for who should get credit for calculus contributed to a schism between English and con- tinental mathematics that lasted into the 20th century.

Equation 1.8.3: Note that the terms on the right belong to the indicated spaces, and therefore the whole expression makes sense; it is the sum of two vectors in Rm, each of which is the product of a vector in Rm and a number. Note that (Df(a)]v is the product of a line matrix and a vector, hence it is a number.

Proof, part 2 of Theorem 1.8.1:

We used the variable v in the statement of the theorem, because we were thinking of it as a ran- dom vector. In the proof the variable naturally is an increment, so we use h, which we tend to use for increments.

138 Chapter 1. Vectors, matrices, and derivatives

5. If f : U ----> IR and g : U ----> !Rm are differentiable at a, then so is

Jg, and the derivative is given by

[D(fg)(a)]v =/(a) [Dg(a)]v .._.,..,____ + ' - . - ' ([D/(a)]v) g(a). .._.,.., 1.8.3

R ]Rm JR Rm

6. If f : U ----> IR and g : U ----> !Rm are differentiable at a, and /(a) =/:-0, then so is g/ f, and the derivative is given by

[D (~) ( )]- = [Dg(a)]v _ ([D/(a)]v) (g(a))

f a v /(a) (f(a))2 . 1.8.4

7. If f, g : U ----> !Rm are both differentiable at a, then so is the dot product f ã g: U----> IR, and (as in one dimension)

[D(f ã g)(a)]v = [Df(a)]v ã g(a) + f(a) ã [Dg(a)]V. 1.8.5

..__._... .._.,.., ..._, ----

We saw in Corollary 1.5.31 that polynomials are continuous, and rational functions are continuous wherever the denominator does not vanish. It follows from Theorem 1.8. l that they are also differentiable.

Corollary 1.8.2 (Differentiability of polynomials and rational functions).

l. Any polynomial function !Rn ----> IR is differentiable on all of !Rn.

2. Any rational function is differentiable on the subset of !Rn where the denominator does not vanish.

Proof of Theorem 1.8.1. Proving most parts of Theorem 1.8.1 is straightforward; parts 5 and 6 are a bit tricky.

l. If f is a constant function, then f(a+h) = f(a), so the derivative [Df(a)]

is the zero linear transformation:

1 - - ) 1 - -

lim-=;--(f(a+h)-f(a)- [O]h =lim-=;--0=0, ii-.o lhl ..._, ii-.o lhl

[Df(a)]h

1.8.6

where [OJ is a zero matrix.

2. Since f is linear, f(a + h) = f(a) + f(h), so

1( - -) -

lim --=-- f(a + h) - f(a) - f(h) = 0.

ii-o lhl

1.8.7 It follows that [Df(a)] = f, i.e., for every h E !Rn we have [Df(a)]h = f(h). 3. The assumption that f is differentiable can be written

. __!_ ( ( fi(a_+ h) ) - ( fi~a) ) - [ [Dfi_(a)]h l )- [~]

hm _ . . . - . 1.8.8

ii-.o lhl /m(a. + h) fm0(a) [Df~(a)]h 0

In the second line of equation 1.8.11 we added and subtracted f(a)g(a+ii); to go from the second equality to the third, we added and subtracted

g(a + ii)[Df(a)]ii.

How did we know to do this? We began with the fact that f and g are differentiable; that's really all we had to work with, so clearly we needed to use that information.

Thus we wanted to end up with something of the form

f(a +ii) - f(a) - [Df(a)]ii and

g(a +ii) - g(a) - [Dg(a)]ii.

So we looked for quantities we could add and subtract in order to arrive at something that includes those quantities. (The ones we chose to add and subtract are not the only possibilities.)

The assumption that Ji, ... , fm are differentiable can be written

1.8.9

The expressions on the left sides are equal by Proposition 1.5.25.

4. Functions are added point by point, so we can separate out f and g:

(f + g)(a + h) - (f + g)(a) - ([Df(a)) + [Dg(a)j)h 1.8.10

= (f(a + h) - f(a) - [Df(a)Jh) + (g(a + h) - g(a) - [Dg(a)Jh).

Now divide by lhl, and take the limit as Jhl --+ 0. The right side gives

0 + 0 = 0, so the left side does too.

5. We need to show both that Jg is differentiable and that its derivative is given by equation 1.8.3. By part 3, we may assume that m = 1 (i.e., that g = g is scalar valued). Then

-[D(fg)(a)]h, according to Thm. 1.8.1

!im _ _;_ (Ug)(a + h) - (fg)(a) -/(a)([Dg(a))h) - ([D/(a))h)g(a))

h--+O Jhl

= !im _.;_ (f(a + h)g(a + h) -f(a)g(a + h) + f(a)g(a + h)-f(a)g(a)

h--+O lhl

- ([D/(a)Jh)g(a) - (!Dg(a)Jh)/(a))

= !im _.;_ ((f(a + h) - f(a)) (g(a + h)) + f(a) (g(a + h) - g(a))

h--+O JhJ

- ([D/(a)Jh)g(a) - ([Dg(a)Jh)f(a)) 1.8.11 l. (/(a+ h) - /(a) - [D/(a))h) ( h-)

= im _ g a+

ii--+O JhJ

+ !im_(g(a + h) - g(a)) [D/~a))h

h--+0 lhl

(g(a + h) - g(a) - [Dg(a)Jh)

+ lim /(a) _

h--+0 !hi 0+0+0= 0.

By Theorem 1.5.29, part 5, the first limit is 0: by the definition of the derivative off, the first factor has limit 0, and the second factor g(a + h) is bounded in a neighborhood of h = O; see the margin.

140 Chapter 1. Vectors, matrices, and derivatives

We again use Theorem 1.5.29, part 5, to show that the second limit is 0: g(a + h) - g(a) goes to 0 as ii ---+ 0 (since g is differentiable, hence continuous, at a) and the second factor [D~~~)Jii is bounded, since

[D/(a))h lhl

Prop. 1.4.11 -

'? i[D/(a)JI :~: = l[D/(a)JI. 1.8.12 The third limit is 0 by the definition of the derivative of g (and the fact that /(a) is constant).

6. Applying part 5 to the function 1/ f, we see that it is enough to show that

ID(~) f ( )]-a v = _ [D/(a)]v (f(a))2 . 1.8.13 This is seen as follows:

1 ( 1 1 [D/(a)]h) 1 (/(a) - f(a + h) [D/(a)]h) lhl /(a+ h) - f(a) + (/(a) )2 = lhl /(a+ h)/(a) + (/(a) )2

= ~ (/(a) - /(a+ ii)+ [D/(a)]ii) _ : (/(a) - f(a +ii) _ /(a) - !(a+ h))

lhl (/(a))2 lhl (/(a))2 /(a+ h)(/(a))

= 1 (/(a) - f(a + h) + [D/(a)]h) __ 1_ /(a) - f(a + h) (-1- _ 1 )

(/(a))2 lhl /(a) lhl /(a) /(a+ h) .

Equation 1.8.15: The second equality uses part 4: fãg is the sum of the figi, so the derivative of the sum is the sum of the derivatives.

The third equality uses part 5.

Exercise 1.8.6 sketches a more conceptual proof of parts 5 and 7.

limit as h->0 is 0 by def. of deriv. bounded limit as h->O is o

To see why we label one term in the last line as bounded, note that since f is differentiable,

1. (f(a+ii)-f(a)

Im -

ii-o lhl

[D/(a)](h) 1ii1

...___._.,

bounded; see eq. 1.8.12

) =0. 1.8.14

If the first term in the parentheses were unbounded, then the sum would be unbounded, so the limit could not be zero.

7. We do not need to prove that f ã g is differentiable, since it is the sum of products of differentiable functions, thus differentiable by parts 4 and 5.

So we just need to compute the Jacobian matrix:

- dot prod. def. of ( n ) - (4 ) n -

[D(f ã g)(a)]h = [n 8 figi (a) J h = 8[D(figi)(a))h

(5) n

2: ([Dfi(a)]h)gi(a) + /i(a) ([Dgi(a)]ii) 1.8.15

i=l def. of

dot prod. ([ = Df a ( )] -) h ãg a ( ) + f(a) ã ([Dg(a)]h). D

Some physicists claim that the chain rule is the most important theorem in all mathematics.

For the composition f o g to be well defined, the codomain of g and the domain of f must be the same (i.e., V in the theorem). In terms of matrix multiplication, this is equivalent to saying that the width of [Df(g(a))] must equal the height of [Dg(a)] for the multiplication to be possible.

One motivation for discussing the relationship between matrix multiplication and composition of linear transformations at the be- ginning of this chapter was to have these tools available now. In co- ordinates, and using matrix multiplication, the chain rule states that

Di(f o g)i(a)

= L Dkfi ( g(a)) Djgk(a).

k=l

As a statement, this form of the chain rule is a disaster: it turns a fundamental, transparent statement into a messy formula, the proof of which seems to be a com- putational miracle.

One rule for differentiation is so fundamental that it deserves a subsection of its own: the chain rule, which states that the derivative of a composition is the composition of the derivatives. It is proved in Appendix A4.

Theorem 1.8.3 (Chain rule). Let UC Rn, V C Rm be open sets, let g : U -+ V and f : V -+ JR.P be mappings, and let a be a point of U. If g is differentiable at a and f is differentiable at g(a), then the composition f o g is differentiable at a, and its derivative is given by

[D(f o g)(a)] = [Df (g(a))] o [Dg(a)].

The chain rule is illustrated by Figure 1.8.2.

[Df(b)J[Dg(a)]w =

[Dg(a)](w)

[D(f o g)(a)]w _

[Df(b)][Dg(a)]v = [D(f o g)(a)]v

1.8.16

FIGURE 1.8.2. The function g maps a point a E U to a point g(a) E V. The function f maps the point g(a) = b to the point f(b). The derivative of g maps the vector v to [Dg(a)](v). The derivative off o g maps v to [Df(b)][Dg(a)]v, which, of course, is identical to [Df (g(a))][Dg(a)]V.

In practice, when using the chain rule, most often we represent linear transformations by their matrices, and we compute the right side of equation 1.8.16 by multiplying the matrices together:

[D(f o g)(a)] = [Df (g(a))][Dg(a)]. 1.8.17 Remark. Note that the equality (f o g)(a) = f(g(a)) does not mean you can stick a D in front of each and claim that [D(f o g)(a)] equals [Df(g(a))]; this is wrong. Remember that [D(f o g)(a)] is the derivative of fog evaluated at the point a; [Df(g(a))] is the derivative off evaluated at the point g(a). 6

Example 1.8.4 (The derivative of a composition). Define g : JR. -+ JR.3 and f : JR.3 -+ JR by

f m ~ x' +y2 + z'; g(t) ~ (::) . 1.8.18

Note that the dimensions are right: the composition f o g goes

from JR to JR, so its derivative is a number.

142 Chapter 1. Vectors, matrices, and derivatives

The dedvative off i' the 1 x 3 matiis [ D f G) ] ~ [2", 2y, 2z); evaluated at g(t) it is [2t, 2t2 , 2t3 ]. The derivative of g at t is [Dg(t)] = [ 3tit 2 l ã so

[D(fog)(t)] = [D/(g(t))][Dg(t)] = [2t,2t2,2t3 ] [ 3t;t 2 l = 2t+4t3+6t5 .

In this case it is also easy to compute the derivative of the composition directly: since (! o g) ( t) = t2 + t4 + t6 , we have

[D(f o g)(t)] = (t2 + t4 + t6 )' = 2t + 4t3 + 6t5 • 6. 1.8.19

Example 1.8.5 (Composition of a function with itself). Let U l3 ~ R3 be given by f G) ~ ( ~; 3 )- What is the derivative of f o f, evaluated at ( : ) ? We need to compute

1.8.20

We use~~ tffioe, evaluating it at m and at f m (-D

Sinoe [ Df G) l [ ~ ; n equation 1.8.20 give.

[D(fof)m l [-~ ~ ~] rn ~ ~] ~ rn ~ -~l ã 1821

Now compute the derivative of the composition directly and check in the footnote below. 24 6.

Example 1.8.6 (Composition of linear transformations). Here is a case where it is easier to think of the derivative as a linear transformation

24Since f of ( : ) = f ( xyz~ 3) = ( xyz2 ~y;z 2 - 3) , the derivative of the

z 2y 2xy-6

compositionat ( : ) i s [y~ 2 : :2 2xyz0-6z];at (~),it is[~ ~ -~]ã

z 2y 2x 0 1 2 2 0

Example 1.8.6: Equation 1.7.46 says that the derivative of the

"squaring function" f is [Df(A)]H =AH+ HA.

Equation 1.7.52 says that the derivative of the inverse function for matrices is

[DJ(A)]H = -A-1 HA-1 . In the second line of equation 1.8.22, g(A) = A-1 plays the role of A, and -A-1 HA-1 plays the role of H.

Notice the interesting way this result is related to the one-variable computation: if f(x) = x-2 , then J'(x) = -2x-3 . Notice also how much easier this computation is, using the chain rule, than the proof of Proposition 1.7.18, without the chain rule.

Exercise 1.32 asks you to compute the derivatives of the maps A>-+ A-3 and A>-+ A-n.

than as a matrix, and it is easier to think of the chain rule as concerning a composition of linear transformations rather than a product of matrices.

If A and Haren x n matrices, and f(A) = A2 and g(A) = A-1 , then (! o g) (A) = A-2 • Below we compute the derivative of f o g:

[D(f o g)(A)]H = [D/(g(A))] [Dg(A)]H = [D/(A-1)] (-A-1 HA-1 )

...._.., ' - - v - " . . . _ , , , . _ . . .

chain rule new increment eq. 1.7.52

H for DJ

...._.., A-1(-A-1HA-1 ) + (-A-1HA-1)A-1

eq. 1.7.46

1.8.22

EXERCISES FOR SECTION 1.8

1.8.1 Let f : R3 --+ R, f : R2 --+ R3 , g : R2 --+ R, g : R3 -+R2 be differentiable functions. Which of the following compositions make sense? For each that does, what are the dimensions of its derivative?

a. fog d. fog

b. gof e. f of

c. gof f. f 0 f

1.8.2 a. Giwn tho functioM f ( ~)

ffi [D(gof) m ]1 ~ x' +i/ +2z' and g(t) ~ (~),what

b. Lot f G) ~ ( ''.~ z) ~d g ( : ) ~ a' + b'. WhatIB tho dorivativo of gofat G)1

1.8.3 Is f (:) = sin( e"'Y) differentiable at ( 8)?

1.8.4 a. What compositions can you form using the following functions?

i.f ( : ) =x2+y2 ii.g(b) =2a+b2 iii.f(t)= (;t) iv.g(~) = (~~~)

z f filny

b. Compute these compositions.

c. Compute their derivatives, both by using the chain rule and directly from the compositions.

1.8.5 The following "proof" of part 5 of Theorem 1.8.l is correct as far as it goes, but it is not a complete proof. Why not?

Exercise 1.8.10, hint for part b:

What is the "partial derivative of f with respect to the polar angle (}"?

144 Chapter 1. Vectors, matrices, and derivatives

"Proof": By part 3 of Theorem 1.8.1, we may assume that m = 1 (i.e., that g = g is scalar valued). Then

Jaco bi an matrix of f g

[DJ g(a)]h = [(Difg)(a), ... , (Dnfg)(a)] h

= [f(a)(D1g)(a) + (D1f)(a)g(a), ... , f(a)(Dng)(a) + (Dnf)(a)g(a)] h

in one variable, (fg)'=fg'+f'g

= f(a)[(D1g)(a), ... , (Dng)(a)]h + [(D1f)(a), ... , (Dnf)(a)]g(a)h

Jacobian matrix of g Jacobian matrix off

= f(a)([Dg(a)Jh) + (lDf(a)Jh)g(a).

1.8.6 a. Prove the rule for differentiating dot products (part 7 of Theorem 1.8.1) directly from the definition of the derivative.

b. Let U E R3 be open. Show by a similar argument that if f, g : U --+ R3 are both differentiable at a, then so is the cross product f x g : U --+ R3 . Find the formula for this derivative.

1.8. 7 Consider the function

What is the derivative of the function t,... f ( 1(t))?

1.8.8 True or false? Justify your answer. If f : R2 --+ R2 is a differentiable function with

f ( 8) = 0) and [ Df ( 8)] = D n '

there is no differentiable mapping g: R2 --+ R2 with

1.8.9 Let '{): R--+ R be any differentiable function. Show that the function

satisfies the equation

~Dif ( ~) + ~D2f ( ~) = :2 f ( ~).

1.8.10 a. Show that if a function f : R2 --+ R can be written '{J(x2 + y2 ) for some differentiable function '{) : R --+ R, then it satisfies

xD2f - yDif = 0.

--+

*b. Show the converse: every function satisfying xD2f - yDif = 0 can be written '{J(x2 + y2 ) for some function'{): R--+ R.

1.8.11 Show that if f ( ~) = r,o (~ ~ t) for some differentiable function r,o : JR --+ JR, then

xDif + yD2f = 0.

1.8.12 True or false? Explain your answers.

a. If f : JR2 --+ JR2 is differentiable and [Df(O)] is not invertible, then there is no differentiable function g : JR2 --+ JR2 such that

(go f)(x) = x.

b. Differentiable functions have continuous partial derivatives.

1.8.13 Let U C Mat ( n, n) be the set of matrices A such that A + A 2 is invertible. Compute the derivative of the map F : U --+ Mat (n, n) given by F(A) =(A+ A2 )- 1 .

THE MEAN VALUE THEOREM AND CRITERIA FOR

THE MAIN ALGORITHM: ROW REDUCTION