The first-order approximation

In first-year calculus, we think of a function f as being differentiable at a point aif it has a good tangent line approximation:

f(x)≈`(x) when x is near a,

where y =`(x) =mx+b is the tangent line to the graph of f at the point (a, f(a)). See Figure 4.1.

Figure 4.1: Approximating a differentiable function f(x) by its tangent line `(x) In fact, the tangent line is given by:

`(x) =f(a) +f0(a)(x−a), (4.1)

som=f0(a) and b=f(a)−f0(a)ãa.

By definition, f0(a) = limx→af(x)−f(a)

x−a , assuming the limit exists. This can be rewritten as 0 = limx→a f(x)−f(a)

x−a −f0(a)

= limx→af(x)−f(a)−f0(a)(x−a)

x−a , or, using equation (4.1), as:

x→alim

f(x)−`(x)

x−a = 0. (4.2)

In particular, when x is near a, f(x)−`(x) must be much much smaller than x−a in order for

f(x)−`(x)

x−a to be near 0. Note thatf(x)−`(x) is the error in using the tangent line to approximate f, so, in order forf to be differentiable at a, not only must this error go to 0 asx approachesa, it must go to 0 much faster thanx−a. This is the principle we generalize in moving to functions of more than one variable.

Definition. A function`:Rn→Rm is called anaffine functionif it has the form`(x) =T(x)+b, where:

• T:Rn→Rm is a linear transformation and

• b is a vector inRm.

Let U be an open set in Rn and f:U → R a real-valued function. Provisionally, we’ll say that f is differentiable at a point a of U if there exists a real-valued affine function `:Rn → R,

`(x) =T(x) +b, such that:

x→alim

f(x)−`(x)

kx−ak = 0. (4.3)

This is the generalization of equation (4.2).

We can pin down the best choice of ` much more precisely. For instance, as ` is meant to approximate f near a, it should be true that f(a) = `(a), that is, f(a) = T(a) +b. Thus b = f(a)−T(a) and `(x) =T(x) +f(a)−T(a). Using the fact that T is linear, this becomes:

`(x) =f(a) +T(x−a).

This happens to have the same form as the tangent line (4.1), which seems like a good sign.

Substituting into equation (4.3), we say, still provisionally, that f is differentiable at a if there exists a linear transformation T:Rn→R such that:

x→alim

f(x)−f(a)−T(x−a)

kx−ak = 0. (4.4)

There is really only one choice for T as well. For, as a linear transformation, T is represented by a matrix, in this case, a 1 by n matrix A =

m1 m2 . . . mn

, where mj = T(ej) for j= 1,2, . . . , n. (See Proposition 1.7.) We shall determine the values of the entriesmj.

Suppose that x approaches ain the x1-direction, i.e., letx=a+he1 and let h go to 0. Then x−a=he1 and kx−ak=|h|. Assume for a moment that h > 0. In order for equation (4.4) to hold, it must be true that:

h→0lim

f(a+he1)−f(a)−T(he1)

h = 0;

or, using the linearity of T:

h→0lim

f(a+he1)−f(a)−hT(e1)

h = 0. (4.5)

4.1. THE FIRST-ORDER APPROXIMATION 85 If h <0, then the denominator in equation (4.4) is |h|=−h. The minus sign can be factored out and canceled, so equation (4.5) still holds. This can be solved for T(e1):

m1=T(e1) = lim

h→0

f(a+he1)−f(a) h

= lim

h→0

f(a1+h, a2, . . . , an)−f(a1, a2, . . . , an)

h .

This limit of a difference quotient is the definition of the ordinary one-variable derivative where x2, . . . , xn are held fixed at x2 = a2, . . . , xn = an, and only x1 varies. It is called the partial derivative ∂x∂f

1(a).

Similarly, by approaching a in the x2, . . . , xn directions, we find that m2 = ∂x∂f

2(a), . . . , mn =

∂f

∂xn(a) and hence that A=h

∂f

∂x1(a) ∂x∂f

2(a) . . . ∂x∂f

n(a)i .

Definition. For a real-valued functionf:U →Rdefined on an open setU inRn and a pointain U:

1. Ifj= 1,2, . . . , n, thepartial derivativeof f ata with respect toxj is defined by:

∂f

∂xj

(a) = lim

h→0

f(a+hej)−f(a)

h .

Note thata+hej = (a1, . . . , aj+h, . . . , an), soa+hej andadiffer only in thejth coordinate.

Thus the partial derivative is defined by the one-variable difference quotient for the derivative with variable xj.

Other common notations for the partial derivative arefxj(a) and (Djf)(a).

2. Df(a) is defined to be the 1 by n matrix Df(a) = h

∂f

∂x1(a) ∂x∂f

2(a) . . . ∂x∂f

n(a)i . The corresponding vector ∇f(a) = ∂x∂f

1(a),∂x∂f

2(a), . . . ,∂x∂f

n(a)

in Rn is called the gradient of f. (The intent of the notation might be clearer if these objects were denoted by (Df)(a) and (∇f)(a), respectively, but the extra parentheses are usually omitted.)

Example 4.1. Letf(x, y) =x3+ 2x2y−3y2, and let a= (2,1). To find ∂f∂x(a), we fix y= 1 and differentiate with respect to x. Since f(x,1) = x3 + 2x2 −3, this gives ∂f∂x(x,1) = 3x2 + 4x and then ∂f∂x(2,1) = 12 + 8 = 20.

Similarly, for the partial derivative ata with respect toy,f(2, y) = 8 + 8y−3y2, so ∂f∂y(2, y) = 8−6y and ∂f∂y(2,1) = 8−6 = 2.

In practice, this is not how partial derivatives are calculated usually. Instead, one works directly with the general formula for f. For instance, to find the partial derivative with respect to x, differentiate with respect to x thinking of all other variables—in this case, only y—as constant:

∂f

∂x = 3x2+ 4xy−0 = 3x2+ 4xy, so ∂f∂x(2,1) = 12 + 8 = 20 as before. Likewise, ∂f∂y = 2x2−6y, and

∂f

∂y(2,1) = 8−6 = 2.

In any case, Df(a) = 20 2

, and∇f(a) = (20,2).

We now have all the ingredients needed to state the formal definition of differentiability, as motivated by equation (4.4).

Definition. Let U be an open set in Rn, f:U → R a real-valued function, and a a point of U. Then f is said to bedifferentiable at a if:

x→alim

f(x)−f(a)−Df(a)ã(x−a) kx−ak = 0.

When this happens, the matrix Df(a) is called the derivative of f at a. It is also known as the Jacobian matrix. The affine function`(x) =f(a)+Df(a)ã(x−a) is called thefirst-order affine approximation of f ata. (Note the resemblance to the one-variable tangent line approximation (4.1).)

The product in the numerator of the definition is the matrix product:

Df(a)ã(x−a) =h

∂f

∂x1(a) ∂x∂f

2(a) . . . ∂x∂f

n(a)i







x1−a1 x2−a2

... xn−an





 .

It could also be written as a dot product: ∇f(a)ã(x−a).

The definition generalizes equation (4.2) for one variable in that it says thatf is differentiable at a if the error in using the first-order approximation `(x) goes to 0 faster than kx−ak as x approachesa. In particular, the first-order approximation is a good approximation. This is worth emphasizing: the existence of a good first-order approximation is often the most important way to think of what it means for a function to be differentiable. Note that the derivative Df(a) itself is no longer just a number the way it is in first-year calculus, but rather a matrix or, even better, the linear part of a good affine approximation of f neara.

Example 4.2. Let f:R2 →Rbef(x, y) =x2+y2, and leta= (1,2). Isf differentiable ata?

We calculate the various elements that go into the definition. First, ∂f∂x = 2x and ∂f∂y = 2y, so Df(x, y) =

2x 2y

and Df(a) =Df(1,2) = 2 4

. Also,f(a) = 5, and x−a= x−1

y−2

. Thus we need to check if:

(x,y)→(1,2)lim

x2+y2−5− 2 4

ã x−1

y−2 p(x−1)2+ (y−2)2 = 0.

Let’s simplify the numerator, completing the square at an appropriate stage:

x2+y2−5− 2 4

ã x−1

y−2

=x2+y2−5− 2(x−1) + 4(y−2)

=x2+y2−5−(2x−2 + 4y−8)

=x2−2x+y2−4y+ 5

= (x2−2x+ 1)−1 + (y2−4y+ 4)−4 + 5

= (x−1)2+ (y−2)2 Therefore:

(x,y)→(1,2)lim

x2+y2−5− 2 4

ã x−1

y−2

p(x−1)2+ (y−2)2 = lim

(x,y)→(1,2)

(x−1)2+ (y−2)2 p(x−1)2+ (y−2)2

= lim

(x,y)→(1,2)

p(x−1)2+ (y−2)2.

4.1. THE FIRST-ORDER APPROXIMATION 87 The function in this last expression is continuous, as it is a composition of sums and products of continuous pieces. Thus the value of the limit is simply the value of the function at the point (1,2).

In other words:

(x,y)→(1,2)lim

x2+y2−5− 2 4

ã x−1

y−2 p(x−1)2+ (y−2)2 =p

(1−1)2+ (2−2)2 = 0.

This shows that the answer is: yes,f is differentiable at a= (1,2).

In addition, the preceding calculations show that the first-order approximation of f(x, y) = x2+y2 ata= (1,2) is:

`(x, y) =f(a) +Df(a)ã(x−a) (4.6)

= 5 + 2 4

ã x−1

y−2

= 2x+ 4y−5. (4.7)

For instance, (x, y) = (1.05,1.95) is near a, and f(1.05,1.95) = 1.052 + 1.952 = 4.905 while

`(1.05,1.95) = 2(1.05) + 4(1.95)−5 = 4.9. The values are pretty close.

The same sort of reasoning can be used to show that, more generally, the function f(x, y) = x2+y2 is differentiable at every point ofR2. This is Exercise 1.16.

Example 4.3. Let f(x, y) = ( xy

x2+y2 if (x, y)6= (0,0),

0 if (x, y) = (0,0) . Its graph is shown in Figure 4.2. Is f differentiable ata= (0,0)?

Figure 4.2: The graphz= x2xy+y2

Using the definition requires knowing the partial derivatives at (0,0). We could apply the quotient rule to the formula that definesf, but the result would be valid only when (x, y)6= (0,0), which is precisely what we don’t need. So instead we go back to the basic idea behind partial derivatives, namely, that they are one-variable derivatives where all variables but one are held constant. For example, to find ∂f∂x(0,0), fix y = 0 in f(x, y) and look at the resulting function of x. By the formula for f, f(x,0) = x2xã0+02 = 0 if x 6= 0. This expression also holds when x = 0:

by definition, f(0,0) = 0. Thus f(x,0) = 0 for all x. Differentiating this with respect to x gives

∂f

∂x(x,0) = dxd 0

= 0, whence ∂f∂x(0,0) = 0. A symmetric calculation results in ∂f∂y(0,0) = 0. Hence Df(0,0) =

0 0 .

Moreover, x−a= x−0

y−0

= x

, and kx−ak=p

x2+y2. Thus, according to the definition of differentiability, we need to check if:

(x,y)→(0,0)lim

x2+y2 −0− 0 0

ã x

y px2+y2 = 0, i.e., if lim(x,y)→(0,0) xy

(x2+y2)3/2 = 0.

For this, suppose we approach the origin along the line y = x. Then we are looking at the behavior of (x2+xxãx2)3/2 = 23/2x2|x|3 = 23/21|x|. This blows up as x goes to 0. It certainly doesn’t approach 0. Hence there’s no way that lim(x,y)→(0,0) xy

(x2+y2)3/2 can equal 0 (in fact, the limit does not exist), so f is not differentiable at (0,0).

This example shows that a function of more than one variable can fail to be differentiable at a point even though all of its partial derivatives exist there. This differs from the one-variable case.

The geometry of the dot product

The classification of space curves