Linearization and Multivariate Taylor Series

Một phần của tài liệu Mathematics For Machine Learning.pdf (Trang 171 - 176)

Step 3: Left-singular vectors as the normalized image of the right- singular vectors

5.8 Linearization and Multivariate Taylor Series

Figure 5.12 Linear approximation of a function. The original functionf is linearized at x0=−2using a first-order Taylor series expansion.

−4 −2 0 2 4

x

−2

−1 0 1

f(x)

f(x)

f(x0) f(x0) +f0(x0)(x−x0)

Iff(x, y)is a twice (continuously) differentiable function, then

∂2f

∂x∂y = ∂2f

∂y∂x, (5.146)

i.e., the order of differentiation does not matter, and the corresponding

Hessian matrix Hessian matrix

H =

∂2f

∂x2

∂2f

∂x∂y

∂2f

∂x∂y

∂2f

∂y2

(5.147)

is symmetric. The Hessian is denoted as∇2x,yf(x, y). Generally, forx∈Rn andf : Rn → R, the Hessian is ann×nmatrix. The Hessian measures the curvature of the function locally around(x, y).

Remark(Hessian of a Vector Field). Iff :Rn → Rmis a vector field, the

Hessian is an(m×n×n)-tensor. ♦

5.8 Linearization and Multivariate Taylor Series

The gradient∇f of a functionf is often used for a locally linear approxi- mation off aroundx0:

f(x)≈f(x0) + (∇xf)(x0)(x−x0). (5.148) Here (∇xf)(x0) is the gradient off with respect to x, evaluated atx0. Figure 5.12 illustrates the linear approximation of a functionfat an input x0. The original function is approximated by a straight line. This approx- imation is locally accurate, but the farther we move away from x0 the worse the approximation gets. Equation (5.148) is a special case of a mul- tivariate Taylor series expansion of f at x0, where we consider only the first two terms. We discuss the more general case in the following, which will allow for better approximations.

Figure 5.13 Visualizing outer products. Outer products of vectors increase the dimensionality of the array by 1 per term. (a) The outer product of two vectors results in a matrix; (b) the outer product of three vectors yields a third-order tensor.

(a) Given a vectorδ∈R4, we obtain the outer productδ2:=δ⊗δ=δδ>∈ R4×4as a matrix.

(b) An outer productδ3 := δ⊗δ⊗δ ∈ R4×4×4 results in a third-order tensor (“three- dimensional matrix”), i.e., an array with three indexes.

Definition 5.7(Multivariate Taylor Series). We consider a function

f :RD →R (5.149)

x7→f(x), x∈RD, (5.150) that is smooth atx0. When we define the difference vectorδ :=x−x0, themultivariate Taylor seriesoff at(x0)is defined as

multivariate Taylor series

f(x) =

X

k=0

Dkxf(x0)

k! δk, (5.151)

whereDxkf(x0)is thek-th (total) derivative off with respect tox, eval- uated atx0.

Definition 5.8(Taylor Polynomial). TheTaylor polynomialof degreenof

Taylor polynomial

f atx0contains the firstn+ 1components of the series in (5.151) and is defined as

Tn(x) =

n

X

k=0

Dkxf(x0)

k! δk. (5.152)

In (5.151) and (5.152), we used the slightly sloppy notation of δk, which is not defined for vectorsx ∈ RD, D > 1,andk > 1. Note that bothDkxf andδk are k-th order tensors, i.e., k-dimensional arrays. The

A vector can be implemented as a one-dimensional array, a matrix as a two-dimensional array.

kth-order tensorδk ∈R

ktimes

z }| {

D×D×...×D is obtained as ak-fold outer product, denoted by⊗, of the vectorδ∈RD. For example,

δ2 :=δ⊗δ=δδ>, δ2[i, j] =δ[i]δ[j] (5.153)

5.8 Linearization and Multivariate Taylor Series 167 δ3:=δ⊗δ⊗δ, δ3[i, j, k] =δ[i]δ[j]δ[k]. (5.154) Figure 5.13 visualizes two such outer products. In general, we obtain the terms

Dkxf(x0)δk =

D

X

i1=1

ã ã ã

D

X

ik=1

Dxkf(x0)[i1, . . . , ik]δ[i1]ã ã ãδ[ik] (5.155) in the Taylor series, whereDkxf(x0)δkcontainsk-th order polynomials.

Now that we defined the Taylor series for vector fields, let us explicitly write down the first termsDkxf(x0)δk of the Taylor series expansion for k= 0, . . . ,3andδ:=x−x0:

np.einsum(

’i,i’,Df1,d) np.einsum(

’ij,i,j’, Df2,d,d) np.einsum(

’ijk,i,j,k’, Df3,d,d,d)

k= 0 :D0xf(x0)δ0 =f(x0)∈R (5.156) k= 1 :D1xf(x0)δ1 =∇xf(x0)

| {z }

1×D

δ

|{z}

D×1

=

D

X

i=1

∇xf(x0)[i]δ[i]∈R (5.157) k= 2 :D2xf(x0)δ2 =tr H(x0)

| {z }

D×D

δ

|{z}

D×1

δ>

|{z}

1×D

=δ>H(x0)δ (5.158)

=

D

X

i=1 D

X

j=1

H[i, j]δ[i]δ[j]∈R (5.159)

k= 3 :D3xf(x0)δ3 =

D

X

i=1 D

X

j=1 D

X

k=1

D3xf(x0)[i, j, k]δ[i]δ[j]δ[k]∈R (5.160) Here,H(x0)is the Hessian off evaluated atx0.

Example 5.15 (Taylor Series Expansion of a Function with Two Vari- ables)

Consider the function

f(x, y) =x2+ 2xy+y3. (5.161) We want to compute the Taylor series expansion off at(x0, y0) = (1,2). Before we start, let us discuss what to expect: The function in (5.161) is a polynomial of degree 3. We are looking for a Taylor series expansion, which itself is a linear combination of polynomials. Therefore, we do not expect the Taylor series expansion to contain terms of fourth or higher order to express a third-order polynomial. This means that it should be sufficient to determine the first four terms of (5.151) for an exact alterna- tive representation of (5.161).

To determine the Taylor series expansion, we start with the constant term and the first-order derivatives, which are given by

f(1,2) = 13 (5.162)

∂f

∂x = 2x+ 2y =⇒ ∂f

∂x(1,2) = 6 (5.163)

∂f

∂y = 2x+ 3y2 =⇒ ∂f

∂y(1,2) = 14. (5.164) Therefore, we obtain

D1x,yf(1,2) =∇x,yf(1,2) =h∂f∂x(1,2) ∂f∂y(1,2)i=6 14∈R1×2 (5.165) such that

D1x,yf(1,2)

1! δ=6 14 x−1

y−2

= 6(x−1) + 14(y−2). (5.166) Note thatDx,y1 f(1,2)δcontains only linear terms, i.e., first-order polyno- mials.

The second-order partial derivatives are given by

∂2f

∂x2 = 2 =⇒ ∂2f

∂x2(1,2) = 2 (5.167)

∂2f

∂y2 = 6y =⇒ ∂2f

∂y2(1,2) = 12 (5.168)

∂2f

∂y∂x = 2 =⇒ ∂2f

∂y∂x(1,2) = 2 (5.169)

∂2f

∂x∂y = 2 =⇒ ∂2f

∂x∂y(1,2) = 2. (5.170) When we collect the second-order partial derivatives, we obtain the Hes- sian

H =

"∂2f

∂x2

∂2f

∂x∂y

∂2f

∂y∂x

∂2f

∂y2

#

= 2 2

2 6y

, (5.171)

such that

H(1,2) = 2 2

2 12

∈R2×2. (5.172)

Therefore, the next term of the Taylor-series expansion is given by D2x,yf(1,2)

2! δ2 = 1

2δ>H(1,2)δ (5.173a)

= 1 2

x−1 y−2 2 2

2 12

x−1 y−2

(5.173b)

= (x−1)2+ 2(x−1)(y−2) + 6(y−2)2. (5.173c) Here,Dx,y2 f(1,2)δ2contains only quadratic terms, i.e., second-order poly- nomials.

5.8 Linearization and Multivariate Taylor Series 169 The third-order derivatives are obtained as

Dx,y3 f =h∂H∂x ∂H∂yi∈R2×2×2, (5.174) Dx,y3 f[:,:,1] = ∂H

∂x =

" ∂3f

∂x3

∂3f

∂x2∂y

∂3f

∂x∂y∂x

∂3f

∂x∂y2

#

, (5.175)

Dx,y3 f[:,:,2] = ∂H

∂y =

" ∂3f

∂y∂x2

∂3f

∂y∂x∂y

∂3f

∂y2∂x

∂3f

∂y3

#

. (5.176)

Since most second-order partial derivatives in the Hessian in (5.171) are constant, the only nonzero third-order partial derivative is

∂3f

∂y3 = 6 =⇒ ∂3f

∂y3(1,2) = 6. (5.177) Higher-order derivatives and the mixed derivatives of degree 3 (e.g.,

∂f3

∂x2∂y) vanish, such that D3x,yf[:,:,1] =

0 0 0 0

, D3x,yf[:,:,2] = 0 0

0 6

(5.178) and

Dx,y3 f(1,2)

3! δ3= (y−2)3, (5.179)

which collects all cubic terms of the Taylor series. Overall, the (exact) Taylor series expansion off at(x0, y0) = (1,2)is

f(x) =f(1,2)+D1x,yf(1,2)δ+Dx,y2 f(1,2)

2! δ2+Dx,y3 f(1,2) 3! δ3

(5.180a)

=f(1,2)+∂f(1,2)

∂x (x−1) + ∂f(1,2)

∂y (y−2) + 1

2!

∂2f(1,2)

∂x2 (x−1)2+ ∂2f(1,2)

∂y2 (y−2)2 + 2∂2f(1,2)

∂x∂y (x−1)(y−2)

+1 6

∂3f(1,2)

∂y3 (y−2)3 (5.180b)

=13+ 6(x−1) + 14(y−2)

+ (x−1)2+ 6(y−2)2+ 2(x−1)(y−2)+ (y−2)3. (5.180c) In this case, we obtained an exact Taylor series expansion of the polyno- mial in (5.161), i.e., the polynomial in (5.180c) is identical to the original polynomial in (5.161). In this particular example, this result is not sur- prising since the original function was a third-order polynomial, which we expressed through a linear combination of constant terms, first-order, second-order, and third-order polynomials in (5.180c).

Một phần của tài liệu Mathematics For Machine Learning.pdf (Trang 171 - 176)

Tải bản đầy đủ (PDF)

(417 trang)