The Problem and Its Statistical Rationale

Một phần của tài liệu Realtime Data Mining_ Self-Learning Techniques for Recommendation Engines [Paprotny & Thess 2014-05-14] (Trang 172 - 178)

8.3 PCA-Based Collaborative Filtering

8.3.1 The Problem and Its Statistical Rationale

In what follows, we shall introduce the factorization problem underlying PCA-based CF along with a rather intuitive geometric rationale for the procedure.

Subsequently, we shall provide a statistical interpretation of the approach. The latter is rather technical and may safely be skipped by a less mathematically inclined reader.

Before plunging into the matter, we need to stipulate some basic mathematical concepts. We assume that the reader brings along basic knowledge of linear algebra at the level of an undergraduate introductory class.

The fundamental notion is that of a linear submanifold. Informally, a linear submanifold ofRnp is a shifted subspace. Specifically, it is a set

M:ẳbỵXẳbỵxx∈X ,

whereΧdenotes a subspace of Rnpof dimensiond. Given a basisx1,x2,. . .,xdof V, a vectorx∈ Rnplies inMif and only if

xẳbỵy1x1ỵ . . . ỵydxd

8.3 PCA-Based Collaborative Filtering 149

for some real coefficientsy1,y2,. . .,yd. In matrix notation, this corresponds to xẳbỵXyẳẵX;byT;1T

,

where X:ẳ[x1, x2,. . ., xd], y:ẳ[y1, y2,. . ., yd]T. A linear manifold is thus completely characterized by the matrix [X,b].

We endowRnp with the canonical inner product<,>, which is defined as

<p,q>:ẳXnp

iẳ1

piqi, p,q ∈ Rnp and which induces the norm

k kx :ẳ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

<x,x>

p , x∈Rnp:

Introducing a norm, which, in turn, induces a metric, gives rise to a criterion to distinguish the quality of an approximation to a given vector.

In particular, the problem of finding the best approximation tov∈ Rnp

mmin∈Mkmvk has a unique optimizer given by

^

m :ẳbỵPXðvbị,

where PX ∈ Rnpnp denotes the orthogonal projector onto X. This projector is given by

PXẳXXỵ, ð8:2ị

where

Xỵ:ẳXTX1

XT ð8:3ị

denotes the Moore-Penrose pseudo-inverse ofX, which has already been used in (6.12). If it holds that

XTXẳI: ð8:4ị

the orthogonal projector (8.2) simplifies to

PXẳXXT: ð8:5ị

As we have already seen in Chap.6, the orthogonal projector plays a crucial part for many subspace decompositions and will be frequently encountered in what follows.

Geometrically, Principal Component Analysis is a linear dimensionality reduc- tion tool. Given a set of high-dimensional data, the goal is to project the data orthogonally onto a linear manifold with a prescribed dimension, which is illus- trated by Fig.8.1.

This manifold is chosen such that the mean-squared error resulting from the projection is minimal among all possible choices. Mathematically, the problem may be stated as follows:

min

X∈Rnpd, b∈Rnp, y1,...,yns∈Rd

Xns

jẳ1

ajXyjb

2, ð8:6ị where a1, . . ., ans∈ Rnp denote the given data. A straightforward argument reveals thatb is always given by the centroid of the data, i.e.,b:ẳn1s Xns

jẳ1aj. Hence, assuming without loss of generality that the data are mean centered (which may always be achieved by replacing our data by ajb^, jẳ1,. . .,nsị, the translationb may always be taken to be 0. We may thus restrict ourselves to the problem of finding the best approximatingsubspaceto a set of mean-centered data:

min

X∈Rnpd, y1,...,yns∈Rd

Xns

jẳ1

ajXyj

2: ð8:7ị

TheFrobenius normis defined as k kA 2F :ẳ Xm,n

iẳ1,jẳ1

aij

2, A ∈ Rmn:

Summarizing our data and intrinsic variables in matrices, A:ẳẵa1;. . .;ans, Y:ẳy1;. . .;yns

, we may cast (8.7) equivalently as the matrix factorization problem

min

X∈Rnpd,Y∈RdnskAXYk2F: ð8:8ị

Recalling the general framework stipulated in (8.1), (8.8) may be stated in terms of the former by assigningf E;ð Fị:ẳkEFk2F, C1:ẳRnpd, C2:ẳRdns:

u1 u4

u2 u3

x

y

z w

1 1

1 2

2

2 Fig. 8.1 The best

approximating

one-dimensional subspace (solid line) to a set of data residing inR3. Projections are indicated bydotted lines

8.3 PCA-Based Collaborative Filtering 151

Readers familiar with optimization theory will notice that the objective of (8.8), although convex in each of the decision variablesXandY, is non-convex. There- fore, a global solution by means of optimization algorithms is hard, if not impos- sible. As foreshadowed in the introduction, however, (8.8) is equivalent to a well- studied algebraic problem, namely, that of aspectral decomposition.

Proposition 8.1 Let A∈Rnnbe symmetric,i.e., AẳAT. Then there is a unique real sequenceλ1. . .λnsuch that

AẳUΛUT, ð8:9ị

whereΛij:ẳδijλiand U is unitary (i.e., UTUẳUUTẳI). The valuesλiare called eigenvaluesorspectrumof A, the corresponding columns of Ueigenvectors,and the factorization(8.9) eigenvalueorspectral decompositionof A.

(Proofs as well as more detailed renditions of this result may be found in any textbook on linear algebra. See, e.g., Chap. 1 of [HJ85].)

Given a matrixA∈Rmn, the correspondingGram matrixis defined as G:ẳATA:

Since

xTGxẳxTATAxẳk kAx 20,

Gis positive semidefinite, which, as is well known in linear algebra, implies that its spectrum is nonnegative. The same holds for thecovariance matrix

C:ẳAAT:

Moreover, both the Gram as well as the covariance matrices are symmetric.

Now let a spectral decomposition ofGbe given by

GẳVΛVT, ð8:10ị

and define

Z:ẳAV:

Then we obtain

ZTZẳVTATAV ẳΛ:

SinceΛis a diagonal matrix of the eigenvalues of G,we may write

ZẳUS, i:e: AVẳUS ð8:11ị for some unitaryU∈RmmandS∈Rmndefined as

Sij:ẳδijsj, ð8:12ị where

sj:ẳ ffiffiffiffi λj

p ,jẳ1, . . .,n

are thesingular valuesofA. We have thus derived the well-knownsingular value decomposition(SVD).

Proposition 8.2 (cf. Lemma 7.3.1 in [HJ85])Let A∈Rmn. Then there is a unique sequence s1. . .smsuch that

AẳUSVT, ð8:13ị

where S is as defined in (8.12), for some unitary matrices U ∈ Rmm, V ∈ Rnn. The values sjare referred to assingular valuesof A, the columns of U asleft, and those of V asright singular vectors ofA.

Example 8.3 For our Example 8.1 of a web shop with matrix

Aẳ

0 1 10 5

1 5 1 1

0 5 0 1

0

@

1 A,

we approximately obtain the following SVD:

Sẳ

11:49 0 0 0

0 6:88 0 0

0 0 0:77 0

0

@

1 A,Uẳ

1 0:3 0:1 0:2 0:7 0:7 0:1 0:7 0:7 0

@

1 A,

V ẳ

0:02 0:1 0:91 0:39 0:24 0:96 0:07 0:09 0:84 0:24 0:02 0:06 0:45 0:02 0:41 0:94 0

BB

@

1 CC A:

■ Labeling the left and right singular vectors asUẳ: [u1,. . .,um],Vẳ: [v1,. . .,vn], we obtain a polyadic representation

AẳXr

jẳ1

sjujvjT,

wherer:ẳmax

k sk>0.It may easily be verified thatrẳrank A, i.e., equal the minimum number terms in a polyadic representation of A or, equivalently, the dimension of the range of A. It is thus also obvious that {u1,. . ., ur} is an

8.3 PCA-Based Collaborative Filtering 153

orthogonal basis for the range ofA,and {ur+1,. . .,um} for the orthogonal comple- ment thereof. Likewise, {vr+1,. . .,vn} span the null space ofA, and the remaining right singular vectors its orthogonal complement.

With regard to the solution of (8.8), the decisive tool is thetruncated singular value decomposition. It can be shown that the rank-kmatrices

Ak:ẳXk

jẳ1

sjujvjTẳẵu1;. . .;uk δijsj

v1;. . .;vk

ẵ Tẳ:UkSkVkT,kẳ1,. . .,r ð8:14ị provide optimal rank-kapproximations to our matrixAin terms ofk kF. In fact, the subsequent stronger result obtains.

Theorem 8.1 (cf. Theorem 7.4.51 and Example 7.4.52 in [HJ85]) Let k k be a unitarily invariant norm,i.e.,kAk ẳ kQATk for any A and unitary Q,T. Then we have

min

B∈Rmn,rank BẳkkABk ẳkAAkk:

In particular, this holds fork kFandk k2,i.e., the matrix norm induced by the Euclidean norm.

As an immediate consequence, this insight provides us with a solution of the approximation problem at hand.

Corollary 8.1 An optimal solution of(8.8) is given by X:ẳUdand Y:ẳSdVTd. The (truncated) SVD is illustrated by Fig.8.2.

Example 8.4 For our web shop example 8.1, we obtain for a rank-1 approximation immediately from our SVD

Xẳ 1 0:2 0:1 0

@ 1

A,Yẳð11:49ị 0:02 0:24 0:84 0:45 0 BB

@ 1 CC A

T

ẳð0:23 2:76 9:65 5:17ị,

which coincides with the factorization in Example 8.2.

n

= n

m m m

n m n

A U S

VT

Fig. 8.2 Illustration of the singular value decomposition of a matrix A. Thedotted linesindicate the borders of the submatrices corresponding to a truncated SVD under the assumption that A be rank deficient. Please note that all entries of S other than those indicated by the diagonalsolid line are zero

We obtain for a rank-2 approximation

Xẳ

1 0:3 0:2 0:7 0:1 0:7 0

@

1 A,

Yẳ 11:49 0 0 6:88

0:02 0:1 0:24 0:96 0:84 0:02 0:45 0:02 0

BB

@

1 CC A

T

ẳ 0:23 2:76 9:65 5:17 0:69 6:6 1:65 0:14

and

AkẳXYẳ 0:02 0:78 10:14 5:13 0:53 5:17 0:78 1:13 0:51 4:9 0:19 0:62 0

@

1 A

already provides a fairly good approximation toA. ■

This terrific result tells us that the computation of a solution of (8.8) may be reduced to computing a truncated singular value decomposition ofA. It should be clear from the above derivation that this problem, in turn, may be essentially solved by computing a truncated spectral decomposition of the Gram matrixATA. Calcu- lation of spectral decompositions of symmetric and positive definite matrices, fortunately, is a well-understood domain of numerical linear algebra. In particular, this problem may be solved within polynomially bounded time in terms of dimen- sionality and desired accuracy.

The bad news is that the complexity of state-of-the-art solvers increases with the number of columns of A. Hence, these methods are not suitable for realtime computation.

Một phần của tài liệu Realtime Data Mining_ Self-Learning Techniques for Recommendation Engines [Paprotny & Thess 2014-05-14] (Trang 172 - 178)

Tải bản đầy đủ (PDF)

(333 trang)