The aim of principal component analysis IPrincipal component analysis PCA provides a computationally efficient way of projectingthe p-dimensionaldata cloud orthogonally onto a k-dimensio
Trang 1Statistics in Geophysics: Principal Component
Analysis
Steffen Unkel
Department of Statistics Ludwig-Maximilians-University Munich, Germany
Trang 2Multivariate data
Let x = (x1, , xp)> be a p-dimensional random vector withpopulation mean µ and population covariance matrix Σ.Suppose that a sampleof n realizations of x is available
These np measurements xij (i = 1, , n; j = 1, , p) can becollected in a data matrix
X = (x(1), , x(n))>= (x1, , xp) ∈ Rn×pwith x>(i ) = (xi 1, , xip) being the i -th observation vector(i = 1, , n) and xj = (x1j, , xnj)> being the vector of the
n measurements on the j -th variable (j = 1, , p)
Trang 6Thus, Z>Z/(n − 1) is the sample correlation matrix.
Trang 8Eigendecomposition of the sample covariance matrix
Let SX bepositive semi-definite with rank(SX) = r (r ≤ p)
The eigenvalue decomposition(or spectral decomposition) of
on its main diagonal and E ∈ Rp×r is a column-wise
orthonormal matrix whose columns e1, , er are the
corresponding unit-normeigenvectorsof ω1, , ωr
Trang 9The aim of principal component analysis I
Principal component analysis (PCA) provides a
computationally efficient way of projectingthe p-dimensionaldata cloud orthogonally onto a k-dimensional subspace
The aim of PCA is to derive k ( p) uncorrelated linearcombinations of the p-dimensional observation vectors
x(1), , x(n), called the sample principal components(PCs),which retain most of the total variationpresent in the data.This is achieved by taking those k components that
successively have maximum variance
Trang 10The aim of principal component analysis II
PCA looks for r vectors ej ∈ Rp×1 (j = 1, , r ) whichmaximize e>j SXej
subject to e>j ej = 1 for j = 1, , r and
e>i ej = 0 for i = 1, , j − 1 (j ≥ 2)
It turns out that yj = Xej is the j -th sample PC with zeromean and variance ωj, where ej is an eigenvector of SX
corresponding to its j -th largest eigenvalue ωj (j = 1, , r )
The total variance of the r PCs will equal the total variance ofthe original variables so thatPr
j =1ωj = trace(SX)
Trang 11Singular value decomposition of the data matrix I
The sample PCs can also be found using thesingular valuedecomposition(SVD) of X
Expressing X with rank r with r ≤ min{n, p} by its SVD gives
D ∈ Rr ×r is a diagonal matrix with the singular values of Xsorted in decreasing order, σ1≥ σ2≥ ≥ σr > 0, on itsmain diagonal
Trang 12Singular value decomposition of the data matrix II
The matrix E is composed of coefficients or loadingsand thematrix of component scores Y ∈ Rn×r is given by Y = VD
Since it holds that E>E = Ir and Y>Y/(n − 1) = D2/(n − 1),theloadings are orthogonaland the sample PCsare
uncorrelated
The variance of the j -th sample PC is σj2/(n − 1) which isequal to the j -th largest eigenvalue, ωj, of SX (j = 1, , r )
Trang 13Singular value decomposition of the data matrix III
In practice, the leading k componentswith k r usuallyaccount for a substantial proportion
ω1+ · · · + ωktrace(SX)
of the total variance in the data and the sum in the SVD of X
is thereforetruncated after the first k terms
If so, PCA comes down to finding a matrix
Y = (y1, , yk) ∈ Rn×k of component scores of the nsamples on the k components and a matrix
E = (e1, , ek) ∈ Rp×k of coefficients whose k-th column isthe vector of loadings for the k-th component
Trang 14Least squares property of the SVD
PCA can be defined as the minimization of
||X − YE>||2F ,where ||B||F =
qtrace(B>B) denotes the Frobenius norm ofB
When variables are measured on different scales or on acommon scale with widely differing ranges, the data are oftenstandardized prior to PCA
The sample PCs are then obtained from an eigenvalue
decomposition of the sample correlation matrix Thesecomponents arenot equal to those derived from SX
Trang 15Choosing the number of components I
(i) Retain the first k components which explain alarge
proportion of the total variation, say 70-80%
(ii) If the correlation matrix is analyzed, retain only those
components witheigenvalues greater than 1 (or 0.7)
(iii) Examine a scree plot This is a plot of the eigenvalues versusthe component number The idea is to look for an “elbow”which corresponds to the point after which the eigenvaluesdecrease more slowly
(iv) Consider whether the component has a sensibleand usefulinterpretation
Trang 16Choosing the number of components II
Trang 17Interpretation I
Correlations and covariances of variables and components
The covariance of variable i with component j is given by
Cov(xi, yj) = ωjeji The correlation of variable i with component j is therefore
rxi,yj =
√
ωjeji
si ,where si is the standard deviation of variable i
If the components are extracted from the correlation matrix,then
rx i ,y j =√ωjeji
Trang 18Interpretation II
Rescaling principal components
The coefficients ej an be rescaled so that coefficients for themost important components are larger than those for lessimportant components
These rescaled coefficientsare calculated as
e∗j =√ωjej ,for which e∗j>e∗j = ωj, rather than unity
When the correlation matrix is analyzed, this rescaling leads
to coefficients that are the correlationsbetween the
components and the original variables
Trang 19Rotation can be performed either in an orthogonal or anoblique(non-orthogonal) fashion.
Several analytic orthogonal and oblique rotation criteria exist
in the literature
Trang 20Rotation II
To aid interpretation, all rotation criteria are designed to makethe coefficients as simple as possiblein some sense, with mostloadings made to have values either ‘close to zero’ or ‘far fromzero’, and with as few as possible of the coefficients takingintermediate values
After rotation, either one or both of the properties possessed
by PCA, that is, orthogonality of the loadings and
uncorrelatedness of the component scores, is lost
Trang 21PCA in the open-source software R
Function princomp() in the stats package:
Eigendecomposition of the covariance or correlation matrix.Alternative: use directly the function eigen()
Function prcomp() in the stats package: SVD of the(centered and possibly scaled) data matrix Alternative: usedirectly the function svd()
Trang 22Description of the data
For 41 cities in the United States the following seven variableswere recorded:
meter
more workers
We shall examine how PCA can be used to explore variousaspects of the data
Files: chap3usair.dat and pcausair.R
Trang 23Description of the data
Source: National Center for Environmental
Prediction/National Center for Atmospheric Research
Winter monthly sea level pressures over the Northern
Trang 243 4
1
1
1 1
2 2
2
334
Figure: Spatial map representations of the two leading PCs for winter sea level pressure data (left: North Atlantic Oscillation, right: North Pacific Oscillation) The loadings have been multiplied by 100.