Thus the effect of each time-varying underlying factor or independent components j ton the measured time series is approximately linear.. By ICA, it is possible to isolate both the under
Trang 124 Other Applications
In this chapter, we consider some further applications of independent component analysis (ICA), including analysis of financial time series and audio signal separation
24.1 FINANCIAL APPLICATIONS
24.1.1 Finding hidden factors in financial data
It is tempting to try ICA on financial data There are many situations in which parallel financial time series are available, such as currency exchange rates or daily returns
of stocks, that may have some common underlying factors ICA might reveal some driving mechanisms that otherwise remain hidden
In a study of a stock portfolio [22], it was found that ICA is a complementary tool
to principal component analysis (PCA), allowing the underlying structure of the data
to be more readily observed If one could find the maximally independent mixtures
of the original stocks, i.e., portfolios, this might help in minimizing the risk in the investment strategy
In [245], we applied ICA on a different problem: the cashflow of several stores belonging to the same retail chain, trying to find the fundamental factors common
to all stores that affect the cashflow Thus, the effect of the factors specific to any particular store, i.e., the effect of the managerial actions taken at the individual store and in its local environment, could be analyzed
In this case, the mixtures in the ICA model are parallel financial time seriesxi
(t), withiindexing the individual time series,i= 1::: mandtdenoting discrete time
441
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
Trang 2x i (t) =
j a ij s j
for each time seriesx
i (t) Thus the effect of each time-varying underlying factor or independent components
j (t)on the measured time series is approximately linear The assumption of having some underlying independent components in this spe-cific application may not be unrealistic For example, factors like seasonal variations due to holidays and annual variations, and factors having a sudden effect on the purchasing power of the customers, like price changes of various commodities, can
be expected to have an effect on all the retail stores, and such factors can be assumed
to be roughly independent of each other Yet, depending on the policy and skills of the individual manager, e.g., advertising efforts, the effect of the factors on the cash flow of specific retail outlets are slightly different By ICA, it is possible to isolate both the underlying factors and the effect weights, thus also making it possible to group the stores on the basis of their managerial policies using only the cash flow time series data
The data consisted of the weekly cash flow in 40 stores that belong to the same retail chain, covering a time span of 140 weeks Some examples of the original data
x
i
(t)are shown in Fig 24.1 The weeks of a year are shown on the horizontal axis, starting from the first week in January Thus for example the heightened Christmas sales are visible in each time series before and during week 51 in both of the full years shown
The data were first prewhitened using PCA The original 40-dimensional signal vectors were projected to the subspace spanned by four principal components, and the variances were normalized to 1 Thus the dimension of the signal space was strongly decreased from 40 A problem in this kind of real world application is that there
is no prior knowledge on the number of independent components Sometimes the eigenvalue spectrum of the data covariance matrix can be used, as shown in Chapter
6, but in this case the eigenvalues decreased rather smoothly without indicating any clear signal subspace dimension Then the only way is to try different dimensions
If the independent components that are found using different dimensions for the whitened data are the same or very similar, we can trust that they are not just artifacts produced by the compression, but truly indicate some underlying factors in the data Using the FastICA algorithm, four independent components (ICs)s
j (t) j =
1 ::: 4were estimated As depicted in Fig 24.2, the FastICA algorithm has found several clearly different fundamental factors hidden in the original data
The factors have different interpretations The topmost factor follows the sudden changes that are caused by holidays etc.; the most prominent example is Christmas time The factor in the bottom row, on the other hand, reflects the slower seasonal variation, with the effect of the summer holidays clearly visible The factor in the third row could represent a still slower variation, something resembling a trend The last factor, in the second row, is different from the others; it might be that this factor follows mostly the relative competitive position of the retail chain with respect to its competitors, but other interpretations are also possible
Trang 31 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
Fig 24.1 Five samples of the 40 original cashflow time series (mean removed, normalized
to unit standard deviation) Horizontal axis: time in weeks over 140 weeks (Adapted from [245].)
If five ICs are estimated instead of four, then three of the found components stay virtually the same, while the fourth one separates into two new components Using the found mixing coefficientsa
ij, it is also possible to analyze the original time series and cluster them in groups More details on the experiments and their interpretation can be found in [245]
24.1.2 Time series prediction by ICA
As noted in Chapter 18, the ICA transformation tends to produce component signals,
s
j
(t), that can be compressed with fewer bits than the original signals,x
i (t) They are thus more structured and regular This gives motivation to try to predict the signalsx
i
(t)by first going to the ICA space, doing the prediction there, and then transforming back to the original time series, as suggested by [362] The prediction can be done separately and with a different method for each component, depending
on its time structure Hence, some interaction from the user may be needed in the overall prediction procedure Another possibility would be to formulate the ICA contrast function in the first place so that it includes the prediction errors — some work along these lines has been reported by [437]
In [289], we suggested the following basic procedure:
1 After subtracting the mean of each time series and prewhitening (after which each time series has zero mean and unit variance), the independent components
Trang 41 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
Fig 24.2 Four independent components or fundamental factors found from the cashflow data (Adapted from [245].)
s
j
(t), and the mixing matrix,A, are estimated using the FastICA algorithm The number of ICs can be variable
2 For each components
j (t), a suitable nonlinear filtering is applied to reduce the effects of noise — smoothing for components that contain very low frequen-cies (trend, slow cyclical variations), and high-pass filtering for components containing high frequencies and/or sudden shocks The nonlinear smoothing
is done by applying smoothing functionsf
jon the source signalss
j (t),
s s
j (t) = f j
s j (t + r) : : s
j (t) : : s
j (t k)]: (24.2)
3 Each smoothed independent component is predicted separately, for instance using some method of autoregressive (AR) modeling [455] The prediction is done for a number of steps into the future This is done by applying prediction functions,g
j, on the smoothed source signals,s
s j (t):
s p
j (t + 1) = g
j
s s
j (t) s s
j (t 1) : : s
s
j (t q)] (24.3) The next time steps are predicted by gliding the window of lengthqover the measured and predicted values of the smoothed signal
4 The predictions for each independent component are combined by weighing them with the mixing coefficients,a
ij, thus obtaining the predictions,x
p
i (t), for the original time series,x
i (t):
x p (t + 1) = As
p
Trang 5Fig 24.3 Prediction of real-world financial data: the upper figure represents the actual future outcome of one of the original mixtures and the lower one the forecast obtained using ICA prediction for an interval of 50 values.
To test the method, we applied our algorithm on a set of 10 foreign exchange rate time series Again, we suppose that there are some independent factors that affect the time evolution of such time series Economic indicators, interest rates, and psychological factors can be the underlying factors of exchange rates, as they are closely tied to the evolution of the currencies Even without prediction, some of the ICs may be useful in analyzing the impact of different external phenomena on the foreign exchange rates [22]
The results were promising, as the ICA prediction performed better than direct prediction Figure 24.3 shows an example of prediction using our method The upper figure represents one of the original time series (mixtures) and the lower one the forecast obtained using ICA prediction for a future interval of 50 time steps The algorithm seemed to predict very well especially the turning points In Table 24.1 there is a comparison of errors obtained by applying classic AR prediction to the original time series directly, and our method outlined above The right-most column shows the magnitude of the errors when no smoothing is applied to the currencies While ICA and AR prediction are linear techniques, the smoothing was nonlinear Using nonlinear smoothing, optimized for each independent component time series separately, the prediction of the ICs is more accurately performed and the results also are different from the direct prediction of the original time series The noise in the time series is strongly reduced, allowing a better prediction of the underlying factors The model is flexible and allows various smoothing tolerances and different orders
in the classic AR prediction method for each independent component
In reality, especially in real world time series analysis, the data are distorted by delays, noise, and nonlinearities Some of these could be handled by extensions of the basic ICA algorithms, as reported in Part III of this book
Trang 6used The amount of smoothing in classic AR prediction was varied.
Errors Smoothing in 2 0.5 0.1 0.08 0.06 0.05 0
AR prediction
ICA prediction 2.3 2.3 2.3 2.3 2.3 2.3 2.3
AR prediction 9.7 9.1 4.7 3.9 3.4 3.1 4.2
24.2 AUDIO SEPARATION
One of the original motivations for ICA research was the cocktail-party problem, as reviewed in the beginning of Chapter 7 The idea is that there arensound sources recorded by a number of microphones, and we want to separate just one of the sources
In fact, often there is just one interesting signal, for example, a person speaking to the microphone, and all the other sources can be considered as noise; in this case,
we have a problem of noise canceling A typical example of a situation where we want to separate noise (or interference) from a speech signal is a person talking to a mobile phone in a noisy car
If there is just one microphone, one can attempt to cancel the noise by ordinary noise canceling methods: linear filtering, or perhaps more sophisticated techniques like wavelet and sparse code shrinkage (Section 15.6) Such noise canceling can be rather unsatisfactory, however It works only if the noise has spectral characteristics that are clearly different from those of the speech signal One might wish to remove the noise more effectively by collecting more data using several microphones Since
in real-life situations the positions of the microphones with respect to the sources can be rather arbitrary, the mixing process is not known, and it has to be estimated blindly In this case, we find the ICA model, and the problem is one of blind source separation
Blind separation of audio signals is, however, much more difficult than one might expect This is because the basic ICA model is a very crude approximation of the real mixing process In fact, here we encounter almost all the complications that we have discussed in Part III:
The mixing is not instantaneous Audio signals propagate rather slowly, and thus they arrive in the microphones at different times Moreover, there are echos, especially if the recording is made in a room Thus the problem is more adequately modeled by a convolutive version of the ICA model (Chapter 19) The situation is thus much more complicated than with the separation of mag-netoencephalographic (MEG) signals, which propagate fast, or with feature
Trang 7extraction, where no time delays are possible even in theory In fact, even the basic convolutive ICA model may not be enough because the time delays may
be fractional and may not be adequately modeled as integer multiples of the time interval between two samples
Typically, the recordings are made with two microphones only However, the number of source signals is probably much larger than 2 in most cases, since the noise sources may not form just one well-defined source Thus we have the problem of overcomplete bases (Chapter 16)
The nonstationarity of the mixing is another important problem The mixing matrix may change rather quickly, due to changes in the constellation of the speaker and the microphones For example, one of these may be moving with respect to the other, or the speaker may simply turn his head This implies that the mixing matrix must be reestimated quickly in a limited time frame, which also means a limited number of data Adaptive estimation methods may alleviate this problem somewhat, but this is still a serious problem due to the convolutive nature of the mixing In the convolutive mixing, the number of parameters can be very large: For example, the convolution may be modeled
by filters of the length of 1000 time points, which effectively multiplies the number of parameters in the model by 1000 Since the number of data points should grow with the number of parameters to obtain satisfactory estimates, it may be next to impossible to estimate the model with the small number of data points that one has time to collect before the mixing matrix has changed too much
Noise may be considerable There may be strong sensor noise, which means that we should use the noisy ICA model (Chapter 15) The noise complicates the estimation of the ICA model quite considerably, even in the basic case where noise is assumed gaussian On the other hand, the effect of overcomplete bases could be modeled as noise as well This noise may not be very gaussian, however, making the problem even more difficult
Due to these complications, it may be that the prior information, independence and nongaussianity of the source signals, are not enough To estimate the convolutive ICA model with a large number of parameters, and a rapidly changing mixing matrix, may require more information on the signals and the matrix First, one may need to combine the assumption of nongaussianity with the different time-structure assumptions in Chapter 18 Speech signals have autocorrelations and nonstationarities, so this information could be used [267, 216] Second, one may need
to use some information on the mixing For example, sparse priors (Section 20.1.3) could be used
It is also possible that real-life speech separation requires sophisticated modeling
of speech signals Speech signals are highly structured, autocorrelations and nonsta-tionarity being just the very simplest aspects of their time structure Such approaches were proposed in [54, 15]
Trang 8estimation of the convolutive ICA model, was described in Chapter 19.
24.3 FURTHER APPLICATIONS
Among further applications, let us mention
Text document analysis [219, 229, 251]
Radiocommunications [110, 77]
Rotating machine monitoring [475]
Seismic monitoring [161]
Reflection canceling [127]
Nuclear magnetic resonance spectroscopy [321]
Selective transmission, which is a dual problem of blind source separation A set of independent source signals are adaptively premixed prior to a nondis-persive physical mixing process so that each source can be independently monitored in the far field [117]
Further applications can be found in the proceedings of the ICA’99 and ICA2000 workshops [70, 348]