Random Processes •Spectra of Deterministic Signals•Spectraof Random Processes 14.3 The Problem of Power Spectrum Estimation14.4 Nonparametric Spectrum Estimation Periodogram •The Bartlet
Trang 1Djuric, P.M & Kay S.M “Spectrum Estimation and Modeling”
Digital Signal Processing Handbook
Ed Vijay K Madisetti and Douglas B Williams
Boca Raton: CRC Press LLC, 1999
Trang 2Random Processes •Spectra of Deterministic Signals•Spectra
of Random Processes
14.3 The Problem of Power Spectrum Estimation14.4 Nonparametric Spectrum Estimation
Periodogram •The Bartlett Method•The Welch Method•
Blackman-Tukey Method •Minimum Variance Spectrum
Es-timator •Multiwindow Spectrum Estimator
14.5 Parametric Spectrum Estimation
Spectrum Estimation Based on Autoregressive Models• trum Estimation Based on Moving Average Models•Spectrum Estimation Based on Autoregressive Moving Average Models •
Spec-Pisarenko Harmonic Decomposition Method •Multiple
Sig-nal Classification (MUSIC)
14.6 Recent DevelopmentsReferences
14.1 Introduction
The main objective of spectrum estimation is the determination of the power spectrum density (PSD)
of a random process The PSD is a function that plays a fundamental role in the analysis of stationaryrandom processes in that it quantifies the distribution of total power as a function of frequency Theestimation of the PSD is based on a set of observed data samples from the process A necessaryassumption is that the random process is at least wide sense stationary, that is, its first and secondorder statistics do not change with time The estimated PSD provides information about the structure
of the random process which can then be used for refined modeling, prediction, or filtering of theobserved process
Spectrum estimation has a long history with beginnings in ancient times [17] The first significantdiscoveries that laid the grounds for later developments, however, were made in the early years of theeighteenth century They include one of the most important advances in the history of mathematics,Fourier’s theory According to this theory, an arbitrary function can be represented by an infinitesummation of sine and cosine functions Later came the Sturm-Liouville spectral theory of differentialequations, which was followed by the spectral representations in quantum and classical physicsdeveloped by John von Neuman and Norbert Wiener, respectively The statistical theory of spectrumestimation started practically in 1949 when Tukey introduced a numerical method for computation
of spectra from empirical data A very important milestone for further development of the fieldwas the reinvention of the fast Fourier transform (FFT) in 1965, which is an efficient algorithm forcomputation of the discrete Fourier transform Shortly thereafter came the work of John Burg, who
Trang 3proposed a fundamentally new approach to spectrum estimation based on the principle of maximumentropy In the past three decades his work was followed up by many researchers who have developednumerous new spectrum estimation procedures and applied them to various physical processes fromdiverse scientific fields Today, spectrum estimation is a vital scientific discipline which plays a majorrole in many applied sciences such as radar, speech processing, underwater acoustics, biomedicalsignal processing, sonar, seismology, vibration analysis, control theory, and econometrics.
14.2 Important Notions and Definitions
14.2.1 Random Processes
The objects of interest of spectrum estimation are random processes They represent time tions of a certain quantity which cannot be fully described by deterministic functions The voltagewaveform of a speech signal, the bit stream of zeros and ones of a communication message, or thedaily variations of the stock market index are examples of random processes Formally, a randomprocess is defined as a collection of random variables indexed by time (The family of random vari-ables may also be indexed by a different variable, for example space, but here we will consider only
fluctua-random time processes.) The index set is infinite and may be continuous or discrete If the index
set is continuous, the random process is known as a continuous-time random process, and if the set
is discrete, it is known as a discrete-time random process The speech waveform is an example of
a continuous random process and the sequence of zeros and ones of a communication message, adiscrete one We shall focus only on discrete-time processes where the index set is the set of integers
A random process can be viewed as a collection of a possibly infinite number of functions, alsocalled realizations We shall denote the collection of realizations by{ ˜x[n]} and an observed realization
of it by{x[n]} For fixed n, { ˜x[n]} represents a random variable, also denoted as ˜x[n], and x[n] is the
n-th sample of the realization {x[n]} If the samples x[n] are real, the random process is real, and if
they are complex, the random process is complex In the discussion to follow, we assume that{ ˜x[n]}
is a complex random process.
The random process{ ˜x[n]} is fully described if for any set of time indices n1,n2, ,n m, the jointprobability density function of˜x[n1], ˜x[n2], , and ˜x[n m] is given If the statistical properties of theprocess do not change with time, the random process is called stationary This is always the case if forany choice of random variables ˜x[n1], ˜x[n2], , and ˜x[n m], their joint probability density function
is identical to the joint probability density function of the random variables ˜x[n1+ k], ˜x[n2+ k],
, and˜x[n m + k] for any k Then we call the random process strictly stationary For example, if the
samples of the random process are independent and identically distributed random variables, it isstraightforward to show that the process is strictly stationary Strict stationarity, however, is a verysevere requirement and is relaxed by introducing the concept of wide-sense stationarity A randomprocess is wide-sense stationary if the following two conditions are met:
and
r[n, n + k] = E ˜x∗[n] ˜x[n + k]
whereE(·) is the expectation operator, ˜x∗[n] is the complex conjugate of ˜x[n], and {r[k]} is the
autocorrelation function of the process Thus, if the process is wide-sense stationary, its mean value
µ is constant over time, and the autocorrelation function depends only on the lag k between the
random variables For example, if we consider the random process
Trang 4where the amplitudea and the frequency f0are constants, and the phase ˜θ is a random variable that
is uniformly distributed over the interval(−π, π), one can show that
Thus, Eq (14.3) represents a wide-sense stationary random process
14.2.2 Spectra of Deterministic Signals
Before we define the concept of spectrum of a random process, it will be useful to review the analogousconcept for deterministic signals, which are signals whose future values can be exactly determinedwithout any uncertainty Besides their description in the time domain, the deterministic signals have
a very useful representation in terms of superposition of sinusoids with various frequencies, which
is given by the discrete-time Fourier transform (DTFT) If the observed signal is{g[n]} and it is not
periodic, its DTFT is the complex valued functionG(f ) defined by
which means that the signal{g[n]} can be represented in terms of complex exponentials whose
frequencies span the continuous interval [0,1)
The complex functionG(f ) can be alternatively expressed as
where|G(f )| is called the amplitude spectrum of {g[n]}, and φ(f ) the phase spectrum of {g[n]}.
For example, if the signal{g[n]} is given by
Trang 5and the amplitude and phase spectra are
From Eq (14.15), we deduce that|G(f )|2df is the contribution to the total energy of the signal
from the frequency band(f, f + df ) Therefore, we say that |G(f )|2represents the energy densityspectrum of the signal{g[n]}.
When{g[n]} is periodic with period N, that is
for alln, and where N is the period of {g[n]}, we use the discrete Fourier transform (DFT) to express
{g[n]} in the frequency domain, that is,
Trang 6its PSDP (f k ) is
P (f k ) = N12, f k= N k , k ∈ {0, 1, · · · , N − 1} (14.23)Again, note that the PSD is defined for a discrete set of frequencies
In summary, the spectra of deterministic aperiodic signals are energy densities defined on thecontinuous set of frequenciesC f = [0, 1) On the other hand, the spectra of periodic signals are
power densities defined on the discrete set of frequenciesD f = {0, 1/N, 2/N, · · · , (N − 1)/N},
whereN is the period of the signal.
14.2.3 Spectra of Random Processes
Suppose that we observe one realization of the random process{ ˜x[n]}, or {x[n]} From the definition
of the DTFT and the assumption of wide-sense stationarity of{ ˜x[n]}, it is obvious that we cannot
use the DTFT to obtainX(f ) from {x[n]} because Eq (14.8) does not hold when we replaceg[n] by x[n] And indeed, if {x[n]} is a realization of a wide-sense stationary process, its energy is infinite.
Its power, however, is finite as was the case with the periodic signals So if we observe{x[n]} from
−N to N, {x[n]} N
−N, and assume that outside this interval the samplesx[n] are equal to zero, we can
find its DTFT,X N (f ) from
Then according to Eq (14.15),|X N (f )|2df represents the energy of the truncated realization that
is contributed by the components whose frequencies are betweenf and f + df The power due to
these components is given by
where ˜X N (f ) is the DTFT of { ˜x[n]} N −N Clearly,P (f )df is interpreted as the average contribution
to the total power from the components of{ ˜x[n]} whose frequencies are between f and f + df
There is a very important relationship between the PSD of a wide-sense stationary random processand its autocorrelation function By Wold’s theorem, which is the analogue of Wiener-Khintchinetheorem for continuous-time random processes, the PSD in Eq (14.27) is the DTFT of the autocor-relation function of the process [15], that is,
P (f ) = X∞
k=−∞
wherer[k] is defined by Eq (14.2)
For all practical purposes, there are three different types ofP (f ) [15] IfP (f ) is an absolutely
continuous function off , the random process has a purely continuous spectrum If P (f ) is
iden-tically equal to zero for allf except for frequencies f = f k,k = 1, 2, , where it is infinite, the
Trang 7random process has a line spectrum In this case, a useful representation of the spectrum is given bythe Diracδ-functions,
P (f ) =X
k
whereP kis the power associated with thek line component Finally, the spectrum of a random process
may be mixed if it is a combination of a continuous and line spectra ThenP (f ) is a superposition
of a continuous function off and δ-functions.
14.3 The Problem of Power Spectrum Estimation
The problem of power spectrum estimation can be stated as follows: Given a set ofN samples {x[0], x[1], , x[N −1]} of a realization of the random process { ˜x[n]}, denoted also by {x[n]} N−10 , estimatethe PSD of the random process,P (f ) Obviously this task amounts to estimation of a function and
is distinct from the typical problem in elementary statistics where the goal is to estimate a finite set
of parameters
Spectrum estimation methods can be classified into two categories: nonparametric and parametric.The nonparametric approaches do not assume any specific parametric model for the PSD They arebased solely on the estimate of the autocorrelation sequence of the random process from the observeddata For the parametric approaches on the other hand, we first postulate a model for the process
of interest, where the model is described by a small number of parameters Based on the model, thePSD of the process can be expressed in terms of the model parameters Then the PSD estimate isobtained by substituting the estimated parameters of the model in the expression for the PSD Forexample, if a random process{ ˜x[n]} can be modeled by
wherea is an unknown parameter and { ˜w[n]} is a zero mean wide-sense stationary random process
whose random variables are uncorrelated and with the same varianceσ2, it can be shown that thePSD of{ ˜x[n]} is
P (f ) = σ2
Thus, to findP (f ) it is sufficient to estimate a and σ2
The performance of a PSD estimator is evaluated by several measures of goodness One is the bias
of the estimator defined by
where ˆP (f ) and P (f ) are the estimated and true PSD, respectively If the bias b(f ) is identically
equal to zero for allf , the estimator is said to be unbiased, which means that on average it yields the
true PSD Among the unbiased estimators, we search for the one that has minimal variability Thevariability is measured by the variance of the estimator
Trang 8The variability of a PSD estimator is also measured by the normalized variance [8]
ψ(f ) = v(f )
Finally, another important metric for comparison is the resolution of the PSD estimators Itcorresponds to the ability of the estimator to provide the fine details of the PSD of the randomprocess For example if the PSD of the random process has two peaks at frequenciesf1andf2, thenthe resolution of the estimator would be measured by the minimum separation off1andf2forwhich the estimator still reproduces two peaks atf1andf2
14.4 Nonparametric Spectrum Estimation
When the method for PSD estimation is not based on any assumptions about the generation of theobserved samples other than wide-sense stationarity, then it is termed a nonparametric estimator.According to Eq (14.28),P (f ) can be obtained by first estimating the autocorrelation sequence from
the observed samplesx[0], x[1], · · ·, x[N − 1], and then applying the DTFT to these estimates One
estimator of the autocorrelation is given by
and those for|k| ≥ N are set equal to zero This estimator, although biased, has been preferred over
others An important reason for favoring it is that it always yields nonnegative estimates of the PSD,which is not the case with the unbiased estimator
Many nonparametric estimators rely on using Eq (14.36) and then transform the obtainedautocorrelation sequence to estimate the PSD Other nonparametric methods, however, operatedirectly on the observed data
14.4.1 Periodogram
The periodogram was introduced by Schuster in 1898 when he was searching for hidden periodicitieswhile studying sunspot data [19] To find the periodogram of the data{x[n]} N−10 , first we determinethe autocorrelation sequencer[k] for −(N − 1) ≤ k ≤ N − 1 and then take the DTFT, i.e.,
N−1X
n=0
x[n]e −j2πf n
2
Thus, the periodogram is proportional to the squared magnitude of the DTFT of the observed data
In practice, the periodogram is calculated by applying the FFT, which computes it at a discrete set of
Trang 9frequenciesD f = {f k : f k = k/N, k = 0, 1, 2, · · · , (N − 1)} The periodogram is then expressed
N−1X
n=0
x[n]e −j2πkn/N
N−1X
n=0
x[n]e −j2πkn/N0
2
, f k ∈ D0f (14.42)
A general property of good estimators is that they yield better estimates when the number ofobserved data samples increases Theoretically, if the number of data samples tends to infinity, theestimates should converge to the true values of the estimated parameters So, in the case of a PSDestimator, as we get more and more data samples, it is desirable that the estimated PSD tends to thetrue value of the PSD In other words, if for finite number of data samples the estimator is biased, thebias should tend to zero asN → ∞ as should the variance of the estimate If this is indeed the case, the
estimator is called consistent Although the periodogram is asymptotically unbiased, it can be shownthat it is not a consistent estimator For example, if{ ˜x[n]} is real zero-mean white Gaussian noise,
which is a process whose random variables are independent, Gaussian, and identically distributedwith varianceσ2, the variance of ˆPPER(f ) is equal to σ4regardless of the lengthN of the observed
data sequence [12] The performance of the periodogram does not improve asN gets larger because
asN increases, so does the number of parameters that are estimated, P (f0), P (f1), , P (f N−1 ) In
general, for the variance of the periodogram, we can write [12]
whereP (f ) is the true PSD.
Interesting insight can be gained if one writes the periodogram as follows
ˆ
PPER(f ) = 1
N
N−1X
n=0
x[n]e −j2πf n
2
= N1
EnPˆPER(f )o= 1
N
Z 1
Trang 10whereW R (f ) is the DTFT of the rectangular window Hence, the mean value of the periodogram
is a smeared version of the true PSD Since the implementation of the periodogram as defined in
Eq (14.44) implies the use of a rectangular window, a question arises as to whether we could use awindow of different shape to reduce the variance of the periodogram The answer is yes, and indeedmany windows have been proposed which weight the data samples in the middle of the observeddata more than those towards the ends of the observed data Some frequently used alternatives
to the rectangular window are the windows of Bartlett, Hanning, Hamming, and Blackman Themagnitude of the DTFT of a window provides two important characteristics about it One is thewidth of the window’s mainlobe and the other is the strength of its sidelobes A narrow mainlobeallows for a better resolution, and low sidelobes improve the smoothing of the estimated spectrum.Unfortunately, the narrower its mainlobe, the higher the sidelobes, which is a typical trade-off inspectrum estimation It turns out that the rectangular window allows for the best resolution but hasthe largest sidelobes
14.4.2 The Bartlett Method
One approach to reduce the variance of the periodogram is to subdivide the observed data record into
K nonoverlapping segments, find the periodogram of each segment, and finally evaluate the average
of the so-obtained periodograms This spectrum estimator, also known as the Bartlett’s estimator,has variance that is smaller than the variance of the periodogram
Suppose that the number of data samplesN is equal to KL, where K is the number of segments
andL is their length If the i-th segment is denoted by {x i [n]} L−10 ,i = 1, 2 · · · , K, where
x i [n] = x[n + (i − 1)L], n ∈ {0, 1, · · · , L − 1} (14.47)and its periodogram by
ˆ
PPER(i) (f ) = 1
L
L−1X
n=0
x i [n]e −j2πf n
2
(14.48)then the Bartlett spectrum estimator is
estimator has a resolutionK times less than that of the periodogram Thus, this estimator allows for
a straightforward trading of resolution for variance
14.4.3 The Welch Method
The Welch method is another estimator that exploits the periodogram It is based on the same idea asBartlett’s approach of splitting the data into segments and finding the average of their periodograms.The difference is that the segments are overlapped, where the overlaps are usually 50% or 75% large,and the data within a segment are windowed Let the length of the segments beL, the i-th segment
be denoted again by{x i [n]} L−10 , and the offset of successive sequences byD samples Then
whereN is the total number of observed samples and K the total number of sequences Note that if
there is no overlap,K = N/L, and if there is 50% overlap, K = 2N/L − 1 The i-th sequence is
Trang 11defined by
x i [n] = x[n + (i − 1)D], n ∈ {0, 1, · · · , L − 1} (14.51)wherei = 1, 2, · · · , K, and its periodogram by
ˆ
P M (f ) = (i) L1
L−1X
n=0
w[n]x i [n]e −j2πf n
2
Here ˆP M (f ) (i) is the modified periodogram of the data because the samplesx[n] are weighted by a
nonrectangular windoww[n] The Welch spectrum estimate is then given by
in resolution in many more ways than with the Bartlett method It can be shown that if the overlap
is 50%, the variance of the Welch estimator is approximately 9/16 of the variance of the Bartlettestimator [8]
example,ˆr[N −1] has only the term x∗[0]x[n−1] compared to the N terms used in the computation
ofˆr[0] Therefore, the large variance of the periodogram can be ascribed to the large weight given to
the poor autocorrelation estimates used in its evaluation
Blackman and Tukey proposed to weight the autocorrelation sequence so that the autocorrelationswith higher lags are weighted less [3] Their estimator is given by
... resolution, and low sidelobes improve the smoothing of the estimated spectrum. Unfortunately, the narrower its mainlobe, the higher the sidelobes, which is a typical trade-off inspectrum estimation. ..Eq (14. 44) implies the use of a rectangular window, a question arises as to whether we could use awindow of different shape to reduce the variance of the periodogram The answer is yes, and indeedmany... Bartlett, Hanning, Hamming, and Blackman Themagnitude of the DTFT of a window provides two important characteristics about it One is thewidth of the window’s mainlobe and the other is the strength