Tugnait Auburn University 16.1 Introduction 16.2 Gaussianity, Linearity, and Stationarity Tests Gaussianity Tests•Linearity Tests•Stationarity Tests 16.3 Order Selection, Model Validatio
Trang 1Tugnait, J.K “Validation, Testing, and Noise Modeling”
Digital Signal Processing Handbook
Ed Vijay K Madisetti and Douglas B Williams Boca Raton: CRC Press LLC, 1999
Trang 216 Validation, Testing, and Noise
Modeling
Jitendra K Tugnait
Auburn University
16.1 Introduction 16.2 Gaussianity, Linearity, and Stationarity Tests
Gaussianity Tests•Linearity Tests•Stationarity Tests
16.3 Order Selection, Model Validation, and Confidence Intervals
Order Selection •Model Validation•Confidence Intervals
16.4 Noise Modeling
Generalized Gaussian Noise•Middleton Class A Noise•Stable Noise Distribution
16.5 Concluding Remarks References
16.1 Introduction
Linear parametric models of stationary random processes, whether signal or noise, have been found
to be useful in a wide variety of signal processing tasks such as signal detection, estimation, filtering, and classification, and in a wide variety of applications such as digital communications, automatic control, radar and sonar, and other engineering disciplines and sciences A general representation of
a linear discrete-time stationary signalx(t) is given by
x(t) =X∞
i=0
where {(t)} is a zero-mean, i.i.d (independent and identically distributed) random sequence
with finite variance, and {h(i), i ≥ 0} is the impulse response of the linear system such that
P∞
i=−∞ h2(i) < ∞ Much effort has been expended on developing approaches to linear model
fitting given a single measurement record of the signal (or noisy signal) Parsimonious parametric models such as AR (autoregressive), MA (moving average), ARMA or state-space, as opposed to impulse response modeling, have been popular together with the assumption of Gaussianity of the data
Define
H (q) =X∞
i=0
whereq−1is the backward shift operator (i.e.,q−1x(t) = x(t − 1), etc.) If q is replaced with the
complex variablez, then H (z) is the Z-transform of {h(i)}, i.e., it is the system transfer function.
Trang 3Using (16.2), (16.1) may be rewritten as
Fitting linear models to the measurement record requires estimation ofH (q), or equivalently of {h(i)}
(without observing{(t)} ) Typically H (q) is parameterized by a finite number of parameters, say
by the parameter vectorθ (M)of dimensionM For instance, an AR model representation of order
M means that
H AR (q; θ (M) ) = 1
1+PM i=1 a i q −i , θ (M) = (a1, a2, · · · , a M ) T (16.4) This reduces the number of estimated parameters from a “large” number toM.
In this section several aspects of fitting models such as (16.1) to (16.3) to the given measurement record are considered These aspects are (see also Fig.16.1):
• Is the model of the type (16.1) appropriate to the given record? This requires testing for linearity and stationarity of the data
• Linear Gaussian models have long been dominant both for signals as well as for noise pro-cesses Assumption of Gaussianity allows implementation of statistically efficient param-eter estimators such as maximum likelihood estimators A Gaussian process is completely characterized by its second-order statistics (autocorrelation function or, equivalently, its power spectral density) Since the power spectrum of{x(t)} of (16.1) is given by
S xx (ω) = σ2
|H (e jω )|2, σ2
= E{2(t)}, (16.5) one cannot determine the phase of H (e jω ) independent of |H (e jω )| Determination
of the true phase characteristic is crucial in several applications such as blind equaliza-tion of digital communicaequaliza-tions channels Use of higher-order statistics allows one to uniquely identify nonminimum-phase parametric models Higher-order cumulants of Gaussian processes vanish, hence, if the data are stationary Gaussian, a minimum-phase (or maximum-phase) model is the “best” that one can estimate Therefore, another aspect considered in this section is testing for non-Gaussianity of the given record
• If the data are Gaussian, one may fit models based solely upon the second-order statistics
of the data — else use of higher-order statistics in addition to or in lieu of the second-order statistics is indicated, particularly if the phase of the linear system is crucial In either case, one typically fits a modelH(q; θ (M) ) by estimating the M unknown parameters through
optimization of some cost function In practice, (the model order)M is unknown and its
choice has a significant impact on the quality of the fitted model In this section another aspect of the model-fitting problem considered is that of order selection
• Having fitted a model H (q; θ (M) ), one would also like to know how good are the estimated
parameters? Typically this is expressed in terms of error bounds or confidence intervals
on the fitted parameters and on the corresponding model transfer function
• Having fitted a model, a final step is that of model falsification Is the fitted model an appropriate representation of the underlying system? This is referred to variously as model validation, model verification, or model diagnostics
• Finally, various models of univariate noise pdf (probability density function) are discussed
to complete the discussion of model fitting
Trang 4FIGURE 16.1: Section outline (SOS — second-order statistics; HOS — higher-order statistics).
16.2 Gaussianity, Linearity, and Stationarity Tests
Given a zero-mean, stationary random sequence{x(t)}, its third-order cumulant function C xxx (i, k)
is given by [12]
C xxx (i, k) := E{x(t + i)x(t + k)x(t)}. (16.6) Its bispectrumB xxx (ω1, ω2) is defined as [12]
B xxx (ω1, ω2) =
∞
X
i=−∞
∞
X
k=−∞
C xxx (i, k)e −j (ω1i+ω2k) (16.7) Similarly, its fourth-order cumulant functionC xxxx (i, k, l) is given by [12]
C xxxx (i, k, l) := E{x(t)x(t + i)x(t + k)x(t + l)}
− E{x(t)x(t + i)}E{x(t + k)x(t + l)}
− E{x(t)x(t + k)}E{x(t + l)x(t + i)}
− E{x(t)x(t + l)}E{x(t + k)x(t + i)}. (16.8) Its trispectrum is defined as [12]
T xxxx (ω1, ω2, ω3) :=
∞
X
i=−∞
∞
X
k=−∞
∞
X
l=−∞
C xxxx (i, k, l)e −j (ω1i+ω2k+ω3l) (16.9)
Trang 5If{x(t)} obeys (16.1), then [12]
B xxx (ω1, ω2) = γ3 H (e jω1)H(e jω2)H∗(e j (ω1+ω2) ) (16.10) and
T xxxx (ω1, ω2, ω3) = γ4 H (e jω1)H(e jω2)H(e jω3)H∗(e j (ω1+ω2+ω3) ) (16.11) where
γ3 = C (0, 0, 0) and γ4 = C (0, 0, 0, 0). (16.12) For Gaussian processes,B xxx (ω1, ω2) ≡ 0 and T xxxx (ω1, ω2, ω3) ≡ 0; equivalently, C xxx (i, k) ≡
0andC xxxx (i, k, l) ≡ 0 This forms a basis for testing Gaussianity of a given measurement record.
When{x(t)} is linear (i.e., it obeys (16.1)), then using (16.5) and (16.10),
|B xxx (ω1, ω2)|2
S xx (ω1)S xx (ω1)S xx (ω1+ ω2) =
γ3
σ6
= constant ∀ ω1, ω2, (16.13) and using (16.5) and (16.11),
|T xxxx (ω1, ω2, ω3)|2
S xx (ω1)S xx (ω1)S xx (ω3)S xx (ω1+ ω2+ ω3) =
γ4
σ8
= constant ∀ ω1, ω2, ω3. (16.14) The above two relations form a basis for testing linearity of a given measurement record How the tests are implemented depends upon the statistics of the estimators of the higher-order cumulant spectra as well as that of the power spectra of the given record
16.2.1 Gaussianity Tests
Suppose that the given zero-mean measurement record is of length N denoted by {x(t), t =
1, 2, · · · , N} Suppose that the given sample sequence of length N is divided into K
nonover-lapping segments each of sizeN Bsamples so thatN = KN B LetX (i) (ω) denote the discrete Fourier
transform (DFT) of the ith block{x(t + (i − 1)N B ), 1 ≤ t ≤ N B } (i = 1, 2, · · · , K) given by
X (i) (ω m ) =
NXB−1
l=0
x(l + 1 + (i − 1)N B )exp(−jω m l) (16.15)
where
ω m = N2π
B m, m = 0, 1, · · · , N B − 1. (16.16) Denote the estimate of the bispectrumB xxx (ω m , ω n ) at bifrequency (ω m = 2π
N B m, ω n = 2π
N B n) as
b
B xxx (m, n), given by averaging over K blocks
b
B xxx (m, n) = 1
K
K
X
i=1
1
N B X (i) (ω m )X (i) (ω n )hX (i) (ω m + ω n )i∗
, (16.17)
whereX∗denotes the complex conjugate ofX A principal domain of b B xxx (m, n) is the triangular
grid
D =
(m, n) | 0 ≤ m ≤ N B
2 , 0 ≤ n ≤ m, 2m + n ≤ N B
. (16.18) Values of bB xxx (m, n) outside D can be inferred from that in D.
Trang 6FIGURE 16.2: Coarse and fine grids in the principal domain.
Select a coarse frequency grid(m, n) in the principal domain D as follows Let d denote the
distance between two adjacent coarse frequency pairs such thatd = 2r + 1 with r a positive integer.
Setn0 = 2 + r and n = n0, n0+ d, · · · , n0+ (L n − 1)d where L n=bbNB3 c−1
d c For a given
n, set m0,n =bN B −n
2 c − r, m = m n =m0,n , m0,n − d, · · · , m0,n − (L m,n − 1)d where L m,n=
bm0,n −(n+r+1)
d c + 1 Let P denote the number of points on the coarse frequency grid as defined
above so thatP =PL n
n=1 L m,n Suppose that(m, n) is a coarse point, then select a fine grid (m, n nk )
and(m mi , n nk ) consisting of
m mi = m + i, |i| ≤ r, n nk = n + k, |k| ≤ r, (16.19) for some integerr > 0 such that (2r +1)2> P ; see also Fig.16.2 Order theL (= (2r +1)2) estimates b
B xxx (m mi , n nk ) on the fine grid around the bifrequency pair (m, n) into an L-vector, which after
relabeling, may be denoted asν ml , l = 1, 2, · · · , L, m = 1, 2, · · · , P, where m indexes the coarse
grid andl indexes the fine grid Define P -vectors
9 i = (ν1i , ν2i , · · · , ν P i ) T (i = 1, 2, · · · , L). (16.20) Consider the estimates
M = L1
L
X
i=1
9 i and 6 = L1
L
X
i=1
9 i − M 9 i − MH (16.21)
Define
F G = 2(L − P )
H 6−1M. (16.22)
If{x(t)} is Gaussian, then F Gis distributed as a centralF (Fisher) with (2P, 2(L − P )) degrees of
freedom A statistical test for testing Gaussianity of{x(t)}istodeclareittobeanon-Gaussiansequence
ifF G > T α whereT αis selected to achieve a fixed probability of false alarmα (= P r{F G > T α} withF Gdistributed as a centralF with (2P, 2(L − P )) degrees of freedom) If F G ≤ T α, then either{x(t)} is Gaussian or it has zero bispectrum.
The above test is patterned after [3] It treats the bispectral estimates on the “fine” bifrequency grid as a “data set” from a multivariable Gaussian distribution with unknown covariance matrix Hinich [4] has simplified the test of [3] by using the known asymptotic expression for the covariance matrix involved, and his test is based uponχ2 distributions Notice thatF G ≤ T α does not
Trang 7necessarily imply that{x(t)} is Gaussian; it may result from that fact that {x(t)} is non-Gaussian
with zero bispectrum Therefore, a next logical step would be to test for vanishing trispectrum of the record This has been done in [14] using the approach of [4]; extensions of [3] are too complicated Computationally simpler tests using “integrated polyspectrum” of the data have been proposed in [6] The integrated polyspectrum (bispectrum or trispectrum) is computed as cross-power spectrum and
it is zero for Gaussian processes Alternatively, one may test ifC xxx (i, k) ≡ 0 and C xxxx (i, k, l) ≡ 0.
This has been done in [8]
Other tests that do not rely on higher-order cumulant spectra of the record may be found in [13]
16.2.2 Linearity Tests
Denote the estimate of the power spectral densityS xx (ω m ) of {x(t)} at frequency ω m = 2π
N B m as
bS xx (m) given by
bS xx (m) = 1
K
K
X
i=1
1
N B X (i) (ω m )
h
X (i) (ω m )i∗
Consider
b
γ x (m, n) = b |bB xxx (m, n)|2
S xx (m)b S xx (n)b S xx (m + n) . (16.24)
It turns out thatbγ x (m, n) is a consistent estimator of the left side of (16.13), and it is asymptotically distributed as a Gaussian random variable, independent at distinct bifrequencies in the interior of
D These properties have been used by Subba Rao and Gabr [3] to design a test of linearity Construct a coarse grid and a fine grid of bifrequencies inD as before Order the L estimates
b
γ x (m mi , n nk ) on the fine grid around the bifrequency pair (m, n) into an L-vector, which after
relabeling, may be denoted asβ ml , l = 1, 2, · · · , L, m = 1, 2, · · · , P, where m indexes the coarse
grid andl indexes the fine grid Define P -vectors
9 i = (β1i , β2i , · · · , β P i ) T , (i = 1, 2, · · · , L). (16.25) Consider the estimates
M = 1 L
L
X
i=1
9 i and 6 = 1
L
L
X
i=1
(9 i − M)(9 i − M) T (16.26)
Define a(P −1)×P matrix B whose ijth element B ijis given byB ij= 1 ifi = j; = −1 if j = i +1;
= 0 otherwise Define
F L = L − P + 1 P − 1 (BM) T B6B T−1BM. (16.27)
If{x(t)} is linear, then F Lis distributed as a centralF with (P −1, L−P +1) degrees of freedom A
statistical test for testing linearity of{x(t)} is to declare it to be a nonlinear sequence if F L > T αwhere
T αis selected to achieve a fixed probability of false alarmα (= P r{F L > T α } with F Ldistributed as
a centralF with (P − 1, L − P + 1) degrees of freedom) If F L ≤ T α, then either{x(t)} is linear
or it has zero bispectrum
The above test is patterned after [3] Hinich [4] has “simplified” the test of [3] Notice that
F L ≤ T αdoes not necessarily imply that{x(t)} is nonlinear; it may result from that fact that {x(t)}
is non-Gaussian with zero bispectrum Therefore, a next logical step would be to test if (16.14) holds true This has been done in [14] using the approach of [4]; extensions of [3] are too complicated The approaches of [3] and [4] will fail if the data are noisy A modification to [3] is presented in [7] when additive Gaussian noise is present Finally, other tests that do not rely on higher-order cumulant spectra of the record may be found in [13]
Trang 816.2.3 Stationarity Tests
Various methods exist for testing whether a given measurement record may be regarded as a sample sequence of a stationary random sequence A crude yet effective way to test for stationarity is to divide the record into several (at least two) nonoverlapping segments and then test for equivalency (or compatibility) of certain statistical properties (mean, mean-square value, power spectrum, etc.)
computed from these segments More sophisticated tests that do not require a priori segmentation
of the record are also available
Consider a record of lengthN divided into two nonoverlapping segments each of length N/2 Let
KN B = N/2 and use the estimators such as (16.23) to obtain the estimator bS xx (l) (m) of the power
spectrumS xx (l) (ω m ) of the l−th segment (l = 1, 2), where ω mis given by (16.16) Consider the test statistic
N B− 2
r
K
2
NB
2 −1
X
m=1
h
ln b S (1)
xx (m) − ln b S (2)
xx (m)i. (16.28)
Then, asymptoticallyY is distributed as zero-mean, unit variance Gaussian if {x(t)} is stationary.
Therefore, if|Y | > T α, then{x(t)} is declared to be nonstationary where the threshold T αis chosen
to achieve a false-alarm probability ofα (= P r{|Y | > T α } with Y distributed as zero-mean, unit
variance Gaussian) If|Y | ≤ T α, then{x(t)} is declared to be stationary Notice that similar tests
based upon higher-order cumulant spectra can also be devised
The above test is patterned after [10] More sophisticated tests involving two model comparisons
as above but without prior segmentation of the record are available in [11] and references therein A test utilizing evolutionary power spectrum may be found in [9]
16.3 Order Selection, Model Validation, and Confidence
Intervals
As noted earlier, one typically fits a modelH (q; θ (M) ) to the given data by estimating the M unknown
parameters through optimization of some cost function A fundamental difficulty here is the choice
ofM There are two basic philosophical approaches to this problem: one consists of an iterative
process of model fitting and diagnostic checking (model validation), and the other utilizes a more
“objective” approach of optimizing a cost w.r.t.M (in addition to θ (M)).
16.3.1 Order Selection
Letf θ (M) (X) denote the probability density function of X = [x(1), x(2), · · · , x(N)] T parameterized
by the parameter vectorθ (M)of dimensionM A popular approach to model order selection in the
context of linear Gaussian models is to compute the Akaike information criterion (AIC)
AIC(M) = −2 ln fbθ (M) (X) + 2M (16.29) where bθ (M)maximizesf θ (M) (X) given the measurement record X Let M denote an upper bound
on the true model order Then the minimum AIC estimate (MAICE), the selected model order, is given by the minimizer ofAIC(M) over M = 1, 2, · · · , M Clearly one needs to solve the problem
of maximization ofln f θ (M) (X) w.r.t θ (M)for each value ofM = 1, 2, · · · , M The second term on
the right side of (16.29) penalizes overparametrization
Rissanen’s minimum description length (MDL) criterion is given by
MDL(M) = −2 ln fbθ (M) (X) + M ln N. (16.30)
Trang 9It is known that if{x(t)} is a Gaussian AR model, then AIC is an inconsistent estimator of the model
order whereas MDL is consistent, i.e., MDL picks the correct model order with probability one as the data length tends to infinity, whereas there is a nonzero probability that AIC will not Several other variations of these criteria exist [15]
Although the derivation of these order selection criteria is based upon Gaussian distribution, they have frequently been used for non-Gaussian processes with success provided attention is confined
to the use of second-order statistics of the data They may fail if one fits models using higher-order statistics
16.3.2 Model Validation
Model validation involves testing to see if the fitted model is an appropriate representation of the underlying (true) system It involves devising appropriate statistical tools to test the validity of the assumptions made in obtaining the fitted model It is also known as model falsification, model verification, or diagnostic checking It can also be used as a tool for model order selection It is an essential part of any model fitting methodology
Suppose that{x(t)} obeys (16.1) Suppose that the fitted model corresponding to the estimated parameter bθ (M)isH (q;b θ (M) ) Assuming that the true model H (q) is invertible, in the ideal case one
should get(t) = H−1(q)x(t) where {(t)} is zero-mean, i.i.d (or at least white when using
second-order statistics) Hence, if the fitted modelH (q;b θ (M) ) is a valid description of the underlying true
system, one expects0(t) = H−1(q;b θ (M) )x(t) to be zero-mean, i.i.d One of the diagnostic checks
then is to test for whiteness or independence of the inverse filtered data (or the residuals or linear innovations, in case second-order statistics are used) If the fitted model is unable to “adequately” capture the underlying true system, one expects{0(t)} to deviate from i.i.d distribution This is
one of the most widely used and useful diagnostic checks for model validation
A test for second-order whiteness of{0(t)} is as follows [15] Construct the estimates of the covariance function as
br (τ) = N−1N−τX
t=1
0(t + τ)0(t) (τ ≥ 0). (16.31) Consider the test statistic
R = br2N
(0)
m
X
i=1
br2
wherem is some a priori choice of the maximum lag for whiteness testing If {0(t)} is zero-mean
white, thenR is distributed as χ2(m) (χ2withm degrees of freedom) A statistical test for testing
whiteness of{0(t)} is to declare it to be a nonwhite sequence (hence invalidate the model) if R > T α
whereT αis selected to achieve a fixed probability of false alarmα (= P r{R > T α } with R distributed
asχ2(m)) If R ≤ T α, then{0(t)} is second-order white, hence the model is validated.
The above procedure only tests for second-order whiteness In order to test for higher-order whiteness, one needs to examine either the higher-order cumulant functions or the higher-order cumulant spectra (or the integrated polyspectra) of the inverse-filtered data A statistical test using bispectrum is available in [5] It is particularly useful if the model fitting is carried out using higher-order statistics If{0(t)} is third-order white, then its bispectrum is a constant for all bifrequencies.
Let bB 000(m, n) denote the estimate of the bispectrum B 000(ω m , ω n ) mimicking (16.17) Construct
a coarse grid and a fine grid of bifrequencies inD as before Order the L estimates b B 000(m mi , n nk )
on the fine grid around the bifrequency pair(m, n) into an L-vector, which after relabeling may be
denoted asµ ml , l = 1, 2, · · · , L, m = 1, 2, · · · , P, where m indexes the coarse grid and l indexes
Trang 10the fine grid DefineP -vectors
ei = (µ1i , µ2i , · · · , µ P i ) T , (i = 1, 2, · · · , L). (16.33) Consider the estimates
f
M = L1
L
X
i=1
ei and e6 = L1
L
X
i=1
ei− fM e9 i − fMH (16.34)
Define a(P −1)×P matrix B whose ijth element B ijis given byB ij= 1 ifi = j; = −1 if j = i +1;
= 0 otherwise Define
F W = 2(L − P + 1)
2P − 2 B f M
H
Be 6B T−1B f M. (16.35)
If{0(t)} is third-order white, then F W is distributed as a centralF with (2P − 2, 2(L − P + 1))
degrees of freedom A statistical test for testing third-order whiteness of{0(t)} is to declare it to be
a nonwhite sequence ifF W > T αwhereT α is selected to achieve a fixed probability of false alarm
α (= P r{F W > T α } with F W distributed as a centralF with (2P − 2, 2(L − P + 1)) degrees of
freedom) IfF W ≤ T α, then either{0(t)} is third-order white or it has zero bispectrum.
The above model validation test can be used for model order selection Fix an upper bound on the model orders For every admissible model order, fit a linear model and test its validity From among the validated models, select the “smallest” order as the correct order It is easy to see that this procedure will work only so long as the various candidate orders are nested Further details may be found in [5] and [15]
16.3.3 Confidence Intervals
Having settled upon a model order estimateM, let b θ N (M)be the parameter estimator obtained by minimizing a cost functionV N (θ (M) ), given a record of length N, such that V∞(θ) :=limN→∞ V N (θ)
exists For instance, using the notation of the section on order selection, one may takeV N (θ (M) ) =
−N−1ln f θ (M) (X) How reliable are these estimates? An assessment of this is provided by confidence
intervals
Under some general technical conditions, it usually follows that asymptotically (i.e., for largeN),
√
Nbθ N (M) − θ0
is distributed as a Gaussian random vector with zero-mean and covariance matrix
P where θ0denotes the true value ofθ (M) A general expression forP is given by [15]
P = V∞00(θ0)−1P∞V∞00(θ0)−1 (16.36) where
P∞ = limN→∞ EnNV N 0 T (θ0)V N0(θ0)o (16.37) andV0(a row vector) andV00(a square matrix) denote the gradient and the Hessian, respectively, of
V
The above result can be used to evaluate the reliability of the parameter estimator It follows from the above results that
η N = Nbθ N (M) − θ0
T
P−1
bθ (M)
N − θ0
(16.38)
is asymptoticallyχ2(M) Define χ2
α (M) via P r{y > χ2
α (M)} = α where y is distributed as χ2(M).
For instance,χ2
0.05 = 9.49 so that P r{η N > 9.49} = 0.05 The ellipsoid η N ≤ χ2
α (M) then defines