Wu CELWAVE 64.1 Introduction 64.2 Problem Formulation 64.3 The IV-SSF Approach 64.4 The Optimal IV-SSF Method 64.5 Algorithm Summary 64.6 Numerical Examples 64.7 Concluding Remarks Refer
Trang 1Stoica, P.; Viberg, M.; Wong, M & Wu, Q
Digital Signal Processing Handbook
Ed Vijay K Madisetti and Douglas B Williams
Boca Raton: CRC Press LLC, 1999
“A Unified Instrumental Variable Approach to Direction Finding in Colored Noise Fields”Ó
Trang 2A Unified Instrumental Variable Approach to Direction Finding in
P Stoica
Uppsala University
M Viberg
Chalmers University of Technology
M Wong
McMaster University
Q Wu
CELWAVE
64.1 Introduction 64.2 Problem Formulation 64.3 The IV-SSF Approach 64.4 The Optimal IV-SSF Method 64.5 Algorithm Summary 64.6 Numerical Examples 64.7 Concluding Remarks References
Appendix A: Introduction to IV Methods
The main goal herein is to describe and analyze, in a unifying manner, the spatial and
temporal IV-SSF approaches recently proposed for array signal processing in colored
noise fields (The acronym IV-SSF stands for “Instrumental Variable - Signal Subspace Fitting”) Despite the generality of the approach taken herein, our analysis technique is simpler than those used in previous more specialized publications We derive a general, optimally-weighted (optimal, for short), IV-SSF direction estimator and show that this estimator encompasses the UNCLE estimator of Wong and Wu, which is a spatial IV-SSF method, and the temporal IV-SSF estimator of Viberg, Stoica and Ottersten The latter two estimators have seemingly different forms (among others, the first of them makes use of four weights, whereas the second one uses three weights “only”), and hence their asymptotic equivalence shown in this paper comes as a surprising unifying result We hope that the present paper, along with the original works aforementioned, will stimulate the interest in the IV-SSF approach to array signal processing, which is sufficiently flexible
to handle colored noise fields, coherent signals and indeed also situations were only some
of the sensors in the array are calibrated
1 This work was supported in part by the Swedish Research Council for Engineering Sciences (TFR).
Trang 364.1 Introduction
Most parametric methods for Direction-Of-Arrival (DOA) estimation require knowledge of the spatial (sensor-to-sensor) color of the background noise If this information is unavailable, a serious degradation of the quality of the estimates can result, particularly at low Signal-to-Noise Ratio (SNR) [1,2,3] A number of methods have been proposed over the recent years to alleviate the sensitivity to the noise color If a parametric model of the covariance matrix of the noise is available, the parameters of the noise model can be estimated along with those of the interesting signals [4,5,6,7] Such an approach is expected to perform well in situations where the noise can be accurately modeled with relatively few parameters An alternative approach, which does not require a precise model of the noise, is based on the principle of Instrumental Variables (IV) See [8,9] for thorough treatments
of IV methods (IVM) in the context of identification of linear time-invariant dynamical systems A brief introduction is given in the appendix of this chapter Computationally simple IVMs for array signal processing appeared in [10,11] These methods perform poorly in difficult scenarios involving closely spaced DOAs and correlated signals
More recently, the combined Instrumental Variable Signal Subspace Fitting (IV-SSF) technique has been proposed as a promising alternative to array signal processing in spatially colored noise fields [12,13,14,15] The IV-SSF approach has a number of appealing advantages over other DOA estimation methods These advantages include:
• IV-SSF can handle noises with arbitrary spatial correlation, under minor restrictions on the signals or the array In addition, estimation of a noise model is avoided, which leads
to statistical robustness and computational simplicity
• The IV-SSF approach is applicable to both non-coherent and coherent signal scenarios
• The spatial IV-SSF technique can make use of the information contained in the output of
a completely uncalibrated subarray under certain weak conditions, which other methods cannot
Depending on the type of “instrumental variables” used, two classes of IV methods have appeared
in the literature:
1 Spatial IVM, for which the instrumental variables are derived from the output of a
(pos-sibly uncalibrated) subarray the noise of which is uncorrelated with the noise in the main calibrated subarray under consideration (see [12,13])
2 Temporal IVM, which obtains instrumental variables from the delayed versions of the
array output, under the assumption that the temporal-correlation length of the noise field is shorter than that of the signals (see [11,14])
The previous literature on IV-SSF has treated and analyzed the above two classes of spatial and temporal methods separately, ignoring their common basis In this contribution, we reveal the common roots of these two classes of DOA estimation methods and study them under the same umbrella Additionally, we establish the statistical properties of a general (either spatial or temporal) weighted IV-SSF method and present the optimal weights that minimize the variance of the DOA estimation errors In particular, we point out that the optimal four-weight spatial IV-SSF of [12,13] (called UNCLE there, and arrived at by using canonical correlation decomposition ideas) and the optimal three-weight temporal IV-SSF of [14] are asymptotically equivalent when used under the same conditions This asymptotic equivalence property, which is a main result of the present section,
is believed to be important as it shows the close ties that exist between two seemingly different DOA estimators
This section is organized as follows In Section64.2the data model and technical assumptions are introduced Next, in Section64.3the IV-SSF method is presented in a fairly general setting In
Trang 4Section64.4, the statistical performance of the method is presented along with the optimal choices of certain user-specified quantities The data requirements and the optimal IV-SSF (UNCLE) algorithm are summarized in Section64.5 The anxious reader may wish to jump directly to this point to investigate the usefulness of the algorithm in a specific application In Section64.6, some numerical examples and computer simulations are presented to illustrate the performance The conclusions are given in Section64.7 In the appendix we give a brief introduction to IV methods The reader who is not familiar with IV might be helped by reading the appendix before the rest of the paper Background material on the subspace-based approach to DOA estimation can be found in Chapter 62
of this Handbook
64.2 Problem Formulation
Consider a scenario in whichn narrowband plane waves, generated by point sources, impinge on an
array comprisingm calibrated sensors Assume, for simplicity, that the n sources and the array are
situated in the same plane Leta(θ) denote the complex array response to a unit-amplitude signal
with DOA parameter equal toθ Under these assumptions, the output of the array, y(t) ∈ C m×1,
can be described by the following well-known equation [16,17]:
wherex(t) ∈ C n×1denotes the signal vector,e(t) ∈ C m×1is a noise term, and
Hereafter,θ kdenotes thekth DOA parameter.
The following assumptions on the quantities in the array equation, (64.1), are considered to hold throughout this section:
A1 The signal vector x(t) is a normally distributed random variable with zero mean and a possibly
singular covariance The signals may be temporally correlated; in fact the temporal IV-SSF approach relies on the assumption that the signals exhibit some form of temporal correlation (see below for details)
A2 The noise e(t) is a random vector that is temporally white, uncorrelated with the signals and
circularly symmetric normally distributed with zero mean and unknown covariance matrix2Q > O,
E[e(t)e∗(s)] = Q δ t,s ; E [e(t)e T (s)] = O (64.3)
A3 The manifold vectors {a(θ)}, corresponding to any set of m different values of θ, are linearly
independent
Note that assumption A1 above allows for coherent signals, and that in A2 the noise field is allowed
to be arbitrarily spatially correlated with an unknown covariance matrix Assumption A3 is a
well-known condition that, under a weak restriction onm, guarantees DOA parameter identifiability in
the caseQ is known (to within a multiplicative constant) [18] WhenQ is completely unknown,
DOA identifiability can only be achieved if further assumptions are made on the scenario under consideration The following assumption is typical of the IV-SSF approach:
2 Henceforth, the superscript “ ∗” denotes the conjugate transpose; whereas the transpose is designated by a superscript
“T ” The notation A ≥ B, for two Hermitian matrices A and B, is used to mean that (A − B) is a nonnegative definite
matrix Also,O denotes a zero matrix of suitable dimension.
Trang 5A4 There exists a vector z(t) ∈ C ¯m×1, which is normally distributed and satisfies
E[z(t)e∗(s)] = O for t ≤ s (64.4)
E[z(t)e T (s)] = O for all t, s (64.5) Furthermore, denote
¯n = rank (0) ≤ ¯m (64.7)
It is assumed that no row of0 is identically zero and that the inequality
¯n > 2n − m (64.8) holds (note that a rank-one0 matrix can satisfy the condition (64.8) ifm is large enough, and hence
the condition in question is rather weak) Owing to its (partial) uncorrelatedness with{e(t)}, the
vector{z(t)} can be used to eliminate the noise from the array output equation (64.1), and for this reason{z(t)} is called an IV vector Below, we briefly describe three possible ways to derive an IV
vector from the available data measured with an array of sensors (for more details on this aspect, the reader should consult [12,13,14])
EXAMPLE 64.1: Spatial IV
Assume that then signals, which impinge on the main (sub)array under consideration, are also
received by another (sub)array that is sufficiently distanced from the main one so that the noise vectors
in the two subarrays are uncorrelated with one another Thenz(t) can be made from the outputs of
the sensors in the second subarray (note that those sensors need not be calibrated) [12,13,15]
EXAMPLE 64.2: Temporal IV
When a second subarray, as described above, is not available but the signals are temporally corre-lated, one can obtain an IV vector by delaying the output vector:z(t) = [y T (t −1) y T (t −2) · · · ] T.
Clearly, such a vectorz(t) satisfies (64.4) and (64.5), and it also satisfies (64.8) under weak conditions
on the signal temporal correlation This construction of an IV vector can be readily extended to cases wheree(t) is temporally correlated, provided that the signal temporal correlation length is longer
than that corresponding to the noise [11,14]
In a sense, the above examples are both special cases of the following more general situation:
EXAMPLE 64.3: Reference Signal
In many systems a reference or pilot signal [19,20]z(t) (scalar or vector) is available If the
reference signal is sufficiently correlated with all signals of interest (in the sense of (64.8)) and uncorrelated with the noise, it can be used as an IV Note that all signals that are not correlated with the reference will be treated as noise Reference signals are commonly available in communication applications, for example a PN-code in spread spectrum communication [20] or a training signal used for synchronization and/or equalizer training [21] A closely related possibility is utilization of cyclo-stationarity (or self-coherence), a property that is exhibited by many man-made signals The reference signal(s) can then consist, for example, of sinusoids of different frequencies [22,23] In these techniques, the data is usually pre-processed by computing the auto-covariance function (or a higher-order statistic) before correlating with the reference signal
Trang 6The problem considered in this section concerns the estimation of the DOA vector
givenN snapshots of the array output and of the IV vector, {y(t), z(t)} N
t=1 The number of signals,
n, and the rank of the covariance matrix 0, ¯n, are assumed to be given (for the estimation of these
integer-valued parameters by means of IV/SSF-based methods, we refer to [24,25])
64.3 The IV-SSF Approach
Let
ˆR = ˆW L
"
1
N
N
X
t=1
z(t)y∗(t)
# ˆ
W R ( ¯m × m) (64.10)
where ˆW L and ˆW R are two nonsingular Hermitian weighting matrices which are possibly data-dependent (as indicated by the fact that they are roofed) Under the assumptions made, asN → ∞,
ˆR converges to the matrix:
whereW LandW Rare the limiting weighting matrices (assumed to be bounded and nonsingular)
Owing to assumptions A2 and A3,
rank(R) = ¯n (64.12) Hence, the Singular Value Decomposition (SVD) [26] ofR can be written as
R = [U ?]
3 O
O O
S∗
?
= U3S∗ (64.13)
whereU∗U = S∗S = I, 3 ∈ R ¯nׯnis diagonal and nonsingular, and where the question marks
stand for blocks that are of no importance for the present discussion
The following key equality is obtained by comparing the two expressions forR in Eqs (64.11) and (64.13) above:
whereC = 04 ∗W L U3−1∈ C nׯnhas full column rank For a givenS, the true DOA vector can be
obtained as the unique solution to Eq (64.14) under the parameter identifiability condition (64.8) (see, e.g., [18]) In the more realistic case whenS is unknown, one can make use of Eq (64.14) to estimate the DOA vector in the following steps
The IV step — Compute the pre- and post-weighted sample covariance matrix ˆR in
Eq (64.10), along with its SVD:
ˆR = ˆU ? ˆ3 O
O ?
ˆS∗
?
(64.15)
where ˆ3 contains the ¯n largest singular values Note that ˆU, ˆ3, and ˆS are consistent estimates of
U, 3, and S in the SVD of R.
Trang 7The SSF step — Compute the DOA estimate as the minimizing argument of the following signal subspace fitting criterion:
min
θ {minC [vec ( ˆS − ˆ W R AC)]
∗ˆV [vec (ˆS − ˆW R AC)]} (64.16)
where ˆV is a positive definite weighting matrix, and “vec” is the vectorization operator3 Alternatively, one can estimate the DOA instead by minimizing the following criterion:
min
θ {[vec (B
∗Wˆ−1R ˆS)]∗W[vec (Bˆ ∗Wˆ−1R ˆS)]} (64.17)
where ˆW is a positive definite weight, and B ∈ C m×(m−n)is a matrix whose columns form a basis
of the null-space ofA∗(hence,B∗A = 0 and rank (B) = m − n) The alternative fitting criterion
above is obtained from the simple observation that Eq (64.14) along with the definition ofB imply
that
It can be shown [27] that the classes of DOA estimates derived from Eqs ( 64.16 ) and ( 64.17 ), respectively, are asymptotically equivalent More exactly, for any ˆ V in Eq (64.16) one can choose ˆ
W in Eq (64.17) so that the DOA estimates obtained by minimizing Eq (64.16) and, respectively,
Eq (64.17) have the same asymptotic distribution and vice-versa
In view of the previous result, in an asymptotical analysis it suffices to consider only one of the two criteria above In the following, we focus on Eq (64.17) Compared with Eq (64.16), the criterion (64.17) has the advantage that it depends on the DOA only On the other hand, for a general array there is no known closed-form parameterization ofB in terms of θ However, as shown
in the following, this is no drawback because the optimally weighted criterion (which is the one to
be used in applications) is an explicit function ofθ.
64.4 The Optimal IV-SSF Method
In what follows, we deal with the essential problem of choosing the weights ˆW, ˆ W R, and ˆW Lin the IV-SSF criterion (64.17) so as to maximize the DOA estimation accuracy First, we optimize the accuracy with respect to ˆW, and then with respect to ˆ W Rand ˆW L
Optimal Selection of ˆ W
Define
and observe that the criterion function in Eq (64.17) can be written as,
g∗(θ) ˆ Wg(θ) (64.20)
In [27] it is shown thatg(θ) (evaluated at the true DOA vector) has, asymptotically in N, a circularly
symmetric normal distribution with zero mean and the following covariance:
G(θ) = 1
N [(W L U3−1)∗R z (W L U3−1)] T ⊗ [B∗R y B] (64.21)
3 Ifxkis thekth column of a matrix X, then vec (X) = [x T1 x T2 · · · ]T
Trang 8where⊗ denotes the Kronecker matrix product [28]; and where, for a stationary signals(t), we use
the notation
R s = E [s(t)s∗(t)] (64.22) Then, it follows from the ABC (Asymptotically Best Consistent) theory of parameter estimation4that the minimum variance estimate, in the class of estimates under discussion, is given by the minimizing argument of the criterion in Eq (64.20) with ˆW = ˆG−1(θ), that is
where
ˆG(θ) = N1[( ˆ W L ˆU ˆ3−1)∗ˆR z ( ˆ W L ˆU ˆ3−1)] T ⊗ [B∗ˆR y B] (64.24) and where ˆR zand ˆR yare the usual sample estimates ofR zandR y Furthermore, it is easily shown that the minimum variance estimate, obtained by minimizing Eq (64.23), is asymptotically normally distributed with mean equal to the true parameter vector and the following covariance matrix:
2{Re [J∗G−1(θ)J ]}−1 (64.25) where
J = lim
N→∞
∂g(θ)
The following more explicit formula forH is derived in [27]:
Re
D∗R −1/2 y 5⊥R −1/2
y A R −1/2 y D
T
−1
(64.27) where denotes the Hadamard-Schur matrix product (elementwise multiplication) and
Furthermore, the notationY −1/2is used for a Hermitian (for notational convenience) square root of
the inverse of a positive definite matrixY , the matrix D is made from the direction vector derivatives,
D = [d1 · · · d n ]; d k =∂a(θ k )
∂θ k
and, for a full column-rank matrixX, 5⊥Xdefines the orthogonal projection onto the nullspace of
X∗as
5⊥X = I − 5 X ; 5 X = X(X∗X)−1X∗. (64.29)
To summarize, for fixed ˆW R and ˆW L, the statistically optimal selection of ˆW leads to DOA
estimates with an asymptotic normal distribution with mean equal to the true DOA vector and covariance matrix given by Eq (64.27)
4 For details on the ABC theory, which is an extension of the classical BLUE (Best Linear Unbiased Estimation) / Markov theory of linear regression to a class of nonlinear regressions with asymptotically vanishing residuals, the reader is referred
to [ 9 , 29 ].
Trang 9Optimal Selection of ˆ W R and ˆ W L
The optimal weights ˆW Rand ˆW Lare, by definition, those that minimize the limiting covariance matrixH of the DOA estimation errors In the expression (64.27) ofH, only depends on W Rand
W L(the dependence onW Ris implicit, viaU) Since the matrix 0 has rank ¯n, it can be factorized
as follows:
0 = 010∗
where both01 ∈ C ¯mׯnand02 ∈ C nׯn have full column rank Insertion of Eq (64.30) into the
equalityW L 0A∗W R = U3S∗yields the following equation, after a simple manipulation,
whereT = 0∗
2A∗W R S3−1∈ C ¯nׯnis a nonsingular transformation matrix By using Eq (64.31)
in Eq (64.28), we obtain:
= 02 (0∗1W 2
L 01 )(0∗1W 2
L R z W2
L 01 )−1(0∗1W 2
L 01 )0∗2 (64.32) Observe that does not actually depend on W R Hence, WˆR can be arbitrarily selected, as any
nonsingular Hermitian matrix, without affecting the asymptotics of the DOA parameter estimates!
Concerning the choice of ˆW L, it is easily verified that
≤ |W L= R −1/2
z = 02(0∗1R−1z 01 )0∗2= 0∗R−1z 0 (64.33) Indeed,
0∗R−1
z 0 − = 02 [0∗
1R−1
z 01 − (0∗
1W2
L 01 )(0∗
1W2
L R z W2
L 01 )−1×
×(0∗1W 2
L 01 )]0∗2= 0∗R −1/2 z 5⊥R1/2
z W2
L 01R −1/2 z 0 (64.34) which is obviously a nonnegative definite matrix Hence,W L = R −1/2 z maximizes Then, it
follows from the expression of the matrixH and the properties of the Hadamard-Schur product that
this same choice ofW LminimizesH The conclusion is that the optimal weight ˆ W L , which yields the best limiting accuracy, is
ˆ
W L = ˆR −1/2 z (64.35) The (minimum) covariance matrixH , corresponding to the above choice, is given by
2N {Re [(D∗R −1/2 y 5⊥R y −1/2 A R −1/2 y D) (0∗R−1z 0) T]}−1 (64.36)
Remark It is worth noting thatH o monotonically decreases as ¯m (the dimension of z(t)) increases The
proof of this claim is similar to the proof of the corresponding result in [9], Complement C8.5 Hence,
as could be intuitively expected, one should use all available instruments (spatial and/or temporal)
to obtain maximal theoretical accuracy However, practice has shown that too large a dimension
of the IV vector may in fact decrease the empirically observed accuracy This phenomenon can be explained by the fact that increasing ¯m means that a longer data set is necessary for the asymptotic
results to be valid
Optimal IV-SSF Criteria
Fortunately, the criterion, (64.23) and (64.24) can be expressed in a functional form that depends on the indeterminateθ in an explicit way (recall that, for most cases, the dependence of B in Eq (64.23)
onθ is not available in explicit form) By using the following readily verified equality [28],
Trang 10which holds for any conformable matricesA, X, B, and Y , one can write Eq (64.23) as:5
f (θ) = tr {[( ˆ W L ˆU ˆ3−1)∗ˆR z ( ˆ W L ˆU ˆ3−1)]−1ˆS∗Wˆ−1R B(B∗ˆR y B)−1B∗Wˆ−1R ˆS} (64.38) However, observe that
B(B∗ˆR y B)−1B∗= ˆR −1/2 y 5 ˆR1/2
y B ˆR −1/2 y = ˆR −1/2 y 5⊥
ˆR −1/2
y A ˆR −1/2 y (64.39) Inserting Eq (64.39) into Eq (64.38) yields:
f (θ) = tr [ ˆ3( ˆU∗WˆL ˆR z WˆL ˆU)−1ˆ3ˆS∗Wˆ−1R ˆR −1/2 y 5⊥ˆR −1/2
y A ˆR −1/2 y Wˆ−1R ˆS] (64.40) which is an explicit function ofθ Insertion of the optimal choice of W Linto Eq (64.40) leads to a further simplification of the criterion as seen below
Owing to the arbitrariness in the choice of ˆW R, there exists an infinite class of optimal IV-SSF criteria In what follows, we consider two members of this class
Let
ˆ
W R = ˆR −1/2 y (64.41) Insertion of Eq (64.41), along with Eq (64.35), into Eq (64.40) yields the following criterion function:
f WW (θ) = tr
5⊥ˆR −1/2
y A ˜S ˜32
˜S∗ (64.42) where ˜S and ˜3 are made from the principal singular right vectors and singular values of the matrix
˜R = ˆR −1/2 z ˆR zy ˆR −1/2 y (64.43) (with ˆR zydefined in an obvious way) The function (64.42) is the UNCLE (spatial IV-SSF) criterion
of Wong and Wu [12,13]
Next, choose ˆW Ras
ˆ
The corresponding criterion function is
f V SO (θ) = tr
5⊥ˆR −1/2
y A ˆR −1/2 y ¯S ¯32¯S∗ˆR −1/2 y (64.45) where ¯S and ¯3 are made from the principal singular pairs of
¯R = ˆR −1/2 z ˆR zy (64.46) The function (64.45) above is recognized as the optimal (temporal) IV-SSF criterion of Viberg et
al [14]
An important consequence of the previous discussion is that the DOA estimation methods of [12,
13] and [14], respectively, which were derived in seemingly unrelated contexts and by means of somewhat different approaches, are in fact asymptotically equivalent when used under the same conditions These two methods have very similar computational burdens, which can be seen by comparing Eqs (64.42) and (64.43) with Eqs (64.45) and (64.46) Also, their finite-sample properties appear to be rather similar, as demonstrated in the simulation examples Numerical algorithms for the minimization of the type of criterion function associated with the optimal IV-SSF methods are discussed in [17] Some suggestions are also given in the summary below
5 To within a multiplicative constant.
...as could be intuitively expected, one should use all available instruments (spatial and/or temporal)
to obtain maximal theoretical accuracy However, practice has shown that too large a. ..
Trang 10which holds for any conformable matricesA, X, B, and Y , one can write Eq (64. 23) as:5
f... IV vector may in fact decrease the empirically observed accuracy This phenomenon can be explained by the fact that increasing ¯m means that a longer data set is necessary for the asymptotic