Báo cáo hóa học: " Research Article Application of the Evidence Procedure to the Estimation of Wireless Channels" pdf

RVMs allow to estimate channel parameters as well as to assess the number of multipath components constituting the channel within the Bayesian framework by locally maximizingthe evidence

Trang 1

Volume 2007, Article ID 79821, 23 pages

doi:10.1155/2007/79821

Research Article

Application of the Evidence Procedure to

the Estimation of Wireless Channels

Dmitriy Shutin, 1 Gernot Kubin, 1 and Bernard H Fleury 2, 3

1 Signal Processing and Speech Communication Laboratory, Graz University of Technology, 8010 Graz, Austria

2 Institute of Electronic Systems, Aalborg University, Fredrik Bajers Vej 7A, 9220 Aalborg, Denmark

3 Forschungszentrum Telekommunikation Wien (ftw.), Donau City Strasse 1, 1220 Wien, Austria

Received 5 November 2006; Accepted 8 March 2007

Recommended by Sven Nordholm

We address the application of the Bayesian evidence procedure to the estimation of wireless channels The proposed scheme isbased on relevance vector machines (RVM) originally proposed by M Tipping RVMs allow to estimate channel parameters as well

as to assess the number of multipath components constituting the channel within the Bayesian framework by locally maximizingthe evidence integral We show that, in the case of channel sounding using pulse-compression techniques, it is possible to cast thechannel model as a general linear model, thus allowing RVM methods to be applied We extend the original RVM algorithm to themultiple-observation/multiple-sensor scenario by proposing a new graphical model to represent multipath components Throughthe analysis of the evidence procedure we develop a thresholding algorithm that is used in estimating the number of components

We also discuss the relationship of the evidence procedure to the standard minimum description length (MDL) criterion We showthat the maximum of the evidence corresponds to the minimum of the MDL criterion The applicability of the proposed scheme isdemonstrated with synthetic as well as real-world channel measurements, and a performance increase over the conventional MDLcriterion applied to maximum-likelihood estimates of the channel parameters is observed

Copyright © 2007 Dmitriy Shutin et al This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

1 INTRODUCTION

Deep understanding of wireless channels is an essential

pre-requisite to satisfy the ever-growing demand for fast

infor-mation access over wireless systems A wireless channel

con-tains explicitly or implicitly all the information about the

propagation environment To ensure reliable

communica-tion, the transceiver should be constantly aware of the

nel state In order to make this task feasible, accurate

nel models, which reproduce in a realistic manner the

chan-nel behavior, are required However, eﬃcient joint

estima-tion of the channel parameters, for example, number of the

multipath components (model order), their relative delays,

Doppler frequencies, directions of the impinging wavefronts,

and polarizations, is a particularly diﬃcult task It often

leads to analytically intractable and computationally very

ex-pensive optimization procedures The problem is often

re-laxed by assuming that the number of multipath

compo-nents is fixed, which simplifies optimization in many cases

[1,2] However, both underspecifying and overspecifying the

model order leads to significant performance degradation:

residual intersymbol interference impairs the performance

of the decoder in the former case, while additive noise isinjected in the channel equalizer in the latter: the excessivecomponents amount only to the random fluctuations of thebackground noise To amend this situation, empirical meth-ods like cross-validation can be employed (see, e.g., [3]).Cross-validation selects the optimal model by measuring itsperformance over a validation data set and selecting the onethat performs the best In case of practical multipath chan-nels, such data sets are often unavailable due to the time-variability of the channel impulse responses Alternatively,one can employ model selection schemes in the spirit of Ock-ham’s razor principle: simple models (in terms of the num-ber of parameters involved) are preferred over more complexones Examples are the Akaike information criterion (AIC)and minimum description length (MDL) [4,5] In this pa-per, we show how the Ockham principle can be eﬀectivelyused to perform estimation of the channel parameters cou-pled with estimating the model order, that is, the number ofwavefronts

Consider a certain class of parametric models ses)Hidefined as the collection of prior distributionsp(w i |

(hypothe-Hi) for the model parameters wi Given the measurement

Trang 2

dataZ and a family of conditional distributions p(Z | wi,

Hi), our goal is to infer the hypothesisH and the corre-

sponding parametersw that maximize the posterior

The key to solving (1) lies in inferring the

correspond-ing parameters wi andHi from the dataZ, which is

of-ten a nontrivial task As far as the Bayesian

methodol-ogy is concerned, there are two ways this inference

prob-lem can be solved [6, Section 5] In the joint estimation

to the quantities of interest wi andHi This often leads to

computationally-intractable optimization algorithms

Alter-natively, one can rewrite the posteriorp(w i,Hi |Z) as

and maximize each term on the right-hand side sequentially

from right to left This approach is known as the marginal

es-timation method Marginal eses-timation methods (MEM) are

well exemplified by expectation-maximization (EM)

algo-rithms and used in many diﬀerent signal processing

appli-cations (see [2,3,7]) MEMs are usually easier to compute,

however they are prone to land in a local rather than global

optimum We recognize the first factor on the right-hand

side of (2) as a parameter posterior, while the other one is a

posterior for diﬀerent model hypotheses It is the

maximiza-tion ofp(H i |Z) that guides our model selection decision

Then, the data analysis consists of two steps [8, Chapter 28],

≡Evidence×Hypothesis Prior. (4)

In the second stage,p(H i) measures our subjective prior over

diﬀerent hypotheses before the data is observed In many

cases it is reasonable to assign equal probabilities to di

ﬀer-ent hypotheses, thus reducing the hypothesis selection to

se-lecting the model with the highest evidencep(Z |Hi).1The

evidence can be expressed as the following integral:

develop-or evidence procedure (EP) [13,14]

Equations (3), (4), and (5) form the theoretical work for our joint model and parameter estimation The esti-mation algorithm is based on relevance vector machines Rel-evance vector machines (RVM), originally proposed by Tip-ping [15], are an example of the marginal estimation methodthat, for a set of hypothesesHi, iteratively approximates (1)

frame-by alternating between the model selection, that is, mizing (5) with respect toHi, and inferring the correspond-ing model parameters from maximization of (3) RVMs havebeen initially proposed to find sparse solutions to general lin-ear problems However, they can be quite eﬀectively adapted

maxi-to the estimation of the impulse response of wireless nels, thus resulting in an eﬀective channel parameter estima-tion and model selection scheme within the Bayesian frame-work

chan-The material presented in the paper is organized as lows: Section 2 introduces the signal model of the wire-less channel and the used notation; Section 3 explains theframework of the EP in the context of wireless channels InSection 4 we explain how model selection is implementedwithin the presented framework and discuss the relation-ship between the EP and the MDL criterion for model se-lection Finally,Section 5presents some application resultsillustrating the performance of the RVM-based estimator insynthetic as well as in actual wireless environments

fol-2 CHANNEL ESTIMATION USING PULSE-COMPRESSION TECHNIQUE

Channel estimation usually consists of two steps: (1) ing a specific sounding sequence s(t) through the channel

send-and observing the responsey(t) at the other end, and (2)

es-timating the channel parameters from the matched-filteredreceived signal z(t) (Figure 1) It is common to representthe multipath channel response as the sum of delayed andweighted Dirac impulses, with each impulse representing oneindividual multipath component (see, e.g., [16, Section 5]).Such special structure of the channel impulse response im-plies that the filtered signalz(t) should have a sparse struc-

ture Unfortunately, this sparse structure is often obscured

by additive noise and temporal dispersion due to the finitebandwidth of the transmitter and receiver hardwares This

Trang 3

Figure 2: Sounding sequences(t).

motivates the application of algorithms capable of

recover-ing this sparse structure from the measurement data

Let us consider an equivalent baseband channel

sound-ing scheme shown in Figure 1 The sounding signal s(t)

(Figure 2) consists of periodically repeated burst waveforms

T u ≤ T f and is formed asu(t) = M −1

m =0b m p(t − mT p) Thesequenceb0· · · b M −1is the known sounding sequence con-

sisting ofM chips, and p(t) is the shaping pulse of duration

T p,MT p = T u Furthermore, we assume that the receiver

(Rx) is equipped with a planar antenna array consisting ofP

sensors located at positions s1, , s P ∈ R2with respect to an

arbitrary reference point Let us now assume that the

maxi-mum absolute Doppler frequency of the impinging waves is

much smaller than the inverse of a single burst duration 1/T u

This low Doppler frequency assumption is equivalent to

as-suming that, within a single observation window equivalent

to the period of the sounding sequence, we can safely neglect

the influence of the Doppler shifts

The received signal vector y(t) ∈ C P ×1for a single burst

Here, a l, τ l, and ν l are respectively the complex gain,

the delay, and the Doppler shift of the lth multipath

component The P-dimensional complex vector c(φ l) =

ar-ray Provided the coupling between the elements can

be neglected, its components are given as c p(φ l) =

f p(φ l) exp(j2πλ −1e(φ l), sp ) with λ, e(φ l) and f p(φ l)

denot-ing the wavelength, the unit vector in R2 pointing in the

direction of the incoming wavefront determined by the

az-imuthφ l, and the complex electric field pattern of the pth

sensor, respectively The additive term η(t) ∈ C P ×1 is a

vector-valued complex white Gaussian noise process, that is,

the components ofη(t) are independent complex Gaussian

processes with double-sided spectral densityN0

The receiver front-end consists of a matched filter (MF)

matched to the transmitted sequence u(t) Under the low

Doppler frequency assumption the term ej2πν l t stays

time-invariant within a single burst duration, that is, equal to a

complex constant that can be incorporated in the complex

gaina l The signal z(t) at the output of the MF is then given

as

z(t) = L

t)dt is a spatially whiteP-dimensional vector with each

ele-ment being a zero-mean wide-sense stationary (WSS) sian noise with autocorrelation function

and delayed kernel functionsR uu(t − τ l), weighted across

sen-sors as given by the components of c(φ l) and observed in thepresence of the colored noiseξ(t).

In practice, however, the output of the MF is sampledwith the sampling periodT s ≤ T p, resulting inPN-tuples of

the MF output, whereN is the number of MF output

sam-ples By collecting the output of each sensor into a vector, wecan rewrite (7) in a vector form:

zp =Kwp+ξ p, p =1· · · P, (9)where we have defined

The additive noise vectorsξ p, p = 1· · · P, possess the

following properties that will be exploited later:

ξ p=0, E

ξ m ξ H k

=0, form / = k, (11)

ξ p ξ H p

also called the design matrix, accumulates the shifted andsampled versions of the kernel function R uu(t) It is con-

structed as K = [r1, , r L], with rl = [R uu(−τ l),R uu(T s −

In general, the channel estimation problem is posed as

follows: given the measured sampled signals zp,p =1· · · P,

determine the orderL of the model and estimate optimally

(with respect to some quality criterion) all multipath etersa l,τ l, andφ l, forl =1· · · L In this contribution, we re-

param-strict ourselves to the estimation of the model orderL along

with the vector wp, rather than of the constituting tersτ l,φ l, anda l We will also quantize, although arbitrarilyfine,2 the search space for the multipath delaysτ l Thus, we

parame-2 There is actually a limit beyond which it makes no sense to make the search grid finer, since it will not decrease the variance of the estimates, which is lower-bounded by the Crammer-Rao bound [ 2 ].

Trang 4

do not try to estimate the path delays with infinite

resolu-tion, but rather fix the delay values to be located on a grid

with a given mesh determining the quantization error The

size of the delay search spaceL0and the resulting quantized

delaysT = { T1, , T L0}form the initial model hypothesis

H0, which would manifest itself in theL0columns of the

de-sign matrix K This allows to formulate the channel

estima-tion problem as a standard linear problem to which the RVM

algorithm can be applied

As it can be seen, our idea lies in finding the closest

approximation of the continuous-time model (7) with the

discrete-time equivalent (9) By incorporating the model

se-lection in the analysis, we also strive to find the most

com-pact representation (in terms of the number of components),

while preserving good approximation quality Thus, our goal

is to estimate the channel parameters wpas well as to

deter-mine how many multipath componentsL ≤ L0are present in

the measured impulse response The application of the RVM

framework to solve this problem follows in the next section

3 EVIDENCE MAXIMIZATION, RELEVANCE VECTOR

MACHINES, AND WIRELESS CHANNELS

We begin our analysis following the steps outlined in

Section 1 In order to ease the algorithm description we first

assume thatP =1, that is, only a single sensor is used

Ex-tensions to the caseP > 1 are carried out later inSection 3.2

To simplify the notations we also drop the subscript indexp

in our further notations

From (9) it follows that the observation vector z is a

lin-ear combination of the vectors from the column-space of K,

weighted according to the parameters w and embedded in

the correlated noiseξ In order to correctly assess the order

of the model, it is imperative to take the noise process into

account It follows from (12) that the covariance matrix of

the noise is proportional to the unknown spectral heightN0,

which should therefore be estimated from the data Thus,

the model hypotheses Hi should include the term N0 In

the following analysis we assume thatβ = N −1 is

Gamma-distributed [15], with the corresponding probability density

function (pdf) given as

Γ(υ)β υ −1exp(−κβ), (13)

with parametersκ and υ predefined so that (13) accurately

reflects our a priori information aboutN0 In the absence of

any a priori knowledge one can make use of a

noninforma-tive (i.e., flat in the logarithmic domain) prior by fixing the

parameters to small valuesκ = υ =10−4[15] Furthermore,

to steer the model selection mechanism, we introduce an

ex-tra parameter (hyperparameter) α l,l = 1· · · L0, for each

column in K This parameter measures the contribution or

relevance of the corresponding weightw l in explaining the

data z from the likelihoodp(z |wi,Hi) This is achieved by

specifying the priorp(w | α) for the model weights:

High values of α l will render the contribution of the

cor-responding column in the matrix K “irrelevant,” since the

weight w l is likely to have a very small value (hence they

are termed relevance hyperparameters) This will enable us

to prune the model by setting the corresponding weightw l

to zero, thus eﬀectively removing the corresponding columnfrom the matrix and the corresponding delayT lfrom the de-lay search spaceT We also see that α −1

l is nothing else asthe prior variance of the model weightw l Also note that theprior (14) implicitly assumes statistical independence of themultipath contributions

To complete the Bayesian framework, we also specify theprior over the hyperparameters Similarly to the noise con-tribution, we assume the hyperparametersα lto be Gamma-distributed with the corresponding pdf

ap-Now, let us define the hypothesisHimore formally Let

P (S) be a power set consisting of all possible subsets of basisvector indicesS = {1 · · · L0}, and i →P (i) the indexing of

P (S) such that P (0) = S Then for each index value i the

hypothesisHiis the setHi = { β; α j, j ∈ P (i)} Clearly, the

initial hypothesisH0 = { β; α j, j ∈S}includes all possiblepotential basis functions

Now we are ready to outline the learning algorithm that

estimates the model parameters w,β, and hyperparameters α

from the measurement data z.

The explicit dependence on the hypothesis indexi has been

dropped to simplify the notation We recognize that the firstterm p(w | z,α, β) in (16) is the weight posterior and theother one p(α, β | z) is the hypothesis posterior From this

point we can start with the Bayesian two-step analysis as hasbeen indicated before

Assuming the parametersα and β are known, estimation

of model parameters consists of finding values w that

max-imize p(w | z,α, β) Using Bayes’ rule we can rewrite this

posterior as

p(w |z,α, β) ∝ p(z |w,α, β)p(w | α, β). (17)Consider the Bayesian graphical model [17] inFigure 3.This graph captures the relationship between diﬀerent vari-ables involved in (16) It is a useful tool to represent the de-pendencies among the variables involved in the analysis in

Trang 5

It immediately follows from the structure of the graph in

Figure 3thatp(z |w,α, β) = p(z |w,β) and p(w | α, β) =

p(w | α), that is, z and α are conditionally independent given

w andβ, and w and β are conditionally independent given α.

Thus, (17) is equivalent to

p(w |z,α, β) ∝ p(z |w,β)p(w | α), (18)

where the second factor on the right-hand side is given in

(14) The first term is the likelihood of w andβ given the

data From (9) it follows that

−(z−Kw)H βΛ −1(z−Kw)

Since both right-hand factors in (18) are Gaussian densities,

p(w | z,α, β) is also a Gaussian density with the covariance

matrixΦ and mean μ given as

Φ=A +βK HΛ−1K−1

μ = βΦK HΛ−1z. (21)

The matrix A = diag(α) is a diagonal matrix that contains

the evidence parametersα lon its main diagonal Clearly,μ

is a maximum a-posteriori (MAP) estimate of the

parame-ter vector w under the hypothesisHi, withΦ being the

co-variance matrix of the resulting estimates This completes the

model fitting step

Our next step is to find parametersα and β that

maxi-mize the hypothesis posterior p(α, β |z) in (16) This

den-sity function can be represented as p(α, β | z) ∝ p(z |

α, β)p(α, β), where p(z | α, β) is the evidence term and

men-tioned earlier, it is quite reasonable to choose noninformative

priors since we would like to give all possible hypothesesHi

an equal chance of being valid This can be achieved by

concluded (see derivations in the appendix) that maximum

of the evidence p(z | α, β) coincides with the maximum of

p(z | α, β)p(α, β) when ζ = = κ = υ =0, which eﬀectively

results in the noninformative hyperpriors forα and β.

This formulation of prior distributions is related to

au-tomatic relevance determination (ARD) [14,18] As a

con-sequence of this assumption, the maximization of the model

posterior is equivalent to the maximization of the evidence,which is known as the evidence procedure [13]

The evidence termp(z | α, β) can be expressed as

hyperparameters α and β is a type-II maximum likelihood

method [19] To ease the optimization, several terms in (22)can be expressed as a function of the weight posterior param-etersμ and Φ as given by (20) and (21) Then, by taking thederivatives of the logarithm of (22) with respect toα and β

and by setting them to zero, we obtain its maximizing values

as (see also the appendix)

In (23)μ landΦlldenote thelth element of, respectively, the

vectorμ, and the main diagonal of the matrix Φ Unlike the

maximizing values obtained in the original RVM paper [15,equation (18)], (24) is derived for the extended, more gen-eral case of colored additive noiseξ with the corresponding

covariance matrixβ −1Λ arising due to the MF processing at

the receiver Clearly, if the noise is assumed to be white, pressions (23) and (24) coincide with those derived in [15].Also note thatα and β are dependent as it can be seen from

ex-(23) and (24)

Thus, for a particular hypothesisHi the learning rithm proceeds by repeated application of (20) and (21), al-ternated with the update of the corresponding evidence pa-rametersα iandβ from (23) and (24), as depicted inFigure 4,until some suitable convergence criterion has been satisfied.Provided a good initialization ofα[0]

algo-i andβ[0]is chosen,3thescheme inFigure 4converges afterj iterations to the station-

ary point of the system of coupled equations (20), (21), (23),and (24) Then, the maximization (1) is performed by select-ing the hypothesis that results in the highest posterior (2)

In practice, however, we will observe that during the timation some of the hyperparametersα ldiverge, or, in fact,become numerically indistinguishable from infinity given thecomputer accuracy.4 The divergence of some of the hyper-parameters enables us to approximate (1) by performing an

rees-3 Later in Section 5 we consider several rules for initializing the rameters.

hyperpa-4 In the finite sample size case, however, this will only happen in the high SNR regime Otherwise,α lwill take large but still finite values In Section 4.1 we elaborate more on the conditions that lead to convergence/divergence of this learning scheme.

Trang 6

Hypothesis Hi Parameter

posteriors

Eq (20), (21)

Hypothesis update

Figure 4: Iterative learning of the parameters; the superscript [j]

denotes the iteration index

on-line model selection: starting from the initial hypothesis

H0, we prune the hyperparameters that become larger than

a certain threshold as the iterations proceed by setting them

to infinity In turn, this sets the corresponding coeﬃcient wl

to zero, thus “switching oﬀ” the lth column in the kernel

matrix K and removing the delayT lfrom the search space

T This eﬀectively implements the model selection by

cre-ating smaller hypothesesHi < H0 (with fewer basis

func-tions) without performing an exhaustive search over all the

possibilities The choice of the threshold will be discussed in

Section 4

3.2 Extensions to multiple channel observations

In this subsection we extend the above analysis to multiple

channel observations or multiple antenna systems When

de-tecting multipath components any additional channel

mea-surement (either in time, by observing several periods of the

sounding sequenceu(t), or in space, by using multiple

sen-sor antenna) can be used to increase detection quality Of

course, it is important to make sure that the multipath

com-ponents are time-invariant within the observation interval

The basic idea how to incorporate several channel

observa-tions is quite simple: in the original formulation each

hyper-parameterα lwas used to control a single weightw land thus

the single component Having several channel observations,

a single hyperparameterα lnow controls weights

represent-ing contribution of the same physical multipath component,

but present in the diﬀerent channel observations

Usage of a single parameter in this case expresses the

channel coherence property in the Bayesian framework The

corresponding graphical model that illustrates this idea for a

single hyperparameterα lis depicted inFigure 5 It is

inter-esting to note that similar ideas, though in a totally diﬀerent

context, were adapted to train neural networks by allowing

a single hyperparameter to control a group of weights [18]

Note that it is also possible to introduce an individual

hyper-parameterα p,lfor each weightw p,l, but this eventually

decou-ples the problem intoP separate one-dimensional problems

and, as the result, any dependency between the consecutive

channels is ignored

Now, let us return to (9) It can be seen that the weights

wp capture the structure induced by multiple antennas

However, for the moment we ignore this structure and treat

the components of wp as a wide-sense stationary (WSS)

wire-process over the individual channels, p = 1· · · P We will

also allow each sensor to have a different MF This mightnot necessarily be the case for wireless channel sounding,but thus a more general situation can be considered Differ-ent matched filters result in different design matrices Kp, andthus different noise covariance matrices Σp,p =1· · · P We

will however require that the variance of the input noise mains the same and equalsN0= β −1for all channels, so that

re-Σ p= N0Λp, and the noise components are statistically pendent among the channels Then, by defining

wP

⎤

⎥

⎦,(25)

hy-in the structure of the matrixA This will have a correspond-

ing eﬀect on the hyperparameter reestimation algorithm.From the structural equivalence of (9) and (26) we caneasily infer that (20) and (21) are modified as follows:

Trang 7

The expressions for the hyperparameter updates become

a bit more complicated but are still straight-forward to

com-pute It is shown in the appendix that

whereμ p,lis thelth element of the MAP estimate of the

pa-rameter vector wpgiven by (28), andΦp,llis thelth element

on the main diagonal ofΦpfrom (27) Comparing the latter

expressions with those developed for the single channel case,

we observe that (29) and (30) use multiple channels to

im-prove the estimates of the noise spectral height and channel

weight hyperparameters They also oﬀer more insight into

the physical meaning of the hyperparametersα On the one

hand, the hyperparameters are used to regularize the matrix

inversion (27), needed to obtain the MAP estimates of the

parametersw p,l and their corresponding variances On the

other hand, they act as the inverse of the second noncentral

moments of the coeﬃcients w p,l, as can be seen from (29)

4 MODEL SELECTION AND BASIS PRUNING

The ability to select the best model to represent the

mea-sured data is an important feature of the proposed scheme,

and thus it is paramount to consider in more detail how the

model selection is eﬀectively achieved InSection 3.1we have

briefly mentioned that during the learning phase many of

the hyperparametersα l’s tend to large values, meaning that

the corresponding weightsw l’s will cluster around zero

ac-cording to the prior (14) This will allow us to set these

co-eﬃcients to zero, thus eﬀectively pruning the corresponding

basis function from the design matrix However the question

how large a hyperparameter has to grow in order to prune

its corresponding basis function has not yet been discussed

In the original RVM paper [15], the author suggests using

a thresholdαthto prune the model The empirical evidence

collected by the author suggests setting the threshold to “a

suﬃciently large number” (e.g., αth = 1012) However, our

theoretical analysis presented in the following section will

show that such high thresholds are only meaningful in very

high SNR regimes, or if the number of channel observations

P is suﬃciently large In more general, and often more

realis-tic, scenarios such high thresholds are absolutely impractical

Thus, there is a need to study the model selection problem in

the context of the presented approach more rigorously

Below, we present two methods for implementing model

selection within the proposed algorithm The first method

relies on the statistical properties of the hyperparametersα l,

when the update equations (27), (28), (29), and (30)

con-verge to a stationary point The second method exploits the

relationship that we will establish between the proposedscheme and the minimum description length principle [4,8,

20,21], thus linking the EP to this classical model selectionapproach

4.1 Statistical analysis of the hyperparameters

in the stationary point

The decision to keep or to prune a basis function from the sign matrix is based purely on the value of the correspondinghyperparameterα l In the following we analyze the conver-gence properties of the iterative learning scheme depicted inFigure 4using expressions (27), (28), (29), and (30), and theresulting distribution of the hyperparameters once conver-gence is achieved

de-We start our analysis of the evidence parameters α l

by making some simplifications to make the derivationstractable

(ii) The same MF is used to process each of theP sensor

output signals, that is, Kp =K and Σp =Σ = β −1Λ,

p =1· · · P.

(iii) The noise covariance matrixΣ is known, and B=Σ−1.(iv) We assume the presence of a single multipath compo-nent, that is,L = 1, with known delay τ Thus, the

design matrix is given as K = [r(τ)], where r(τ) =

associated basis function

(v) The hyperparameter associated with this component

is denoted asα.

Our goal is to consider the steady-state solutionα ∞ forhyperparameterα in this simplified scenario In this case (27)and (28) simplify to

5 Recall thatα −1is the prior variance of the corresponding parameterw.

This constrainsα to be nonnegative.

Trang 8

analysis of (32) reveals that (29) converges to (33) if and only

if the denominator of (33) is positive:

Otherwise, the iterative learning scheme depicted inFigure 4

diverges, that is,α ∞ = ∞ This can be inferred by interpreting

(29) as a nonlinear dynamic system that, at iterationj, maps

α[j −1]into the updated valueα[j] The nonlinear mapping is

given by the right-hand side of (29), where the quantitiesΦp

andμ p depend on the values of the hyperparameters at

it-eration j −1 InFigure 6we show several iterations of this

mapping that illustrate how the solution trajectories evolve

If condition (34) is satisfied, the sequence of solutions{ α[j] }

converges to a stationary point (Figure 6(a)) given by (33)

Otherwise,{ α[j] }diverges (Figure 6(b)) Thus, (32) is a

sta-tionary point only provided the condition (34) is satisfied:

Practically, this means that for a given measurement zp, and

known noise matrix B, we can immediately decide whether a

given basis function r(τ) should be included in the basis by

simply checking if (34) is satisfied or not

A similar analysis is performed in [22], where the

behav-ior of the likelihood function with respect to a single

parame-ter is studied The obtained convergence results coincide with

ours when P = 1 Expression (34) is, however, more

gen-eral and accounts for multiple channel observations and

col-ored noise In [22] the authors also suggest that testing (34)

for a given basis function r(τ) is suﬃcient to find a sparse

representation and no further pruning is necessary In other

words, each basis function in the design matrix K is subject

to the test (34) and, if the test fails, that is, (34) does not hold

for the basis function under test, the basis function is pruned

In case of wireless channels, however, we have

exper-imentally observed that even in simulated high-SNR

sce-narios such pruning results in a significantly overestimated

number of multipath components Moreover, it can be

in-ferred from (34) that, as the SNR increases, the number of

functions pruned with this approach decreases, resulting in

less and less sparse representations This motivates us to

per-form a more detailed analysis of (35)

Let us slightly modify the assumptions we made earlier

We now assume that the multipath delayτ is unknown The

design matrix is constructed similarly but this time K=[rl],

is the basis function associated with the delayT l ∈T used in

our discrete-time model Under these assumptions the input

signal zp is nothing else but the basis function r(τ) scaled

0 10 20 30 40 50 60 70 80

Figure 6: Evolution of the two representative solution trajectoriesfor two cases: (a){ α[j] }converges, (b){ α[j] }diverges

and embedded in the additive complex zero-mean Gaussiannoise with covariance matrixΣ, that is,

Let us further assume that w p ∈ C, p = 1· · · P, are

un-known but fixed complex scaling factors In further tions we assume, unless explicitly stated otherwise, that thecondition (34) is satisfied for the basis rl By plugging (37)

Trang 9

deriva-into (33) and rearranging the result with respect toα −1

Now, we consider two scenarios In the first scenario τ =

T l ∈ T , that is, the discrete-time model matches the

ob-served signal Although unrealistic, this allows to study the

properties ofα −1

∞ more closely In the second scenario, we

study what happens if the discrete-time model does not

match perfectly the measured signal This case helps us to

define how the model selection rules have to be adjusted to

consider possible misalignment of the path component

de-lays in the model

where the only random quantity is the additive noise termξ p

This allows us to study the statistical properties of the finite

On the other hand, in the absence of noise, that is, in the

infinite SNR case, the corresponding hyperparameterα −1

∞ consists of the sum of two contributions

6 Actually, the second term in the resulting expression vanishes in a

per-fectly noise-free case, and thenα −1 = | w p |2/P.

We first considerα −1

s The first term on the right-handside of (41) is a deterministic quantity that equals the averagepower of the multipath component The second one, on theother hand, is random The product Re{w p ξ H

pBrl }in (41)

is recognized as the cross-correlation between the additive

noise term and the basis function rl It is Gaussian distributedwith expectation and variance given as

av-Now, let us consider the termα −1

n In (40) the only dom element is P p =1ξ p ξ H

ran-p This random matrix is known tohave a complex Wishart distribution [23,24] with the scalematrixΣ and P degrees of freedom Let us denote

c

Pe− x/σ c2. (46)The mean and the variance ofx are easily computed to be

Trang 10

which is equivalent to (46), but shifted so as to correspond

to a zero-mean distribution However, it is known that only

positive values ofα −1

n occur in practice The probability mass

of the negative part of (48) equals the probability that the

condition (34) is not satisfied and the resultingα ∞eventually

diverges to infinity and is pruned Taking this into account

A closer look at (49) shows that asP increases the variance of

the Gamma distribution decreases, withα −1

n concentrating

at zero In the limiting case asP → ∞, (49) converges to a

Dirac delta function localized at zero, that is,αn = ∞ This

allows natural pruning of the corresponding basis function

This situation is equivalent to averaging out the noise, as the

number of channel observations grows Practically, however,

certain finite variance

The pruning problem can now be approached from the

perspective of classical detection theory To prune a basis

function, we have to decide if the corresponding value of

α −1has been generated by the noise distribution (49), that

is, the null hypothesis, or by the pdf of α −1

s +α −1

n , that is, the

problem might be somewhat relaxed by taking the

assump-tion thatα −1

s andα −1

n are statistically independent Howeverproving the plausibility of this assumption is diﬃcult Even

if we were successful in finding the analytical expression for

the pdf of the alternative hypothesis, such model selection

approach is hampered by our inability to evaluate (43) since

the gainsw p’s are not known a priori However, we can still

use (49) to select a threshold

Recall that the presented algorithm allows to learn

(esti-mate) the noise spectral heightN0= β −1from the

measure-ments Assuming that we knowβ, and, as a consequence, the

whole matrix B then, for any basis function rlin the design

matrix K and the corresponding hyperparameterα l, we can

decide with a priori specified probabilityρ that α lis

gener-ated by the distribution (49) Indeed, letα −1

of (49) such that the probabilityP(α −1 ≤ α −th1) = ρ Since

(49) is known exactly, we can easily computeα −1

th and pruneall the basis functions for whichα −1

l ≤ α −1

th

The analysis performed above relies on the knowledge

that the true multipath delayτ belongs to T Unfortunately,

this is often unrealistic and the model mismatch τ / ∈ Tmust be considered To be able to study how the model mis-match influences the value of the hyperparameters we have tomake a few more assumptions Let us for simplicity select themodel delayT lto be a multiple of the chip periodT p We willalso need to assume a certain shape of the correlation func-

convenient to assume that the main lobe ofR uu(t) can be

ap-proximated by a raised cosine function with period 2T p Thisapproximation makes sense if the sounding pulse p(t) de-

fined inSection 2is a square root raised cosine pulse Clearly,this approximation can also be applied for other shapes of themain lobe, but the analysis of quality of such approximationremains outside the scope of this paper

Just as in the previous case, we can split the expression(38) into the multipath component contributionα −1

and the same noise contributionα −1

n defined in (40) It can

be seen that theγ(τ) makes (52) diﬀer from (41), and as such

it is the key to the analysis of the model mismatch Note thatthis function is bounded as| γ(τ) | ≤1, with equality follow-ing only ifτ = T l Note also that in our case for| τ − T l | < T p

the correlationγ(τ) is strictly positive.

Due to the properties of the sounding sequenceu(t), the

magnitude ofR uu(t) for | t | > T p is suﬃciently small and inour analysis of model mismatch can be safely assumed to be

zero Furthermore, if rlis chosen to coincide with the tiple of the sampling periodT l = lT s, then it follows from(12) that the product rH

mul-l B =rH

l Σ−1 = βe H

l is a vector withall elements being zero except thelth element, which is equal

a form identical to that of the correlation functionR uu(t) for

γ(τ) can be assumed to be zero, and it makes sense to

ana-lyze (52) only when| τ − T l | < T p InFigure 7we plot thecorrelation functionsR uu(t) and γ(τ) for this case.

Since the true value ofτ is unknown, we assume this

pa-rameter to be random, uniformly distributed in the interval[T l − T p,T l+T p] This in turn induces corresponding distri-butions for the random variablesγ(τ) and γ(τ)2, which enter,respectively, the second and first terms on the right-hand side

of (52)

It can be shown that in this case γ(τ) ∼ B(0.5, 0.5),

whereB(0.5, 0.5) is a Beta distribution [25] with both bution parameters equal to 1/2 The corresponding pdf p γ(x)

distri-is given in thdistri-is case as

−1/2(1− x) −1/2, (54)whereB( ·, ·) is a Beta-function [26] withB(0.5, 0.5) = π.

Trang 11

It is also straight-forward to compute the pdf of the term

The corresponding empirical and theoretical pdf ’s of

Now we have to find out how this information can be

uti-lized to design an appropriate threshold In the case of a

per-fectly matched model the threshold is selected based on the

noise distribution (49) In the case of a model mismatch, the

term (52) measures the amount of the interference resulting

from the model imperfection

Indeed, if| τ − T l | ≥ T p, then the resulting γ(τ) = 0,

and thusα −1

s = 0 The corresponding evidence parameter

∞ is then equal to the noise contributionα −1

n only and will

be pruned using the method we described for the matched

Empiricalγ(x)2

p γ2 (x)

(b)

Figure 8: Comparison between the empirical and theoretical pdf ’s

of (a)γ(τ) and (b) γ(τ)2for the cosine approximation case To pute the histogramN =5000 samples were used

com-model case If however,| τ − T l | < T p, then a certain fraction

ofα −1

s will be added to the noise contributionα −1

n , thus ing the interference In order to be able to take this interfer-ence into account and adjust the threshold accordingly, wepropose the following approach

caus-The amount of interference added is measured by themagnitude ofα −1

s in (52) It consists of two terms: the firstone is the multipath power, scaled by the factorγ(τ)2:

Định dạng
Số trang	23
Dung lượng	1,86 MB