Báo cáo hóa học: " Research Article Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations" potx

Volume 2007, Article ID 34013, 12 pagesdoi:10.1155/2007/34013 Research Article Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuation

Trang 1

Volume 2007, Article ID 34013, 12 pages

doi:10.1155/2007/34013

Research Article

Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations

Takafumi Hikichi, Marc Delcroix, and Masato Miyoshi

Media Information Laboratory, NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho,

Soraku-gun, Kyoto 619-0237, Japan

Received 16 November 2006; Accepted 2 February 2007

Recommended by Liang-Gee Chen

Inverse filtering of room transfer functions (RTFs) is considered an attractive approach for speech dereverberation given that the time invariance assumption of the used RTFs holds However, in a realistic environment, this assumption is not necessarily guar-anteed, and the performance is degraded because the RTFs fluctuate over time and the inverse filter fails to remove the eﬀect of the RTFs The inverse filter may amplify a small fluctuation in the RTFs and may cause large distortions in the filter’s output Moreover, when interference noise is present at the microphones, the filter may also amplify the noise This paper proposes a design strategy for the inverse filter that is less sensitive to such disturbances We consider that reducing the filter energy is the key to making the filter less sensitive to the disturbances Using this idea as a basis, we focus on the influence of three design parameters on the filter energy and the performance, namely, the regularization parameter, modeling delay, and filter length By adjusting these three design parameters, we confirm that the performance can be improved in the presence of RTF fluctuations and interference noise Copyright © 2007 Takafumi Hikichi et al This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

1 INTRODUCTION

Inverse filtering of room acoustics is useful in various

ap-plications such as sound reproduction, sound-field

equal-ization, and speech dereverberation Usually, room

trans-fer functions (RTFs) are modeled as finite impulse response

(FIR) filters, and inverse filters are designed to remove the

eﬀect of the RTFs When the RTFs are known a priori or

are capable of being accurately estimated, this approach has

been shown to achieve high inverse filtering performance [1

4] However, in actual acoustic environments, there are

dis-turbances that aﬀect the inverse filtering performance One

cause of these disturbances is the fluctuation in the RTFs

re-sulting from changes in such factors as source position and

temperature [5 9] As a result, an inverse filter correctly

de-signed for one condition may not work well for another

con-dition, and compensation or adaptation processing may

be-come necessary

The sensitivity issue with inverse filtering in relation to

the movement of a sound source or microphone has been

addressed in several papers In [8,9], the sensitivity of

in-verse filters is quantified in terms of the mean-squared error

(MSE), defined as the power of the deviation of the

equal-ized impulse response from the ideal impulse This MSE

is theoretically derived based on statistical room acoustics These studies claim that the region in which the MSE is be-low−10 dB is restricted to a few tenths of a wavelength of a

target signal, revealing a high sensitivity to small positional changes That is, when an inverse filter designed for a cer-tain location is applied to recover signals observed at another location, the performance easily degrades and the MSE be-comes high

Inverse filters are usually obtained by inverting the autocorrelation matrix of the RTFs Accordingly, in order to realize stable inverse filtering, either regularization [10] or the truncated singular value decomposition method [11–13] has been applied With the latter method, the small singular values of the autocorrelation matrix of the RTFs are treated

as zeros Both methods have been applied to a sound repro-duction system, and have been experimentally verified The purpose of this paper is to pursue ways of designing inverse filters that are less sensitive to RTF fluctuations and interference noise When the RTFs fluctuate, the inverse fil-ter may amplify the small fluctuation in the RTFs and may cause large distortions in the output signal of the inverse fil-ter Moreover, when the microphone signal contains noise,

Trang 2

x1 (n)

.

x P(n) s(n)

Speaker

.

H1 (z)

H P(z)

Room soundfield

Mic.

Figure 1: Single-source multimicrophone acoustic system.H i(z)

represent room transfer functions

the inverse filter may also amplify the noise We expect the

filtered signal to be less degraded when the filter energy is

small Hence, we believe that reducing the filter energy is the

key to making the filters less sensitive To confirm this belief,

we focus on the influence of three parameters used in the

design of inverse filters: the regularization parameter, filter

length, and modeling delay By selecting proper parameter

values, we expect to reduce the filter energy, and hence make

the filter more robust to RTF variations and noise

The organization of this paper is as follows The

follow-ing section describes the acoustic system with a sfollow-ingle source

and multiple microphones considered in this paper It then

describes how inverse filters are calculated and analyzes the

eﬀect of the three design parameters on the filter energy

Section 3 reports experiments undertaken in the presence

of noise.Section 4describes experimental results for an

in-verse filter with RTF fluctuations caused by source position

changes.Section 5provides an analysis of the RTF

fluctua-tions caused by source position changes.Section 6concludes

the paper

2 PROBLEM FORMULATION

2.1 Acoustic system in consideration

We consider an acoustic system with a single sound source

and multiple microphones as shown inFigure 1 The source

signal is represented as s(n), where n denotes a discrete

time index, and the signals received by the microphones are

Microphone signalsx i(n) are given by

=

J

k =0

where ∗ denotes the convolution operation, h i(k), k =

source and theith microphone, and w i(n) denotes noise The

RTFs are expressed as

J

k =0

We assume hereafter that these RTFs have no common zeros

among all the channels

Equation (2) can be expressed in a matrix form as

where

x(n) =

⎡

⎢

⎣

x1(n)

xP(n)

⎤

⎥

⎦, xi(n) =

⎡

⎢

⎣

⎤

⎥

w(n) =

⎡

⎢

⎣

w1(n)

wP(n)

⎤

⎥

⎦, wi(n) =

⎡

⎢

⎣

⎤

⎥

s(n) =

⎡

⎢

⎣

s(n)

⎤

⎥

⎦,

Hi =

⎛

⎜

h i(1) h i(0)

h i(1) 0

h i(J) . h i(0)

0 h i(J) h i(1)

⎞

⎟

⎫

⎪

M

(5)

channel The objective of dereverberation is to recover source signals(n) from the received signal x(n) This is achieved by

filtering the received signal with the inverse filter of room

acoustic system H.

2.2 Inverse filter calculation

Generally, the inverse filter vector, denoted as g, is calculated

by minimizing the following cost function:

whereadenotes thel2-norm of vector a, where

PM

T

,

v=[0, , 0 d

, 1, 0, , 0] T, (7)

is the modeling delay [14] Here, modeling delay can be

se-lected arbitrarily By applying this inverse filter g to the

mi-crophone signals, the filter’s output signal is equivalent to the

Trang 3

input signal delayed by d-taps Hereafter, we consider that

impulse responsesh i(n) are normalized by their norm When

RTF matrix H is given, such inverse filter set can be calculated

as

where A+ is the Moore-Penrose pseudoinverse of matrix A

[15] The inverse filter set is calculated based on the

multiple-input/output inverse theorem (MINT) [1] The filter set with

minimum length is obtained by settingM so that matrix H

is square, which leads toM = Mmin = J/(P −1) The filter

length can be set atM > J/(P −1) as well

2.3 Inverse filters with disturbances

When noise is present at the microphones, distortion occurs

in the output signal of the inverse filter The larger the filter

energy is, the larger the distortion can be Thus, we introduce

the filter energy into the cost function expressed in (6) By

taking the filter energy into consideration, the cost function

is modified as follows:

whereδ( ≥0) is a scalar variable This parameter determines

how much weight to assign to the energy term, and thus

determines a tradeoﬀ between the filter’s accuracy and the

amount of distortion The same formulation is applied as

the one used in multichannel active noise control systems

[14,16] We would like to derive a solution that minimizes

this cost function Equation (9) can be rewritten as

=gTHTHg−gTHTv−vTHg + vTv +δg Tg. (10)

By taking derivatives with respect to g and setting them equal

to zero, the following solution is derived:

gr =HTH +δI−1

where I is an identity matrix This solution has a similar

form to that of Tikhonov regularization for ill-posed

prob-lems [11–13,17] We hereafter refer toδ as a regularization

parameter, and gras an inverse filter vector with

regulariza-tion

Equation (11) is an optimum solution when the

interfer-ence noise is white noise with small varianceδ, and the term

δI corresponds to the correlation matrix of the noise If the

colored noise is considered as a more general case, its

corre-lation matrix is replaced with termδI as

gr =HTH + R n−1

where R nis the noise correlation matrix

Then, let us consider the situation where RTFs

fluctu-ate Suppose fluctuated RTFs denoted as H + H, where H

andH represent the mean RTF and the fluctuation from the

mean RTF, respectively In this case, we consider the ensem-ble mean of the total squared error,

= E

(Hg−v + Hg) T(Hg−v + Hg)

=(Hg−v)T(Hg−v)

+E

(Hg−v)THg + ( Hg) T(Hg−v) + gTHTHg

=gTHTHg−gTHTv−vTHg + vTv + gT E HTHg,

(13) where E · represents the expectation operation In this derivation, we assumeE His a zero matrix Then, the fol-lowing filter minimizes the cost function expressed in (13):

gr =HTH + R H

−1

where R H= E HTH From discussions described above, we

can treat the disturbances by using the filter expressed in the following form:

gr =HTH + R−1HTv, (15) whereH is either H or the mean RTF H, and R is the cor-relation matrix of either the noise R n or the fluctuation R H

If the fluctuation could be regarded as white noise,R= δI

could be applied to the inverse filter In the following experi-ments, we investigate the performance of the inverse filter of the form

gr =HT H + δI−1HTv, (16) where

H =

⎧

⎨

⎩

H (noise case),

H (fluctuation case). (17)

2.4 Influence of design parameters on filter energy

Regularization parameter δ increases the minimum

eigen-value of matrix (HT H + δI) in (16), and hence reduces the norm of the inverse filter Increasing the regularization pa-rameter is thus believed to reduce the sensitivity to RTF vari-ations and noise On the other hand, increasing this param-eter reduces the accuracy of the inverse filter with respect to the true RTFs

The eﬀect of the filter length can be expected as follows Equation (16) will give the minimum norm filter for a given lengthM By increasing the filter length, we compare

var-ious filters with diﬀerent lengths, and consequently expect that the filter with the smallest norm can be found

A modeling delayd is also used to make the inverse filter

stable When a nonzero modeling delayd (d ≥1) is used, we also expect the filter norm to be reduced because the causal-ity constraint is relaxed The filter may correspond to the minimum-norm solution that could be obtained in the fre-quency domain [18]

As described above, we can expect the regularization pa-rameter, filter length, and modeling delay to be eﬀective in reducing the filter energy

Trang 4

•Room height: 250 cm

•Microphone height: 100 cm

•Loudspeaker height: 150 cm

M4 M3 M2 M1

20 cm

100 cm

445 cm

Microphone

Loudspeaker

Figure 2: Source and microphone arrangement M1, M2, M3, and

M4 denote the microphones

3 EXPERIMENTS ON THE EFFECT OF NOISE

Experiments were performed to verify the eﬀectiveness of our

strategy in the presence of additive white noise

3.1 Experimental setup

Figure 2shows the arrangement of the source and the

micro-phones used in the experiment Four micromicro-phones are used

(P =4), and room impulse responses between the source and

the microphones are simulated by using the image method

[19] The sampling frequency is set at 8 kHz The impulse

responses are truncated to 4000 samples (J =3999),

corre-sponding to−60 dB attenuation (the reverberation time of

the room is 500 ms).Figure 3shows an example of the

im-pulse response and its frequency response

We define the input and output SNRs as follows For the

ith microphone, the input SNR is defined as

SNRin=10 log10

" #N

n =0y2

i(n)

#N

n =0w2

i(n)

$

wherey i(n) is the reverberant signal without noise, and w i(n)

is the noise In the experiment, we adjust the input SNR by

controlling the amplitude of the noise signal The output

SNR is defined as

SNRout=10 log10

" #N

n =0

y(n) Tgr

2

#N

n =0

w(n) Tgr

2

$

where y(n) = HTs(n) is the reverberant signal vector This

output SNR is obtained by filtering the reverberant and the

noise signals separately and taking the power ratio of the

output signals

−0.2

0

0.2

0.4

0.6

Time (ms)

−30

−20

−10 0 10

0 500 1000 1500 2000 2500 3000 3500 4000

Frequency (Hz) Figure 3: Waveform of a room impulse responseh1(n) and its fre-quency characteristics

3.2 Evaluation criteria

In order to avoid any dependency of the results on the source signal, we used uncorrelated white signals with a duration of

3 seconds for both source signal and noise rather than speech The dereverberation performance is evaluated by using the signal-to-distortion ratio (SDR) defined as

SDR=10 log10

" #N

n =0s2(n)

#N

n =0

$

wheres(n) is the original source signal and%s(n) is the output

signal of the inverse filter defined as%s(n) =x(n) Tgr

3.3 Results

Figure 4shows the filter energy with various modeling de-lays and regularization parameters when the minimum filter lengthM = Mmin=1333 is used, as described inSection 2.2 The energy decreases with increases in both the modeling de-lay and the regularization parameter, and shows the mini-mum value whenδ =10−1andd =500

Figure 5shows the inverse filter calculated withδ =10−6

andδ =10−1when the modeling delay is fixed atd =500

We clearly observed that the filter energy was reduced by in-creasing the regularization parameter

Figure 6shows the performance of the inverse filter with

an input SNR of 20 dB We observed that a proper regular-ization parameter value of δ = 10−2 gives the largest SDR for all the modeling delay values This regularization param-eter corresponds to the input SNR (20 dB) When the regu-larization parameter is smaller than 10−2, the performance monotonically decreased as the regularization parameter de-creased, according to the increase in the filter energy Even though the filter norm decreases withδ = 10−1, the per-formance also deteriorated because the accuracy of the filter

Trang 5

1

2

3

4

5

6

7

8

Regularization parameter

d =0

d =100

d =200

d =300

d =400

d =500 Figure 4: Filter energy as a function of regularization parameter

and modeling delay (filter length is fixed atM =1333)

−0.2

−0.1

0

0.1

0.2

(a)

−0.2

−0.1

0

0.1

0.2

(b)

Figure 5: An example of inverse filterg1(n) calculated with δ =

10−6(a) andδ =10−1(b) (modeling delay is fixed atd =500)

decreased and the deviation of the equalized response from

the ideal one became large

In the second experiment, the modeling delay was fixed

atd =500, and the eﬀect of filter length M was investigated

with various regularization parameters δ Figures 7 and8

show the filter energy and corresponding performance in this

case InFigure 7, the energy decreases with increases in both

the filter length and the regularization parameter, although

the eﬀect of the filter length is less significant when a large

0 5 10 15 20 25

d =0

d =100

d =200

d =300

d =400

d =500 Figure 6: Performance as a function of regularization parameter and modeling delay with an SNR of 20 dB (filter length is fixed at

M =1333)

0 1 2 3 4 5 6 7 8

M = Mmin

M = Mmin + 100

M = Mmin + 200

M = Mmin + 300

M = Mmin + 400

M = Mmin + 500 Figure 7: Filter energy as a function of regularization parameter and filter length (modeling delay is fixed atd =500)

regularization parameter such as δ = 10−1 to δ = 10−2 is used In Figure 8, the best performance was obtained with

δ = 10−2 for all the filter lengths used in this experiment, which corresponds to the input SNR level The performance was also improved by using the larger filter length

In the third experiment, we evaluated the performance for several SNR values by using modeling delay d = 500 and filter length M = 1333 (minimum case), or M =

1333 + 500 (lengthened case) Figure 9 shows the results

Trang 6

5

10

15

20

25

M = Mmin

M = Mmin + 100

M = Mmin + 200

M = Mmin + 300

M = Mmin + 400

M = Mmin + 500 Figure 8: Performance as a function of regularization parameter

and filter length with an SNR of 20 dB (modeling delay is fixed at

d =500)

obtained with input SNRs of 10, 20, 30, and 40 dB As the

input SNR increases, the regularization parameter that

pro-vides the best performance decreases We observe that the

best regularization parameter corresponds to the input SNR

We also observe that the performance evaluated with SDR is

bounded by the input SNR level In addition, when the input

SNR is 20 dB, the output SNR defined in (19) is about 20 dB,

indicating that the input noise is not amplified

By using a proper delay and a larger filter length, the

in-verse filter’s energy and equalization error can be reduced

Furthermore, appropriate choice of the regularization

pa-rameter is eﬀective for reducing the equalization error In the

next section, we investigate the applicability of this strategy

to the RTF fluctuations

4 EXPERIMENTS FOR RTF FLUCTUATIONS

Simulations are undertaken to investigate the eﬀect of the

RTF fluctuations on the inverse filter Here, we consider the

fluctuations caused by source position fluctuations in the

horizontal plane for the sake of simplicity The more general

case of three-dimensional fluctuations is not investigated in

this paper

4.1 Experimental setup

We consider the same room as in the previous experiment

shown inFigure 2 As for the source positions, we simulate

the fluctuations in source position as follows As shown in

Figure 10, we considerN equally spaced new positions placed

on a circle of radiusr centered at the original position As a

model of fluctuation, we assume that the source is located at

each of theseN positions with equal probability, and that the

averaged RTF over these positions is obtained through either measurement or estimation This averaged RTF is referred

to as “reference RTF,” and is used to calculate inverse filters according to (16) In the following simulation, the number

of source positons is fixed toN =8

4.2 Evaluation procedure

The performance of the inverse filter for fluctuations in the source position is evaluated as follows

(1) An inverse filter set is calculated based on the reference RTFs according to (16)

(2) For each new source position j ( j = 1, , 8),

equal-ization is achieved by filtering reverberant signals with the inverse filter set calculated in (1)

(3) SDR values are calculated for all of the dereverberated signals obtained in (2), and the SDR values are aver-aged over the 8 positions to obtain the overall perfor-mance measure

4.3 Results

The influence of the design parameters on performance is evaluated in the same manner as in the previous experiment

Figure 11shows the performance of an inverse filter designed with various modeling delays d and regularization

param-eters δ with radius r = 1 cm This radius corresponds to one eighth of a wavelength of the center frequency of sig-nals in consideration Conventional studies have shown con-siderable degradation in the performance for this displace-ment In general, the performance shows a similar tendency

to that obtained in the previous experiment That is, the per-formance is inversely proportional to the filter energy, and improved with increases in the regularization parameter and modeling delay We observed that the best performance was obtained at δ = 10−2 andd = 500 However, the perfor-mance is rather flat compared with that inFigure 6 For a change of source position ofr =1 cm, the best performance was 12 dB

In the second experiment, the modeling delay was fixed

atd = 500, and the eﬀects of filter length M and

regular-ization parameterδ were investigated. Figure 12shows the performance in this case Here also, we observed that the performance is inversely proportional to the filter energy Furthermore, the performance depends on the regularization parameter less than in the case of additive noise In the case of

additive noise, the noise correlation matrix R nin (12) could

be well approximated toδI On the contrary, the correlation

matrix of the fluctuation R Hin (14) could not be correctly approximated toδI.

Figure 13shows the performance for position variations

ofr =1, 2, 3, and 4 cm The modeling delay was set atd =

500, and the filter length was set atM = 1333 (minimum case) andM =1333 + 500 (lengthened case) In both cases, whenr =1 cm,δ =10−2shows the maximum SDR value of around 12 dB Forr =2, 3, and 4 cm, the best regularization parameter wasδ =10−1

Trang 7

5

10

15

20

25

30

35

40

10 dB

20 dB

30 dB

40 dB (a)

0 5 10 15 20 25 30 35 40

10 dB

20 dB

30 dB

40 dB (b)

Figure 9: Performance as a function of regularization parameter for SNR values of 10, 20, 30, and 40 dB (d=500) Filter length was set at

M =1333 (a), andM =1333 + 500 (b), respectively

1 2

3 4

5

6

7

8 Original position

r cm

New position

Figure 10: Source positions considered in the experiment

Again, by using an appropriate delay and filter length, the

inverse filter’s energy could be reduced, and accordingly the

inverse filtering performance could be improved

Further-more, an appropriate choice of regularization parameter was

eﬀective However, the eﬀect of adjusting this regularization

parameter is less obvious than with additive noise

In the next section, we analyze the RTF fluctuations

caused by position changes, and discuss the diﬀerences

be-tween the results for RTF fluctuations and additive noise

5 DISCUSSION

5.1 Comparison between RTF fluctuations and noise

We compare the results for RTF fluctuations shown in

Figure 9 and the results for noise shown in Figure 13 As

shown in Figure 9, the dereverberation performance has a

maximum point for a certain regularization parameter value,

0 5 10 15 20 25

d =0

d =100

d =200

d =300

d =400

d =500 Figure 11: Performance as a function of the regularization param-eter and modeling delay (filter length is fixed atM =1333)

and this best value corresponds to the SNR value of the ob-served signals For example, with SNR = 20 dB, the best value isδ =10−2and this gives a maximum SDR of 20 dB, that is, we obtained almost the same SDR level as the input SNR When a smaller δ is used such as 10 −9, the filter en-ergy becomes large, and hence this results in a small SDR of 5 (minimum-length case) to 10 dB (lengthened filter case) By contrast, for RTF fluctuations ofr =1 cm (corresponding to one eighth of a wavelength of the center frequency of signals

Trang 8

5

10

15

20

25

M = Mmin

M = Mmin + 100

M = Mmin + 200

M = Mmin + 300

M = Mmin + 400

M = Mmin + 500 Figure 12: Performance as a function of the regularization

parame-ter and additional filparame-ter length (modeling delay is fixed atd =500)

in consideration) as shown in Figure 13, although the best

value for the regularization parameter is almost the same,

that is,δ =10−2, the corresponding SDR was around 12 dB,

and the curve was much broader than inFigure 9 That is,

the performance does not depend greatly onδ.

The cause of the diﬀerence between these two results

is discussed here We analyze the eﬀect of using this

fil-ter in the fluctuation case on the performance using the

fluctuation model described in Section 5.1 Let us denote

the RTF matrix corresponding to each source position as

Hj = H + Hj, where H represents the reference RTF

ma-trix averaged over the positions, andHjrepresents the

fluc-tuation between the reference RTF and the RTF for the jth

new postion If the source switches back and forth among

all the possible positions with equal probability, we can

con-sider that the periods in which the source locates at each

po-sition are rearranged and put together Then, the total

er-ror may be calculated as the sum of erer-rors for all the

posi-tions as

N

j =1

Hjg−v2

= 1

N

j =1

H + Hj

g−v2

By considering suﬃcienty large number of N, we replace

spatial averaging with an expectation,

= E

(Hg−v + Hg) T(Hg−v + Hg) . (22)

This turns out to be (13)

−2 0 2 4 6 8 10 12 14

1 cm

2 cm

3 cm

4 cm (a)

−2 0 2 4 6 8 10 12 14

1 cm

2 cm

3 cm

4 cm (b)

Figure 13: Performance as a function of the regularization param-eter for position variations ofr =1, 2, 3 and 4 cm (d=500) Filter length was set atM =1333 (a), andM =1333 + 500 (b), respec-tively

Let us evaluate the diﬀerence in performance between

an example RTF fluctuation and of a random signal used

in the experiment Figure 14shows these autocorrelations There is a discrepancy between these two correlations This may explain why the adjustment of the regularization pa-rameter is of limited eﬃciency in the presence of RTF fluc-tuations

Then, the inverse filter in (15) is used to compare the performance with H = H and regularization matrices R

Trang 9

0

0.5

1

0 1000 2000 3000 4000 5000 6000 7000 8000

Time (samples) (a) Autocorrelation trace of RTF fluctuations,r =1 cm

−0.5

0

0.5

1

0 1000 2000 3000 4000 5000 6000 7000 8000

Time (samples) (b) Autocorrelation trace of a random signal Figure 14: Autocorrelation coeﬃcients

Table 1: Regularization performance

Regularization matrixR (1)δI, δ =10−2 (2)E HTH ≈(1/8)# 8

j=1HT

jHj

defined as

(2) R= E HTH ≈(1/8)#8

j =1HT

jHj,Hj =Hj −H.

The performance of the inverse filter calculated with (15) is

shown inTable 1 The performance with the correlation

ma-trix in (2) is improved by 3.7 dB compared with the mama-trix

in (1) This result shows the eﬀect of incorporating the

au-tocorrelation of the RTF fluctuations If the time structure of

the fluctuations could be obtained, for example by estimating

the averaged autocorrelation of the fluctuation, more robust

inverse filters could be obtained Future work should include

finding ways to estimate such fluctuation’s time structure

5.2 Results of speech dereverberation

Finally, the dereverberation performance is shown using

speech signals.Figure 15shows spectrograms of the (a)

orig-inal, (b) reverberant, and (c), (d) dereverberated speech

sig-nals The reference RTFs were used to calculate the inverse

filter, and the RTFs corresponding to the 5th new position

in Figure 10were used to calculate the reverberant speech

and for dereverberation The source position change is 1 cm

The filter length was set at M = 1333, and the modeling

delay was d = 500 The SDR of the reverberant speech is

1.8 dB.Figure 15(c)shows a spectrogram of the

dereverber-ated speech signal filtered by the inverse filter with the reg-ularization parameter δ = 10−9 Although the figure ap-pears less reverberant thanFigure 15(b), there is some degra-dation and an SDR of 10.9 dB was obtained Figure 15(d)

shows a spectrogram of the dereverberated speech filtered

by the inverse filter withδ = 10−2 When the proper reg-ularization parameter was used, the SDR improved by up

to 17 dB This SDR value is 5 dB higher than that obtained using a white signal as shown in Figure 13 This di ﬀer-ence comes from the fact that the distortion mainly occurs

in the higher frequency range, where speech has low en-ergy

Figure 16(a)shows a spectrogram of noisy and reverber-ant speech The SNR level at the microphone is 20 dB, and the SDR with respect to the source speech signal is 0.5 dB

Figure 16(b)shows a spectrogram of the dereverberated sig-nal whenδ =10−9is used The SDR of the dereverberated speech signal is 5.1 dB Although it appears less reverber-ant, the frequency components of the speech are buried in those of the noise This is because the incoming noise was amplified by the filter.Figure 16(c)shows a spectrogram of the dereverberated signal whenδ =10−2is used When the proper regularization parameter was used, the noise became less noticeable, because the filter energy was small As a re-sult, an SDR of 15.9 dB was achieved while the output SNR was kept over 20 dB

Trang 10

0 500 1000 1500 2000 2500 3000 3500 4000

Time (s) (a) Clean speech

0 500 1000 1500 2000 2500 3000 3500 4000

Time (s) (b) Reverberant speech (SDR=1.8 dB)

0 500 1000 1500 2000 2500 3000 3500 4000

Time (s) (c) Recovered speech with fluctuation (δ =10−9,

SDR=10.9 dB)

0 500 1000 1500 2000 2500 3000 3500 4000

Time (s) (d) Recovered speech with fluctuation (δ =10−2, SDR=17 dB)

Figure 15: Spectrograms of speech signals

6 CONCLUSION

With a view of extending the applicability of

inverse-filter-based dereverberation, this paper examined a design method

for an inverse filter, in which the filter design parameters

were adjusted to reduce the filter energy The

regulariza-tion parameter, modeling delay, and filter length were

se-lected to improve the performance when the RTFs

fluctu-ated and when slight interference noise was present at the

microphone signals Simulation results showed that the

in-verse filtering performance could be improved by properly

adjusting the design parameters, which led to a reduction

in the filter energy Consequently, this approach was shown

to be eﬀective for both RTF fluctuation and interference

noise

We discussed the diﬀerences between the results we

ob-tained for RTF fluctuations and white noise We observed

that the performance with the regularization parameter did

not improve greatly with regard to the RTF fluctuations,

while the performance for the white noise showed a clear

peak corresponding to the input SNR level This is because

RTF fluctuations are not random, and the regularized

in-verse filter implicitly assumes that the fluctuation is

ran-dom To demonstrate this, we used the autocorrelation of the fluctuation to calculate the inverse filter The simula-tion result revealed that the RTF fluctuasimula-tion had time struc-tures Future work thus includes finding ways to incorporate such fluctuation’s time structures into the filter design pro-cess

Systematic determination of the design parameters also remains as future work Among the design parameters, a proper choice of the regularization parameter was impor-tant for the improvement in the performance, and the choice

of the filter length and the modeling delay was less cru-cial than the regularization parameter In the noisy case, the optimum regularization parameter that provides the best performance corresponds to the input SNR level, as shown inFigure 9 Thus, one way to determine the param-eter is through the estimation of the input SNR [20] For the RTF fluctuations, on the other hands, automatic deter-mination of the parameter may not be simple However, we observed from the results shown in Figure 13 that a rela-tively large value such asδ = 10−1 was eﬀective in avoid-ing the degradation for small positional changes Thus, usavoid-ing such a large value may be one solution for the RTF fluctua-tions

Định dạng
Số trang	12
Dung lượng	7,66 MB