Volume 2007, Article ID 34013, 12 pagesdoi:10.1155/2007/34013 Research Article Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuation
Trang 1Volume 2007, Article ID 34013, 12 pages
doi:10.1155/2007/34013
Research Article
Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations
Takafumi Hikichi, Marc Delcroix, and Masato Miyoshi
Media Information Laboratory, NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho,
Soraku-gun, Kyoto 619-0237, Japan
Received 16 November 2006; Accepted 2 February 2007
Recommended by Liang-Gee Chen
Inverse filtering of room transfer functions (RTFs) is considered an attractive approach for speech dereverberation given that the time invariance assumption of the used RTFs holds However, in a realistic environment, this assumption is not necessarily guar-anteed, and the performance is degraded because the RTFs fluctuate over time and the inverse filter fails to remove the effect of the RTFs The inverse filter may amplify a small fluctuation in the RTFs and may cause large distortions in the filter’s output Moreover, when interference noise is present at the microphones, the filter may also amplify the noise This paper proposes a design strategy for the inverse filter that is less sensitive to such disturbances We consider that reducing the filter energy is the key to making the filter less sensitive to the disturbances Using this idea as a basis, we focus on the influence of three design parameters on the filter energy and the performance, namely, the regularization parameter, modeling delay, and filter length By adjusting these three design parameters, we confirm that the performance can be improved in the presence of RTF fluctuations and interference noise Copyright © 2007 Takafumi Hikichi et al This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
1 INTRODUCTION
Inverse filtering of room acoustics is useful in various
ap-plications such as sound reproduction, sound-field
equal-ization, and speech dereverberation Usually, room
trans-fer functions (RTFs) are modeled as finite impulse response
(FIR) filters, and inverse filters are designed to remove the
effect of the RTFs When the RTFs are known a priori or
are capable of being accurately estimated, this approach has
been shown to achieve high inverse filtering performance [1
4] However, in actual acoustic environments, there are
dis-turbances that affect the inverse filtering performance One
cause of these disturbances is the fluctuation in the RTFs
re-sulting from changes in such factors as source position and
temperature [5 9] As a result, an inverse filter correctly
de-signed for one condition may not work well for another
con-dition, and compensation or adaptation processing may
be-come necessary
The sensitivity issue with inverse filtering in relation to
the movement of a sound source or microphone has been
addressed in several papers In [8,9], the sensitivity of
in-verse filters is quantified in terms of the mean-squared error
(MSE), defined as the power of the deviation of the
equal-ized impulse response from the ideal impulse This MSE
is theoretically derived based on statistical room acoustics These studies claim that the region in which the MSE is be-low−10 dB is restricted to a few tenths of a wavelength of a
target signal, revealing a high sensitivity to small positional changes That is, when an inverse filter designed for a cer-tain location is applied to recover signals observed at another location, the performance easily degrades and the MSE be-comes high
Inverse filters are usually obtained by inverting the autocorrelation matrix of the RTFs Accordingly, in order to realize stable inverse filtering, either regularization [10] or the truncated singular value decomposition method [11–13] has been applied With the latter method, the small singular values of the autocorrelation matrix of the RTFs are treated
as zeros Both methods have been applied to a sound repro-duction system, and have been experimentally verified The purpose of this paper is to pursue ways of designing inverse filters that are less sensitive to RTF fluctuations and interference noise When the RTFs fluctuate, the inverse fil-ter may amplify the small fluctuation in the RTFs and may cause large distortions in the output signal of the inverse fil-ter Moreover, when the microphone signal contains noise,
Trang 2x1 (n)
.
x P(n) s(n)
Speaker
.
H1 (z)
H P(z)
Room soundfield
Mic.
Figure 1: Single-source multimicrophone acoustic system.H i(z)
represent room transfer functions
the inverse filter may also amplify the noise We expect the
filtered signal to be less degraded when the filter energy is
small Hence, we believe that reducing the filter energy is the
key to making the filters less sensitive To confirm this belief,
we focus on the influence of three parameters used in the
design of inverse filters: the regularization parameter, filter
length, and modeling delay By selecting proper parameter
values, we expect to reduce the filter energy, and hence make
the filter more robust to RTF variations and noise
The organization of this paper is as follows The
follow-ing section describes the acoustic system with a sfollow-ingle source
and multiple microphones considered in this paper It then
describes how inverse filters are calculated and analyzes the
effect of the three design parameters on the filter energy
Section 3 reports experiments undertaken in the presence
of noise.Section 4describes experimental results for an
in-verse filter with RTF fluctuations caused by source position
changes.Section 5provides an analysis of the RTF
fluctua-tions caused by source position changes.Section 6concludes
the paper
2 PROBLEM FORMULATION
2.1 Acoustic system in consideration
We consider an acoustic system with a single sound source
and multiple microphones as shown inFigure 1 The source
signal is represented as s(n), where n denotes a discrete
time index, and the signals received by the microphones are
Microphone signalsx i(n) are given by
=
J
k =0
where ∗ denotes the convolution operation, h i(k), k =
source and theith microphone, and w i(n) denotes noise The
RTFs are expressed as
J
k =0
We assume hereafter that these RTFs have no common zeros
among all the channels
Equation (2) can be expressed in a matrix form as
where
x(n) =
⎡
⎢
⎣
x1(n)
xP(n)
⎤
⎥
⎦, xi(n) =
⎡
⎢
⎢
⎣
⎤
⎥
⎥
w(n) =
⎡
⎢
⎣
w1(n)
wP(n)
⎤
⎥
⎦, wi(n) =
⎡
⎢
⎢
⎣
⎤
⎥
⎥
s(n) =
⎡
⎢
⎢
⎣
s(n)
⎤
⎥
⎥
⎦,
Hi =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
h i(1) h i(0)
h i(1) 0
h i(J) . h i(0)
0 h i(J) h i(1)
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎫
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
M
(5)
channel The objective of dereverberation is to recover source signals(n) from the received signal x(n) This is achieved by
filtering the received signal with the inverse filter of room
acoustic system H.
2.2 Inverse filter calculation
Generally, the inverse filter vector, denoted as g, is calculated
by minimizing the following cost function:
whereadenotes thel2-norm of vector a, where
PM
T
,
v=[0, , 0 d
, 1, 0, , 0] T, (7)
is the modeling delay [14] Here, modeling delay can be
se-lected arbitrarily By applying this inverse filter g to the
mi-crophone signals, the filter’s output signal is equivalent to the
Trang 3input signal delayed by d-taps Hereafter, we consider that
impulse responsesh i(n) are normalized by their norm When
RTF matrix H is given, such inverse filter set can be calculated
as
where A+ is the Moore-Penrose pseudoinverse of matrix A
[15] The inverse filter set is calculated based on the
multiple-input/output inverse theorem (MINT) [1] The filter set with
minimum length is obtained by settingM so that matrix H
is square, which leads toM = Mmin = J/(P −1) The filter
length can be set atM > J/(P −1) as well
2.3 Inverse filters with disturbances
When noise is present at the microphones, distortion occurs
in the output signal of the inverse filter The larger the filter
energy is, the larger the distortion can be Thus, we introduce
the filter energy into the cost function expressed in (6) By
taking the filter energy into consideration, the cost function
is modified as follows:
whereδ( ≥0) is a scalar variable This parameter determines
how much weight to assign to the energy term, and thus
determines a tradeoff between the filter’s accuracy and the
amount of distortion The same formulation is applied as
the one used in multichannel active noise control systems
[14,16] We would like to derive a solution that minimizes
this cost function Equation (9) can be rewritten as
=gTHTHg−gTHTv−vTHg + vTv +δg Tg. (10)
By taking derivatives with respect to g and setting them equal
to zero, the following solution is derived:
gr =HTH +δI−1
where I is an identity matrix This solution has a similar
form to that of Tikhonov regularization for ill-posed
prob-lems [11–13,17] We hereafter refer toδ as a regularization
parameter, and gras an inverse filter vector with
regulariza-tion
Equation (11) is an optimum solution when the
interfer-ence noise is white noise with small varianceδ, and the term
δI corresponds to the correlation matrix of the noise If the
colored noise is considered as a more general case, its
corre-lation matrix is replaced with termδI as
gr =HTH + R n−1
where R nis the noise correlation matrix
Then, let us consider the situation where RTFs
fluctu-ate Suppose fluctuated RTFs denoted as H + H, where H
andH represent the mean RTF and the fluctuation from the
mean RTF, respectively In this case, we consider the ensem-ble mean of the total squared error,
= E
(Hg−v + Hg) T(Hg−v + Hg)
=(Hg−v)T(Hg−v)
+E
(Hg−v)THg + ( Hg) T(Hg−v) + gTHTHg
=gTHTHg−gTHTv−vTHg + vTv + gT E HTHg,
(13) where E · represents the expectation operation In this derivation, we assumeE His a zero matrix Then, the fol-lowing filter minimizes the cost function expressed in (13):
gr =HTH + R H
−1
where R H= E HTH From discussions described above, we
can treat the disturbances by using the filter expressed in the following form:
gr =HTH + R−1HTv, (15) whereH is either H or the mean RTF H, and R is the cor-relation matrix of either the noise R n or the fluctuation R H
If the fluctuation could be regarded as white noise,R= δI
could be applied to the inverse filter In the following experi-ments, we investigate the performance of the inverse filter of the form
gr =HT H + δI−1HTv, (16) where
H =
⎧
⎨
⎩
H (noise case),
H (fluctuation case). (17)
2.4 Influence of design parameters on filter energy
Regularization parameter δ increases the minimum
eigen-value of matrix (HT H + δI) in (16), and hence reduces the norm of the inverse filter Increasing the regularization pa-rameter is thus believed to reduce the sensitivity to RTF vari-ations and noise On the other hand, increasing this param-eter reduces the accuracy of the inverse filter with respect to the true RTFs
The effect of the filter length can be expected as follows Equation (16) will give the minimum norm filter for a given lengthM By increasing the filter length, we compare
var-ious filters with different lengths, and consequently expect that the filter with the smallest norm can be found
A modeling delayd is also used to make the inverse filter
stable When a nonzero modeling delayd (d ≥1) is used, we also expect the filter norm to be reduced because the causal-ity constraint is relaxed The filter may correspond to the minimum-norm solution that could be obtained in the fre-quency domain [18]
As described above, we can expect the regularization pa-rameter, filter length, and modeling delay to be effective in reducing the filter energy
Trang 4•Room height: 250 cm
•Microphone height: 100 cm
•Loudspeaker height: 150 cm
M4 M3 M2 M1
20 cm
20 cm
20 cm
100 cm
100 cm
445 cm
Microphone
Loudspeaker
Figure 2: Source and microphone arrangement M1, M2, M3, and
M4 denote the microphones
3 EXPERIMENTS ON THE EFFECT OF NOISE
Experiments were performed to verify the effectiveness of our
strategy in the presence of additive white noise
3.1 Experimental setup
Figure 2shows the arrangement of the source and the
micro-phones used in the experiment Four micromicro-phones are used
(P =4), and room impulse responses between the source and
the microphones are simulated by using the image method
[19] The sampling frequency is set at 8 kHz The impulse
responses are truncated to 4000 samples (J =3999),
corre-sponding to−60 dB attenuation (the reverberation time of
the room is 500 ms).Figure 3shows an example of the
im-pulse response and its frequency response
We define the input and output SNRs as follows For the
ith microphone, the input SNR is defined as
SNRin=10 log10
" #N
n =0y2
i(n)
#N
n =0w2
i(n)
$
wherey i(n) is the reverberant signal without noise, and w i(n)
is the noise In the experiment, we adjust the input SNR by
controlling the amplitude of the noise signal The output
SNR is defined as
SNRout=10 log10
" #N
n =0
y(n) Tgr
2
#N
n =0
w(n) Tgr
2
$
where y(n) = HTs(n) is the reverberant signal vector This
output SNR is obtained by filtering the reverberant and the
noise signals separately and taking the power ratio of the
output signals
−0.2
0
0.2
0.4
0.6
Time (ms)
−30
−20
−10 0 10
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency (Hz) Figure 3: Waveform of a room impulse responseh1(n) and its fre-quency characteristics
3.2 Evaluation criteria
In order to avoid any dependency of the results on the source signal, we used uncorrelated white signals with a duration of
3 seconds for both source signal and noise rather than speech The dereverberation performance is evaluated by using the signal-to-distortion ratio (SDR) defined as
SDR=10 log10
" #N
n =0s2(n)
#N
n =0
$
wheres(n) is the original source signal and%s(n) is the output
signal of the inverse filter defined as%s(n) =x(n) Tgr
3.3 Results
Figure 4shows the filter energy with various modeling de-lays and regularization parameters when the minimum filter lengthM = Mmin=1333 is used, as described inSection 2.2 The energy decreases with increases in both the modeling de-lay and the regularization parameter, and shows the mini-mum value whenδ =10−1andd =500
Figure 5shows the inverse filter calculated withδ =10−6
andδ =10−1when the modeling delay is fixed atd =500
We clearly observed that the filter energy was reduced by in-creasing the regularization parameter
Figure 6shows the performance of the inverse filter with
an input SNR of 20 dB We observed that a proper regular-ization parameter value of δ = 10−2 gives the largest SDR for all the modeling delay values This regularization param-eter corresponds to the input SNR (20 dB) When the regu-larization parameter is smaller than 10−2, the performance monotonically decreased as the regularization parameter de-creased, according to the increase in the filter energy Even though the filter norm decreases withδ = 10−1, the per-formance also deteriorated because the accuracy of the filter
Trang 51
2
3
4
5
6
7
8
Regularization parameter
d =0
d =100
d =200
d =300
d =400
d =500 Figure 4: Filter energy as a function of regularization parameter
and modeling delay (filter length is fixed atM =1333)
−0.2
−0.1
0
0.1
0.2
(a)
−0.2
−0.1
0
0.1
0.2
(b)
Figure 5: An example of inverse filterg1(n) calculated with δ =
10−6(a) andδ =10−1(b) (modeling delay is fixed atd =500)
decreased and the deviation of the equalized response from
the ideal one became large
In the second experiment, the modeling delay was fixed
atd =500, and the effect of filter length M was investigated
with various regularization parameters δ Figures 7 and8
show the filter energy and corresponding performance in this
case InFigure 7, the energy decreases with increases in both
the filter length and the regularization parameter, although
the effect of the filter length is less significant when a large
0 5 10 15 20 25
Regularization parameter
d =0
d =100
d =200
d =300
d =400
d =500 Figure 6: Performance as a function of regularization parameter and modeling delay with an SNR of 20 dB (filter length is fixed at
M =1333)
0 1 2 3 4 5 6 7 8
Regularization parameter
M = Mmin
M = Mmin + 100
M = Mmin + 200
M = Mmin + 300
M = Mmin + 400
M = Mmin + 500 Figure 7: Filter energy as a function of regularization parameter and filter length (modeling delay is fixed atd =500)
regularization parameter such as δ = 10−1 to δ = 10−2 is used In Figure 8, the best performance was obtained with
δ = 10−2 for all the filter lengths used in this experiment, which corresponds to the input SNR level The performance was also improved by using the larger filter length
In the third experiment, we evaluated the performance for several SNR values by using modeling delay d = 500 and filter length M = 1333 (minimum case), or M =
1333 + 500 (lengthened case) Figure 9 shows the results
Trang 65
10
15
20
25
Regularization parameter
M = Mmin
M = Mmin + 100
M = Mmin + 200
M = Mmin + 300
M = Mmin + 400
M = Mmin + 500 Figure 8: Performance as a function of regularization parameter
and filter length with an SNR of 20 dB (modeling delay is fixed at
d =500)
obtained with input SNRs of 10, 20, 30, and 40 dB As the
input SNR increases, the regularization parameter that
pro-vides the best performance decreases We observe that the
best regularization parameter corresponds to the input SNR
We also observe that the performance evaluated with SDR is
bounded by the input SNR level In addition, when the input
SNR is 20 dB, the output SNR defined in (19) is about 20 dB,
indicating that the input noise is not amplified
By using a proper delay and a larger filter length, the
in-verse filter’s energy and equalization error can be reduced
Furthermore, appropriate choice of the regularization
pa-rameter is effective for reducing the equalization error In the
next section, we investigate the applicability of this strategy
to the RTF fluctuations
4 EXPERIMENTS FOR RTF FLUCTUATIONS
Simulations are undertaken to investigate the effect of the
RTF fluctuations on the inverse filter Here, we consider the
fluctuations caused by source position fluctuations in the
horizontal plane for the sake of simplicity The more general
case of three-dimensional fluctuations is not investigated in
this paper
4.1 Experimental setup
We consider the same room as in the previous experiment
shown inFigure 2 As for the source positions, we simulate
the fluctuations in source position as follows As shown in
Figure 10, we considerN equally spaced new positions placed
on a circle of radiusr centered at the original position As a
model of fluctuation, we assume that the source is located at
each of theseN positions with equal probability, and that the
averaged RTF over these positions is obtained through either measurement or estimation This averaged RTF is referred
to as “reference RTF,” and is used to calculate inverse filters according to (16) In the following simulation, the number
of source positons is fixed toN =8
4.2 Evaluation procedure
The performance of the inverse filter for fluctuations in the source position is evaluated as follows
(1) An inverse filter set is calculated based on the reference RTFs according to (16)
(2) For each new source position j ( j = 1, , 8),
equal-ization is achieved by filtering reverberant signals with the inverse filter set calculated in (1)
(3) SDR values are calculated for all of the dereverberated signals obtained in (2), and the SDR values are aver-aged over the 8 positions to obtain the overall perfor-mance measure
4.3 Results
The influence of the design parameters on performance is evaluated in the same manner as in the previous experiment
Figure 11shows the performance of an inverse filter designed with various modeling delays d and regularization
param-eters δ with radius r = 1 cm This radius corresponds to one eighth of a wavelength of the center frequency of sig-nals in consideration Conventional studies have shown con-siderable degradation in the performance for this displace-ment In general, the performance shows a similar tendency
to that obtained in the previous experiment That is, the per-formance is inversely proportional to the filter energy, and improved with increases in the regularization parameter and modeling delay We observed that the best performance was obtained at δ = 10−2 andd = 500 However, the perfor-mance is rather flat compared with that inFigure 6 For a change of source position ofr =1 cm, the best performance was 12 dB
In the second experiment, the modeling delay was fixed
atd = 500, and the effects of filter length M and
regular-ization parameterδ were investigated. Figure 12shows the performance in this case Here also, we observed that the performance is inversely proportional to the filter energy Furthermore, the performance depends on the regularization parameter less than in the case of additive noise In the case of
additive noise, the noise correlation matrix R nin (12) could
be well approximated toδI On the contrary, the correlation
matrix of the fluctuation R Hin (14) could not be correctly approximated toδI.
Figure 13shows the performance for position variations
ofr =1, 2, 3, and 4 cm The modeling delay was set atd =
500, and the filter length was set atM = 1333 (minimum case) andM =1333 + 500 (lengthened case) In both cases, whenr =1 cm,δ =10−2shows the maximum SDR value of around 12 dB Forr =2, 3, and 4 cm, the best regularization parameter wasδ =10−1
Trang 75
10
15
20
25
30
35
40
Regularization parameter
10 dB
20 dB
30 dB
40 dB (a)
0 5 10 15 20 25 30 35 40
Regularization parameter
10 dB
20 dB
30 dB
40 dB (b)
Figure 9: Performance as a function of regularization parameter for SNR values of 10, 20, 30, and 40 dB (d=500) Filter length was set at
M =1333 (a), andM =1333 + 500 (b), respectively
1 2
3 4
5
6
7
8 Original position
r cm
New position
Figure 10: Source positions considered in the experiment
Again, by using an appropriate delay and filter length, the
inverse filter’s energy could be reduced, and accordingly the
inverse filtering performance could be improved
Further-more, an appropriate choice of regularization parameter was
effective However, the effect of adjusting this regularization
parameter is less obvious than with additive noise
In the next section, we analyze the RTF fluctuations
caused by position changes, and discuss the differences
be-tween the results for RTF fluctuations and additive noise
5 DISCUSSION
5.1 Comparison between RTF fluctuations and noise
We compare the results for RTF fluctuations shown in
Figure 9 and the results for noise shown in Figure 13 As
shown in Figure 9, the dereverberation performance has a
maximum point for a certain regularization parameter value,
0 5 10 15 20 25
Regularization parameter
d =0
d =100
d =200
d =300
d =400
d =500 Figure 11: Performance as a function of the regularization param-eter and modeling delay (filter length is fixed atM =1333)
and this best value corresponds to the SNR value of the ob-served signals For example, with SNR = 20 dB, the best value isδ =10−2and this gives a maximum SDR of 20 dB, that is, we obtained almost the same SDR level as the input SNR When a smaller δ is used such as 10 −9, the filter en-ergy becomes large, and hence this results in a small SDR of 5 (minimum-length case) to 10 dB (lengthened filter case) By contrast, for RTF fluctuations ofr =1 cm (corresponding to one eighth of a wavelength of the center frequency of signals
Trang 85
10
15
20
25
Regularization parameter
M = Mmin
M = Mmin + 100
M = Mmin + 200
M = Mmin + 300
M = Mmin + 400
M = Mmin + 500 Figure 12: Performance as a function of the regularization
parame-ter and additional filparame-ter length (modeling delay is fixed atd =500)
in consideration) as shown in Figure 13, although the best
value for the regularization parameter is almost the same,
that is,δ =10−2, the corresponding SDR was around 12 dB,
and the curve was much broader than inFigure 9 That is,
the performance does not depend greatly onδ.
The cause of the difference between these two results
is discussed here We analyze the effect of using this
fil-ter in the fluctuation case on the performance using the
fluctuation model described in Section 5.1 Let us denote
the RTF matrix corresponding to each source position as
Hj = H + Hj, where H represents the reference RTF
ma-trix averaged over the positions, andHjrepresents the
fluc-tuation between the reference RTF and the RTF for the jth
new postion If the source switches back and forth among
all the possible positions with equal probability, we can
con-sider that the periods in which the source locates at each
po-sition are rearranged and put together Then, the total
er-ror may be calculated as the sum of erer-rors for all the
posi-tions as
N
N
j =1
Hjg−v2
= 1
N
N
j =1
H + Hj
g−v2
By considering sufficienty large number of N, we replace
spatial averaging with an expectation,
= E
(Hg−v + Hg) T(Hg−v + Hg) . (22)
This turns out to be (13)
−2 0 2 4 6 8 10 12 14
Regularization parameter
1 cm
2 cm
3 cm
4 cm (a)
−2 0 2 4 6 8 10 12 14
Regularization parameter
1 cm
2 cm
3 cm
4 cm (b)
Figure 13: Performance as a function of the regularization param-eter for position variations ofr =1, 2, 3 and 4 cm (d=500) Filter length was set atM =1333 (a), andM =1333 + 500 (b), respec-tively
Let us evaluate the difference in performance between
an example RTF fluctuation and of a random signal used
in the experiment Figure 14shows these autocorrelations There is a discrepancy between these two correlations This may explain why the adjustment of the regularization pa-rameter is of limited efficiency in the presence of RTF fluc-tuations
Then, the inverse filter in (15) is used to compare the performance with H = H and regularization matrices R
Trang 90
0.5
1
0 1000 2000 3000 4000 5000 6000 7000 8000
Time (samples) (a) Autocorrelation trace of RTF fluctuations,r =1 cm
−0.5
0
0.5
1
0 1000 2000 3000 4000 5000 6000 7000 8000
Time (samples) (b) Autocorrelation trace of a random signal Figure 14: Autocorrelation coefficients
Table 1: Regularization performance
Regularization matrixR (1)δI, δ =10−2 (2)E HTH ≈(1/8)# 8
j=1HT
jHj
defined as
(2) R= E HTH ≈(1/8)#8
j =1HT
jHj,Hj =Hj −H.
The performance of the inverse filter calculated with (15) is
shown inTable 1 The performance with the correlation
ma-trix in (2) is improved by 3.7 dB compared with the mama-trix
in (1) This result shows the effect of incorporating the
au-tocorrelation of the RTF fluctuations If the time structure of
the fluctuations could be obtained, for example by estimating
the averaged autocorrelation of the fluctuation, more robust
inverse filters could be obtained Future work should include
finding ways to estimate such fluctuation’s time structure
5.2 Results of speech dereverberation
Finally, the dereverberation performance is shown using
speech signals.Figure 15shows spectrograms of the (a)
orig-inal, (b) reverberant, and (c), (d) dereverberated speech
sig-nals The reference RTFs were used to calculate the inverse
filter, and the RTFs corresponding to the 5th new position
in Figure 10were used to calculate the reverberant speech
and for dereverberation The source position change is 1 cm
The filter length was set at M = 1333, and the modeling
delay was d = 500 The SDR of the reverberant speech is
1.8 dB.Figure 15(c)shows a spectrogram of the
dereverber-ated speech signal filtered by the inverse filter with the reg-ularization parameter δ = 10−9 Although the figure ap-pears less reverberant thanFigure 15(b), there is some degra-dation and an SDR of 10.9 dB was obtained Figure 15(d)
shows a spectrogram of the dereverberated speech filtered
by the inverse filter withδ = 10−2 When the proper reg-ularization parameter was used, the SDR improved by up
to 17 dB This SDR value is 5 dB higher than that obtained using a white signal as shown in Figure 13 This di ffer-ence comes from the fact that the distortion mainly occurs
in the higher frequency range, where speech has low en-ergy
Figure 16(a)shows a spectrogram of noisy and reverber-ant speech The SNR level at the microphone is 20 dB, and the SDR with respect to the source speech signal is 0.5 dB
Figure 16(b)shows a spectrogram of the dereverberated sig-nal whenδ =10−9is used The SDR of the dereverberated speech signal is 5.1 dB Although it appears less reverber-ant, the frequency components of the speech are buried in those of the noise This is because the incoming noise was amplified by the filter.Figure 16(c)shows a spectrogram of the dereverberated signal whenδ =10−2is used When the proper regularization parameter was used, the noise became less noticeable, because the filter energy was small As a re-sult, an SDR of 15.9 dB was achieved while the output SNR was kept over 20 dB
Trang 100 500 1000 1500 2000 2500 3000 3500 4000
Time (s) (a) Clean speech
0 500 1000 1500 2000 2500 3000 3500 4000
Time (s) (b) Reverberant speech (SDR=1.8 dB)
0 500 1000 1500 2000 2500 3000 3500 4000
Time (s) (c) Recovered speech with fluctuation (δ =10−9,
SDR=10.9 dB)
0 500 1000 1500 2000 2500 3000 3500 4000
Time (s) (d) Recovered speech with fluctuation (δ =10−2, SDR=17 dB)
Figure 15: Spectrograms of speech signals
6 CONCLUSION
With a view of extending the applicability of
inverse-filter-based dereverberation, this paper examined a design method
for an inverse filter, in which the filter design parameters
were adjusted to reduce the filter energy The
regulariza-tion parameter, modeling delay, and filter length were
se-lected to improve the performance when the RTFs
fluctu-ated and when slight interference noise was present at the
microphone signals Simulation results showed that the
in-verse filtering performance could be improved by properly
adjusting the design parameters, which led to a reduction
in the filter energy Consequently, this approach was shown
to be effective for both RTF fluctuation and interference
noise
We discussed the differences between the results we
ob-tained for RTF fluctuations and white noise We observed
that the performance with the regularization parameter did
not improve greatly with regard to the RTF fluctuations,
while the performance for the white noise showed a clear
peak corresponding to the input SNR level This is because
RTF fluctuations are not random, and the regularized
in-verse filter implicitly assumes that the fluctuation is
ran-dom To demonstrate this, we used the autocorrelation of the fluctuation to calculate the inverse filter The simula-tion result revealed that the RTF fluctuasimula-tion had time struc-tures Future work thus includes finding ways to incorporate such fluctuation’s time structures into the filter design pro-cess
Systematic determination of the design parameters also remains as future work Among the design parameters, a proper choice of the regularization parameter was impor-tant for the improvement in the performance, and the choice
of the filter length and the modeling delay was less cru-cial than the regularization parameter In the noisy case, the optimum regularization parameter that provides the best performance corresponds to the input SNR level, as shown inFigure 9 Thus, one way to determine the param-eter is through the estimation of the input SNR [20] For the RTF fluctuations, on the other hands, automatic deter-mination of the parameter may not be simple However, we observed from the results shown in Figure 13 that a rela-tively large value such asδ = 10−1 was effective in avoid-ing the degradation for small positional changes Thus, usavoid-ing such a large value may be one solution for the RTF fluctua-tions