ii Abstract Speech separation is a challenging signal processing which plays a significant role in improving the accuracy of various real-world applications, such as speech recognition
Trang 1୯ ҥ ύ ѧ ε Ꮲ
ၗૻπำᏢس ᅺγፕЎ
ፄኧࠠଯථၸำӣᘜᔈҔܭᇟॣϩᚆ
Complex-valued Gaussian Process Regression
for speech separation
ࣴ ز ғǺLe Dinh Nguyen
ࡰᏤ௲ǺЦৎቼ ௲
ύ ҇ ୯ ԭ႟Ϥ ԃ Ϥ Д
Trang 2NATIONAL CENTRAL UNIVERSITY
Department of Computer Science
Master Thesis
Complex-valued Gaussian Process Regression
for speech separation
ࣴ ز ғ : Le Dinh Nguyen ࡰᏤ௲ǺJia-Ching Wang
ύ ҇ ୯ 106 ԃ 6 Д
Trang 6i
ύЎᄔा
ᇟॣϩᚆӧૻဦೀύࢂڀԖࡷᏯ܄ޑୢᚒǴځӧӚᅿჴШࣚޑ ᔈҔύวචΑख़ाբҔǴٯӵᇟॣᒣس܈ႝߞ೯ૻǶᇟॣϩᚆޑЬा ҞࣁவঁڀԖӭঁว၉ޣޑషӝᇟॣीрঁձว၉ޣޑᇟॣǶҗܭ ӧԾฅᕉნΠǴᇟॣૻဦதڙډᏓॣ܈ځѬᇟॣޑυᘋǴᇟॣϩᚆ ӢԜᡂԋঁԖ֎ЇΚޑࣴزፐᚒǶ
ќБय़Ǵଯථၸำ(Gaussian Process, GP)ࢂᅿ୷ܭਡڄኧޑᐒᏔᏢ ಞБݤǴ٠ЪςεໆޑᔈҔӧૻဦೀǶӧԜࣴزύǴךॺගр୷ ܭଯථၸำӣᘜ(Gaussian Process Regression, GPR)ޑБݤٰኳᔕషӝᇟॣ
ૻဦᆶଳృᇟॣϐ໔ޑߚጕ܄ࢀǴख़ࡌޑᇟॣૻဦёҗGPኳࠠޑѳ֡ ڄኧளǶኳ္ࠠޑຬୖኧ(Hyper-parameter)җӅ೫ఊࡋݤ(Conjugate Gradient Method)ٰՉന٫ϯǶӧჴᡍ٬ҔTIMITޑᇟॣၗǴځ่
݀ᡉҢගрޑБݤԖၨӳޑ߄Ƕ
!
Trang 7ii
Abstract
Speech separation is a challenging signal processing which plays a significant role in improving the accuracy of various real-world applications, such as speech recognition system and telecommunication Its main goal is to isolate or estimate the target voice of each speaker from a mixed speech talked
by various speakers at the same time Due to the fact that speech signals collected in the natural environment are frequently corrupted by noise data, speech separation has become an attractive research topic over the past several decades
In addition, Gaussian process (GP) is a flexible kernel-based learning method which has found widespread application in signal processing In this thesis, a supervised method is proposed for handling speech separation problem
In this work, we focus on modeling a nonlinear mapping between mixed and clean speeches based on GP regression, in which reconstructed audio signal is estimated by the predictive mean of GP model The nonlinear conjugate gradient method was utilized to perform the hyper-parameter optimization An experiment on a subset of TIMIT speech dataset is carried out to confirm the validity of the proposed approach
Trang 8
iii
Acknowledgements
The work presented in this thesis has been carried out at the Department of Computer Science and Information Engineering in National Central University, Taiwan during the years 2015-2017
First of all, I wish to express my deepest gratitude to my research advisor, Professor Jia-Ching Wang, for guiding and encouraging me in my research The fact that the thesis is finished at all is in great part of his endless enthusiasm for talking about my work
I also specially thank to Ms Sih-Huei Chen She greatly supported me for theoretical and helped me take my initial thesis proposal and develop it into a true body of work, resulting in several conference and workshop papers together
I would like to thank students in Laboratory for lots of interesting discussions, various help, and making life at the laboratory so enjoyable Especially, I would like to thank to Ms Sih-Huei Chen for discussing and co-working in the research, to Mr Tuan Pham for helping me familiar with source separation
The financial support provided by National Central University fellowship program and advisor Professor Jia-Ching Wang is gratefully acknowledged
In addition, I wish to thank my family for their support in all my efforts
Trang 9iv
Table of Contents
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Aim and Objective 3
1.3 Thesis Overview 4
Chapter 2 Background knowledge 5
2.1 Gaussian Process 5
2.1.1 Introduction 5
2.1.2 Covariance functions 8
2.1.3 Optimization of hyper-parameters 10
2.2 Short-time Fourier transform 12
2.2.1 Introduction 12
2.2.2 Spectrogram of STFT 14
2.2.3 Inverse short-time Fourier transform 16
2.3 Overlap-add method 17
2.4 Complex-valued Derivatives: 22
2.4.1 Differentiating complex exponentials of a real parameter 22
2.4.1.1 Differentiating complex exponentials 22
2.4.2 Differentiating function of a complex parameter 23
Chapter 3 Employed systems 26
3.1 System overview: 26
3.1.1 Real-valued GP-based system for source separation 26
3.1.2 Complex-valued GP-based system for source separation 28
Trang 10v
3.2 GP regression-based source separation: 29
3.2.1 Real-valued GPR-based source separation 29
3.2.2 Complex-valued GPR-based source separation 31
Chapter 4 Experiments 34
4.1 Real-valued GP regression-based model for source separation 34
4.2 Complex-valued GP regression-based model for speech enhancement 37 Chapter 5 Conclusions and future work 40
Bibliographies……….41
Trang 11vi
List of Figures
Figure 1.1 Cocktail party problem 1
Figure 1.2 An example of single channel source separation 2
Figure 2.1 GP model for regression 8
Figure 2.2 GP model for regression 12
Figure 2.3 Windows overlapping 13
Figure 2.4 STFT of signal 14
Figure 2.5 (2-D) presentation of a spectrogram 16
Figure 2.6 ISTF process 17
Figure 2.7 A general diagram of OLA analysis and synthesis system 18
Figure 2.8 Linear convolution 18
Figure 2.9 OLA overview 20
Figure 2.10 An example of OLA 21
Figure 3.1 Real-valued GPR-based system 27
Figure 3.2 Complex-valued GPR-based system 28
Figure 4.1 Spectrograms of mixture, 1 source and 1 de-noised speech 37
Trang 12vii
List of tables
Table 2.1 List of common Kernel functions 10
Table 4.1 Source separation performance using 512-points STFT 36
Table 4.2 Source separation performance using 1024-points STFT 36
Table 4.3 SNR and SegSNR in dB averaged over the white noise 38
Table 4.4 SNR and SegSNR in dB averaged over the babble noise 38
Trang 13viii
List of symbols and abbreviations Symbols
È ՜ Joint distribution
*
*
*
cov( )f ՜ Predictive covariance
d
l ՜ Characteristic length-scale
R
z ՜ Real part of z
I
Trang 14ix
Abbreviations
GPR ՜ Gaussian process regression
NMF ՜ Nonnegative Matrix Factorization SCSS ՜ Single-channel speech separation STFT ՜ Short-time Fourier transform
iSTFT ՜ Inverse Short-time Fourier transform iFFT ՜ Inverse Fast Fourier transform
SAR ՜ Source-to-artifacts ratio
SIR ՜ Source-to-interference ratio
SegSNR ՜ Segmental signal-to-noise ratio
i.i.d ՜ Independent and identically distributed