1. Trang chủ
  2. » Ngoại Ngữ

Complex valued gaussian process regression for speech separation (tt)

14 81 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 14
Dung lượng 3,71 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

ii Abstract Speech separation is a challenging signal processing which plays a significant role in improving the accuracy of various real-world applications, such as speech recognition

Trang 1

୯ ҥ ύ ѧ ε Ꮲ

ၗૻπำᏢس ᅺγፕЎ

ፄኧࠠଯථၸำӣᘜᔈҔܭᇟॣϩᚆ

Complex-valued Gaussian Process Regression

for speech separation

ࣴ ز ғǺLe Dinh Nguyen

ࡰᏤ௲௤ǺЦৎቼ ௲௤

ύ ๮ ҇ ୯ ΋ԭ႟Ϥ ԃ Ϥ Д

Trang 2

NATIONAL CENTRAL UNIVERSITY

Department of Computer Science

Master Thesis

Complex-valued Gaussian Process Regression

for speech separation

ࣴ ز ғ : Le Dinh Nguyen ࡰᏤ௲௤ǺJia-Ching Wang

ύ ๮ ҇ ୯ 106 ԃ 6 Д

Trang 6

i

ύЎᄔा

ᇟॣϩᚆӧૻဦೀ౛ύࢂ΋໨ڀԖࡷᏯ܄ޑୢᚒǴځӧӚᅿ੿ჴШࣚޑ ᔈҔύวචΑख़ाբҔǴٯӵᇟॣᒣ᛽س಍܈ႝߞ೯ૻǶᇟॣϩᚆޑЬा Ҟ኱ࣁவ΋ঁڀԖӭঁว၉ޣޑషӝᇟॣ՗ीрঁձว၉ޣޑᇟॣǶҗܭ ӧ΋૓ԾฅᕉნΠǴᇟॣૻဦ࿶தڙډᏓॣ܈ځѬᇟॣޑυᘋǴᇟॣϩᚆ ӢԜᡂԋ΋ঁԖ֎ЇΚޑࣴزፐᚒǶ

ќ΋Бय़Ǵଯථၸำ(Gaussian Process, GP)ࢂ΋ᅿ୷ܭਡڄኧޑᐒᏔᏢ ಞБݤǴ٠Ъς࿶εໆޑ೏ᔈҔӧૻဦೀ౛΢ǶӧԜࣴزύǴךॺගр୷ ܭଯථၸำӣᘜ(Gaussian Process Regression, GPR)ޑБݤٰኳᔕషӝᇟॣ

ૻဦᆶଳృᇟॣϐ໔ޑߚጕ܄ࢀ৔Ǵ೏ख़ࡌޑᇟॣૻဦёҗGPኳࠠޑѳ֡ ڄኧ؃ளǶኳ္ࠠޑຬୖኧ(Hyper-parameter)җӅ೫ఊࡋݤ(Conjugate Gradient Method)ٰ຾Չന٫ϯǶӧჴᡍ΢٬ҔTIMITޑᇟॣၗ਑৤Ǵځ่

݀ᡉҢගрޑБݤԖၨӳޑ߄౜Ƕ

!

Trang 7

ii

Abstract

Speech separation is a challenging signal processing which plays a significant role in improving the accuracy of various real-world applications, such as speech recognition system and telecommunication Its main goal is to isolate or estimate the target voice of each speaker from a mixed speech talked

by various speakers at the same time Due to the fact that speech signals collected in the natural environment are frequently corrupted by noise data, speech separation has become an attractive research topic over the past several decades

In addition, Gaussian process (GP) is a flexible kernel-based learning method which has found widespread application in signal processing In this thesis, a supervised method is proposed for handling speech separation problem

In this work, we focus on modeling a nonlinear mapping between mixed and clean speeches based on GP regression, in which reconstructed audio signal is estimated by the predictive mean of GP model The nonlinear conjugate gradient method was utilized to perform the hyper-parameter optimization An experiment on a subset of TIMIT speech dataset is carried out to confirm the validity of the proposed approach

Trang 8

iii

Acknowledgements

The work presented in this thesis has been carried out at the Department of Computer Science and Information Engineering in National Central University, Taiwan during the years 2015-2017

First of all, I wish to express my deepest gratitude to my research advisor, Professor Jia-Ching Wang, for guiding and encouraging me in my research The fact that the thesis is finished at all is in great part of his endless enthusiasm for talking about my work

I also specially thank to Ms Sih-Huei Chen She greatly supported me for theoretical and helped me take my initial thesis proposal and develop it into a true body of work, resulting in several conference and workshop papers together

I would like to thank students in Laboratory for lots of interesting discussions, various help, and making life at the laboratory so enjoyable Especially, I would like to thank to Ms Sih-Huei Chen for discussing and co-working in the research, to Mr Tuan Pham for helping me familiar with source separation

The financial support provided by National Central University fellowship program and advisor Professor Jia-Ching Wang is gratefully acknowledged

In addition, I wish to thank my family for their support in all my efforts

Trang 9

iv

Table of Contents

Chapter 1 Introduction 1

1.1 Motivation 1

1.2 Aim and Objective 3

1.3 Thesis Overview 4

Chapter 2 Background knowledge 5

2.1 Gaussian Process 5

2.1.1 Introduction 5

2.1.2 Covariance functions 8

2.1.3 Optimization of hyper-parameters 10

2.2 Short-time Fourier transform 12

2.2.1 Introduction 12

2.2.2 Spectrogram of STFT 14

2.2.3 Inverse short-time Fourier transform 16

2.3 Overlap-add method 17

2.4 Complex-valued Derivatives: 22

2.4.1 Differentiating complex exponentials of a real parameter 22

2.4.1.1 Differentiating complex exponentials 22

2.4.2 Differentiating function of a complex parameter 23

Chapter 3 Employed systems 26

3.1 System overview: 26

3.1.1 Real-valued GP-based system for source separation 26

3.1.2 Complex-valued GP-based system for source separation 28

Trang 10

v

3.2 GP regression-based source separation: 29

3.2.1 Real-valued GPR-based source separation 29

3.2.2 Complex-valued GPR-based source separation 31

Chapter 4 Experiments 34

4.1 Real-valued GP regression-based model for source separation 34

4.2 Complex-valued GP regression-based model for speech enhancement 37 Chapter 5 Conclusions and future work 40

Bibliographies……….41

Trang 11

vi

List of Figures

Figure 1.1 Cocktail party problem 1

Figure 1.2 An example of single channel source separation 2

Figure 2.1 GP model for regression 8

Figure 2.2 GP model for regression 12

Figure 2.3 Windows overlapping 13

Figure 2.4 STFT of signal 14

Figure 2.5 (2-D) presentation of a spectrogram 16

Figure 2.6 ISTF process 17

Figure 2.7 A general diagram of OLA analysis and synthesis system 18

Figure 2.8 Linear convolution 18

Figure 2.9 OLA overview 20

Figure 2.10 An example of OLA 21

Figure 3.1 Real-valued GPR-based system 27

Figure 3.2 Complex-valued GPR-based system 28

Figure 4.1 Spectrograms of mixture, 1 source and 1 de-noised speech 37

Trang 12

vii

List of tables

Table 2.1 List of common Kernel functions 10

Table 4.1 Source separation performance using 512-points STFT 36

Table 4.2 Source separation performance using 1024-points STFT 36

Table 4.3 SNR and SegSNR in dB averaged over the white noise 38

Table 4.4 SNR and SegSNR in dB averaged over the babble noise 38

Trang 13

viii

List of symbols and abbreviations Symbols

È ՜ Joint distribution

*

*

*

cov( )f ՜ Predictive covariance

d

l ՜ Characteristic length-scale

R

z ՜ Real part of z

I

Trang 14

ix

Abbreviations

GPR ՜ Gaussian process regression

NMF ՜ Nonnegative Matrix Factorization SCSS ՜ Single-channel speech separation STFT ՜ Short-time Fourier transform

iSTFT ՜ Inverse Short-time Fourier transform iFFT ՜ Inverse Fast Fourier transform

SAR ՜ Source-to-artifacts ratio

SIR ՜ Source-to-interference ratio

SegSNR ՜ Segmental signal-to-noise ratio

i.i.d ՜ Independent and identically distributed

Ngày đăng: 06/12/2018, 11:25

TỪ KHÓA LIÊN QUAN