1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn thạc sĩ study and design a procedure for building speech corpora for minority languages in vietnam

75 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Study and design a procedure for building speech corpora for minority languages in Vietnam
Tác giả Đoàn Thị Ngọc Hiền
Người hướng dẫn Dr. Eric Castelli
Trường học Hanoi University of Technology
Chuyên ngành Information Processing and Communication
Thể loại Thesis
Năm xuất bản 2005
Thành phố Hà Nội
Định dạng
Số trang 75
Dung lượng 176,49 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

HANOI UNIVERSITY OF TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCTENCE STUDY AND DESIGN A PROCEDURE FOR BUILDING SPEECH CORPORA FOR MINORITY LANGUAGES IN VIETNAM ĐOÀN THỊ NGỌC HI

Trang 1

HANOI UNIVERSITY OF TECHNOLOGY

THESIS FOR THE DEGREE OF MASTER

OF SCTENCE

STUDY AND DESIGN A PROCEDURE FOR

BUILDING SPEECH CORPORA FOR

MINORITY LANGUAGES IN VIETNAM

ĐOÀN THỊ NGỌC HIEN

Supervisor: Dr ERIC CASTELLI

HA NOT 2005

Trang 2

STUDY AND DESIGN A PROCEDURE FOR

BUILDING SPEECH CORPORA FOR MINORITY LANGUAGES IN VIETNAM

Trang 3

For the Degree of

MASTER OF INFORMATION PROCESSING AND COMMUNICATION

Trang 4

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 5

Thesis for the degree of master of Information Processing and Communication

3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56

33.2 Vietnamese Speech Synthesis - - 57

CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES

CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE

TECHNIQUES FOR RECORDING THE MINORITY CORPUS

Trang 6

Vigure 3.9 ‘Table of Melt requency Coefficients

Figure 4.1 Ausiro-Asiatic Carnily graph

Figure 4.2 Austronesian fiurily graph

Figure 4.3 Tai-Kadai family graph

Figure 4.4 Miao-Yao Ñmily graph -2

Figure 4.5 Sino-Sibetan family graph

Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707

Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok

Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording

Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”

Trang 7

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 8

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 9

Abstract

In the recent years, duc to the researches in speech processing licld, the

scientists have gained many considerable results, especially in speech

recognition and synthesis and they arc applicd in many different fields of the

life, such as spcech-based accessibility systems For example, there has becn

much work in speech-based and auditory interfaces to allow visually impaired

users Lo acucss existing graphical interfaces In general, mulliple modalities

have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent

years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech

Database (it is called VNSpeechCorpus) including about 100 recorded hours

with al last 50 speakers in different recording environments However, while

many majority languages corpora have also created and made available

recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building

VNSpeechCorpus, we expect to design a procedure for building of speech

corpora of languages of the minorilics in Vietnam The specch material can be

chosen to characterize the vowels and consonants and the measured

parameters will be the spectral characteristics of word (formants for the

vowels, fundamental frequency for the tone, vlc.)

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 10

Chapter 1 INTRODUCTION

So far, there were many projects of building speech corpus for majority

languages done such as ATR-JSDB, SpocchDal, SALA IL However, the

procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between

majority and mmority languages and the residential arca of minorities

Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis

The thesis is implemented upon three following basis Firstly, the

Vietnamese speech database has been built in the Intemational Research

Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam

And the final base is the speech corpus for some minorities that have been built in the world

‘To obtain the objective of thesis, we have to deal with four big

problems The first problem is to study the procedure of building the

Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus

because the first phase of building VNSpeechCorpus stopped in the recording

‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure

for a minority language in Vietnam

This thesis is organized as follows Chapter 2 gives an overview on

specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are

studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 11

Abstract

In the recent years, duc to the researches in speech processing licld, the

scientists have gained many considerable results, especially in speech

recognition and synthesis and they arc applicd in many different fields of the

life, such as spcech-based accessibility systems For example, there has becn

much work in speech-based and auditory interfaces to allow visually impaired

users Lo acucss existing graphical interfaces In general, mulliple modalities

have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent

years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech

Database (it is called VNSpeechCorpus) including about 100 recorded hours

with al last 50 speakers in different recording environments However, while

many majority languages corpora have also created and made available

recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building

VNSpeechCorpus, we expect to design a procedure for building of speech

corpora of languages of the minorilics in Vietnam The specch material can be

chosen to characterize the vowels and consonants and the measured

parameters will be the spectral characteristics of word (formants for the

vowels, fundamental frequency for the tone, vlc.)

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 12

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 13

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 14

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 15

Thesis for the degree of master of Information Processing and Communication

3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56

33.2 Vietnamese Speech Synthesis - - 57

CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES

CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE

TECHNIQUES FOR RECORDING THE MINORITY CORPUS

Trang 16

33.2 Vietnamese Speech Synthesis - - 57

CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES

CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE

TECHNIQUES FOR RECORDING THE MINORITY CORPUS

Trang 17

Thesis for the degree of master of Information Processing and Communication

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 18

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 19

Thesis for the degree of master of Information Processing and Communication

Figure 3.8 Table of Linear Predict Coding Coefficients

Vigure 3.9 ‘Table of Melt requency Coefficients

Figure 4.1 Ausiro-Asiatic Carnily graph

Figure 4.2 Austronesian fiurily graph

Figure 4.3 Tai-Kadai family graph

Figure 4.4 Miao-Yao Ñmily graph -2

Figure 4.5 Sino-Sibetan family graph

Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707

Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok

Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording

Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”

Trang 20

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 21

Abstract

In the recent years, duc to the researches in speech processing licld, the

scientists have gained many considerable results, especially in speech

recognition and synthesis and they arc applicd in many different fields of the

life, such as spcech-based accessibility systems For example, there has becn

much work in speech-based and auditory interfaces to allow visually impaired

users Lo acucss existing graphical interfaces In general, mulliple modalities

have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent

years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech

Database (it is called VNSpeechCorpus) including about 100 recorded hours

with al last 50 speakers in different recording environments However, while

many majority languages corpora have also created and made available

recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building

VNSpeechCorpus, we expect to design a procedure for building of speech

corpora of languages of the minorilics in Vietnam The specch material can be

chosen to characterize the vowels and consonants and the measured

parameters will be the spectral characteristics of word (formants for the

vowels, fundamental frequency for the tone, vlc.)

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 22

Chapter 1 INTRODUCTION

So far, there were many projects of building speech corpus for majority

languages done such as ATR-JSDB, SpocchDal, SALA IL However, the

procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between

majority and mmority languages and the residential arca of minorities

Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis

The thesis is implemented upon three following basis Firstly, the

Vietnamese speech database has been built in the Intemational Research

Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam

And the final base is the speech corpus for some minorities that have been built in the world

‘To obtain the objective of thesis, we have to deal with four big

problems The first problem is to study the procedure of building the

Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus

because the first phase of building VNSpeechCorpus stopped in the recording

‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure

for a minority language in Vietnam

This thesis is organized as follows Chapter 2 gives an overview on

specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are

studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 23

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 24

Vigure 3.9 ‘Table of Melt requency Coefficients

Figure 4.1 Ausiro-Asiatic Carnily graph

Figure 4.2 Austronesian fiurily graph

Figure 4.3 Tai-Kadai family graph

Figure 4.4 Miao-Yao Ñmily graph -2

Figure 4.5 Sino-Sibetan family graph

Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707

Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok

Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording

Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”

Trang 25

Abstract

In the recent years, duc to the researches in speech processing licld, the

scientists have gained many considerable results, especially in speech

recognition and synthesis and they arc applicd in many different fields of the

life, such as spcech-based accessibility systems For example, there has becn

much work in speech-based and auditory interfaces to allow visually impaired

users Lo acucss existing graphical interfaces In general, mulliple modalities

have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent

years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech

Database (it is called VNSpeechCorpus) including about 100 recorded hours

with al last 50 speakers in different recording environments However, while

many majority languages corpora have also created and made available

recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building

VNSpeechCorpus, we expect to design a procedure for building of speech

corpora of languages of the minorilics in Vietnam The specch material can be

chosen to characterize the vowels and consonants and the measured

parameters will be the spectral characteristics of word (formants for the

vowels, fundamental frequency for the tone, vlc.)

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 26

2.2 Speech signal representations

2.2.3 Linear predictive cođing cccoeceocoooec 2G

3.2 Theprogram of management ọ the VNSpcechCorpus

321 Sludy of SAM standard

323 Conversion of SAM signal into WAV signal - 3L

Trang 27

2.2 Speech signal representations

2.2.3 Linear predictive cođing cccoeceocoooec 2G

3.2 Theprogram of management ọ the VNSpcechCorpus

321 Sludy of SAM standard

323 Conversion of SAM signal into WAV signal - 3L

Trang 28

Vigure 3.9 ‘Table of Melt requency Coefficients

Figure 4.1 Ausiro-Asiatic Carnily graph

Figure 4.2 Austronesian fiurily graph

Figure 4.3 Tai-Kadai family graph

Figure 4.4 Miao-Yao Ñmily graph -2

Figure 4.5 Sino-Sibetan family graph

Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707

Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok

Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording

Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”

Trang 29

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 30

Acknowledgments

During the course of my thesis work, there were many peuple who

were instrumental in helping me I would like to take this opportunity to

acknowledge sume of them

Firstly, T would like to express my gratitude to my supervisor, Dr ric

Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience

Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA

I would like to thank to Ma Tran Do Dat who has a lot of experiences

in building a speech corpus database provided me helpful advices in the enuire

of researching and recording speech corpus

I would also like to thank my family, especially my parents for the

supporL thơy provided me through my cnlire life, withoul whose care,

encouragement 1 would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I

worked wn this thesis

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 31

Chapter 1 INTRODUCTION

So far, there were many projects of building speech corpus for majority

languages done such as ATR-JSDB, SpocchDal, SALA IL However, the

procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between

majority and mmority languages and the residential arca of minorities

Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis

The thesis is implemented upon three following basis Firstly, the

Vietnamese speech database has been built in the Intemational Research

Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam

And the final base is the speech corpus for some minorities that have been built in the world

‘To obtain the objective of thesis, we have to deal with four big

problems The first problem is to study the procedure of building the

Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus

because the first phase of building VNSpeechCorpus stopped in the recording

‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure

for a minority language in Vietnam

This thesis is organized as follows Chapter 2 gives an overview on

specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are

studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Trang 32

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 33

Thesis for the degree of master of Information Processing and Communication

3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56

33.2 Vietnamese Speech Synthesis - - 57

CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES

CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE

TECHNIQUES FOR RECORDING THE MINORITY CORPUS

Trang 34

2.2 Speech signal representations

2.2.3 Linear predictive cođing cccoeceocoooec 2G

3.2 Theprogram of management ọ the VNSpcechCorpus

321 Sludy of SAM standard

323 Conversion of SAM signal into WAV signal - 3L

Trang 35

Thesis for the degree of master of Information Processing and Communication

List of Figures

Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2

Figure 2.3 Basic source-filter model for speech signals - - odd

Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-

Figuic 2.8 Junction between two lossless tnbcs 122

Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24

Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past

the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54

TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005

Trang 36

33.2 Vietnamese Speech Synthesis - - 57

CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES

CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE

TECHNIQUES FOR RECORDING THE MINORITY CORPUS

Trang 37

Chapter 1 INTRODUCTION

So far, there were many projects of building speech corpus for majority

languages done such as ATR-JSDB, SpocchDal, SALA IL However, the

procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between

majority and mmority languages and the residential arca of minorities

Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis

The thesis is implemented upon three following basis Firstly, the

Vietnamese speech database has been built in the Intemational Research

Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam

And the final base is the speech corpus for some minorities that have been built in the world

‘To obtain the objective of thesis, we have to deal with four big

problems The first problem is to study the procedure of building the

Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus

because the first phase of building VNSpeechCorpus stopped in the recording

‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure

for a minority language in Vietnam

This thesis is organized as follows Chapter 2 gives an overview on

specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are

studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for

TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005

Ngày đăng: 19/06/2025, 16:22

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm