HANOI UNIVERSITY OF TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCTENCE STUDY AND DESIGN A PROCEDURE FOR BUILDING SPEECH CORPORA FOR MINORITY LANGUAGES IN VIETNAM ĐOÀN THỊ NGỌC HI
Trang 1HANOI UNIVERSITY OF TECHNOLOGY
THESIS FOR THE DEGREE OF MASTER
OF SCTENCE
STUDY AND DESIGN A PROCEDURE FOR
BUILDING SPEECH CORPORA FOR
MINORITY LANGUAGES IN VIETNAM
ĐOÀN THỊ NGỌC HIEN
Supervisor: Dr ERIC CASTELLI
HA NOT 2005
Trang 2STUDY AND DESIGN A PROCEDURE FOR
BUILDING SPEECH CORPORA FOR MINORITY LANGUAGES IN VIETNAM
Trang 3For the Degree of
MASTER OF INFORMATION PROCESSING AND COMMUNICATION
Trang 4Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 5Thesis for the degree of master of Information Processing and Communication
3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56
33.2 Vietnamese Speech Synthesis - - 57
CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES
CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE
TECHNIQUES FOR RECORDING THE MINORITY CORPUS
Trang 6Vigure 3.9 ‘Table of Melt requency Coefficients
Figure 4.1 Ausiro-Asiatic Carnily graph
Figure 4.2 Austronesian fiurily graph
Figure 4.3 Tai-Kadai family graph
Figure 4.4 Miao-Yao Ñmily graph -2
Figure 4.5 Sino-Sibetan family graph
Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707
Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok
Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording
Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”
Trang 7Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 8List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 9Abstract
In the recent years, duc to the researches in speech processing licld, the
scientists have gained many considerable results, especially in speech
recognition and synthesis and they arc applicd in many different fields of the
life, such as spcech-based accessibility systems For example, there has becn
much work in speech-based and auditory interfaces to allow visually impaired
users Lo acucss existing graphical interfaces In general, mulliple modalities
have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent
years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech
Database (it is called VNSpeechCorpus) including about 100 recorded hours
with al last 50 speakers in different recording environments However, while
many majority languages corpora have also created and made available
recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building
VNSpeechCorpus, we expect to design a procedure for building of speech
corpora of languages of the minorilics in Vietnam The specch material can be
chosen to characterize the vowels and consonants and the measured
parameters will be the spectral characteristics of word (formants for the
vowels, fundamental frequency for the tone, vlc.)
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 10Chapter 1 INTRODUCTION
So far, there were many projects of building speech corpus for majority
languages done such as ATR-JSDB, SpocchDal, SALA IL However, the
procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between
majority and mmority languages and the residential arca of minorities
Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis
The thesis is implemented upon three following basis Firstly, the
Vietnamese speech database has been built in the Intemational Research
Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam
And the final base is the speech corpus for some minorities that have been built in the world
‘To obtain the objective of thesis, we have to deal with four big
problems The first problem is to study the procedure of building the
Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus
because the first phase of building VNSpeechCorpus stopped in the recording
‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure
for a minority language in Vietnam
This thesis is organized as follows Chapter 2 gives an overview on
specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are
studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 11Abstract
In the recent years, duc to the researches in speech processing licld, the
scientists have gained many considerable results, especially in speech
recognition and synthesis and they arc applicd in many different fields of the
life, such as spcech-based accessibility systems For example, there has becn
much work in speech-based and auditory interfaces to allow visually impaired
users Lo acucss existing graphical interfaces In general, mulliple modalities
have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent
years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech
Database (it is called VNSpeechCorpus) including about 100 recorded hours
with al last 50 speakers in different recording environments However, while
many majority languages corpora have also created and made available
recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building
VNSpeechCorpus, we expect to design a procedure for building of speech
corpora of languages of the minorilics in Vietnam The specch material can be
chosen to characterize the vowels and consonants and the measured
parameters will be the spectral characteristics of word (formants for the
vowels, fundamental frequency for the tone, vlc.)
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 12Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 13Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 14List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 15Thesis for the degree of master of Information Processing and Communication
3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56
33.2 Vietnamese Speech Synthesis - - 57
CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES
CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE
TECHNIQUES FOR RECORDING THE MINORITY CORPUS
Trang 1633.2 Vietnamese Speech Synthesis - - 57
CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES
CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE
TECHNIQUES FOR RECORDING THE MINORITY CORPUS
Trang 17Thesis for the degree of master of Information Processing and Communication
List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 18Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 19Thesis for the degree of master of Information Processing and Communication
Figure 3.8 Table of Linear Predict Coding Coefficients
Vigure 3.9 ‘Table of Melt requency Coefficients
Figure 4.1 Ausiro-Asiatic Carnily graph
Figure 4.2 Austronesian fiurily graph
Figure 4.3 Tai-Kadai family graph
Figure 4.4 Miao-Yao Ñmily graph -2
Figure 4.5 Sino-Sibetan family graph
Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707
Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok
Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording
Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”
Trang 20List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 21Abstract
In the recent years, duc to the researches in speech processing licld, the
scientists have gained many considerable results, especially in speech
recognition and synthesis and they arc applicd in many different fields of the
life, such as spcech-based accessibility systems For example, there has becn
much work in speech-based and auditory interfaces to allow visually impaired
users Lo acucss existing graphical interfaces In general, mulliple modalities
have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent
years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech
Database (it is called VNSpeechCorpus) including about 100 recorded hours
with al last 50 speakers in different recording environments However, while
many majority languages corpora have also created and made available
recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building
VNSpeechCorpus, we expect to design a procedure for building of speech
corpora of languages of the minorilics in Vietnam The specch material can be
chosen to characterize the vowels and consonants and the measured
parameters will be the spectral characteristics of word (formants for the
vowels, fundamental frequency for the tone, vlc.)
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 22Chapter 1 INTRODUCTION
So far, there were many projects of building speech corpus for majority
languages done such as ATR-JSDB, SpocchDal, SALA IL However, the
procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between
majority and mmority languages and the residential arca of minorities
Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis
The thesis is implemented upon three following basis Firstly, the
Vietnamese speech database has been built in the Intemational Research
Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam
And the final base is the speech corpus for some minorities that have been built in the world
‘To obtain the objective of thesis, we have to deal with four big
problems The first problem is to study the procedure of building the
Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus
because the first phase of building VNSpeechCorpus stopped in the recording
‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure
for a minority language in Vietnam
This thesis is organized as follows Chapter 2 gives an overview on
specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are
studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 23Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 24Vigure 3.9 ‘Table of Melt requency Coefficients
Figure 4.1 Ausiro-Asiatic Carnily graph
Figure 4.2 Austronesian fiurily graph
Figure 4.3 Tai-Kadai family graph
Figure 4.4 Miao-Yao Ñmily graph -2
Figure 4.5 Sino-Sibetan family graph
Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707
Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok
Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording
Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”
Trang 25Abstract
In the recent years, duc to the researches in speech processing licld, the
scientists have gained many considerable results, especially in speech
recognition and synthesis and they arc applicd in many different fields of the
life, such as spcech-based accessibility systems For example, there has becn
much work in speech-based and auditory interfaces to allow visually impaired
users Lo acucss existing graphical interfaces In general, mulliple modalities
have been used to make human-computer interaction accessible for people with disabilities Since the year 90s, for studying all speech aspects, many speech databases have been built in the world such as SpeechDat, SATA T-TT, SPEECON In Vietnam, speech processing has been researched in recent
years and the International Research Center MICA (Multimedia Information Communication and Applications) has built one large Vietnamese Speech
Database (it is called VNSpeechCorpus) including about 100 recorded hours
with al last 50 speakers in different recording environments However, while
many majority languages corpora have also created and made available
recently, less progress has been made in the creation of minority language resources Realizing this problem and basing on the experiences in building
VNSpeechCorpus, we expect to design a procedure for building of speech
corpora of languages of the minorilics in Vietnam The specch material can be
chosen to characterize the vowels and consonants and the measured
parameters will be the spectral characteristics of word (formants for the
vowels, fundamental frequency for the tone, vlc.)
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 262.2 Speech signal representations
2.2.3 Linear predictive cođing cccoeceocoooec 2G
3.2 Theprogram of management ọ the VNSpcechCorpus
321 Sludy of SAM standard
323 Conversion of SAM signal into WAV signal - 3L
Trang 272.2 Speech signal representations
2.2.3 Linear predictive cođing cccoeceocoooec 2G
3.2 Theprogram of management ọ the VNSpcechCorpus
321 Sludy of SAM standard
323 Conversion of SAM signal into WAV signal - 3L
Trang 28Vigure 3.9 ‘Table of Melt requency Coefficients
Figure 4.1 Ausiro-Asiatic Carnily graph
Figure 4.2 Austronesian fiurily graph
Figure 4.3 Tai-Kadai family graph
Figure 4.4 Miao-Yao Ñmily graph -2
Figure 4.5 Sino-Sibetan family graph
Figure 5.1 Portable Minidisc Recorder SONY Walkman MZ-N707
Figure 5.2 Sound Blaster Audigy 2 ZS Notchaok
Figue 5.3 USBPrs Microphone Iterfaee for Computer Audio Recording
Iigue 5.4'The waveferm and spectrogram of sentenice “Phôngv na t ix”
Trang 29Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 30Acknowledgments
During the course of my thesis work, there were many peuple who
were instrumental in helping me I would like to take this opportunity to
acknowledge sume of them
Firstly, T would like to express my gratitude to my supervisor, Dr ric
Castelli, whose expertise, understanding, patience, added considerably and constructively crilical cye Lo my graduate experience
Special thanks go out to Dr Nguyen Trong Giang and Dr Pham Thi Kgoc Yen for supporting me the best convenient conditions during time working in International Research Center MICA
I would like to thank to Ma Tran Do Dat who has a lot of experiences
in building a speech corpus database provided me helpful advices in the enuire
of researching and recording speech corpus
I would also like to thank my family, especially my parents for the
supporL thơy provided me through my cnlire life, withoul whose care,
encouragement 1 would not have finished this thesis
Finally, thanks go to all of my colleagues who helped me while I
worked wn this thesis
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 31Chapter 1 INTRODUCTION
So far, there were many projects of building speech corpus for majority
languages done such as ATR-JSDB, SpocchDal, SALA IL However, the
procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between
majority and mmority languages and the residential arca of minorities
Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis
The thesis is implemented upon three following basis Firstly, the
Vietnamese speech database has been built in the Intemational Research
Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam
And the final base is the speech corpus for some minorities that have been built in the world
‘To obtain the objective of thesis, we have to deal with four big
problems The first problem is to study the procedure of building the
Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus
because the first phase of building VNSpeechCorpus stopped in the recording
‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure
for a minority language in Vietnam
This thesis is organized as follows Chapter 2 gives an overview on
specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are
studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005
Trang 32List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 33Thesis for the degree of master of Information Processing and Communication
3.3.1 1.arge Vosabnlary Continnons Speech Recognition system for Vietnamese 56
33.2 Vietnamese Speech Synthesis - - 57
CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES
CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE
TECHNIQUES FOR RECORDING THE MINORITY CORPUS
Trang 342.2 Speech signal representations
2.2.3 Linear predictive cođing cccoeceocoooec 2G
3.2 Theprogram of management ọ the VNSpcechCorpus
321 Sludy of SAM standard
323 Conversion of SAM signal into WAV signal - 3L
Trang 35Thesis for the degree of master of Information Processing and Communication
List of Figures
Figure 2.1 Schematic diagram of Ihe human vocat mechanism 10 Figure 2.2 Block diagram of human speech production l2
Figure 2.3 Basic source-filter model for speech signals - - odd
Figure 2.4 (2) Waveform with (b) ils corresponding wideband spoctrogram 14 Darker areas mean higher energy ft thất time and ŸiequehcY eosocseeooooee 14 Figure 2.5 Conversion between log-energy values (in the x-axis) and gray scale (in the y-
Figuic 2.8 Junction between two lossless tnbcs 122
Figure 2.9 Coupling o£ the nasal cavity with the oral eaVïFy 24
Figure 2.10 Madel of the glottat excitation for voiced sounds 2⁄4 Figure 2.11 General disorete-tirne model oŸ speechh prodacfion - 35 Figure 2.12 Source-filter model for voiced and unvoiced speech 25 Figure 2.13 A mixed excitation source-filicr model of speach - 26 Figure 2.14 The orthogonality principte The prediction error is orlhogonat to the past
the linear-frequency cepstrum coefficients e368 Figure 2.17 Triangular filters used in the computation o£ the mel-ccpstruin 38 Figure 3.1 The strueture of the VNSpeechCorpus -.46 Figure 3.2 Description of the nomenclature of the files in the SAM standard 47 Figure 3.3 Example of a file name of description of corpus - 4Ð Figure 3.4 The process of building the speech database 50 Vigure 3.5 ‘The relation between tables of the speech database - s0 Figure 3.6 The interface of the VNSpeechCurpus - - ene SB Figure 3.7 The result of search by word and type oŸ eoTpus 54
TDoan Thi Ngac Hicu _ Mastcr IPC 2003-2005
Trang 3633.2 Vietnamese Speech Synthesis - - 57
CIIAPTER 4, TITE VIETNAMESE MINORITY LANGUAGES
CHAPTERS THE SPEECH CORPUS AND THE ADAPTIVE
TECHNIQUES FOR RECORDING THE MINORITY CORPUS
Trang 37Chapter 1 INTRODUCTION
So far, there were many projects of building speech corpus for majority
languages done such as ATR-JSDB, SpocchDal, SALA IL However, the
procedure of building speech corpus for majority languages can not be applied for minority languages because of the different characters between
majority and mmority languages and the residential arca of minorities
Therefore, study and design a procedure for building speech corpora for minority languages in Vietnam is the objective of this thesis
The thesis is implemented upon three following basis Firstly, the
Vietnamese speech database has been built in the Intemational Research
Center MICA, Hanoi University of Technology, Victnam Sccondly, it is the research on the history and characters of the minority languages in Vietnam
And the final base is the speech corpus for some minorities that have been built in the world
‘To obtain the objective of thesis, we have to deal with four big
problems The first problem is to study the procedure of building the
Vietnamese speech database (it is called the VNSpeechCorpus) ‘he second problem is to design a program of management of the VNSpeechCorpus
because the first phase of building VNSpeechCorpus stopped in the recording
‘The next is to design a procedure for building speech corpora for minority languages And the last but not least is to experiment with a new procedure
for a minority language in Vietnam
This thesis is organized as follows Chapter 2 gives an overview on
specch signals and their representatives Chapter 3 discusses the building ol a Vietnamese speech corpus The languages of minorities in Vietnam are
studied in Chapter 4 Chapter 5 will introduce the adaptive techniques for
TDoan Thi Ngac Hicn _ Mastcr IPC 2003-2005