With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sen
Trang 1Thesis for the degree of
MASTER OF SCTENCE
Modeling the prosody
of Vietnamese language for speech
Trang 2
Faculty of Information Technology
International research center of Multimedia Information, Communication and Application
Trang 3
Acknowledgment
Many people provided me generous help and inspiration during ny time of master
student
Furst, I would like to express iny deep sense of respect and gratilude lowards my
supervisors: Dr Eric Castelli and Prof Pham Thi Ngoc Yén Thank you very much
for orienting and guiding my research in speech processing domain Thank you for
all your useful advices, your true criticisms and your patience during my time of
master research
Special thanks also gocs to Mrs Genevieve Caclcn-Haumont, PhD students Tran
Đỗ Đạt, Vũ Minh Quang and all members of MICA’s speech group I could not
have done this thesis without your supports Thank all of you for all your
suggestions and your sincere remarks on entire of my research
1 would like to thank to Ms Đoàn Thị Ngọc Hiển, who guiding me in recording the corpus | would also like to thank to a lot of MICA member who spent much of time for recording and testing for my research
Tam grateful to Prof Nguyén Trong Giling and MICA’s directorate supporting, me the best convenient conditions during time working in Intemational Research Center MICA
Finally, I owe a great deal to my parents and my sister for ther continued support 1
also give a very special thanks to my girl friend for her constant encouragement,
giving me strength and motivation in my work and in my life
Trang 4Table of contents
Acknowledgment
LAA, The concept af prosody
1.1.2 Major components af wrosady
1.1.3 The fanetions of prosedy
1.14 Levele of representation of prosadic phenomena
1.2 Prosody modeling
1.2.1 Intonation model
1.2.2 Thuration madeling,
1.2.5, This thesis work approaoh ce.cseooeee
2 VIETNAMESE LANGUAGE AND PROSODY
3 TTS SYSTEM AND PROSODY GENERATION
3.2 Prosody generaliom
3.2.1 Overview of prosody generat
3.2.2 From lext lo prosody
3.3 Otherzesearches and our proposal
4 PROSODY PATTERNS EXTRACTION
41 Prosody corpus
Mạc Đăng Khoa
Trang 5List of Figures
Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23
Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30
Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31
Figure 2-3: The shape of Tone 2 with female and male voiee [18 31
Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32
Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33
Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34
Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38 Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—
Figure 2-13: The sentznces "Tân bỏ đi chứ” in 137
Figure 2-14: The differences of FU contour between Assertive and Interrogative
Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB
Trang 6Table of contents
Acknowledgment
LAA, The concept af prosody
1.1.2 Major components af wrosady
1.1.3 The fanetions of prosedy
1.14 Levele of representation of prosadic phenomena
1.2 Prosody modeling
1.2.1 Intonation model
1.2.2 Thuration madeling,
1.2.5, This thesis work approaoh ce.cseooeee
2 VIETNAMESE LANGUAGE AND PROSODY
3 TTS SYSTEM AND PROSODY GENERATION
3.2 Prosody generaliom
3.2.1 Overview of prosody generat
3.2.2 From lext lo prosody
3.3 Otherzesearches and our proposal
4 PROSODY PATTERNS EXTRACTION
41 Prosody corpus
Mạc Đăng Khoa
Trang 8Table of contents
Acknowledgment
LAA, The concept af prosody
1.1.2 Major components af wrosady
1.1.3 The fanetions of prosedy
1.14 Levele of representation of prosadic phenomena
1.2 Prosody modeling
1.2.1 Intonation model
1.2.2 Thuration madeling,
1.2.5, This thesis work approaoh ce.cseooeee
2 VIETNAMESE LANGUAGE AND PROSODY
3 TTS SYSTEM AND PROSODY GENERATION
3.2 Prosody generaliom
3.2.1 Overview of prosody generat
3.2.2 From lext lo prosody
3.3 Otherzesearches and our proposal
4 PROSODY PATTERNS EXTRACTION
41 Prosody corpus
Mạc Đăng Khoa
Trang 9List of Figures
Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23
Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30
Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31
Figure 2-3: The shape of Tone 2 with female and male voiee [18 31
Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32
Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33
Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34
Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38 Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—
Figure 2-13: The sentznces "Tân bỏ đi chứ” in 137
Figure 2-14: The differences of FU contour between Assertive and Interrogative
Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB
Trang 10Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of
cach phonetic unit (14] - - 28
Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2
Table 4.2: Prosody corpus text information - 33
Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75
Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)
K4
Table 5.9: Correct recognition rate (%) with other types of sentences 6
Mạc Đăng Khoa
Trang 11Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Trang 12Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of
cach phonetic unit (14] - - 28
Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2
Table 4.2: Prosody corpus text information - 33
Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75
Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)
K4
Table 5.9: Correct recognition rate (%) with other types of sentences 6
Mạc Đăng Khoa
Trang 14Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of
cach phonetic unit (14] - - 28
Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2
Table 4.2: Prosody corpus text information - 33
Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75
Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)
K4
Table 5.9: Correct recognition rate (%) with other types of sentences 6
Mạc Đăng Khoa
Trang 15Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Trang 16Mạc Đăng Khoa
Trang 174.2.2 Extracting prosody parameters of key-syllable
43 Proposal the pallems for Vietnamese prosody
4.3.1 Methodology
4.3.2 Trosody palterns
§ EXPERIMENTS AND EVALUATION
5.1 Experiment 1: Tone and non-sense phrase
3.1L Objectives Hinh
5.12 Method and implementation
5.1.3 Results and discussion
5.2 Experiment 2: Malti-type sentences
52.1 Objeelives
3.2.2 Method and ‘Implementation
$.2.3, Results and discussion
3.3 Comparison and conclusion
6 CONCLUSION AND PERSPECTIVES
REFERENCES
APPENDIX
A Text far prosody corpns - 95
B: Datasheet of prosody pattemns
Trang 18Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: es The results shown
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Mạc Đăng Khoa
Trang 194.2.2 Extracting prosody parameters of key-syllable
43 Proposal the pallems for Vietnamese prosody
4.3.1 Methodology
4.3.2 Trosody palterns
§ EXPERIMENTS AND EVALUATION
5.1 Experiment 1: Tone and non-sense phrase
3.1L Objectives Hinh
5.12 Method and implementation
5.1.3 Results and discussion
5.2 Experiment 2: Malti-type sentences
52.1 Objeelives
3.2.2 Method and ‘Implementation
$.2.3, Results and discussion
3.3 Comparison and conclusion
6 CONCLUSION AND PERSPECTIVES
REFERENCES
APPENDIX
A Text far prosody corpns - 95
B: Datasheet of prosody pattemns
Trang 20Table of contents
Acknowledgment
LAA, The concept af prosody
1.1.2 Major components af wrosady
1.1.3 The fanetions of prosedy
1.14 Levele of representation of prosadic phenomena
1.2 Prosody modeling
1.2.1 Intonation model
1.2.2 Thuration madeling,
1.2.5, This thesis work approaoh ce.cseooeee
2 VIETNAMESE LANGUAGE AND PROSODY
3 TTS SYSTEM AND PROSODY GENERATION
3.2 Prosody generaliom
3.2.1 Overview of prosody generat
3.2.2 From lext lo prosody
3.3 Otherzesearches and our proposal
4 PROSODY PATTERNS EXTRACTION
41 Prosody corpus
Mạc Đăng Khoa
Trang 21Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Trang 22List of Figures
Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23 Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30 Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31 Figure 2-3: The shape of Tone 2 with female and male voiee [18 31 Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32 Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33 Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34 Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38
Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—
Figure 2-14: The differences of FU contour between Assertive and Interrogative
Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB
Eigure 5-3: An cxamplc of synfhesized nulfi-typc senfcnees BÚ
Mạc Đăng Khoa
Trang 23Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Trang 24Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: es The results shown
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Mạc Đăng Khoa
Trang 25List of Figures
Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23 Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30 Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31 Figure 2-3: The shape of Tone 2 with female and male voiee [18 31 Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32 Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33 Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34 Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38
Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—
Figure 2-14: The differences of FU contour between Assertive and Interrogative
Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB
Trang 26
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Mạc Đăng Khoa
Trang 27Figure 5-4: Interface for Perception test 2 seus Seo B2
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Trang 28Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: es The results shown
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Mạc Đăng Khoa
Trang 29Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system
Trang 30
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Mạc Đăng Khoa
Trang 31Figure 5-4: Interface for Perception test 2 seus Seo B2
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Trang 324.1.2 Deline the corpus text
4.1.3 Recarding
4.14 Sentence segmentation
42 Analysis and extracting prosody parameters
4.2.1 Segmentation
4.2.2 Extracting prosody parameters of key-syllable
43 Proposal the pallems for Vietnamese prosody
4.3.1 Methodology
4.3.2 Trosody palterns
§ EXPERIMENTS AND EVALUATION
5.1 Experiment 1: Tone and non-sense phrase
3.1L Objectives Hinh
5.12 Method and implementation
5.1.3 Results and discussion
5.2 Experiment 2: Malti-type sentences
52.1 Objeelives
3.2.2 Method and ‘Implementation
$.2.3, Results and discussion
3.3 Comparison and conclusion
A Text far prosody corpns - 95
B: Datasheet of prosody pattemns
Mạc Đăng Khoa
Trang 33Figure 5-4: Interface for Perception test 2 seus Seo B2
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Trang 34
Eigure 5-5: Correct reoognition rate with 8 tones of last syllable
Figure $-6: Correel recognition rate (%) with other Íypes of senences R6
Figurs $-7: Result comparison of three experiments Tuy 8?
Mạc Đăng Khoa
Trang 35Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of
cach phonetic unit (14] - - 28
Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2
Table 4.2: Prosody corpus text information - 33
Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75
Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)
K4
Table 5.9: Correct recognition rate (%) with other types of sentences 6
Trang 364.1.2 Deline the corpus text
4.1.3 Recarding
4.14 Sentence segmentation
42 Analysis and extracting prosody parameters
4.2.1 Segmentation
4.2.2 Extracting prosody parameters of key-syllable
43 Proposal the pallems for Vietnamese prosody
4.3.1 Methodology
4.3.2 Trosody palterns
§ EXPERIMENTS AND EVALUATION
5.1 Experiment 1: Tone and non-sense phrase
3.1L Objectives Hinh
5.12 Method and implementation
5.1.3 Results and discussion
5.2 Experiment 2: Malti-type sentences
52.1 Objeelives
3.2.2 Method and ‘Implementation
$.2.3, Results and discussion
3.3 Comparison and conclusion
A Text far prosody corpns - 95
B: Datasheet of prosody pattemns
Mạc Đăng Khoa
Trang 37Abstract
Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking
With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole
sentence
The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the
micro-prosody, in three types of sentence: assertive, interrogative and imperative
The first task is to set up a “prosody corpus” and extract all possible prosody
parameters Base on the extracted data, we defined seventy-two simple prosody
patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception
experiments were taken to evaluate the: synthesized seniton es The results shown
that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence
‘This work is our preliminary work in Viemamese prosody, just concerning the
sentence types and the position of syllable ina sentence Tn the future, we expect to
contimie this research with more faclors of Vietnamesz prosody, improve our
pattern and apply them Vietnamese TTS system