1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn modeling the prosody of vietnamese language for speech synthesis

75 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Modeling the prosody of Vietnamese language for speech synthesis
Tác giả Mạc Đăng Khoa
Người hướng dẫn Prof. Pham Thi Ngoc Yên
Trường học Hanoi University of Technology
Chuyên ngành Information processing and communication
Thể loại Thesis
Năm xuất bản 2007
Thành phố Hanoi
Định dạng
Số trang 75
Dung lượng 304,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sen

Trang 1

Thesis for the degree of

MASTER OF SCTENCE

Modeling the prosody

of Vietnamese language for speech

Trang 2

Faculty of Information Technology

International research center of Multimedia Information, Communication and Application

Trang 3

Acknowledgment

Many people provided me generous help and inspiration during ny time of master

student

Furst, I would like to express iny deep sense of respect and gratilude lowards my

supervisors: Dr Eric Castelli and Prof Pham Thi Ngoc Yén Thank you very much

for orienting and guiding my research in speech processing domain Thank you for

all your useful advices, your true criticisms and your patience during my time of

master research

Special thanks also gocs to Mrs Genevieve Caclcn-Haumont, PhD students Tran

Đỗ Đạt, Vũ Minh Quang and all members of MICA’s speech group I could not

have done this thesis without your supports Thank all of you for all your

suggestions and your sincere remarks on entire of my research

1 would like to thank to Ms Đoàn Thị Ngọc Hiển, who guiding me in recording the corpus | would also like to thank to a lot of MICA member who spent much of time for recording and testing for my research

Tam grateful to Prof Nguyén Trong Giling and MICA’s directorate supporting, me the best convenient conditions during time working in Intemational Research Center MICA

Finally, I owe a great deal to my parents and my sister for ther continued support 1

also give a very special thanks to my girl friend for her constant encouragement,

giving me strength and motivation in my work and in my life

Trang 4

Table of contents

Acknowledgment

LAA, The concept af prosody

1.1.2 Major components af wrosady

1.1.3 The fanetions of prosedy

1.14 Levele of representation of prosadic phenomena

1.2 Prosody modeling

1.2.1 Intonation model

1.2.2 Thuration madeling,

1.2.5, This thesis work approaoh ce.cseooeee

2 VIETNAMESE LANGUAGE AND PROSODY

3 TTS SYSTEM AND PROSODY GENERATION

3.2 Prosody generaliom

3.2.1 Overview of prosody generat

3.2.2 From lext lo prosody

3.3 Otherzesearches and our proposal

4 PROSODY PATTERNS EXTRACTION

41 Prosody corpus

Mạc Đăng Khoa

Trang 5

List of Figures

Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23

Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30

Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31

Figure 2-3: The shape of Tone 2 with female and male voiee [18 31

Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32

Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33

Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34

Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38 Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—

Figure 2-13: The sentznces "Tân bỏ đi chứ” in 137

Figure 2-14: The differences of FU contour between Assertive and Interrogative

Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB

Trang 6

Table of contents

Acknowledgment

LAA, The concept af prosody

1.1.2 Major components af wrosady

1.1.3 The fanetions of prosedy

1.14 Levele of representation of prosadic phenomena

1.2 Prosody modeling

1.2.1 Intonation model

1.2.2 Thuration madeling,

1.2.5, This thesis work approaoh ce.cseooeee

2 VIETNAMESE LANGUAGE AND PROSODY

3 TTS SYSTEM AND PROSODY GENERATION

3.2 Prosody generaliom

3.2.1 Overview of prosody generat

3.2.2 From lext lo prosody

3.3 Otherzesearches and our proposal

4 PROSODY PATTERNS EXTRACTION

41 Prosody corpus

Mạc Đăng Khoa

Trang 8

Table of contents

Acknowledgment

LAA, The concept af prosody

1.1.2 Major components af wrosady

1.1.3 The fanetions of prosedy

1.14 Levele of representation of prosadic phenomena

1.2 Prosody modeling

1.2.1 Intonation model

1.2.2 Thuration madeling,

1.2.5, This thesis work approaoh ce.cseooeee

2 VIETNAMESE LANGUAGE AND PROSODY

3 TTS SYSTEM AND PROSODY GENERATION

3.2 Prosody generaliom

3.2.1 Overview of prosody generat

3.2.2 From lext lo prosody

3.3 Otherzesearches and our proposal

4 PROSODY PATTERNS EXTRACTION

41 Prosody corpus

Mạc Đăng Khoa

Trang 9

List of Figures

Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23

Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30

Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31

Figure 2-3: The shape of Tone 2 with female and male voiee [18 31

Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32

Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33

Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34

Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38 Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—

Figure 2-13: The sentznces "Tân bỏ đi chứ” in 137

Figure 2-14: The differences of FU contour between Assertive and Interrogative

Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB

Trang 10

Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of

cach phonetic unit (14] - - 28

Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2

Table 4.2: Prosody corpus text information - 33

Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75

Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)

K4

Table 5.9: Correct recognition rate (%) with other types of sentences 6

Mạc Đăng Khoa

Trang 11

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Trang 12

Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of

cach phonetic unit (14] - - 28

Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2

Table 4.2: Prosody corpus text information - 33

Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75

Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)

K4

Table 5.9: Correct recognition rate (%) with other types of sentences 6

Mạc Đăng Khoa

Trang 14

Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of

cach phonetic unit (14] - - 28

Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2

Table 4.2: Prosody corpus text information - 33

Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75

Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)

K4

Table 5.9: Correct recognition rate (%) with other types of sentences 6

Mạc Đăng Khoa

Trang 15

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Trang 16

Mạc Đăng Khoa

Trang 17

4.2.2 Extracting prosody parameters of key-syllable

43 Proposal the pallems for Vietnamese prosody

4.3.1 Methodology

4.3.2 Trosody palterns

§ EXPERIMENTS AND EVALUATION

5.1 Experiment 1: Tone and non-sense phrase

3.1L Objectives Hinh

5.12 Method and implementation

5.1.3 Results and discussion

5.2 Experiment 2: Malti-type sentences

52.1 Objeelives

3.2.2 Method and ‘Implementation

$.2.3, Results and discussion

3.3 Comparison and conclusion

6 CONCLUSION AND PERSPECTIVES

REFERENCES

APPENDIX

A Text far prosody corpns - 95

B: Datasheet of prosody pattemns

Trang 18

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: es The results shown

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Mạc Đăng Khoa

Trang 19

4.2.2 Extracting prosody parameters of key-syllable

43 Proposal the pallems for Vietnamese prosody

4.3.1 Methodology

4.3.2 Trosody palterns

§ EXPERIMENTS AND EVALUATION

5.1 Experiment 1: Tone and non-sense phrase

3.1L Objectives Hinh

5.12 Method and implementation

5.1.3 Results and discussion

5.2 Experiment 2: Malti-type sentences

52.1 Objeelives

3.2.2 Method and ‘Implementation

$.2.3, Results and discussion

3.3 Comparison and conclusion

6 CONCLUSION AND PERSPECTIVES

REFERENCES

APPENDIX

A Text far prosody corpns - 95

B: Datasheet of prosody pattemns

Trang 20

Table of contents

Acknowledgment

LAA, The concept af prosody

1.1.2 Major components af wrosady

1.1.3 The fanetions of prosedy

1.14 Levele of representation of prosadic phenomena

1.2 Prosody modeling

1.2.1 Intonation model

1.2.2 Thuration madeling,

1.2.5, This thesis work approaoh ce.cseooeee

2 VIETNAMESE LANGUAGE AND PROSODY

3 TTS SYSTEM AND PROSODY GENERATION

3.2 Prosody generaliom

3.2.1 Overview of prosody generat

3.2.2 From lext lo prosody

3.3 Otherzesearches and our proposal

4 PROSODY PATTERNS EXTRACTION

41 Prosody corpus

Mạc Đăng Khoa

Trang 21

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Trang 22

List of Figures

Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23 Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30 Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31 Figure 2-3: The shape of Tone 2 with female and male voiee [18 31 Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32 Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33 Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34 Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38

Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—

Figure 2-14: The differences of FU contour between Assertive and Interrogative

Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB

Eigure 5-3: An cxamplc of synfhesized nulfi-typc senfcnees BÚ

Mạc Đăng Khoa

Trang 23

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Trang 24

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: es The results shown

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Mạc Đăng Khoa

Trang 25

List of Figures

Figure 1-1: Category of methods for prcdicting syÏlable đuzation [6| 23 Figure 2-1: Example of the contows of six tones, as đescribzd in [21) 30 Figure 2-2: The shape of Tone 1 with femals and mnale voiee [18] 31 Figure 2-3: The shape of Tone 2 with female and male voiee [18 31 Figure 2-4: The shape of Tone 3 with female and male voiee [18] 32 Figure 2-5: The shape of Tone 4 with female and male voice [18] 32 Figure 2-6: The shape of Tone $ with female and male voice [18] 32 Figwe 2-7: The shape of Tone Sh with faiale and male voice [18] 33 Figure 2-8: The shape of Tone 6 wilh fermale and nuale voice [18] 33 Figure 2-9: The shape of Tone 6b with female and male voiee [18| 34 Figure 2-10: Sentence classificalion by sfrueture [20], « -.e ce 38

Figure 2-11: The sentences “Lan thich an com khéng” in - 36 Figure 2-12: The sentences “Bao cd ging tap di” in "—

Figure 2-14: The differences of FU contour between Assertive and Interrogative

Eigure 3-3: Fujisaki mnodsl for tonal languags [19| s< ee eee dG Figure 3-4: Function diagram of proposal TTS system — BD Figure 3-5: Prosody generation module .cccsessessssestsiestassessieseseeee eee dB

Trang 26

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Mạc Đăng Khoa

Trang 27

Figure 5-4: Interface for Perception test 2 seus Seo B2

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Trang 28

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: es The results shown

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Mạc Đăng Khoa

Trang 29

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Trang 30

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Mạc Đăng Khoa

Trang 31

Figure 5-4: Interface for Perception test 2 seus Seo B2

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Trang 32

4.1.2 Deline the corpus text

4.1.3 Recarding

4.14 Sentence segmentation

42 Analysis and extracting prosody parameters

4.2.1 Segmentation

4.2.2 Extracting prosody parameters of key-syllable

43 Proposal the pallems for Vietnamese prosody

4.3.1 Methodology

4.3.2 Trosody palterns

§ EXPERIMENTS AND EVALUATION

5.1 Experiment 1: Tone and non-sense phrase

3.1L Objectives Hinh

5.12 Method and implementation

5.1.3 Results and discussion

5.2 Experiment 2: Malti-type sentences

52.1 Objeelives

3.2.2 Method and ‘Implementation

$.2.3, Results and discussion

3.3 Comparison and conclusion

A Text far prosody corpns - 95

B: Datasheet of prosody pattemns

Mạc Đăng Khoa

Trang 33

Figure 5-4: Interface for Perception test 2 seus Seo B2

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Trang 34

Eigure 5-5: Correct reoognition rate with 8 tones of last syllable

Figure $-6: Correel recognition rate (%) with other Íypes of senences R6

Figurs $-7: Result comparison of three experiments Tuy 8?

Mạc Đăng Khoa

Trang 35

Table 2.4:The phonological hierarchy of Vietnamese syllables with total nmmbers of

cach phonetic unit (14] - - 28

Table 3.1: Comparison between direct patterm and ruodel patfeHn 3Ó Table 4.1: Prosody corpus siruetEe con nnrnreimisioeaooe S2

Table 4.2: Prosody corpus text information - 33

Table 5.1: Confusion matrix (in %) for 8 tones with male Voiee 73 Table 5.2: Confusion matrix (in %) for 8 tones with female voice 75

Table 3.4: Confusion matrix (%) of sentence types with fernide voi 7 Table 5.5: Test data for Experiment 2 ccssesssnsnessuntntssntasentntrnenseennne ID Table 5.6: Confusion mattix (in %) of sentence types (wafh male voles) 82 Table 5.7: Confusion matrix (in %) of sentence types (wrth Eamale Voice) 6 B3 Table 5.8: Confluston matrix (in %) of sentence types (average of Male and Female)

K4

Table 5.9: Correct recognition rate (%) with other types of sentences 6

Trang 36

4.1.2 Deline the corpus text

4.1.3 Recarding

4.14 Sentence segmentation

42 Analysis and extracting prosody parameters

4.2.1 Segmentation

4.2.2 Extracting prosody parameters of key-syllable

43 Proposal the pallems for Vietnamese prosody

4.3.1 Methodology

4.3.2 Trosody palterns

§ EXPERIMENTS AND EVALUATION

5.1 Experiment 1: Tone and non-sense phrase

3.1L Objectives Hinh

5.12 Method and implementation

5.1.3 Results and discussion

5.2 Experiment 2: Malti-type sentences

52.1 Objeelives

3.2.2 Method and ‘Implementation

$.2.3, Results and discussion

3.3 Comparison and conclusion

A Text far prosody corpns - 95

B: Datasheet of prosody pattemns

Mạc Đăng Khoa

Trang 37

Abstract

Text-To-Speceh (TTS) system is a computer system which is able to produce the speech fiom the text In the TTS system, the naturalness of the produced speech depends greatly on the variation of pitch, duration and energy during speaking, We call it the “prosody controlling ability’ A TTS system with good prosody controlling ability can be simulate the human speech prosody corresponding to the context of speaking

With tonal languages such as Vietnamese, the prosody of an utterance is the combination results of the two components: "micro-prosody" corresponding to the tone of each syllable in a sentence and "“macro-prosody” corresponding to the whole

sentence

The main goal of this thesis is to model the characteristics of Vietnamese prosody for speech synthesis It focuses on the influences of the macro-prosody on the

micro-prosody, in three types of sentence: assertive, interrogative and imperative

The first task is to set up a “prosody corpus” and extract all possible prosody

parameters Base on the extracted data, we defined seventy-two simple prosody

patterns for Viemamese syllables in three types of sentence After that, these patterns were applied lo synthesize some sitaple senlences Finally, some perception

experiments were taken to evaluate the: synthesized seniton es The results shown

that the proposcd patterns can be applicd successfully to goncrate the presody of simple sentence

‘This work is our preliminary work in Viemamese prosody, just concerning the

sentence types and the position of syllable ina sentence Tn the future, we expect to

contimie this research with more faclors of Vietnamesz prosody, improve our

pattern and apply them Vietnamese TTS system

Ngày đăng: 22/06/2025, 08:01

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm