Luận văn studying the phonetic characteristics of glottalized tones in vietnamese expressive speech

doclaration.51 Figure 3-17 Averaged Oq contours of SFP /da3/ of 10 male speakers: irritation...53 Figure 3-18 Proposed model for combination of speaker altitude, voice auaity and glotta

Trang 1

MINISTRY OF EDUCATION AND TRAINING

Nguyen Thi Lan

GLOTTALIZED TONES IN VIETNAMESE EXPRESSIVE

Trang 2

MINISTRY OF EDUCATION AND TRAINING ILANOI UNIVERSITY OF SCIENCE AND TECIINOLOGY

Nguyen Thi Lan

TONES IN VIETNAMESE EXPRESSIVE SPEECH

MAS'TER ‘THESIS OF SCIENCE

INFORMATION TECHNOLOGY

SUPERVISOR:

Dr Tran Do Dat

Hanoi 2015

Trang 3

COMMITMENT

T commit myself to be the person who was responsible for conducting this study All reference figures were extracted with clear derivation Ihe presented results are

‘truthful and have not published in any other person’s work

NGUYEN Thi Lan

Trang 4

ACKNOWLEDGEMENT

‘This is the second time that 1 sit here, at Hanoi University of Science and

Technology, with a great honor to write these grateful wards to people who have

been supporting me since the first, moment T erfered the university The first

acknowledgement was written in my graduation thesis 2.5 years ago and today, this

one just awakes a special emotion in me

I wish to thank all my professors and colleagues at School of Information and Communication Technology and MICA International Research Institute, who have helped me with generous supports Their advice and knowledge they imparted to me

are gratefully appreciated, inspiring me a lot to finish this thesis

Special thanks to my supervisor Dr Tran Do Dat and colleagues of Speech Communication Department, MICA Tnstitute, including Dr Do Thi Ngoe Diep, Nguyen Thi Thu Trang, Nguyen Tuan Ninh, Tran Thi Anh Xuan, Dr Nguyen Viet Son, Dr Nguyen Cong Phuong, Nguyen Dục Anh and Nguyen Tie Thanh, for their advice and encouragement they gave to me, especially Dr Mac Dang Khoa

and Dr Alexis Michaud for their thorough review and invaluable suggestions

Another thanks for two thesis reviewers including Assoc Prof Truong Ninh

‘Thuan (VNU) and Dr Vu Thi Luong Giang (SOICT, MUST) for their worth comments which helped the thesis’s presentation become much better

Special thanks to my family and friends who always stand by me, lifting me up when I was down Without them, my life would be nonsense!

NGUYEN Thị Lan

tà

Trang 5

CONTENTS COMMITMENT

Chapter 3 ANALYSING VARIATION IN REALIZATION OF GLOTTALIZED

Trang 6

3.3 Proposalz lora fillscalo zludy and snustical sans of [Hhomlie Bạn based on EGG

APPENDIX B: FIGURES OF AVERAGED FU&0q CONTOURS OF EACH

SPEAKER WITH STANDARD DEVIATION FOR THE USED ATTITUDES

Trang 7

LIST OF ABBREVIATIONS SEP — Sentence-final particle

KGG Hlectroglottography

DEGG — The derivative of the electroglottography signal

IPA — International phonetic association

DbCPA Derivative-Hlectroglottographic Closure Peak Amplitude

X-SAMPA ~The Extended Speech Assessment Methods Phonetic Alphabet

FO Fundamental frequency

Trang 8

LIST OF TABLES

Table 1-3 Phonetic characlerislies of Viclnamese inilial consonants 14 Table 1-4 Phonetic characteristics of Vietnamese final consonants 14 'Table 1-5 Phonetic characteristics of Vietnamese vowels/diphthongs 18 Table 1-6 Summarized description of & tenes of Viotnamese 16

‘Table 2-1 Intended attitudes G1012 neo ãM

‘Table 3-1 Statistics of Mechanism I-A/Pressed Voiee/Mechanism 1-13 of tone 6a

Table 3-2 Statistics of Mechanism I-A/Creaky Voice/Mechanism I-B of tone 3 with

Table 3-3 Statistics of Mechanism I ‘AlPressed Voice/Mechanism IBof tone 3 with

Trang 9

LIST OF FIGURES Vigure 1-1 Schematic diagram of Llanoi Vietnamese tones (Michaud, 2004a) 8 Figure 2-1 Speaker F7 (lel) and M10 (right) in the recording booth

Figure 2-2 Sentence and Syllable Level Annotation with SoundForge (above) oui

Figure 3-1 Visualization of closing instant synchronized with EGG (above) and

DEGG (below) signals (enrich, 2001) .csccessccssssessssessseesssasveee 34 Figure 3-2 Visualization of opening instant synchronized with EGG (above) and

DGG (balow) signals (1lenrich, 2001) 35

Figure 3-3 Fxample of KGG and DEGG sigruls with indication of glottis closure

Figure 3-4 lwo realizations of glottalization on SKP /aGa/ with two attitudes (a)

Figure 3-5 Average curves of I'0 and Oq over 6 tokens of /a6a/, speaker M7 .38 Figure 3-6 Two realizations of glolativalion on SFP /da3/ of two attitudes, (a)

declarative/neutral; (b): irritation Speaker M6 sone dD Figure 3-7 Average curves of FO and Og over 6 tokens of Ad33/, speaker M6 40 Figure 3-8 Determining mechanisms of voice based on DECPA and FO parameters (each point of !U&Oq contour corresponds with a cycle on DEGG signal) 42 Figure 3-9 Determining the duration of pressed voice based on local dipping of Oq

(each point of !'080q contour corresponds with a cycle on DUOG signal) 44 Figure 3-10 The tool for detection integrated three analysis modules 45 Figure 3-11 Some visually illustrative figures of creaky voice from the detection

Figure 3-1 Averaged Oq contours of SFP /a6a/ of 10 male speakers: surprise (left)

Figure 3-15 Averaged FO contours of SFP /ci33/ of 10 male speakers irritation and

Tigure 3-16 Avereged Òq contours of SEP /da3/ of 10 male speakers doclaration.51 Figure 3-17 Averaged Oq contours of SFP /da3/ of 10 male speakers: irritation 53 Figure 3-18 Proposed model for combination of speaker altitude, voice auaity and

glottalized tone in Vietnamese expressive speech processing 54

Trang 10

INTRODUCTION Nowadays, using speech in human-machine interaction is gradually becoming the

major rend which promises Lo replace traditional commumicalion methods: mouse,

keyboard, sercen, for example However, a high-quality human-machine interaction system that can completely behave as a human being, currently, is still just beyond our reach One of the primary reasons is because of the lack of advanced techniques that enable precisely processing (either synthesis or recognition) the expression of

human utterances

‘The expression, in other words, refers to attitudinal or emotional aspects when someone speaks, which hereby can convey much linguistic information In this

perspective, the attitudinal aspects in speaker utterances, also called speaker

altiludes are of no sinall importance If' speaker altiludes play such an iraportant role

in the interactions between humans, they need to be taken into account in the

imleraction between humans and machines (Picard, 1997) Atitudial mformalion in

a spoken utterance can be lexically oncoded but can also be conveyed by intonation,

including modifications of voice quality (Seibert, 2003)

However, the modification of those features in Vietnamese is quite complex since

ít has the interplay between intonation and tones: especially, the complexity even becomes much more complicated when dealing with glottalized tones which are tone ngd and tone nding lurthermore, in expressive speech, how the interplay can

be expressed, what its realization will be and with which mechanisms, are several

among many questions sel oul

Among eight tones in Vietnamese, tone ngd and tone ngng are considered the most complicated since they have glotlalizaion phenomenon accompanied Ta mos! cascs, with simpler tones, the interaction between intonation and tone simplifies to

be described by the changing in fundamental frequency, intensity or duration

pararicters, whercas wilh these two glollalived tones, these parameters are exactly

Trang 11

not sufficient since their glottalization phenomenon can vary a lot depending on context Obviously, there have been many researches that tried to approach this but, actually, they seem to avoid the mosl complicated aspect which is gÌotlalizalion

phenomenon in Vietnamese

Therefore, towards application in Vietnamese speech processing, the ultimate

objective is to provide sufficient detail of the interplay between glottalized tones and intonation for both automatic speech recognition system and text-to-speech

syslem in encoding and decoding allitudinal information in speaker's ullerances

Specific contents of the thesis are as follows

Chapter 1 presenls overview of phonetic and phonology, tone and the expression

of attitudes in Victnamese as well as cxisting issues that nced to be dealt with and

thesis’s approach

Chapter 2 and 3 show proposed methods for data acquisition and analysis which was based on LIGG and DEGG signal in order to clarify the interaction mechanism between glottalized tones and expressive speech intonation

Finally, Chapter 4 gives some conclusions and perspectives for expanding the study to cover wider range of speaker attitudes and more tones in Vietnamese

‘The obtained results include:

v Thesis Report

~ Attitudinal Corpus: recorded with L0 males and 10 females

¥ Method and tool for detection and quantification of Creaky and Pressed

Voice in Surprise/Irritation/Declaration Attitude

¥ 1 International Conference Paper: INTERSPEECH 2013

Trang 12

*⁄_1 NaGonal Journal Papor: Journal oŸ 8eienee & Technology of Techaical Universities in Vietnam, 101 (2014)

Trang 13

Chapter1 OVERVIEW

Similar to any other language, Vietnamese has a rich system of consonants and

vowels together with various regulations of forming meaningful words However, one of the special charactenstics which make it even more attractive in the eyes of

researchers is thal 1 has 4 complex lexical ones system So, why tL is evaluated 1a

be complex and why the topic focusing on studying its tones system was chosen as

major point of the thesis Furthermore, the author also conducted a research on

expressive speech and emphasized that the relationship belween tonal realization

and attitudinal expression in Vietnamese should be taken seriously, is this a unique point that distinguishes Vietnamese from others? In this part, a brief introduction will be presented to bring you a clear look of Victnamese phonetics and phonology Additionally, the section of raising issues will clarify the questions above as well as

our interests

1.1 Background knowledge

1.1.1 Vietnamese phonetics and phonology

There has been many works involving in studying Vietnamese phonology system for years such as (Doan, 1977), (Nguyen, Edmondson & Jerokl, 1998), (Hwa-

Froclich, Hodsen, & Edwards, 2002), (Nguyen, Carre, & Castelli, 2008), (Michaud

& André-Georges, 2010) and (Hajek, 2008) Among these, there exists different coneepls in establishing Vietnamese phonology system, but in genoral, the list of consonants and vowels in Viemamese can be summarized respectively as in Table 1-1 and Table 1-2 in both IPA-symbol system and X-SAMPA-symbol system

(Doan, 1977)

Where

Ail if initial followed by consonani, 8 or nothing

2; final only for this phoneme

3: final except after u, 0, 6

Trang 14

: ngh - imitial on|y (before ¡, e, ê); ng— imitial except before ï, e, ê

Ý: gh- initial before l, e, ê; ¢— initial except before , e, ê

+ initial except before i, e, é, y; final after u, 0, 6

12

Trang 15

Table 1-2 Viemamese vowels/diphthongs

Short vowels

or several phonemes just follow certain vowels Besides, there are only 9 long

vowels, 4 short vowels and 3 diphthongs which are combination of single vowels

Table 1-3 and Table 1-4 describe phonetic characteristics of these consonants In

these tables, the format to represent phonemes is “IPA-symbol (X-SAMPA-

symbol)”, where the (XSAMPA symbol) part disappears if it is the same as the IPA- symbol For two variants of /p/ and /k/, final consonants after /u 9 o/, /nm/ is labial-

velar nasal while /kp/ is voiceless labial-velar plosive (Hajek, 2008) (Doan, 1977)

Trang 16

Table 1-3 Phonetic characteristics of Vietamese initial consonants

Green bold consonants: Not exist in Northern dialect Besides, for this dialect:

~ ch- /c/ and tr- {/ are pronounced alike

- d-, gi- /2/ and r- /=/ are pronounced alike

~ x« /8/ and s- /s/ are pronounced alike

Table 1-4 Phonetic characteristics of Vietnamese final consonants

Table 1-5 presents the phonetic characteristics of 16 vowels and diphthongs in

Vietnamese Similar to other languages, they are distinguished from each other

based on which part of the tongue is involved (front, central, back) and how high

the tongue is when the sound is produced (high, mid, low)

14

Trang 17

Table 1-5 Phonetic characteristics of Viemamese vowels/diphthongs

Above is a brief introduction on Vietnamese phonetics and phonology, the next

session will present one of the problems that is always a challenge to anyone who want to approach Vietnamese — Vietnamese tones system

1.1.2 The phonetic characteristics of complex lexical tone system in

Vietnamese

Vietnamese is a tonal language, that is the meaning of each word depends on the

"tone" in which it is pronounced Many other languages also use tones, such as

Mandarin and Thai However, it can be said that Vietnamese tone system is

relatively complex in comparison with the others since it has a six-tone paradigm

for sonorant-final syllables, and a two-tone paradigm for obstruent-final syllables

(Michaud, 2004a) The experiment in warrants the conclusion that rising (Sb) and drop (6b) tones (i.e the tones of syllables ending in /p/, /t/ or /k/ - checked syllables)

are not glottalized, either in final or non-final position Therefore, it could be said

that there are 8 different tones in Vietnamese language The work on oral flow

(Michaud, Vu, Angelique, & Bernard, 2006) brings out a clear difference

between these two sets of rhymes: tone 6a (drop tone in unchecked syllables)

has low oral airflow; tone 5b and 6b have relatively high oral airflow, getting close to the range of breathy voice

15

Trang 18

Table 1-6 Summarized description of Š tones of Viemamese

2 Huyền Falling Low Slightly Falling a Laxness, breathy

Specifically, phonetically detailed description of each tone which is summarized

from (Thompson, 1987)(Mixdorff, Nguyen, Fujisaki, & Luong, 2003)(Nguyén,

1997)(Michaud, 2004a) is as follows:

Tone 1 — level tone (“ngang”) is modal and sometimes lax and its contour is

nearly level in non-final syllables not accompanied by heavy stress, although even

in these cases it probably trails downward slightly

Tone 2 — falling tone (“huyén”) is lax, starts quite low and trails downward toward

the bottom of the voice range It is often accompanied by a kind of breathy voicing

(voiceless + modal), reminiscent of a sigh For some speakers it is even lax to the

point of breathiness with somewhat lowered subglottal air pressure

Tone 3 — broken tone (“nga”) is also high and rising, the FO contour being similar

to that of tone 5, but it is accompanied by the rasping voice quality (strong creaky

16

Trang 19

voioc starting toward the middle of the vowel, which is then lesscning as the end of the syllable is approached) occasioned by tense glottal stricture, In careful speech such syllables are soructimes interrupted completely by a glotial stop (or a rapid

series of glottal stops) Its trajectory therefore sometimes shows a characteristic

break in the voicing at about half of the total duration of the syllable Many speakers begin the vowel with modal vaice, followed by strong creaky voice

starting toward the middle of the vowel

Tone 4— curve lone (“hai”) is Lense and drops rather abruplly Tt starts with modal

voice phonation, which moves mercasingly toward tense voice with accompanying harsh voice (although the harsh voice seems to vary according to speaker) In final

syllables, and especially in cilalion forms, this is followed by a sweeping rise

at the end, and for this reason it is often called the ‘dipping’ tone However,

non-final syllables seem only to have a brief level portion at the end, and this is

exceedingly clusive in rapid speech Although lone 4 is usually deseribed as a low falling and then rising tone, not all Vietnamese speakers have the nsing part Curve

and broken tones are both tense but their tension is not alike and is not cistributed

across the syllable in the same way

Tone Sa — rising tone (“s&c”) is high and rising (perhaps nearly level in rapid speech) and lense Phonetically, tone Sa is produced with modal voice

Tone Ga — drop tone (“nang”) is also tense: it starts somewhat lower than tone 4

Syllables bearing tone 6a have ihe same rasping voice quality as lone 3, drop very

sharply and are almost immediately cut off by a strong glottal stop Tone 6a is amuch shorter than other tones with a tendency to go lower

As for tones Sb and 6b, the orthography identifies tone 5b with tone Sa as sac and

tone Sb with tone 6a as nfng; which indicates the names that the tones carry in

present-day Vietnamese orthography However, tones Sb and 6b are not

glottalized, either in final or non-final position (Michaud, 2004a), Tone 6a is

17

Trang 20

characterized by a gesture of strong constriction that is distinct from creaky voice;

tone 6b drops more sharply than tone 2, but it is never accompanied by the

Figure I-1 Schematic diagram of Hanoi Vietnamese tones (Michaud, 2004a)

This section has shown all issues involved in features of Vietnamese tones that

need to be taken into account when approaching the language The next section will

talk about the expression of expressive speech generally in common languages

1.2 Glottalized tones in the context of expressive speech: raising issues

Glottalization is a challenge for speech processing by disrupting FO estimations

(make it not clear how to measure), raising problem for averaging/ building a

model Specifically, most models of speech synthesis and recognition system

currently do not take the control of glottalization into account due to its complexity

In languages such as English: the issue may appear secondary, as glottalization is

not phonological in the standard variety Glottalization is a characteristic of certain

sociolects: creak in “drawl”, ‘glottaling’ of /t/, which is becoming increasingly

common in familiar speech, used to be stigmatized as “working-class”/vulgar

(Fabricius & Anne, 2002) Among national languages of Europe, only Danish

18

Trang 21

possesses phonological glottalization (sted) (Fischer-Jorgenscn & Eli, 1989) There exist languages in which glottalization is controlled in greater phonological detail,

for instance languages of the Mon-Khnier family of languages, bul these languages are relatively less well-studied, and given the present state of the documentation, studies of the fine phonetic detail of these phenomena in discourse is seldom

perevived as a priorily by linguists (DiCanio & Christian, 2009)

Tanoi Vietnamese has a key role to play here: it has extremely rich glottalization phenomena; and as the official standard of a country wilh about 90° million inhabitants, it revcives increasing attention from specialists of spcoch technology A salient aspect of the Hanoi Vietnamese tone system is the use of phonation-type characicristies (Nguyen et al., 1998)(Brunclle, Nguyen, & Nguyen, 2010\Kirby, 2010)(Brunelle, 2009a), absent trom other dialects (Tran, 1969) Hanoi Vietnamese makes use of glottalization as part of the lexical specification of some of its lexical lones In particular, tones 6a and 3 are glollalived Tone 3 (also referred to by ils orthographic label, ngd, or the English descriptor “broken tone’) is a rising tone with

a strong glottalization in its first half Tone 6a (orthographic nang, ‘drop tone’) stars on a middle pitch and usually falls dramatically because of a strong glottalization in its second half It has been reported that glottal constriction for tone 6a is consistently present hoth in a ‘neutral’ context and in an ‘emphatic/impatient’ context (Michaud & Vu, 2004)

Glottalization in Vietnamese is not only a distinctive characteristic of tone: fine delails in its phonetic roalivalion can convey intomalional information Vieuwunese hhas salient intonational phenomena (Iran & Castelli, 2008) ‘The surface realization

of tones depends greatly on intonation: phrasing, prominence, and the expression of alliludes and cmotions Therefore, it appeared worlhwhile 10 investigate how speaker attitude affects the realization of glottalization, a phonetic dimension which

is cross-linguistically known to convey “paralinguislic” information (Fonagy, 1983)(Gobl & Ni Chasaide, 2003) Specifically, the research issue is: how fine-

19

Trang 22

grained details in the phonetic realization of glottalized tones convey attitudinal information in Vietnantese expressive speeck

This is a challenge for speech processing: models such as Fujisaki’s (Mixdorff et

aL, 2003), which focuses exclusively on (0, would require substantial additions before they can handle such phenomena New-generation speech processing for Vietnamese will require facing the challenge of synthesis/fine tuning of phonation

types

1.3 The scope of the thesis

In view of the context set out above, the goal of the present study is to investigate the phonetic characteristics of glottalized tones in Vietnamese expressive spoceh, focusing on sentence-final particles Due to limitations of the present study, applications in speech processing will not be attempted The aim of the present sludy is lo provide # sufficiently detailed analysis of production data to pave the way for fresh work on the synthesis and recognition of attitudes in Vietnamese in

future

Mare precisely, we concentrate on studying tone 3 and tone 6a with three

alliludes: Declaration, Surprise and Trritation, since lhey have the clearest

perception (Mac, 2009) The objective is to answer the question that how these attitudes can change the realization of glottalization on these two tones and the use

of its special voice qualities Even so, the process of building speech corpus will not

be Limited on these objects only, so that it can serve for further research as well

14 Conclusion

This chapter has presented some overview of phonetics and phonology as well as

the phonetic characteristics of lexical tone system in Vietnamese After which, the

existing issues and the aulhor’s interests of glollized lones and expressive speech were given as the main point of the thesis Ln the next chapter, the author proposed

an approach of using expressive morphemes called Senence-final particles as the

20

Trang 23

objects to study the glotalization in the interaction between lexical tone function and attitudinal function ‘his chapter will present the construction of our corpus for this

research,

21

Trang 24

Chapter 2 BUILDING VIETNAMESE ATTITUDINAL SPEECH CORPUS FOR SENTENCE-FINAL PARTICLES

As discussed im the Tast chapter, this chapter will focus on the construction of

speech corpus which serves for investigation of the interplay between glottalized

tones and attiLudinal expression im Vietnamese Besides, several special SFPs which

carry both lexical tones and attitudinal information were used to construct targct sentences which concentrate on basic speaker attitudes and glottalized tones

There already exists a corpus designed for the study of social attindes in

Vietnamese (Mac et al., 2009), but it does not contain SFPs We therefore decided

to record new data, Speech data acquisition is an underestimated challenge (Niebubr

and Michaud), especially when attempting to capture such elusive aspects of speech

as attitudinal information Special attention was therefore paid to the elaboration of materials and recording, procedures

Tn particular, the research was divided into two phases and corresponding to these

bo phases, two different corpora were buill, The lirst phase conducted a pilot sludy with a small corpus and four speakers to initially explore hypotheses on SIP,

glollalived (ones and speaker atlitudes Afier thal, the second phase, with larger

corpus recorded with 20 speakers, expanded on the pilot study’s obtained results

Specifically, in the scope of the thesis, we aimed for demonstrating the qualitative

observalion results by concentraling on analyses of tone 3, tone 6a, threo studied attitudes and male speakers; the rest part of the built corpus was reserved for further research This chapter will present both of these two corpora

2.1 Method of using expressive morphemes carrying lexical tones —

Sentence-final particles

Languages differ in the means thal hey offer for the expression of afflitudes and

emotions In Linglish, intonation is known to fulfill a considerable range of

functions, including subtle nuances related to attitudes and emotions Japanese and

22

Trang 25

Cantonese arc famous examples of languages that possess morphomes which have been described as performing functions that intonation does in a language such as Fuglish (Chan & Marjorie, 1999) For instance, in Cantonese, the particle (aFl? is used as an illustration, This particle is suffixed to a declarative sentence to convert the sentence into a question of disbelief or surprise (Wu 2008, p 24) or a “query to the truth of something” (Kwok 1984, p 88)

‘The particles specifically called sentence-final particles (hereafter SFPs) constitute

a marginal class of expressive words indicating speech ao types,

cvidontial/epistemic nuances, and affoctive/cmotional colouring There are about

ten SFPs in Mandarin, thirty in Cantonese (Kwok & Helen, 1984), and about the

same number in Vietnamese (Tran, 2010); SFPs are ubiquitous in casual,

conversational speech SFPs “often carry much of the meaning and function that

intonation does in non-tone languages” (Chan & Marjorie, 1998); the relationship is

nol simply one of fimclional equivalence between intonation and SFPs, however, since SKPs also carry intonational information: sentence-level intonational phenomena are known to cluster on SFPs One and the same SIP can take on different nuances (crealing different sense-cflevis) depending on the intonational

realization of the SKP itself (the ‘tune’ that it carries) and of the sentence as a whole

In Viotnamese, where they clearly have a tone of their own, SFPs provide an exemplary illustration of the superposition of tone and intonation An important

proportion of sentence-level imonation, conveying sentence mode, attitudes is

concentrated al the end of the utterance, on the SFP(s) (Do, Tran, & Georges, 1998)

This superposition affects FO (Neuyen & Tran, 2012), but also phonation types The

purpose of the present study is to investigate how speaker attitude affects the

realization of glottalization Lor the two glottalized tones 6a and 3 (orthographic

nang and nga) carried by SFPs A pilot study (Nguyen, Michaud, Tran, & Mao,

2013) suggests Unal glottalivation is phased earlier for surprise than for declaration,

and that uzitation also tends to be reflected in earlier glottalization, but with an

23

Trang 26

added glottal stop/constrietion at the ond of the SFP The present study builds on a more extensive empirical basis, relying on materials that have been constructed to

be acotmpanied by a wider range of altitudes in Vietnamese

2.2 Designing sample corpus

Due to the great amount of carry-over tonal co-articulation in Vietnamese

(Brunelle, 2009b), the tones of SFPs are strongly influenced by those of preceding

syllables (Nguyen & Tran, 2012) The sludy therefore aimad lo devise a sentence using only syllables carrying tone | (ngang), which is phonetically the simplest: a

level, non-fow tone This resulled m sentence (1), used in our pilot study (Nguyen et

al., 2013):

propor_name t6 goảnp —— workplico/eornpany

‘This sentence was then associated with SIPs a [IPA: /a6a/], conveying politeness,

and da (IPA: /da3/], conveying tense-aspect-modality information This yields (2)

Lam lén céng ty ạ mủ (3) Lưan lên công (y dã Finally, sentences (1-3) were placed inside dialogues, which were precisely contextualized The attitudes under study are

G) pohtcness, associated lexically Lo the SFP a, and (2) declaration, irritation, and

surprise, clieited by context The general context is as follows: Lam, Minh and An are three friends who have just moved into a shared flat; today is Saturday, a day when they neither go to class nor to their workplace: but Lam is suddenly requested

to go to work for went business

However, it turned out that the Sl?P ạ [IPA: /aéa/] sometimes tended to coalesce with the preceding syllable, y ([TPA: /ui1/)), im carrier sentence (1) Tn hypor- articulated speech, the SHP /a6a/ begins with a glottal onset (empty-onset filler),

which sets it off from what precede Ilowever, the onset of this syllable is one of the

paramelers that strongly varies depending on context, including cases where there is

24

Trang 27

no detectable initial glottalization on the acoustic signal, resulting in segmentation

difficulties This detracts from the precision of measurements

As a consequence, a slightly different set of materials was devised for the full- scale study The details are set out below

In the target sentence, a given name was chosen as grammatical subject Among the wealth of Vietnamese given names, Ba, meaning ‘three’, i.e ‘third child’, was

chosen for two main reasons, first, its vowel /a/: allows the possibility of phonetic

comparison of the vowel /a/ in Ba with that in a and d@ and the second is because of

the phonetic simplicity of its tone: 1, ngang, a non-low tone that is relatively level

Table 2-1 presents the speech materials Labels for the intended attitudes follow

the terminology proposed by (Mac, Aubergé, Rilliard, & Castelli, 2009), which

distinguishes 16 attitudes, and which treats sentence mode (declarative,

interrogative or imperative) as part of speaker attitude

Table 2-1 Intended attitudes

Where: ‘Sentence’ = contextualized sentence, DEC = Declaration, INT =

Interrogation, SUR = Surprise, OBV = Obviousness, IRR = Irritation, POL 1

2

Trang 28

Politeness, AUT - Authority, SAR - Sarcastic Lrony Slots in grey indicate combinations judged implausible Politeness (POL) is conveyed semantically by the

SFP /a6a/

‘the data are not fully symmetrical because of the semantics of SI'Ps Attitudes of sarcastic irony and authority are antagonistic with the respect (acknowledgment of the addressee’s seniority) conveyed by the SEP /a6a/, likewise, surprise and interrogation are antagonistic with the assertiveness conveyed by the SEP /da3/, hence four empty slots in Table 2-1

Besides, two other SFPs were also used because we want lo demonstrate that lhe

final glottalization of /a6a/ and the medial glottalization of /da3/ are due to their

lexical tone, and not to intonational factors, so we confirmed this pomt by using

SFPs with lones that do not involve ylotlal constriction Tn order to cover the satne

range of attitudes a3 for the SFPs /aGa/ and /da3/, two different SFPs had to be used:

hd, carrying tone 4, is compatible with the expression of interrogation and surprise;

and mã, carrying tone 2, was recorded with the other four alliludes

Four specific target sentences are as follows:

1 Ba dihoca

2 Ba di hoe da

3 Ba di hoc ma

4 Ra di học bà

After which, the target sentence without SEP, accompanied by declarative

allilude and 4 sentences accomparied by respective allatudes as indicated in Table

2-1 were set in 17 specifically suitable contexts to facilitate recording with speakers

and to ensure the naturalness in utterances lor example, the context number 2 with

SEP q and interrogative attitude: Ba di hec a? (Does Ba ga to school?) was

expressed in the contexL when a Ba’s classmate comes and asks onc of Ba’s older

relatives just to get some extra information So, in that case, the classmate should

26

Trang 29

show their respect and politeness Whereas, the context number 3 with SFP ¢ and surprise intends the situation when another Ba’s classmate comes to mest Ba because he thought that Ba was still staying at home Thon, the answer from Ba’s

brother or sister which asserted that Ba had gone to school brought him a big surprise,

Another example concerning the contexts number 8 and 10 with irtitation and sarcastic irony respectively can be given to illustrate The context number &

expresses a situation when Ba is dragged by some bad guys while he needs 10 go to

school right away, then an assertion together with irritation may be the best choice

to show his strong refusal Regarding other context of the context number 10, in the

casc, Ba (ells his roommate thal he needs 10 go to school immediately, perhaps for

some English classes, so the discussion between him and the roommate should be

temporarily stopped After that, the thought that Ba could not be such a hard student

who goes to schoot even in the wockend forces the roommate to tease Ba by

repeating his utterance with a tone of sarcastic irony

Above is some instances given to illustrate our context-based method in

collecting data; 17 adequate contexts have been created 1o Lit the selected attitudes

and SEPs (See Appendix A)

2.3 The progress of building the sample corpus

2.3.1 Elicitation method and speakers

in the pilot study, two different approaches to data collection were used The first aimed at maximal ecological validity, eliciting the intended attitude through

contextualization, from two speakers who were unaware of the purpose of the study

(Their speaker codes, assigned as par of a larger database, are M4 and Mã, respectively.) The second aimed at maximal clarity in contrasting different attitudes

two speech sciertists (M6 and M7) who were aware of the purpose of the slucdy

deliberately expressed the intended attitude as identified by the labels in Table 2-1

‘MA and MS are aged 24; they have university education in software engineering

27

Trang 30

M6 and M7 are aged 26 and 31, respectively They were born in Hanoi, and are

permanent residents there, apart from a total of 2 years in France for M7 All four

can speak some English, and M7 is also fluent in French

For the full-scale study, it was possible to recruit a sizeable group of speakers (10 female and 10 male) from Hanoi Academy of Theatre and Cinema where

Vietnamese well-known actors are trained, ensuring the consistency of age groups

Besides, they are both from Department of Spoken Theatre, so their ability in orally

expressing different attitudes is the main point that should be highly evaluated No

group of speakers is ‘ideal’, and this choice raises concerns of naturalness: there is a

risk that the speakers are reproducing stereotyped pattems for the expression of

attitudes — patterns designed for the stage, that do not correspond to patterns found

in ordinary speech Great care was taken to verify perceptually that the intended

attitudes were recognized by persons outside the narrow circle of the performing

arts, using science and technology students as subjects for the perception tests The

information concerning the speakers is summarized below

Table 2-2 List of speakers

28

Trang 31

15 | FIO | Female | 21 Quangninh English

16 | Fil |Female| 21 Haiduong, English

18 Fl3 | Female | 19 Hanoi English

20 FIS | Female | 22 Hanoi English

2.3.2 Recording conditions

The recordings were conducted at the MICA Institute's sound-treated booth The

participants received information about electroglottography (EGG) and its full

innocuousness (Fabre, 1957)(Baken, 1992) Then they were given time to

familiarize themselves with the scripts of the dialogues Questions were answered

through discussion of the context After this, the speakers read the dialogues three times, then swapped roles and read another three times They were instructed to

read ‘like actors’ — an indirect way to elicit a vivid, expressive dialogue

29

Trang 32

Figure 2-1 Speaker F7 (left) and M10 (right) in the recording booth

2.3.3 Post-processing and annotation

The average recording time for each speaker was between 25 — 35 minutes, totally, 10 hours of speech with 20 speakers were collected Particularly, the

electroglottographic signal from an EG2-PC (for one of the speakers) and the audio

(from one microphone for each speaker) were recorded as three synchronized WAV

files (44,100 Hz, 24-bit) After finished, among repeat recording samples for each

target sentence with each speaker, 6 best samples when speaker produces the most

natural voice were extracted and annotated in SoundForge (Sentence-level) and in

Praat (Syllabel-Level) as in Figure 2-2

30

Trang 33

.= = oe sec

Figure 2-2 Sentence and Syllable Level Annotation with SoundF orge (above) and Praat

(below) of the corpus

The recordings made for the pilot study are available online as part of the MICA

Institute's AuCo project; long-term archiving and online availability are guaranteed

through the Pangloss Collection See

http://lacito vif enrs.fr/archivage/languages/Vietnamese_enhtm We plan to make

31

Trang 34

the materials of the full study available online in future ideally by the time that the results of the full study are published

2.4 Conclusion

This chapter has presented the whole progress of the studied corpus with detailed

explanation of methods used for choosing the sample material as well as the group

of speakers and totally recording procedure ‘'o the best of our knowledge, this is

the first expressive speech corpus (hat can be used to study the varialion of

glottalization of tones and speaker attitudes in Victnamese In addition to this, 20

speakers are truly professional with good ability to impart there attitudinal

information

Furthermore, we did not limit at building a small corpus that covers only tone ng@

and nding with 3 attitudes: Surprise, Declaration and Irritation within the objectives

we aimed for Tnstead, 1o leverage the professional speakers, an addilional sample corpus which expands through various tones and attitudes was built as well By this

way, our corpus can be used for further research that orients to other aspects in

Vietnamese expressive speech processing including cross-gender studies since we

had an equal number of male and female speakers that were suitable for such a

research

In the scope of the thesis, a sufficient part of the corpus was exploited to introduce

a new approach on this issue The next chapter shows all observation results and illustrative figures after having thorough analyses, especially, the discussion at final

of the chapter will provide an insight into the way to generate a model based on

these analysis results

32

Trang 35

Chapter 3 ANALYSING VARIATION IN REALIZATION OF GLOTTALIZED TONES BY VARIOUS ATTITUDES

As inentioncd in Chapter 2, there were two individual corpuses wilh lwo separate

groups of speaker that have been collected Particularly, the first one was recorded with 4 males accompanying by contexts of 4 attitudes only, and the other was done

wilh 10 males aud 10 fernales, which involves a wider range of aliitudes Among

them, the first one was used for a pilot study to preliminarily investigate and raise hypotheses of the interplay of glottalized tones and attitudes in Viemamese This study played a very important role leading to building of the sccond corpus which intends to a fall-scale study ‘his chapter will fully present both of these studies

3.1 Analysis Method

In order to analyze the IGG signal that has been recorded simultaneously with

acouslic signals, Ihe used method is based on the denvative of the EGG sigual

(Henrich, d’ Alessandro, Castellengo, & Deval, 2004), which allows for the measurement of cycle length (and hence FO), glottal open quotient (Oq) (Vu-Ngoc,

đt Alessandro, & Michaud, 2008), and a parameter called DECPA: Derivative- Hlectroglottographic Closure Peak Amplitude (Michaud, 2004b), Specifically, the BGG signal monitors the changes in vocal fold contact area It rises sharply when the glottis closes, reaches a maximum, then slowly decreases until the point where the vocal folds separate along their upper rim, at which point the GG signal decreases most rapidly The derivative (DEGG) of EGG typically has a positive peak at glottis closure and a negative peak at the opening, Figwe 3-1 and Figure 3-2 are illustrative figures visualizing a closing and opening instant in vocal fold contact area and accompanying by synchronization of EGG and DEGG signals

33

Trang 36

Figure 3-1 Visualization of closing instant synchronized with EGG (above) and DEGG (below)

signals (Henrich, 2001)

34

Tiêu đề	Studying the phonetic characteristics of glottalized tones in vietnamese expressive speech
Tác giả	Nguyen Thi Lan
Người hướng dẫn	Dr. Tran Do Dat
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Information Technology
Thể loại	Luận văn
Năm xuất bản	2015
Thành phố	Hanoi

Định dạng
Số trang	73
Dung lượng	1,84 MB