1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn characterization of vietnamese intonation for questions

84 0 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Characterization of Vietnamese Intonation for Questions
Tác giả Ninh Khanh Duy
Người hướng dẫn Dr. Eric Castelli
Trường học Hanoi University of Technology
Chuyên ngành Linguistics / Phonetics
Thể loại Thesis
Năm xuất bản 2005
Thành phố Hanoi
Định dạng
Số trang 84
Dung lượng 1,24 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

13 Figure 2.3 A schematic diagram of the human speech production apparatus 14 Figure 2.4 Schematic representation of the complete physiological mechanism Figure 2.5 A section of waveform

Trang 1

THESIS FOR THE DEGREE OF MASTER OF SCIENCE

CHARACTERIZATION OF VIETNAMESE

INTONATION FOR QUESTIONS

NINH KHANH DUY

Supervisor: Dr ERIC CASTELLI

HA NOT 2005

Trang 2

Acknowledgments

Firstly, | would like to express my gratitude to my supervisor, Dr Kric

Castelli, whose expertise, understanding, patience, added considerably and

constructively critical eye to my graduate expertence

Special thanks go to Ir Nguyen ‘lrong Giang and Dr Pham ‘Thi Ngoc

Yen for supporting me the best convenient conditions during my working

time al International Research Center MICA

I would like to thank to PhD students Nguyen Viet Tung, Tran Do Dat,

Vu Minh Quang and le Xuan Hung who helped me a lot in finishing the thesis

I would also like to thank my family, especially my parents for the

support they provided me through my cntire life, without whose care, encouragement | would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I worked on this thesis.

Trang 3

3.1 The concepts of prosody and intonation 20 3.2, Levels of representation of prosodic phehommena 20

3.4 Applications of intonation seo "— Chapter 4 PROSODY LN VITTNAMUSH, 27 4.1 Genoral charactcristios of Vietnamesc languape 27

4.1.5 Modality, altitnde and morphoeyntactic stmnetures, - 34

Trang 4

Chapler 5 FUNDAMENTAT FREQUENCY DETECTION,

5.1 Introduction

5.2 Some pilch detection algonthms

5.2.1, The autocorrelation method

5.2.2 The average magnitude difference function method

5.2.3 The simple inverse filtering tracking method

5.2.4 The cepetrrm-based method

3.3 'The Praat pitch traokeer S222 c2 me

A List of questions in the corpus ă

B List of statements in the corpus

4I 4I

43

43

Trang 5

understanding The gray boxes indicate the corresponding computer

system components for spoken language processing [1 | 12 Figure 22 Application of sound cnergy causes allernaling

compression/refraction of air molecules, described by a sine wave [1] 13 Figure 2.3 A schematic diagram of the human speech production apparatus 14 Figure 2.4 Schematic representation of the complete physiological mechanism

Figure 2.5 A section of waveform of the utterance “sa” The unvoiced sound

“s” in the first part and the voiced sound “a” in the second part 17 Figure 2.6 Vocal fold cycling at the larynx (a) Closed with sub-glottal

pressure buildup; (b) trans-glottal pressure differential causing folds to

blow apart; (c) pressure cqualization and tissue clasticity oreing temporary reclosure of vocal folds, ready to begin next cycle [1] 18

Figure 2.7 Glottal airflow and the resulting sound pressure at the mouth [2].19 Figure 4.1 Example of the contours of six tones (female subject PNY), as

Figure 4.2 FO variations of 2 typical pairs of sentences in [9] 40

Figure 5.1 Autocorrelation function for (a} and (b) voiced speech, and (¢}

Figurc 5.2 Example of waveforms and correlation function: (a) ne clipping,

Figure 5.3 AMDF function for same speech segments as in Figure 5.1 [10] 47

Trang 6

Figure 5.4 Block diagram of the SIFT algorithm [10] - 48

Tigure 5.5 Cepstrum of an example segment of: (a) voiced speech, (b)

Figure 5.6 Windowing a signal and estimating the ACK of a signal segment

from the ACF of its windowed version [15] 51 Figure 5.7 Some FO points are detected in the unvoiced consonant “kh” of the

Tigure 5.8 Pitch halving errors in the middle of the word “tra” (female subject

Figure 5.9 Some FO points are missed in the voiced consonant “b” of the word

Figure 5.10 Some FO points are missed in the middle of the word “rd” (female

Figure 5.11 Some FO poinis are missed al the end of the word “vay” (male

Figure 6.1 Speech waveform (in the background) and KO contour (blue dotted

line) of the utterance “Bay gio anh & dau?” (male subject ND) The final

syllable “dau” is bounded by lwo vertical lines - 62

Figure 6.2 FO contour (blue dotted linc) and proposed intonation contour (red

dotted line) of the utterance “Hién tai anh lam viéc 6 diu?” (male subject

Tigure 6.3 The intonation contour (red dotted line) of the statement “Ba ay

Trang 7

Figure 6.5 FO level of all speakers for questions (Q) and statements (8) 67 Figure 6.6 ‘Time waveform (top), 0 contour (middle) and the position of 5

Trang 8

Table 4.3 Arrangement of Vietnamese consonants 30

Table 4.4 The phonological hierarchy of Vietnamese syllables with total

Table 5.1 Praat PDA evaluation for male speech and female speech $7

Table 6.2 Statistics on FO level of all speakers for questions (Q) and

statements (S$) including: mean, minimum (min), maximum (max) and

Table 6.3 Representative values of “ngang” tone in final position of questions

and staterments for all speakes 70

Trang 9

Vocal technologies are important and strategic in the development of information technology The increasing demand for the application of speech

in man-machine communication in all arcas ranging [rom telephony,

telematics, and automated translation to aids for the handicapped requires

sophislicaled tochnology lor the recognition and synthesis of specch

However, to carry out automatic modules of specch synthesis or speech

recognition for a given language, it is essential to know perfectly the

characteristic of the language, particularly in tcrm of phonetics and phonology

Smmce intonation fonms such a central part of human speech communication, not only conveying diverse linguistic information, but also

information about the speaker, the speaker’s mood and attitude, it certainly

ought to be useful in such above applications In the field of speech recognition, the more the task develops from the recognition of single words

in a limited vocabulary towards the understanding of complex utterances, the

more suprasegmental features like intonation have to be taken into account

These arc important cues for the segmentation and classification (question vs statement, for instance) of utterances In speech synthesis, modeling

intonational features is indispensable for increasing the intelligibility and

naturalness of synthetic speech ‘This is the reason | chose to study the

characteristic of Vietmamese intonation in questions.

Trang 10

The thesis is orgamzed as fallow Chapter 2 gives a brief review of

human specch production system and an introduction of” some related

fundamental concepts An overview of prosody, which includes intonation, is

presonted in Chapter 3 Chapter 4 describes the genoral characteristics of Victnamese language and some studics on Victnamese prosody Fundamental

frequency, the acoustical correlate of intonation, and the problem of its

estimation are provided in Chapter 5 Chapter 6 presents the experiments

carried out in the work and the results obtained Finally, the conclusion and

the perspectives of the study are given in Chapter 7

Trang 11

2.1 Introduction

As we will see, intonation is based on the vibration of the vocal folds,

which is an inherent characteristic of the speech production process and thus,

in other words, once there is speech there is normally intonation too So the

understanding of speech production process is necessary for the

understanding of intonation formation In this chapter, a brief review of

human speech production system and the introduction of some fundamental concepts used in the thesis will be given

Spoken language is used to communicate information from a speaker to

a listener Specch production and perception arc both important componenis

of the speech chain Speech begins with a thought and intent to communicate

in the brain, which activates muscular movements Lo produce speech sounds

A listener reecives it in the audilory system, processing il for conversion to

neurological signals the brain can understand The speaker continuously

monilors and controls the voeal organs by receiving his or her own speech as

feedback Considering the universal components of specch communication as

shown in Figure 2.1, the fabric of spoken interaction is woven from many

dislinet clements The specch production process slaris with the semantic

message in a person’s mind to be transmitted to the listener via speech ‘Ihe

computer counterpart to the process of message formulation is the application

Trang 12

semanties that creates the concept to be expressed Alter the message is

created, the next slep is to convert the message into 4 sequence of words

Each word consists of a sequence of phonemes that corresponds to the

pronunciation of the words Each scntence also contains a prosodic pation

that denotes the duration of cach phoneme, intonation of the sentence, and

loudness of the sounds Once the language system finishes sentence, and

loudness of the sounds Once the language system finishes the mapping, the

talker executes a series of neuromuscular signals ‘The neuromuscular

commands perform articulatory mapping to control the vocal cords, lips, jaw,

tongue, and velum, thereby producing the sound sequence as the final output

The speech understanding process works in reverse order First the signal is

passed Lo the cochlea in the inner ear, which performs frequency analysis

a

filter bank A neural transduction process follows and converts the spectral

signal into activity signals on the auditory nerve, corresponding roughly to a

feature extraction component Currently, it 18 unclear how neural activity is

mapped into the language system and how message comprehension is

achieved im the brain.

Trang 13

Speech Generation Speech Understanding

Cozhles Motion

Figure 2.1 The underlying determinants of speech generation and understanding The gray

boxes indicate the corresponding computer system components for spoken language

processing [1]

2.2 Sound

Sound is a longitudinal pressure wave formed of compressions and

rarefactions of air molecules, in a direction parallel to that of the application

of energy Compressions are zones where air molecules have been forced by the application of energy into a tighter-than-usual configuration, and rarefactions are zones where air molecules are less tightly packed The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave as shown in Figure 2.2 In this representation, crests of the sine curve correspond to moments of maximal compression and troughs to

moments of maximal rarefaction.

Trang 14

Figure 2.2 Application of sound energy causes alternating compression/refraction of air

molecules, described by a sine wave [1]

The use of the sine graph in Figure 2.2 is only a notational convenience for charting local pressure variations over time, since sound does not form a

transverse wave, and the air particles are just oscillating in place along the

line of application of energy The amount of work done to generate the energy

that sets the air molecules in motion is reflected in the amount of

displacement of the molecules from their resting position This degree of displacement is measured as the amplitude of a sound as shown in Figure 2.2

2.3 Speech production

2.3.1 Articulators

Speech is produced by air-pressure waves emanating from the mouth and the nostrils of a speaker In most of the world’s languages, the inventory

of phonemes can be split into two basic classes:

* consonants - articulated in the presence of constrictions in the throat or obstructions in the mouth (tongue, teeth, lips) as we speak

vowels - articulated without major constrictions and obstructions

Trang 15

Velum

hare

front

Vocal Cos VÔ TT, `

Figure 2.3 A schematic diagram of the aman speech production apparatus

The sounds can be further partitioned into subgroups based on certain

articulatory properties These properties derive from the anatomy of a handful

of important articulators and the places where they touch the boundaries of the human vocal tract Additionally, a large number of muscles contribute to

articulator positioning and motion A schematic view of only the major

articulators is diagrammed in Figure 2.3 The gross components of the speech

production apparatus are the lungs, trachea, larynx (organ of voice

production), pharyngeal cavily (throat), oral and nasal cavity The pharyngeal

and oral cavities are typically referred to as the vocal tract, and the nasal cavity as the nasal tract As illustrated in Figure 2.3, the human speech

production apparatus consists of

« Lungs: source of air during speech

* Vocal cords: when the vocal cords are held close together and

oscillate against one another during a speech sound the sound is

Trang 16

said to be voiced When the cords are too slack or tense to

vibrate periodically, the sound is said to be unvoiced The place

where the vocal cords come together is called the glottis

Velum (Soft palate): operates as a valve, opening to allow

passage of air through the nasal cavity Sounds produced with

the flap open melude m and x

Hard palate: a long relatively hard surface at the roof inside the

mouth, which, when the tongue is placed against it, enables

consonant arliculalion

Tongue: Moxible articulator, shaped away from the palate for

vowels, placed close to or on the palate or other hard surfaces for

consonant articulation

Teeth: another place of articulation used to brace the tongue for

cortaim consonanls

Lips: can be rounded or spread to affect vowel quality, and

closed completely to stop the oral air flow in certam consonants

Œ, 6, m)

Trang 17

— Í tmanxmx `

LUNG VOLUME

Figure 2.4 Schematic representation of the complete physiological mechanism of speech

production [2]

A simplilicd representation of the complete physiological mechanism for creating speech is shown in Figure 2.4 Air enters the lung via the normal

breathing mechanism As air is expelled from the lung to the trachea (or

windprpe), the tensed vocal cords within the larynx are caused to vibrate (in

the mode of relaxation oscillator) by the air flow The air flow is chopped into quasi-periodic pulses which are the modulated in frequency in passing

through the throat (pharynx cavity), the mouth cavity, and possibly the nasal

cavity Depend on the positions of the various articulators, different sounds

are produced

2

The voicing mechanism

The most fundamental distinction between sound types in speech is the

voicedunveiced distinclion Voieed sounds, including vowels, have in their

time and frequency structure a roughly regular pattern that voiceless sounds,

Trang 18

Figure 2.5 A section of waveform of the utterance “sa” The unvoiced sownd *“s” im the

Jirst part and the voiced sound “a” in the second part

What in the speech production mechanism creates this fundamental distinction? When the vocal folds vibrate during phoneme articulation, the phoneme is considered voiced, otherwise it is unvoiced Voiced sounds are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxation oscillation, thereby producing quasi-periodic pulses of air which excite the vocal tract So the resulting speech waveform is quasi-periodic, Unvoiced sounds are generated by

forming a constriction at some point in the vocal tract (usually toward the

mouth end), and forcing air through the constriction at a high enough velocity

to produce turbulence This creates a broad-spectrum noise source to excite the vocal tract So the resulting speech waveform is aperiodic or random in

nature The vocal folds vibrate at slower or faster rates, from as low as 60

cycles per second (Hz) for a large man, to as high as 300 Hz or higher for a small woman or child The rate of cycling (opening and closing) of the vocal

folds in the larynx during phonation of voiced sounds is called the

fitndamental frequency or Fo This is because it sets the periodic baseline for

all higher-frequency harmonics contributed by the pharyngeal and oral resonance cavities above With an appropriate approximation, Fo correlates

well with the subjective experience of pitch (the rising and falling of voice

Trang 19

"¬=

#igire 2.6 Vocal fÐld cycling at the larynx (a) Closed with sub-glottal pressure buildup;

th) trans-glattal pressure differential causing folds to blaw apart, (c} pressure equalization and fissue elasticity forcing temporary reclosure of vocal folds, ready to begin nexi cycle

Lh

The glottal cycle is illustrated in Figure 2.6 At stage (a), the vocal

folds are closed and the air stream [rom the lungs is indicated by the arrow AL

some point, the air pressure on the underside of the barrier formed by the

vocal folds increases until it overcomes the resistance of the vocal fold

closure and the higher air pressure below blows them apart (b) However, the tissues and muscles of the larynx and the vocal folds have a natural elasticity

which tends to make thom fall back into place rapidly, once air pressure is

temporarily equalized (c} ‘he successive airbursts resulting from this process are the source of energy for all voiced sounds The time for a single open-

close cycle depends on the stiffness and size of the vocal folds and the amount

of sub-glottal air pressure These factors can be controlled by a speaker to raise and lower the perceived frequency er pitch of a voiced sound

The glottal air low (volume velocity wavelorm) and the rosuling sound pressure at the mouth for a typical vowel sound is shown in Figure 2.7

The glottal waveform shows a gradual build-up to a quasi-periodic pulse train

Trang 20

of air, taking about 15 ms to reach steady slate This build-up is also reflected

in the acoustic waveform shown al the bottom of the Ligure

§ cut

awd pon Bor

§ ~ ¬ai

Figure 2.7 Glotial airflow and the resulting sound pressure at the mouth [2].

Trang 21

Chapter 3 AN OVERVIEW OF PROSODY

3.1 The concepts of prosody and intonation

‘The term prosody (ngén diéu) refers to certain properties of the speech

signal such as audible changes in pitch, loudness, and syllable length [3] For

some authors the set of prosodic features also includes other aspects related to

speech timing such as rhythm and speech rate

Because prosodic events appear to be time-aligned with syllables or

groups of syllables rather than with segments (phonemes), they are also

referred to as suprasegmental phenomena

The term intonation (ngir digu) is used by some as a synonym for prosody It is restricted to the tonal (melodic) aspects of prosody by others [3]

(it will be the case here, a0) It means that, in the thesis, intonation refers 1a

pitch variation in speech production and is part of prosody

3.2 Levels of representation of prosodic phenomena

As for other properlies of the speech signal, prosodic events can be

studied at various levels of representation (see ‘lable 3.1):

» First, the acoustic level: the acoustic manifestation of prosody

(fundamental frequency, amplitude, and duration) can be

measured direclly, using specialized hardware or algorithms

(such as pitch determination algorithms),

Trang 22

21

w Socond, the perceptual level represents the prosodic events as

heard by the listener As lor spectral properties off speech sounds,

acoustic characteristics that can be measured are not always

percepble The perecptual representation is accessible to the individual listener, but this mental representation can hardly be

measured Alternatively it can be computed with a fair amount of

precision on the basis of our knowledge about psychoacoustics

« Finally, the linguistic level represents the prosody of an ultcrance

as a sequence of abstract units (signs, symbols), some of which

have a communicative function in speech, while others may just

fulfill syntactic requirements The linguistic structure of prosody

is not some hidden code that simply can be revealed using some standard procedure

Lable 3.1 Links between levels of representation of prosodic phenomena [3]

Fundamental frequency Pitch ‘lone, intonation, aspect of stress

(Fo)

As one moves away from acoustic level lowards the perceptual and/or linguistic levels, the measurement of some given prosodic property will

progressively involve segmentation (for example, into syllables), context

(such as relative prominence), and structural information (the linguistic

interpretation of a syllabic tone, for example, often depends on whether the related syllable is stressed or not, which requires a prior analysis of the

segmental layer)

Trang 23

3.3 The functions of prosody

The functions of prosody can be distinguished into those which modify

meaning and those which do not (sec Table 3.2) The former could also be

seen as the part of information which is consciously and intentionally

provided by the speaker, the message, whereas the latter involuntarily

accompanies It

Table 3.2 Information conveyed by prosody, ‘*’ marking feature discussed in this study

“4

focus intention, attitude sex

(native language, dialect)

Prosodic features have specific functions in speech communication

(One of the most uncontroversial functions of intonation is that of conveying

different illocutionary aspects, or sentence modes Thus it is commonly

maintained that a distinction between declarative and imterrogative modes 1s

one of the most universal characteristics of mtonation systems My work is

centered around the contribution of prosody to the expression of interrogalive

mode

One of the most apparent effects of prosody is that of focus Kor

instance, certain pitch events make a syllable stand out within the utterance,

and indirectly the word or syntactic group it belongs to will be highlighted as

an important or new component in the meaning of that utterance ‘The presence of a focus marking may have various effects, such as contrast,

Trang 24

23

depending on the place where il occurs, or the semantic context of the ullcrance

Prosodic features creale a segmentation of the specch chain inlo groups

of syllables, or, put the other way round, they give rise to the grouping of

syllables and words into larger chunks

All these aspects of intonation can be grouped under the header of

linguistic aspects of intonation ‘hey are part of the structure of language (and

specific to any given language) in the same way as morphology and syntax

arc The linguistic features concern the way a mossage is formally coded and

organized into intonational units of a certain language ‘hey correspond to the

surface slruclure of the message on a still rather abstract level The actual

meaning of the message can oficn not be decoded without interpreting the

underlying paralinguistic information Paralinguistic information is defined as

the information that is not mlerable from the wrilicn counterpart but is

deliberately added by the speaker to modify or supplement the linguistic

information A written sentence can be uttered in various ways to express

different intentions and attitudes which arc under the conscious control of the

speaker ‘Ihe question “Are you tired?”, for instance, is simply a request for

being supplied information on someone's psychological and physiological

condition If it is asked with a concerned undertone then the message may be

“Come on, you’ve been working so hard, you have to get yourself some

sleep!” With an ironical undertone, it may mean “You lazy guy, you’ve been

sleeping all day and still you're tired!”

‘There is, however, another range of phenomena that are also expressed

by prosodic means (such as pitch), but do not modify the meaning of a

message They can convey information about the age, gender, the emotional

Trang 25

or physical state of the speaker These factors are not directly related to

linguistic and paralinguistic contents of the utterances and cannot generally be

controlled by the speaker Angry people, for instance, usually have faster

pitch changes, a larger pitch range, and a larger dynamic amplitude range, whercas depressed poople typically show the apposite trend But while the

pitch range may be affected by such emotional factors, the basic functional

pitch shapes and configurations remain unaffected The emotional state docs not alter the linguistic code; it merely affects its realization ‘Ihis is why these

aspects are called non-linguistic aspects

The understanding of information conveyed by intonation is important

for intonation study Rach type af information has its cffect on tonal

variations, i.e intonation These effects need to be taken into account in

intonation analysis

3.4 Applications of intonation

Since intonation forms such a central part of human speech

communication, not only conveying diverse lmguistic mformation, but also

information about the speaker, the speaker’s mood and attitude, it certainly ought to be useful in many applications Apart from language technology and

spcoch synthesis, where intonation is an established application, divorse arcas

of medical as well as educational applications where intonation is less commonplace are being developed

Speech processing:

The increasing demand for the application of speech in man-machine

communication in all areas ranging from telephony, telematics, and

automated translation to aids for the handicapped requires sophisticated

technology for the analysis, recognilion and synthesis of speech, In the field

Trang 26

of speech recognition, the more the task develops from the recognition of

single words im a limited vocabulary lowards the understanding of complex

utterances, the more suprasegmental features like intonation have to be taken

into account These arc important cues for the segmentation and classilication (question vs declaration, for instanec) of utterances In this context, modeling

intonation is an important task

In speech synthesis, modeling intonational features is indispensable for

increasing the intelligibility and naturalness of synthetic speoch Sophisticated

models of intonation are needed which predict F) contours for a particular

sentence at a given speech rate

Automatic language identification could be important especially in

different telecom applications, when the spectral content of the specch could

be expected to be distorted Intonation cues are in this case especially

interesting The varied intonational structure of languages could be cxploitcd

in ths application In this recognition task, the intonation cues need to be

combined with other types of information

Speech Pathology:

Hearing impairments, especially if they are congenital or acquired at an

carly age, arc accompanied by a reduced imiolligibility of specch Major

factors for this are an imperfect command of phonatory effort and a lack of control of the laryngeal function which result in a high degree of variation in

the pitch patterns produced The spocch of hearing impaired people may

sound monotonous or on the contrary excessively emotional The basic pitch

is often kept on a level cither too high or loo low and it was observed thal

hearing impaired persons have difficulties in changing their pitch within a

Trang 27

single syllable Teaching aids have been developed to overcome these

problems which provide a feedback {or pitch over tactile or visual channels Foreign Language Education:

It is widely agreed that the acquisition of a good command of

intonational features in a foreign language is one of the most difficult tasks a

sludent must accomplish Yet it is crucial for the degree of intelligibility he or

she will achieve In traditional language education, however, intonation

usually comes second to segmental phonetics, which itself forms only a small

part in the curriculum of common language courses, This deficit has become

more apparent as the political and economical globalization requires beiter

communicative skills on the part of the leamer of a foreign language In this

conlext, individual computer-based language cducation will play a further

growing role Software is needed which is fit for the special problems of the

speaker of a language L1 who studies a larget language L2 Although a number of programs exist whereby the student can train his lexical,

grammatical or orthographic skills, there are few systems which use speech

input to help correct the student's pronunciation In this context, visualization

of speech can provide additional feedback where the auditory channel fails,

because of the mother tongue interference

It seems desirable to develop more intelligent systems which are

customized to the special requirements of students with the same native

language Contrastive studies of the intonational systems of L] and L2 can help to predict problems and select appropriate teaching materials

Trang 28

27

The understandmg of phonetic and phonological characteristics of a language has an important role in the studies on speech processing in general

and on intonation analysis im particular This chapier provides a review of

characteristics of Vietnamese language and some works of other authors

related to my study

4.1 General characteristics of Vietnamese language

Vietnamese is known as a tonal language which uses tone to distinguish lexical meaning Vietnamese has basically six lexical tones Each tone could

contribute to create the morpheme and meaning of word, e.g me, mé, mé, ma,

mé, me It is not the case for non-tonal languages In English, for example, the position of the stressed syllable within a word is lexically distinctive

4.1.1 Phoneme system

Viemamese phoneme system includes 14 vowels or vowel

combinations and 22 consonants

The Vietnamese vowels include cleven vowels and three diphthongs [5]

(see Table 4.1) All vowels are voiced sounds

Trang 29

Table 4.1 Vietnamese vowels

Alef 1a, yề ia, lê, ya,về |_ kia kia, yêu kiểu

An o/ va va, uỗ | tua rua, luôn luôn decd ®ị ưa ưa, ươ lưa thưa, lượt thượt

Vietnamese includes 22 consonants [5]:

Trang 30

29

Table 4.2 Vietnamese consonants

fa dé, gié and do d, gi duyén dang, gitt gin

Trang 31

of consonant in syllable Based on these features, Vietnamese consonants can

~~~ arliculutc position - apical

articulate method ~~~ _ dental_laminal

Viemamese grammarians and linguists have long considered the

syllable in Vietnamese as a fundamental unit A syllable in full structure (a tonal syllable) has five parts: initial sound, medial sound, nucleus sound, final

sound and lone (Table 4.4) [5] For instance, the syllable “ton” has following

components: initial sound /t/, medial sound /o/, nucleus sound /a/, final sound

‘n/, and tone “sac” (or rising tone) One syllable has to have a nucleus sound

Other components arc optional A nucleus sound could create onc syllable, for

instance a, 6, & Besides the initial sound (called INITLAL part), the rest of

the syllable is called a FINAL part A tone is a fumdamental frequency

variation spreading over the whole syllable A lone has the same function as 4

phoneme It always assigns for syllable and its influence covers the entire of

syllable Therc are a low constraints: 1Í a syllable ends with unvoiced

Trang 32

4.1.3 The tonal system

There are six syllabic tones in Vietnamese (sce Table 4.5) To deseribe

the tonal system on a physical basis, most linguists have studied tones in

isolated syllables where they are likely to be realized as close as possible

according 1o ther phonolype In term of distinctive [calures, Victnamose

tones can be described according to register, contour and glottalization (the

complete or partial closure of the gloltis during the articulation of another

sound) ‘lhese tones can be separated into two groups according to register:

“ngang”, “sắc”, “ngã” are realized in a higher register while “huyền”, “nặng”,

“hoi” are realized in a lowcr one Bascd on glotalizaLion [caturc, thesc six

tones can be classified into two groups: “ngã” and “nang” tones are

meg

glottalized whereas “ngang”, “huyén”, , “hỏi” are non-glottalized The

FO contours of the six Victnamese tones (cxamples are shown in Figure 4.1),

are described as follows [6]:

Table 4.5 The six Vietnamese tones,

Tone 1 | Tone2 | Tone 3 | Tone 4 | Tone 5 | Tone 6

Trang 33

Figure 4.1 Example of the contours of six tones (female subject PNY), as described in [7]

e Tone 1- Level tone (“ngang”): is a high tone At the beginning of syllable, it is the highest tone The steady state of the level contour is observed consistently

* Tone 2 - Falling tone (“huyén”): the onset of the falling tone is lower than tone 1, tone 5 and tone 3 The low FO at the onset gradually falls toward the end

e Tone 3 - Broken tone (“nga”): the onset is as high as that of the level of tone 5, it is higher than the falling tone The second third

of the contour of this tone is characterized by an abrupt dip

caused by a glottalization In most cases, the bottom of the dip

occurs between the mid-point and the point two-thirds from

onset A creaky voice is heard during this dip

e Tone 4 - Curve tone (“hỏi”): the onset is the lowest among the six tones The low onset falls further gradually until the point two-thirds from the onset From this point, the extremely low FO starts to rise toward the end.

Trang 34

33

© Tone 5 - Rising tone (“sac”): the onset is also high Starting from

high onset, the FO gradually rises for the first two thirds of the

duration After this point, the rise becomes more rapid

« ‘Tone 6 - Drop tone (“nang”): the onset is usually higher than that

of the falling or curve tone but considerably lower than the tonc

1, tone 5 and tone 3 This tone is characterized by a glottahzation

at the end and also by its considerably shorter duration than the

other tones The duration of this tone is approximately two thirds

of the other tones The main body of this tone is almost leveled

or slightly falling

These descriptions are only for the Northern dialect, in particular Tlanoi

dialect which is the standard dialect of Vietnamese They would be changed

with the other dialects in the South and the Center of Vietnam In these

regions, there arc only S$ tones mstcad of 6 like the Hanoi dialcet, because

tone 3 and tone 4 are pronounced identically

4.1.4, Tones in context

In continuous speech, tones seldom reach their target values They are

generally affected by context: stressed vs unstressed syllable, influence of

neighbouring tones, tempo These influences have rarely been studied

Tonal variation due to the influence of neighbouring tones is described by

linguists as a type of tonal coarticulation Dé ‘Ihé Ding [8] observed that after

arising lone such as “sic” or “nga”, any immediately following tone will start

one or two quarter tones higher than its normal target valuc, and aller a [alling

tone such as “nang” or “huyền” it will start one or two quarter tones lower

This variation is stronger in unstressed positions than in stressed ones, and in

spite of this, a relative difforenee in rogister and contour 1s preserved.

Trang 35

4.1.5 Modality, attitude and morphosyntactic structures

In Vietnamese there are two possible ways of expressing modality,

mood or altitude, the first only using prosodic features, and the sccond using

lexicon-syntactic markers, possibly combined with prosodic features [8] In

the first casc, as the pragmatic informalon relies entirely on prosodic

structure, it has to be clearly marked In the second case, as intonation become redundant, it is interesting to see if it can still play a role in characterizing the

pragmatic type

The Victnamese language has a sysiom of synlaclic markers which

occur mostly at the end (occasionally at the beginning or in the middle) of a

declarative sentence They are used lo express modal and attitudinal

meanings For example, trom a declaralive sence

Troi mua

‘we may obtain a-yes-no question by adding “không”

Trời mưa không?

With another morpheme “A”, we obtain a question expressing the

speaker’s surprise:

‘Troi mua 4?

‘The morphosyntactic elements can be put into three classes according

to their semantic values: question, imperative and attitudinal markers

Trang 36

surprise or astonishment and cannol be considered a “neutral” interrogative

type [8] Some controversies remain about the classification of interrogative

markers It seems, however, reasonable to distinguish two types of question

Yes-no questions use the following markers: “khéng” expresses a

question on the predicative relation itself, for mstance “Troi mua khéng?”,

“chưa” has an aspectual value, for example “‘Iroi mua chua?”: “hay” give an

explicit altemative choice, for example “Trời mưa hay trời nắng?

Open questions use indefinite words in the same way as wh-markers: ai

(who), bao gir (when), bao lau (how long), bao nhiGu (how many), bao xã

chow far}, dau (where), gi (what), mấy (hơw many), mây gid (at what time),

nig (which or what), nhu thé ndo (how), sao/tai sao/vi sao (why), sao/lam sao

(how)

Some linguists have also mentioned a third type of question called

biased questions (suggesting an expected answer) which are associated with

the expression of an allitude They are syntactically marked with the final

morphemes “a, uw” (surprise), “chứ” (logical evidence), “ha, hit, ho” Gnsisting

and astonishment), “nhé” (supposition, suggestion),

4.1.5.2 Injunctive markers

Injunction is expressed by the presence of “di” at the end of a

declarative structure, for instance “Trời mưa đit”

A weaker injunction is expressed with “nhé” and a stronger (insisting)

is exprossed with the compound marker “hiy di”

4.1.5.3, Attitude and emotional markers

In Vietnamese, a final marker can be used to express speaker altitude

Lé Thi Xuyén gave the following list: “a” (respect), “d4y” (admiration), “ri”

Trang 37

(conclusive), “ma” (insistence), “sao” (surprise), “ching” (doubt), “ha”

(anger), “nhi” (Jamiliarily), “vay” (cxternal obligation) [8 |

4.2 Some studies on Vietnamese prosody

In a tonal language like Vietnamese with six lexical tones which

moreaver has a system of morphosyntactic markers to express emotions,

attitudes, mood and modality, it would not be surprising if intonation play a lesser role than in non-tonal languages such as French or English: what is

usually conveyed by intonation in many other languages is already marked

This idea was developed by Gordina and Bystrov: “the more a language uses

xmorphosynlaotle or syntactic means ta express mood, modalily and emotions,

the less it would rely on intonation for the same functions” [8]

‘This explains why there are very few studies on intonation in

Vietnamese There are a few remarks in general grammar books The

statements about intonation made by grammarians or linguists are rather

intuitive, not based on experimental description For example, declarative sentences are said to be “falling” with such descriptive lerms as “fading” or

“decreasing” (Thompson), “falling” (Nguyén Sng 1.8m), “normal” or “low

pitch” (Jones and Huynh Sanh Théng), whereas interrogative sentences are

said to bơ “rising” (Nguyễn Đăng Liêm), “sustaming” (Thompson), “higher

pitch level 1” (Jones and Huỳnh Sanh Lhông) Expressive seniences on the other hand are said to have a rising contour with a higher pitch level: “higher

pitch level 2 or 3” (Jones and Huynh Sanh Thing), “increasing” (Thompson),

“tising-falling” (Nguyén Dang Liém) [8]

There are a small number of experimental studies by Gordina and

Bystrov, Lê Thị Xuyến, Nguyễn and Boulakia |9| These studies have already

given some ideas of the role and function of intonation in Vietnamese.

Trang 38

37

According to Gordina and Bystrov, the shorter the sentence, the greater

the difference between the intonation patterns [8] In their examples

(a) Anh ấy di sang nude Anh a?

(b) Anh ấy di sang nước Anh

(e) Không sách ả?

(đ) Không sách

the diferenec is greaLer beLwcen the (c) and (đ) patlcrns than between

{a) and (b) though in each case a declarative is contrasted with an

intorrogalive

According lo these same authors, an interrogative without a

morphosyntactic marker has a well differentiated pattern when compared to

an interrogative with a marker In:

(e) Khéng sach a?

(f) Khéng sach?

(g) Khang sach

The difference between the intonation patterns of (g) and (f) is greater

than between (g) and (c)

Lê Thị Xuyến seL out to establish whether six different attiludinal

sentence types (statement, irony, exasperation, anger, sadness and admiration} were differentiated on the prosodic level [8] Her corpus consists of five

sentences as follows

“Mua.”

“Trời mưa.”

“Cô ta xmh.”

Trang 39

“Khuya rồi"

“Nam về lúc khoảng một rưỡi `

Lach sentence was read with different attitudes by 2 speakers (one

male, one female) and judged by 20 hears Results of her experiments showed

that only irony anger and statement were identified above chance level (75%,

52.5%, 67.5% respectively) According to her, the neutral declarative is

characterized by a low register and a moderate tempo; irony has a higher

register, a larger tone movement and a slower tempo resulting in increased

gonlonce length; whercas anger is conveycd by a spoeding up of tempo,

greater and more abrupt pitch movement, shortening of the utterance and an

increase in the overall intensity

In order lo bring oul the pertinent prosodic features corresponding Lo

assertive, mlorrogative and umperative modes, while cxcluding altitudinal

variations, and to produce natural Vietnamese utterances, Neuyén and

Boulakia [9| uscd a cerlain number of ullcrance pairs in which the final

question or imperative marker can be replaced by a homonymous lexical item

The resulting pairs have the same syllabic and tonal structures but differing

morphosynlactic structurcs They arc therclore considered to be ambiguous

and if they are discriminated, it has to be due to the presence of prosodic differences Some example pairs of sentences in their corpus are as follows:

Statement Question:

Lan thích 4n com khéng (Lan only likes to eat rice.)

Lan thích ấn cơm không? (Does Lan like to eat rice?) Statement — Imperative:

Trang 40

Tan bé di chit? (TAn, did you leave?)

"Tân bỏ di chit! (Tân, do leave it!)

From five morphemes “khéng”, “ha”, “sao”, “chur”, “di”, mine such

pairs of sentences were formulated These 18 sentences were read by 4

speakers (2 males and 2 females) and judged by 22 hearers, The resulls of

prosodic analysis showed that questions arc shorter than statements and this

difference is significant Imperatives are even shorter but the difference with

questions is not significant, In terms of intensity, the difference is signilicant

for the statement-imperative pair, but not for the statement-question and

question-imperative ones About the intonation, the two members of the same

pair have the same overall FO contour but there is a difference in terms of

register: the register of questions and imperatives is clearly higher than that of statements, while there is no difference between questions and imperatives

(Figure 4.2) ‘There is an obvious difference in the last syllable: the “ngang”

tone falls in statements and is much higher and rising in questions, while the

moan value and movement is half way between for imperatives The rising

tones, “sic” or “héi”, rise even more in the case of questions than in

statements, while tend to become flat or even fall slightly in the final parl in

imperatives It means that there is an influence of the intonation on ihe fmal-

syllable tone of the sentence

Ngày đăng: 09/06/2025, 12:57

🧩 Sản phẩm bạn có thể quan tâm