1. Trang chủ
  2. » Luận Văn - Báo Cáo

Characterization of vietnamese intonation for questions

84 11 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 84
Dung lượng 768,45 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Since intonation forms such a central part of human speech communication, not only conveying diverse linguistic information, but also information about the speaker, the speaker’s mood an

Trang 1

THESIS FOR THE DEGREE OF MASTER OF SCIENCE

Trang 2

Acknowledgments

Firstly, I would like to express my gratitude to my supervisor, Dr Eric Castelli, whose expertise, understanding, patience, added considerably and constructively critical eye to my graduate experience

Special thanks go to Dr Nguyen Trong Giang and Dr Pham Thi Ngoc Yen for supporting me the best convenient conditions during my working time at International Research Center MICA

I would like to thank to PhD students Nguyen Viet Tung, Tran Do Dat,

Vu Minh Quang and Le Xuan Hung who helped me a lot in finishing the thesis

I would also like to thank my family, especially my parents for the support they provided me through my entire life, without whose care, encouragement I would not have finished this thesis

Finally, thanks go to all of my colleagues who helped me while I worked on this thesis

Trang 3

Table of contents

Acknowledgments 1

List of Figures 4

List of Tables 7

Chapter 1 INTRODUCTION 8

Chapter 2 SPEECH PRODUCTION PROCESS 10

2.1 Introduction 10

2.2 Sound 12

2.3 Speech production 13

2.3.1 Articulators 13

2.3.2 The voicing mechanism 16

Chapter 3 AN OVERVIEW OF PROSODY 20

3.1 The concepts of prosody and intonation 20

3.2 Levels of representation of prosodic phenomena 20

3.3 The functions of prosody 22

3.4 Applications of intonation 24

Chapter 4 PROSODY IN VIETNAMESE 27

4.1 General characteristics of Vietnamese language 27

4.1.1 Phoneme system 27

4.1.2 Syllable structure 30

4.1.3 The tonal system 31

4.1.4 Tones in context 33

4.1.5 Modality, attitude and morphosyntactic structures 34

4.2 Some studies on Vietnamese prosody 36

Trang 4

Chapter 5 FUNDAMENTAL FREQUENCY DETECTION 41

5.1 Introduction 41

5.2 Some pitch detection algorithms 43

5.2.1 The autocorrelation method 43

5.2.2 The average magnitude difference function method 46

5.2.3 The simple inverse filtering tracking method 48

5.2.4 The cepstrum-based method 49

5.3 The Praat pitch tracker 50

5.3.1 Introduction 50

5.3.2 Windowing and sampling problems 51

5.3.3 Evaluation 54

Chapter 6 EXPERIMENTAL INTONATION ANALYSIS 58

6.1 Objective 58

6.2 Speech corpus 59

6.3 Hypotheses 60

6.4 Experiments 62

6.4.1 First experiment 62

6.4.2 Second experiment 66

6.4.3 Third experiment 68

Chapter 7 CONCLUSION AND PERSPECTIVES 74

References 76

Appendix 78

A List of questions in the corpus 78

B List of statements in the corpus 80

Trang 5

List of Figures

Figure 2.1 The underlying determinants of speech generation and understanding The gray boxes indicate the corresponding computer system components for spoken language processing [1] 12Figure 2.2 Application of sound energy causes alternating compression/refraction of air molecules, described by a sine wave [1] 13Figure 2.3 A schematic diagram of the human speech production apparatus 14Figure 2.4 Schematic representation of the complete physiological mechanism

of speech production [2] 16Figure 2.5 A section of waveform of the utterance “sa” The unvoiced sound

“s” in the first part and the voiced sound “a” in the second part 17Figure 2.6 Vocal fold cycling at the larynx (a) Closed with sub-glottal pressure buildup; (b) trans-glottal pressure differential causing folds to blow apart; (c) pressure equalization and tissue elasticity forcing temporary reclosure of vocal folds, ready to begin next cycle [1] 18Figure 2.7 Glottal airflow and the resulting sound pressure at the mouth [2] 19Figure 4.1 Example of the contours of six tones (female subject PNY), as described in [7] 32Figure 4.2 F0 variations of 2 typical pairs of sentences in [9]: 40Figure 5.1 Autocorrelation function for (a) and (b) voiced speech, and (c) unvoiced speech [10] 44Figure 5.2 Example of waveforms and correlation function: (a) no clipping, (b) center clipped [10] 46Figure 5.3 AMDF function for same speech segments as in Figure 5.1 [10] 47

Trang 6

Figure 5.4 Block diagram of the SIFT algorithm [10] 48Figure 5.5 Cepstrum of an example segment of: (a) voiced speech, (b) unvoiced speech 49Figure 5.6 Windowing a signal and estimating the ACF of a signal segment from the ACF of its windowed version [15] 51Figure 5.7 Some F0 points are detected in the unvoiced consonant “kh” of the word “không” (female subject HT) 55Figure 5.8 Pitch halving errors in the middle of the word “trà” (female subject LH) 55Figure 5.9 Some F0 points are missed in the voiced consonant “b” of the word

“biết” (female subject LH) 55Figure 5.10 Some F0 points are missed in the middle of the word “rõ” (female subject VL) 56Figure 5.11 Some F0 points are missed at the end of the word “vậy” (male subject VN) 56Figure 6.1 Speech waveform (in the background) and F0 contour (blue dotted line) of the utterance “Bây giờ anh ở đâu?” (male subject ND) The final syllable “đâu” is bounded by two vertical lines 62Figure 6.2 F0 contour (blue dotted line) and proposed intonation contour (red

VN) 64Figure 6.3 The intonation contour (red dotted line) of the statement “Bà ấy làm giáo viên.” (male subject VN) 66

Trang 7

Figure 6.4 The intonation contour (red dotted line) of the question “Bà có nhìn rõ không?” (male subject VN) 66Figure 6.5 F0 level of all speakers for questions (Q) and statements (S) 67Figure 6.6 Time waveform (top), F0 contour (middle) and the position of 5 representative points 69

Trang 8

List of Tables

Table 3.1 Links between levels of representation of prosodic phenomena [3]

21

Table 3.2 Information conveyed by prosody, ‘*’ marking feature discussed in this study [4] 22

Table 4.1 Vietnamese vowels 28

Table 4.2 Vietnamese consonants 29

Table 4.3 Arrangement of Vietnamese consonants 30

Table 4.4 The phonological hierarchy of Vietnamese syllables with total numbers of each phonetic unit [6] 31

Table 4.5 The six Vietnamese tones 31

Table 5.1 Praat PDA evaluation for male speech and female speech 57

Table 6.1 Speakers’ information 60

Table 6.2 Statistics on F0 level of all speakers for questions (Q) and statements (S) including: mean, minimum (min), maximum (max) and standard deviation (std) 67

Table 6.3 Representative values of “ngang” tone in final position of questions and statements for all speakers 70

Trang 9

Since intonation forms such a central part of human speech communication, not only conveying diverse linguistic information, but also information about the speaker, the speaker’s mood and attitude, it certainly ought to be useful in such above applications In the field of speech recognition, the more the task develops from the recognition of single words

in a limited vocabulary towards the understanding of complex utterances, the more suprasegmental features like intonation have to be taken into account These are important cues for the segmentation and classification (question vs statement, for instance) of utterances In speech synthesis, modeling intonational features is indispensable for increasing the intelligibility and naturalness of synthetic speech This is the reason I chose to study the characteristic of Vietnamese intonation in questions

Trang 10

The thesis is organized as follow Chapter 2 gives a brief review of human speech production system and an introduction of some related fundamental concepts An overview of prosody, which includes intonation, is presented in Chapter 3 Chapter 4 describes the general characteristics of Vietnamese language and some studies on Vietnamese prosody Fundamental frequency, the acoustical correlate of intonation, and the problem of its estimation are provided in Chapter 5 Chapter 6 presents the experiments carried out in the work and the results obtained Finally, the conclusion and the perspectives of the study are given in Chapter 7

Trang 11

Chapter 2 SPEECH PRODUCTION PROCESS

Spoken language is used to communicate information from a speaker to

a listener Speech production and perception are both important components

of the speech chain Speech begins with a thought and intent to communicate

in the brain, which activates muscular movements to produce speech sounds

A listener receives it in the auditory system, processing it for conversion to neurological signals the brain can understand The speaker continuously monitors and controls the vocal organs by receiving his or her own speech as feedback Considering the universal components of speech communication as shown in Figure 2.1, the fabric of spoken interaction is woven from many distinct elements The speech production process starts with the semantic message in a person’s mind to be transmitted to the listener via speech The computer counterpart to the process of message formulation is the application

Trang 12

semantics that creates the concept to be expressed After the message is created, the next step is to convert the message into a sequence of words Each word consists of a sequence of phonemes that corresponds to the pronunciation of the words Each sentence also contains a prosodic pattern that denotes the duration of each phoneme, intonation of the sentence, and loudness of the sounds Once the language system finishes sentence, and loudness of the sounds Once the language system finishes the mapping, the talker executes a series of neuromuscular signals The neuromuscular commands perform articulatory mapping to control the vocal cords, lips, jaw, tongue, and velum, thereby producing the sound sequence as the final output The speech understanding process works in reverse order First the signal is passed to the cochlea in the inner ear, which performs frequency analysis as a filter bank A neural transduction process follows and converts the spectral signal into activity signals on the auditory nerve, corresponding roughly to a feature extraction component Currently, it is unclear how neural activity is mapped into the language system and how message comprehension is achieved in the brain

Trang 13

Figure 2.1 The underlying determinants of speech generation and understanding The gray boxes indicate the corresponding computer system components for spoken language

Trang 14

Figure 2.2 Application of sound energy causes alternating compression/refraction of air

molecules, described by a sine wave [1]

The use of the sine graph in Figure 2.2 is only a notational convenience for charting local pressure variations over time, since sound does not form a transverse wave, and the air particles are just oscillating in place along the line of application of energy The amount of work done to generate the energy that sets the air molecules in motion is reflected in the amount of displacement of the molecules from their resting position This degree of displacement is measured as the amplitude of a sound as shown in Figure 2.2

2.3 Speech production

2.3.1 Articulators

Speech is produced by air-pressure waves emanating from the mouth and the nostrils of a speaker In most of the world’s languages, the inventory

of phonemes can be split into two basic classes:

throat or obstructions in the mouth (tongue, teeth, lips) as we speak

Trang 15

Figure 2.3 A schematic diagram of the human speech production apparatus

The sounds can be further partitioned into subgroups based on certain articulatory properties These properties derive from the anatomy of a handful

of important articulators and the places where they touch the boundaries of the human vocal tract Additionally, a large number of muscles contribute to articulator positioning and motion A schematic view of only the major articulators is diagrammed in Figure 2.3 The gross components of the speech production apparatus are the lungs, trachea, larynx (organ of voice production), pharyngeal cavity (throat), oral and nasal cavity The pharyngeal and oral cavities are typically referred to as the vocal tract, and the nasal cavity as the nasal tract As illustrated in Figure 2.3, the human speech production apparatus consists of:

oscillate against one another during a speech sound, the sound is

Trang 16

said to be voiced When the cords are too slack or tense to vibrate periodically, the sound is said to be unvoiced The place where the vocal cords come together is called the glottis

passage of air through the nasal cavity Sounds produced with

mouth, which, when the tongue is placed against it, enables consonant articulation

vowels, placed close to or on the palate or other hard surfaces for consonant articulation

certain consonants

closed completely to stop the oral air flow in certain consonants (p, b, m)

Trang 17

Figure 2.4 Schematic representation of the complete physiological mechanism of speech

production [2]

A simplified representation of the complete physiological mechanism for creating speech is shown in Figure 2.4 Air enters the lung via the normal breathing mechanism As air is expelled from the lung to the trachea (or windpipe), the tensed vocal cords within the larynx are caused to vibrate (in the mode of relaxation oscillator) by the air flow The air flow is chopped into quasi-periodic pulses which are the modulated in frequency in passing through the throat (pharynx cavity), the mouth cavity, and possibly the nasal cavity Depend on the positions of the various articulators, different sounds are produced

2.3.2 The voicing mechanism

The most fundamental distinction between sound types in speech is the

voiced/unvoiced distinction Voiced sounds, including vowels, have in their

time and frequency structure a roughly regular pattern that voiceless sounds,

Trang 18

such as consonants like s, lack Voiced sounds typically have more energy as shown in Figure 2.5 We see here the a part of the waveform of the utterance

“sa”, which consists of two phonemes: unvoiced consonant /s/ and vowel /a/

Figure 2.5 A section of waveform of the utterance “sa” The unvoiced sound “s” in the

first part and the voiced sound “a” in the second part

What in the speech production mechanism creates this fundamental distinction? When the vocal folds vibrate during phoneme articulation, the

phoneme is considered voiced; otherwise it is unvoiced Voiced sounds are

produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxation oscillation, thereby producing quasi-periodic pulses of air which excite the vocal tract So the resulting

speech waveform is quasi-periodic Unvoiced sounds are generated by

forming a constriction at some point in the vocal tract (usually toward the mouth end), and forcing air through the constriction at a high enough velocity

to produce turbulence This creates a broad-spectrum noise source to excite the vocal tract So the resulting speech waveform is aperiodic or random in nature The vocal folds vibrate at slower or faster rates, from as low as 60 cycles per second (Hz) for a large man, to as high as 300 Hz or higher for a small woman or child The rate of cycling (opening and closing) of the vocal folds in the larynx during phonation of voiced sounds is called the

fundamental frequency or F 0 This is because it sets the periodic baseline for all higher-frequency harmonics contributed by the pharyngeal and oral

well with the subjective experience of pitch (the rising and falling of voice

Trang 19

tones) It is therefore common practice to use the terms F0 and pitch interchangeably, and in the remainder of the thesis I will do the same

Figure 2.6 Vocal fold cycling at the larynx (a) Closed with sub-glottal pressure buildup; (b) trans-glottal pressure differential causing folds to blow apart; (c) pressure equalization and tissue elasticity forcing temporary reclosure of vocal folds, ready to begin next cycle [1]

The glottal cycle is illustrated in Figure 2.6 At stage (a), the vocal folds are closed and the air stream from the lungs is indicated by the arrow At some point, the air pressure on the underside of the barrier formed by the vocal folds increases until it overcomes the resistance of the vocal fold closure and the higher air pressure below blows them apart (b) However, the tissues and muscles of the larynx and the vocal folds have a natural elasticity which tends to make them fall back into place rapidly, once air pressure is temporarily equalized (c) The successive airbursts resulting from this process are the source of energy for all voiced sounds The time for a single open-close cycle depends on the stiffness and size of the vocal folds and the amount

of sub-glottal air pressure These factors can be controlled by a speaker to raise and lower the perceived frequency or pitch of a voiced sound

The glottal air flow (volume velocity waveform) and the resulting sound pressure at the mouth for a typical vowel sound is shown in Figure 2.7 The glottal waveform shows a gradual build-up to a quasi-periodic pulse train

Trang 20

of air, taking about 15 ms to reach steady state This build-up is also reflected

in the acoustic waveform shown at the bottom of the figure

Figure 2.7 Glottal airflow and the resulting sound pressure at the mouth [2]

Trang 21

Chapter 3 AN OVERVIEW OF PROSODY

3.1 The concepts of prosody and intonation

The term prosody (ngôn điệu) refers to certain properties of the speech

signal such as audible changes in pitch, loudness, and syllable length [3] For some authors the set of prosodic features also includes other aspects related to speech timing such as rhythm and speech rate

Because prosodic events appear to be time-aligned with syllables or groups of syllables rather than with segments (phonemes), they are also referred to as suprasegmental phenomena

prosody It is restricted to the tonal (melodic) aspects of prosody by others [3] (it will be the case here, too) It means that, in the thesis, intonation refers to pitch variation in speech production and is part of prosody

3.2 Levels of representation of prosodic phenomena

As for other properties of the speech signal, prosodic events can be studied at various levels of representation (see Table 3.1):

(fundamental frequency, amplitude, and duration) can be measured directly, using specialized hardware or algorithms (such as pitch determination algorithms)

Trang 22

Second, the perceptual level represents the prosodic events as

heard by the listener As for spectral properties of speech sounds, acoustic characteristics that can be measured are not always perceptible The perceptual representation is accessible to the individual listener, but this mental representation can hardly be measured Alternatively it can be computed with a fair amount of precision on the basis of our knowledge about psychoacoustics

as a sequence of abstract units (signs, symbols), some of which have a communicative function in speech, while others may just fulfill syntactic requirements The linguistic structure of prosody

is not some hidden code that simply can be revealed using some standard procedure

Table 3.1 Links between levels of representation of prosodic phenomena [3]

Acoustic Perceptual Linguistic

Fundamental frequency

(F0)

As one moves away from acoustic level towards the perceptual and/or linguistic levels, the measurement of some given prosodic property will progressively involve segmentation (for example, into syllables), context (such as relative prominence), and structural information (the linguistic interpretation of a syllabic tone, for example, often depends on whether the related syllable is stressed or not, which requires a prior analysis of the segmental layer)

Trang 23

3.3 The functions of prosody

The functions of prosody can be distinguished into those which modify meaning and those which do not (see Table 3.2) The former could also be seen as the part of information which is consciously and intentionally provided by the speaker, the message, whereas the latter involuntarily accompanies it

Table 3.2 Information conveyed by prosody, ‘*’ marking feature discussed in this study

speaker's intention, attitude

age sex speaker's background (native language, dialect) emotional condition Prosodic features have specific functions in speech communication One of the most uncontroversial functions of intonation is that of conveying

different illocutionary aspects, or sentence modes Thus it is commonly

maintained that a distinction between declarative and interrogative modes is one of the most universal characteristics of intonation systems My work is centered around the contribution of prosody to the expression of interrogative mode

One of the most apparent effects of prosody is that of focus For instance, certain pitch events make a syllable stand out within the utterance, and indirectly the word or syntactic group it belongs to will be highlighted as

an important or new component in the meaning of that utterance The presence of a focus marking may have various effects, such as contrast,

Trang 24

depending on the place where it occurs, or the semantic context of the utterance

Prosodic features create a segmentation of the speech chain into groups

of syllables, or, put the other way round, they give rise to the grouping of syllables and words into larger chunks

All these aspects of intonation can be grouped under the header of

linguistic aspects of intonation They are part of the structure of language (and

specific to any given language) in the same way as morphology and syntax are The linguistic features concern the way a message is formally coded and organized into intonational units of a certain language They correspond to the surface structure of the message on a still rather abstract level The actual meaning of the message can often not be decoded without interpreting the

underlying paralinguistic information Paralinguistic information is defined as

the information that is not inferable from the written counterpart but is deliberately added by the speaker to modify or supplement the linguistic information A written sentence can be uttered in various ways to express different intentions and attitudes which are under the conscious control of the speaker The question “Are you tired?”, for instance, is simply a request for being supplied information on someone’s psychological and physiological condition If it is asked with a concerned undertone then the message may be:

“Come on, you’ve been working so hard, you have to get yourself some sleep!” With an ironical undertone, it may mean “You lazy guy, you’ve been sleeping all day and still you’re tired!”

There is, however, another range of phenomena that are also expressed

by prosodic means (such as pitch), but do not modify the meaning of a message They can convey information about the age, gender, the emotional

Trang 25

or physical state of the speaker These factors are not directly related to linguistic and paralinguistic contents of the utterances and cannot generally be controlled by the speaker Angry people, for instance, usually have faster pitch changes, a larger pitch range, and a larger dynamic amplitude range; whereas depressed people typically show the opposite trend But while the pitch range may be affected by such emotional factors, the basic functional pitch shapes and configurations remain unaffected The emotional state does not alter the linguistic code; it merely affects its realization This is why these

aspects are called non-linguistic aspects

The understanding of information conveyed by intonation is important for intonation study Each type of information has its effect on tonal variations, i.e intonation These effects need to be taken into account in intonation analysis

3.4 Applications of intonation

Since intonation forms such a central part of human speech communication, not only conveying diverse linguistic information, but also information about the speaker, the speaker’s mood and attitude, it certainly ought to be useful in many applications Apart from language technology and speech synthesis, where intonation is an established application, diverse areas

of medical as well as educational applications where intonation is less commonplace are being developed

Speech processing:

The increasing demand for the application of speech in man-machine communication in all areas ranging from telephony, telematics, and automated translation to aids for the handicapped requires sophisticated technology for the analysis, recognition and synthesis of speech In the field

Trang 26

of speech recognition, the more the task develops from the recognition of

single words in a limited vocabulary towards the understanding of complex utterances, the more suprasegmental features like intonation have to be taken into account These are important cues for the segmentation and classification (question vs declaration, for instance) of utterances In this context, modeling intonation is an important task

In speech synthesis, modeling intonational features is indispensable for

increasing the intelligibility and naturalness of synthetic speech Sophisticated

sentence at a given speech rate

Automatic language identification could be important especially in

different telecom applications, when the spectral content of the speech could

be expected to be distorted Intonation cues are in this case especially interesting The varied intonational structure of languages could be exploited

in this application In this recognition task, the intonation cues need to be combined with other types of information

Speech Pathology:

Hearing impairments, especially if they are congenital or acquired at an early age, are accompanied by a reduced intelligibility of speech Major factors for this are an imperfect command of phonatory effort and a lack of control of the laryngeal function which result in a high degree of variation in the pitch patterns produced The speech of hearing impaired people may sound monotonous or on the contrary excessively emotional The basic pitch

is often kept on a level either too high or too low and it was observed that hearing impaired persons have difficulties in changing their pitch within a

Trang 27

single syllable Teaching aids have been developed to overcome these problems which provide a feedback for pitch over tactile or visual channels

Foreign Language Education:

It is widely agreed that the acquisition of a good command of intonational features in a foreign language is one of the most difficult tasks a student must accomplish Yet it is crucial for the degree of intelligibility he or she will achieve In traditional language education, however, intonation usually comes second to segmental phonetics, which itself forms only a small part in the curriculum of common language courses This deficit has become more apparent as the political and economical globalization requires better communicative skills on the part of the learner of a foreign language In this context, individual computer-based language education will play a further growing role Software is needed which is fit for the special problems of the speaker of a language L1 who studies a target language L2 Although a number of programs exist whereby the student can train his lexical, grammatical or orthographic skills, there are few systems which use speech input to help correct the student's pronunciation In this context, visualization

of speech can provide additional feedback where the auditory channel fails, because of the mother tongue interference

It seems desirable to develop more intelligent systems which are customized to the special requirements of students with the same native language Contrastive studies of the intonational systems of L1 and L2 can help to predict problems and select appropriate teaching materials

Trang 28

Chapter 4 PROSODY IN VIETNAMESE

The understanding of phonetic and phonological characteristics of a language has an important role in the studies on speech processing in general and on intonation analysis in particular This chapter provides a review of characteristics of Vietnamese language and some works of other authors related to my study

4.1 General characteristics of Vietnamese language

Vietnamese is known as a tonal language which uses tone to distinguish lexical meaning Vietnamese has basically six lexical tones Each tone could

mẽ, mẹ It is not the case for non-tonal languages In English, for example, the position of the stressed syllable within a word is lexically distinctive

Trang 29

Table 4.1 Vietnamese vowels

Transcription Reading Letters Example

Vietnamese includes 22 consonants [5]:

Trang 30

Table 4.2 Vietnamese consonants

Transcription Reading Letter Example

Trang 31

of consonant in syllable Based on these features, Vietnamese consonants can

varieties of Vietnamese, the whole tonal paradigm can occur

Trang 32

Table 4.4 The phonological hierarchy of Vietnamese syllables with total numbers of each

phonetic unit [6]

TONAL SYLLABLE (6492)

BASE SYLLABLE (2376) Initial

(22)

Final (155)

Medial (1) Nucleus (16) Ending (8)

TONE (6)

4.1.3 The tonal system

There are six syllabic tones in Vietnamese (see Table 4.5) To describe the tonal system on a physical basis, most linguists have studied tones in isolated syllables where they are likely to be realized as close as possible according to their phonotype In term of distinctive features, Vietnamese tones can be described according to register, contour and glottalization (the complete or partial closure of the glottis during the articulation of another sound) These tones can be separated into two groups according to register:

“ngang”, “sắc”, “ngã” are realized in a higher register while “huyền”, “nặng”,

“hỏi” are realized in a lower one Based on glottalization feature, these six tones can be classified into two groups: “ngã” and “nặng” tones are

F0 contours of the six Vietnamese tones (examples are shown in Figure 4.1), are described as follows [6]:

Table 4.5 The six Vietnamese tones

Trang 33

Figure 4.1 Example of the contours of six tones (female subject PNY), as described in [7]

syllable, it is the highest tone The steady state of the level contour is observed consistently

lower than tone 1, tone 5 and tone 3 The low F0 at the onset gradually falls toward the end

level of tone 5, it is higher than the falling tone The second third

of the contour of this tone is characterized by an abrupt dip caused by a glottalization In most cases, the bottom of the dip occurs between the mid-point and the point two-thirds from onset A creaky voice is heard during this dip

six tones The low onset falls further gradually until the point two-thirds from the onset From this point, the extremely low F0 starts to rise toward the end

Trang 34

• Tone 5 - Rising tone (“sắc”): the onset is also high Starting from high onset, the F0 gradually rises for the first two thirds of the duration After this point, the rise becomes more rapid

of the falling or curve tone but considerably lower than the tone

1, tone 5 and tone 3 This tone is characterized by a glottalization

at the end and also by its considerably shorter duration than the other tones The duration of this tone is approximately two thirds

of the other tones The main body of this tone is almost leveled

or slightly falling

These descriptions are only for the Northern dialect, in particular Hanoi dialect which is the standard dialect of Vietnamese They would be changed with the other dialects in the South and the Center of Vietnam In these regions, there are only 5 tones instead of 6 like the Hanoi dialect, because tone 3 and tone 4 are pronounced identically

4.1.4 Tones in context

In continuous speech, tones seldom reach their target values They are generally affected by context: stressed vs unstressed syllable, influence of neighbouring tones, tempo… These influences have rarely been studied Tonal variation due to the influence of neighbouring tones is described by linguists as a type of tonal coarticulation Đỗ Thế Dũng [8] observed that after

a rising tone such as “sắc” or “ngã”, any immediately following tone will start one or two quarter tones higher than its normal target value, and after a falling tone such as “nặng” or “huyền” it will start one or two quarter tones lower This variation is stronger in unstressed positions than in stressed ones, and in spite of this, a relative difference in register and contour is preserved

Trang 35

4.1.5 Modality, attitude and morphosyntactic structures

In Vietnamese there are two possible ways of expressing modality, mood or attitude, the first only using prosodic features, and the second using lexicon-syntactic markers, possibly combined with prosodic features [8] In the first case, as the pragmatic information relies entirely on prosodic structure, it has to be clearly marked In the second case, as intonation become redundant, it is interesting to see if it can still play a role in characterizing the pragmatic type

The Vietnamese language has a system of syntactic markers which occur mostly at the end (occasionally at the beginning or in the middle) of a declarative sentence They are used to express modal and attitudinal meanings For example, from a declarative sentence

Trời mưa

we may obtain a yes-no question by adding “không”:

Trời mưa không?

With another morpheme “à”, we obtain a question expressing the speaker’s surprise:

Trời mưa à?

The morphosyntactic elements can be put into three classes according

to their semantic values: question, imperative and attitudinal markers

4.1.5.1 Question markers

It has been considered that only questions with morphosyntactic markers express simple interrogative modality in Vietnamese, and that questions with only prosodic markers are always interrogatives expressing

Trang 36

surprise or astonishment and cannot be considered a “neutral” interrogative type [8] Some controversies remain about the classification of interrogative markers It seems, however, reasonable to distinguish two types of question

Yes-no questions use the following markers: “không” expresses a question on the predicative relation itself, for instance “Trời mưa không?”;

“chưa” has an aspectual value, for example “Trời mưa chưa?”; “hay” give an explicit alternative choice, for example “Trời mưa hay trời nắng?”

Open questions use indefinite words in the same way as wh-markers: ai (who), bao giờ (when), bao lâu (how long), bao nhiêu (how many), bao xa (how far), đâu (where), gì (what), mấy (how many), mấy giờ (at what time), nào (which or what), như thế nào (how), sao/tại sao/vì sao (why), sao/làm sao (how)

Some linguists have also mentioned a third type of question called biased questions (suggesting an expected answer) which are associated with the expression of an attitude They are syntactically marked with the final morphemes “à, ư” (surprise), “chứ” (logical evidence), “hả, hử, hở” (insisting and astonishment), “nhé” (supposition, suggestion)

4.1.5.2 Injunctive markers

Injunction is expressed by the presence of “đi” at the end of a declarative structure, for instance “Trời mưa đi!”

A weaker injunction is expressed with “nhé” and a stronger (insisting)

is expressed with the compound marker “hãy…đi”

4.1.5.3 Attitude and emotional markers

In Vietnamese, a final marker can be used to express speaker attitude

Lê Thị Xuyến gave the following list: “ạ” (respect), “đấy” (admiration), “rồi”

Trang 37

(conclusive), “mà” (insistence), “sao” (surprise), “chăng” (doubt), “hả” (anger), “nhỉ” (familiarity), “vậy” (external obligation) [8]

4.2 Some studies on Vietnamese prosody

In a tonal language like Vietnamese with six lexical tones which moreover has a system of morphosyntactic markers to express emotions, attitudes, mood and modality, it would not be surprising if intonation play a lesser role than in non-tonal languages such as French or English: what is usually conveyed by intonation in many other languages is already marked This idea was developed by Gordina and Bystrov: “the more a language uses morphosyntactic or syntactic means to express mood, modality and emotions, the less it would rely on intonation for the same functions” [8]

This explains why there are very few studies on intonation in Vietnamese There are a few remarks in general grammar books The statements about intonation made by grammarians or linguists are rather intuitive, not based on experimental description For example, declarative sentences are said to be “falling” with such descriptive terms as “fading” or

“decreasing” (Thompson), “falling” (Nguyễn Đăng Liêm), “normal” or “low pitch” (Jones and Huỳnh Sanh Thông); whereas interrogative sentences are said to be “rising” (Nguyễn Đăng Liêm), “sustaining” (Thompson), “higher pitch level 1” (Jones and Huỳnh Sanh Thông)… Expressive sentences on the other hand are said to have a rising contour with a higher pitch level: “higher pitch level 2 or 3” (Jones and Huỳnh Sanh Thông), “increasing” (Thompson),

“rising-falling” (Nguyễn Đăng Liêm) [8]

There are a small number of experimental studies by Gordina and

given some ideas of the role and function of intonation in Vietnamese

Trang 38

According to Gordina and Bystrov, the shorter the sentence, the greater the difference between the intonation patterns [8] In their examples:

(a) Anh ấy đi sang nước Anh à?

(b) Anh ấy đi sang nước Anh

(c) Không sách à?

(d) Không sách

the difference is greater between the (c) and (d) patterns than between (a) and (b) though in each case a declarative is contrasted with an interrogative

According to these same authors, an interrogative without a morphosyntactic marker has a well differentiated pattern when compared to

an interrogative with a marker In:

‘Mưa.’

‘Trời mưa.’

‘Cô ta xinh.’

Trang 39

‘Khuya rồi.’

‘Nam về lúc khoảng một rưỡi.’

Each sentence was read with different attitudes by 2 speakers (one male, one female) and judged by 20 hears Results of her experiments showed that only irony, anger and statement were identified above chance level (75%, 52.5%, 67.5% respectively) According to her, the neutral declarative is characterized by a low register and a moderate tempo; irony has a higher register, a larger tone movement and a slower tempo resulting in increased sentence length; whereas anger is conveyed by a speeding up of tempo, greater and more abrupt pitch movement, shortening of the utterance and an increase in the overall intensity

In order to bring out the pertinent prosodic features corresponding to assertive, interrogative and imperative modes, while excluding attitudinal variations, and to produce natural Vietnamese utterances, Nguyễn and Boulakia [9] used a certain number of utterance pairs in which the final question or imperative marker can be replaced by a homonymous lexical item The resulting pairs have the same syllabic and tonal structures but differing morphosyntactic structures They are therefore considered to be ambiguous and if they are discriminated, it has to be due to the presence of prosodic differences Some example pairs of sentences in their corpus are as follows:

Statement – Question:

Lan thích ăn cơm không (Lan only likes to eat rice.) Lan thích ăn cơm không? (Does Lan like to eat rice?) Statement – Imperative:

Trang 40

Bảo cố gắng tập đi (Bảo is making an effort to practice walking.)

Bảo cố gắng tập đi! (Bảo, make an effort to practice!) Question – Imperative:

Tân bỏ đi chứ? (Tân, did you leave?) Tân bỏ đi chứ! (Tân, do leave it!) From five morphemes “không”, “hả”, “sao”, “chứ”, “đi”, nine such pairs of sentences were formulated These 18 sentences were read by 4 speakers (2 males and 2 females) and judged by 22 hearers The results of prosodic analysis showed that questions are shorter than statements and this difference is significant Imperatives are even shorter but the difference with questions is not significant In terms of intensity, the difference is significant for the statement-imperative pair, but not for the statement-question and question-imperative ones About the intonation, the two members of the same pair have the same overall F0 contour but there is a difference in terms of register: the register of questions and imperatives is clearly higher than that of statements, while there is no difference between questions and imperatives (Figure 4.2) There is an obvious difference in the last syllable: the “ngang” tone falls in statements and is much higher and rising in questions, while the mean value and movement is half way between for imperatives The rising tones, “sắc” or “hỏi”, rise even more in the case of questions than in statements, while tend to become flat or even fall slightly in the final part in imperatives It means that there is an influence of the intonation on the final-syllable tone of the sentence

Ngày đăng: 27/02/2021, 23:43

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
1. Huang X., Acero A., et al. (2001), Spoken language processing: A guide to theory, algorithm and system development, Prentice Hall PTR Sách, tạp chí
Tiêu đề: Spoken language processing: A guide to "theory, algorithm and system development
Tác giả: Huang X., Acero A., et al
Năm: 2001
2. Rabiner L. and Juang B.H. (1993), Fundamentals of speech recognition, Prentice Hall Sách, tạp chí
Tiêu đề: Fundamentals of speech recognition
Tác giả: Rabiner L. and Juang B.H
Năm: 1993
3. Dutoit T. (1997), An Introduction to Text-to-Speech Synthesis, Springer Sách, tạp chí
Tiêu đề: An Introduction to Text-to-Speech Synthesis
Tác giả: Dutoit T
Năm: 1997
4. Mixdorff H. (1998), Intonation patterns of German - Model-based quantitative analysis and synthesis of F 0 contours, PhD thesis, TU Dresden Sách, tạp chí
Tiêu đề: Intonation patterns of German - Model-based "quantitative analysis and synthesis of F"0" contours
Tác giả: Mixdorff H
Năm: 1998
5. Nguyen H.Q. (2001), N gữ pháp tiếng Việt, Nhà xuất bản từ điển Bách Khoa Sách, tạp chí
Tiêu đề: N"gữ pháp tiếng Việt
Tác giả: Nguyen H.Q
Nhà XB: Nhà xuất bản từ điển Bách Khoa
Năm: 2001
6. Tran D.D., Castelli E., et al. (2005), "Influence of F0 on Vietnamese syllable perception", Interspeech Sách, tạp chí
Tiêu đề: Influence of F0 on Vietnamese syllable perception
Tác giả: Tran D.D., Castelli E., et al
Năm: 2005
7. Nguyen Q.C. (2002), Reconnaissance de la parole en langue Vietnamienne, PhD Thesis, Institut National Polytechnique de Grenoble Sách, tạp chí
Tiêu đề: Reconnaissance de la parole en langue Vietnamienne
Tác giả: Nguyen Q.C
Năm: 2002
8. Do T.D., Tran T.H., et al. (1998), Intonation in Vietnamese, in Hirst and Di Cristo (ed.) Intonation system - A survey of twenty languages (chap. 22), Cambridge University Press Sách, tạp chí
Tiêu đề: Intonation in Vietnamese
Tác giả: Do T.D., Tran T.H., et al
Năm: 1998
9. Nguyen T.T.H. and Boulakia G. (1999), "Another look at Vietnamese intonation", ICPhS'99 Sách, tạp chí
Tiêu đề: Another look at Vietnamese intonation
Tác giả: Nguyen T.T.H. and Boulakia G
Năm: 1999
10. Rabiner L.R. and Schafer R.W. (1978), Digital processing of speech signals, Prentice Hall Sách, tạp chí
Tiêu đề: Digital processing of speech signals
Tác giả: Rabiner L.R. and Schafer R.W
Năm: 1978
11. de Cheveigné A., Kawahara, H. (2001), "Comparative evaluation of F0 estimation algorithms", Eurospeech Sách, tạp chí
Tiêu đề: Comparative evaluation of F0 estimation algorithms
Tác giả: de Cheveigné A., Kawahara, H
Năm: 2001
12. Govender N., Barnard E., et al. (2005), "Fundamental frequency and tone in isiZulu: initial experiments", Interspeech Sách, tạp chí
Tiêu đề: Fundamental frequency and tone in isiZulu: initial experiments
Tác giả: Govender N., Barnard E., et al
Năm: 2005
13. Bagshaw P.C., Hiller S.M., et al. (1993), "Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching",EuroSpeech Sách, tạp chí
Tiêu đề: Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching
Tác giả: Bagshaw P.C., Hiller S.M., et al
Năm: 1993
15. Boersma P. (1993), "Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound", Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 Sách, tạp chí
Tiêu đề: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
Tác giả: Boersma P
Năm: 1993
16. Nguyen T.T.H. (2004), Contribution à l'étude de la prosodie du vietnamien: variations de l'intonation dans les modalités - assertive, interrogative et impérative, PhD thesis, Université Paris 7 Sách, tạp chí
Tiêu đề: Contribution à l'étude de la prosodie du vietnamien: "variations de l'intonation dans les modalités - assertive, interrogative et "impérative
Tác giả: Nguyen T.T.H
Năm: 2004
14. Praat toolkit's website: http://www.fon.hum.uva.nl/praat/ Link

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN