1. Trang chủ
  2. » Luận Văn - Báo Cáo

How Vietnamese Attitudes can be Recognized and Confused: CrossCultural Perception and Speech Prosody Analysis

4 405 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 364,08 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

How Vietnamese attitudes can be recognized and confused: Cross-cultural perception and speech prosody analysis Dang-Khoa Mac, Eric Castelli International Research Center MICA HUST-CNRS

Trang 1

How Vietnamese attitudes can be recognized and confused:

Cross-cultural perception and speech prosody analysis

Dang-Khoa Mac, Eric Castelli

International Research Center MICA

HUST-CNRS/UMI 2954 Grenoble INP

Hanoi, Vietnam {dang-khoa.mac, eric.castelli}@mica.edu.vn

Véronique Aubergé 1, Albert Rilliard 2

1 Laboratory of Informatics of Grenoble (LIG), 2 LIMSI

CNRS

1 Grenoble, 2 Orsay, France

1 veronique.auberge@imag.fr, 2 albert.rilliard@limsi.fr

Abstract - Prosodic attitudes, or social affects, are main part

of face-to-face interaction and linked to the language

through the culture This paper presents a study on prosodic

attitudes in Vietnamese, a tonal language Perception

experiments on 16 Vietnamese attitudes were carried out

with Vietnamese and French participants The results

revealed perception differences between native and

non-native listeners As attitudinal expressions are partially

carried through speech prosody, an analysis was also carried

out, in order to have a better understanding of why these

attitudes are recognized or confused, and to bring out some

prosodic characteristics of Vietnamese social affects

Keywords - Vietnamese, attitude, perception, prosodic

analysis

I INTRODUCTION During communication between humans, speech is an

important information channel to express mental,

intentional, attitudinal and emotional states According to

some theoretical models of affects [1], the affective

expression in speech communication may be controlled at

different levels of cognitive processing, from the

involuntarily controlled expressions of emotion to the

intentionally, voluntarily controlled expressions of

attitudes Therefore, attitudes and emotions can be

distinguished depends on the nature of the control exerted

by the speaker (voluntary vs involuntary) [2] Some types

of expressivity may be expressed as either an attitude or an

emotion For example, “surprise” can be considered as an

attitude when expressed during a voluntary process;

otherwise it can be considered as an emotion

Attitude expression carries the intention and points of

view of the speaker (e.g surprise, confirmation, politeness

etc.) [3] Attitudes are constructed for each language and

each culture and they need to be learned by children or by

second language students [5] As all attitudinal expressions

are constructed for a certain language and culture, they can

differ between languages Some attitudes can be expected

to have a universal value (e.g “surprise”), but specific

attitudes in one language may not be recognized or may be

ambiguous in another language [7] The understanding of

this phenomenon may benefit from cross-cultural studies

[3,6,7]

The important role of prosody in emotional and

attitudinal expression was shown in many researches [4,9]

According to [4], some emotions can be characterized by

the mean level and the range of F0 This research also

showed the different contour shapes for different emotions

With a tonal language such as Vietnamese, the acoustic

parameters implied in the linguistics and affective functions of prosody (F0, intensity, timing) also play an important role at the phonemic level for lexical access

Moreover, the Vietnamese tones use voice quality settings such as creaky voice [12], that are used in the morphology

of some other languages’ attitudes and emotions [11]

After presenting the corpus, we describe the perceptual experiment with Vietnamese and French participants This result shows the differences in attitude perception between the native and non-native speakers Then, a prosodic analysis is presented and discussed to give some explanations of the perception test This paper concludes with some discussions

II EXPERIMENTS

A The corpus

In the researches on social affects in different languages [5,10], the attitudes have been selected thanks to the foreign languages literature in didactic Unfortunately,

as an under-resourced language, there are few researches

on Vietnamese expressive speech We have found only one study [12], which describes 16 Vietnamese attitudes (cf

Table 1), which have been selected and audio-visually recorded by a male native speaker of Hanoi (standard pronunciation of Vietnamese) However, for the purpose of prosodic analysis, this paper addresses only the audio information of Vietnamese attitudes

TABLE I S ELECTION OF 16 V IETNAMESE ATTITUDES , WITH THEIR

ABBREVIATIONS

Exclamation of neutral surprise EXo Scorn SCO Exclamation of positive surprise EXp Politeness POL Exclamation of negative surprise EXn Admiration ADM

B Perception tests

The perception test was carried out to study how the native and non-native listeners recognize and confuse the

16 Vietnamese attitudes To examine the influence of sentence length, three sentences, having one, two or five syllables, were chosen from the corpus To control a possible effect of Vietnamese tone on the perception of attitudes, all syllables are performed with tone 1 (the level tone) The perception test therefore comprises 48 stimuli (3 sentences * 16 attitudes)

2011 International Conference on Asian Language Processing

Trang 2

Forty listeners participated in this experiment: 20

Vietnamese (10 men and 10 women) who speak the same

dialect as the speaker; and 20 French (10 males and 10

females) who have not been exposed to Vietnamese

language The test interface gave them the labels and the

definitions of the 16 attitudes (in the native language of the

listeners) No listener expressed any difficulty in

understanding the concepts of these 16 attitudes All

subjects listened to each stimulus only one time After

each stimulus, they were asked to indicate the perceived

attitude among the 16 presented ones

C Result analysis

Effect of factors: Firstly, a repeated measure ANOVA

was carried out to evaluate the relative importance of the

following factors on the listeners’ perception: the sentence

length (number of syllables); the listeners’ linguistic

background (natives and non-native) and the listeners’

gender The ANOVA shows that the listeners’ linguistic

background factor has a significant effect on the perception

(p<0.01): Vietnamese and French listeners don’t perceive

these expressions the same way In contrast, sentence

length (number of syllables) and the listeners’ gender have

no influence on perception (p>0.01)

TABLE II T HE OUTPUT OF ANOVA IN PERCENT OF GOOD

ANSWERS S IGNIFICANT EFFECTS AT THE 1% LEVEL ARE SET IN BOLD

Listener (Vietnamese or French) 1 1286.772 0.000

Gender of listener 1 3.754 0.053

Sentence length (Num of syllables) 2 1.376 0.253

Attitude recognition: Figure 1 presents recognition

rates (in percent) of the 16 attitudes for both groups of

listeners Globally, most of the attitudes were recognized

above a chance level, and native listeners had higher

recognition scores than foreign ones Some attitudes were

well recognized by both Vietnamese and French listeners:

DEC, AUT, IRR, SAR, SED

Figure 1 Recognition rate of 16 attitudes by Vietnamese and French

listeners The dashed line indicates the chance level (6.25%)

Some other attitudes received low recognition scores

(POL) or were not recognized by both Vietnamese and

French listeners (ADM) The SCO and IDS attitudes were

well recognized by Vietnamese listeners but almost not

recognized by the French listeners Conversely, the EXn

attitude was recognized by the French listeners, but not by

the Vietnamese ones

Attitude confusion: The analysis of the confusions

between attitudes gives interesting details on the

perceptive proximity between the 16 expressive labels

From the confusion matrices, confusion graphs (cf figure

2) were built, reporting all the confusions higher than twice the chance level (i.e  12.5%)

For both Vietnamese and French listeners, ADM was not recognized and it was mixed with COL, EXo (for Vietnamese listeners) and with COL and IDS (for French listeners) Vietnamese listeners did not recognize the EXn attitude and mixed it with EXo and DOU French listeners did not recognize IDS and mixed it with SAR or DOU Vietnamese listeners made reciprocal confusions between some pairs or groups of attitudes: SAR and SCO; POL and DEC; SED and COL; EXo, EXn and DOU French listeners made reciprocal confusions between AUT and IRR; DEC and OBV; DOU and EXn; DOU and EXo

< =19

%

30%

= >

<=2 6%

22%

=>

<=63%

19%

=>

<=

25%

17

%= >

Figure 2 Confusion graphs (in percentage of recognition) for Vietnamese (top) and French (bottom) listeners The reciprocal

confusions are in bold Some similarities can be found in the confusion of Vietnamese and French listeners Both of them made the reciprocal confusion between EXn, DOU and EXo They strongly confused EXp with EXn (>30%), IRR with AUT (about 25%) They also confused POL, COL, INT and EXn with DEC However, there are some differences between them The SED was strongly confused with COL (33% of confusion) by Vietnamese listeners, but not by French listeners For Vietnamese listeners, SAR and SCO

Trang 3

show strong reciprocal confusions, while the French

listeners show no confusion between these two attitudes

III PROSODIC ANALYSIS

A prosodic analysis was carried out to give some

acoustical explanations of the recognition and confusion of

16 Vietnamese attitudes According to the ANOVA

analysis (cf Table II), there is no influence of the

sentences’ length on the perception of attitudes In three

types of sentence, only the five-syllable sentences have a

complete structure of Vietnamese sentences (Subject- Verb

- Object) The sentence with 5 syllable-lengths also allows

us to analyze the variations of prosodic parameters in the

different parts of the sentence (first, middle and last part)

Therefore, and to save space, the prosodic analysis was

carried out only on the 5-syllable long sentence

A Principal Component Analysis (PCA)

The audio signals of 16 attitudes were phonetically

segmented manually Three acoustic parameters were

extracted automatically; F0 (in semitones calculated with 1

Hz as the reference value), syllabic duration (in seconds),

and intensity (in dB) We calculated the mean values of F0

and intensity on each sentence (F0_mean, Int_mean), the

slope of last syllable (F0_final_slope, Int_Final_slope) and

the slope of whole sentence (i.e., the mean value of the last

syllable minus the mean value of the first syllable:

F0_slope, Int_slope) For the syllabic duration, the mean

(dur_mean) and the length of final syllable (final_length)

were calculated Using the parameters described above as

features, separate Principal Components Analyses were

carried out, in order to see how all these acoustic

parameters allow to distinguish the 16 different attitudes

(figure 3)

With the PCAs based on the F0 parameters, F0 slope

separates the 16 attitudes into 2 groups: attitudes with

rising F0 contour (EXp, IRR, EXo, DOU, EXN, ADM

OBV) and the others with falling F0 contour The F0 final

slope shows the attitudes ADM, EXN, DOU, INT, DEC

with a rising F0 on the last syllable The OBV, AUT, IDS

have falling F0 on the last syllable The IRR and EXp are

characterized by high F0 mean and high positive F0 slope

The OBV and AUT are distinguished with other attitudes

by a very low and negative F0’s final slope

With the PCAs based on intensity, the parameter of

mean intensity shows some attitudes with very low

intensity (ADM, COL, SED, SCO, POL) The AUT, IDS,

EXP have the highest mean intensity and positive final

slope The parameter Int_Slope is important to distinguish

the IRR (highest positive slope) and SED (lowest negative

slope)

With the duration parameters, IDS, SCO and SAR are

separated by high duration mean IDS is also distinguished

by a high value of duration mean and the length of the last

syllable

B Prosodic contours comparison

For all attitudes, the F0 contours were extracted (in

semitones calculated with 1 Hz as the reference value) to

examine the similarity and the specific shape of intonation

contours Figure 4 shows F0 contours of 5 syllables-length

sentences (extract in semitone) of 16 Vietnamese attitudes

Overall, most attitudes have the duration from 0.8 to 1s

However, three attitudes SCO, SAR, IDS have the duration twice longer than the others

Figure 3 Two main dimensions of PCA for 16 attitudes, base on F0

(top), Intensity (middle) and Duration (bottom) For most attitudes, the F0 curves at the middle of sentence (from the second syllable to the next-to-last syllable) are nearly similar The F0 contours of the attitudes are mostly different at the first and the last syllables Researches on different languages also show the informative weight of the first syllable [8] In the case of Vietnamese, the attitudes AUT, IRR, OBV and EXp have their first syllable with a long duration and a rising F0 Amongst them, IRR have the last syllable with level

Trang 4

contour, the EXp, OBV and AUT have last syllable with

the falling contours

The F0 contours of DEC, POL and ADM are nearly

similar, with a flat shape for all syllables That may explain

why they were confused in perception test The INT, EXn

and DOU have the same shape of last syllable (slightly

rising) That may make some confusion between them

According to the perception test, Vietnamese listeners

recognized the SAR and SCO attitudes, but with a strong

reciprocal confusion Such a result can also be explained

by the similar shapes of their F0 contours Both attitudes

have a long overall duration, due to an important

lengthening of their first and last syllable Their F0

contours rise rapidly from the first syllable and fall down

after the second syllable The EXp, OBV have special

shape of the last syllable, which rises at the beginning but

falls down rapidly at the end The IDS can be also

distinguished from other attitudes by the longest duration

IV DISCUSSTION AND CONCLUSIONS

Using a cross-cultural perception test, 16 Vietnamese

attitudes were evaluated by native and non-native listeners

Experimental results do not show any significant effect of

listener’s gender nor sentence length On the contrary,

there are some obvious differences between the perception

of native and non-native listeners Some attitudes such as

DEC, AUT, IRR, SAR, SED were well recognized by both

Vietnamese and French listeners One can suppose that the

concepts and the expressions of these attitudes are similar

between the two languages and the two cultures Other

attitudes are recognized by native listeners, but almost not

recognized by non-native ones (SCO and IDS) Such

attitudes shall be conceptually encoded using different

strategies by Vietnamese and French speakers

The fact that some attitudes were not recognized by

either Vietnamese, French listeners or both of them may be

explained by the assumption that such kinds of attitudes

cannot be distinguished satisfactorily from others on the

basis of audio information only, outside any pertinent

interaction context: the listeners may need more

information – and particularly visual information from the

face or from gestures to distinguish such attitudes It raises

interesting questions for future researches on audio-visual

perception and the analysis of the facial parameters It is

particularly the case for the EXn attitude, which is not

recognized by natives while non-natives do recognize it:

the subtle variations of prosody may not be sufficient when

confronted also to the sentence’s meaning – a problem that

does not have non-native listeners

The prosodic analysis proposed some reasonable

explanations of these 16 attitude’s recognition and

confusion It also gives us some basic characteristics of the

Vietnamese attitude Those are the basic results for our

future work on modeling Vietnamese prosodic attitudes

However, this analysis was limited to three prosodic

parameters (F0, intensity and duration) The future work

will also deal with voice quality analysis and visual

parameter analysis, in order to bring out more complete

description of Vietnamese social affects Future works will

also explore the importance of the tonal system on the

production and the perception of Vietnamese attitudes, not

only for native, but also for foreign speakers without any

linguistic knowledge of a tonal language: will they be able

to separate tonal from attitudinal information?

Figure 4 The F0 contours of 5 syllables-length sentences for 16

Vietnamese attitudes

REFERENCES [1] K.R Scherer, and H Ellgring, “Multimodal Expression of Emotion: Affect Programs or Componential Appraisal Patterns?”, Emotion, 7(1), pp 158-171, 2007

[2] V Aubergé, "A Gestalt Morphology of Prosody Directed by Functions: the Example of a Step by Step Model Developed at ICP", Speech Prosody, 2002

[3] F Danes , “Involvement with language and in language”, Journal

of Pragmatics, 22,251–264, 1994

[4] T Banziger and K R Scherer "The role of intonation in emotional expressions." Speech Communication 46(3-4): 252-267, 2005 [5] P Delattre “Les dix intonations de base du francǜais” The French Review, 40(1):1-14, 1966

[6] S Shigeno, “Cultural similarities and differences in the recognition

of audio-visual speech stimuli”, ICSLP98, 1998

[7] K R Scherer, R Banse, H G Wallbott, “Emotion inferences from vocal expression correlate across languages and cultures”, Journal

of Cross-Cultural Psychology, 32(1), 76-92, 2001

[8] V Aubergé, T Grépillat, A Rilliard, “Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours”, 5th Eurospeech, 1997

[9] S Mozziconacci, “Prosody and Emotion”, Speech Prosody 2002 [10] M.-L Diaféria, "Les Attitudes de l’Anglais : Premiers Indices Prosodiques", Master thesis, INP Grenoble, France 2002

[11] C Gobl and A Ni Chasaide, "The role of voice quality in communicating emotion, mood and attitude." Speech Communication 40(1-2): 189-212, 2003

[12] T.X Le, "Etude contrastive de l’intonation expressive en français

et en vietnamien", PhD thesis of Linguistic and Phonetic, Université Paris 3, 1989

Ngày đăng: 18/06/2014, 10:19

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w