1. Trang chủ
  2. » Luận Văn - Báo Cáo

Prosodic transfer in vietnamese acquisition of english contrastive stress patterns

33 7 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns
Tác giả T. Anh-Thu Nguyễn, C.L. John Ingram, J. Rob Pensalfiți
Trường học University of Queensland
Chuyên ngành Phonetics and Language Acquisition
Thể loại Article in Journal
Năm xuất bản 2008
Thành phố St Lucia
Định dạng
Số trang 33
Dung lượng 637,71 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

4072, Australia Received 15 June 2004; received in revised form 7 August 2007; accepted 7 September 2007 Abstract This paper reports a study of prosodic transfer effects in the productio

Trang 1

Journal of Phonetics 36 (2008) 158–190

Prosodic transfer in Vietnamese acquisition of English

contrastive stress patterns

School of English, Media Studies, and Art History, University of Queensland, St Lucia, Qld 4072, Australia

Received 15 June 2004; received in revised form 7 August 2007; accepted 7 September 2007

Abstract

This paper reports a study of prosodic transfer effects in the production and perception of three English stress patterns(broad-focus noun phrase, narrow-focus noun phrase and compound) at the level of word and phrase prosody byVietnamese learners of English The experiments examined the acoustic features and the perceptual strategies that nativeAustralian English speakers and different groups of non-native speakers (Vietnamese beginning learners and advancedspeakers of English) use to distinguish the three stress patterns The results showed that native speakers and non-nativespeakers differ in their use of acoustic patterns which are optimally suited to their respective first language phonologies forrealizing the three English stress patterns Native speakers of English employed a combination of syntagmatic f0(andcorrelated intensity) contrasts and duration in distinguishing the three stress patterns Vietnamese speakers had noproblem in manipulating contrastive levels of f0and intensity on accent-bearing syllables but failed to realize the timingcontrast between compound words and phrases and the syntagmatic contrast of accent in larger units such as polysyllabicwords or phrases, as evidenced by their failure to deaccent the second element of the compound and narrow-focus patterns.Nevertheless, the advanced speakers’ ability to compress the constituents of the compounds and to deaccent the final nounsshows the effect of language learning/experience on prosodic acquisition Possible mechanisms that underlie the transfereffects involved in three stress patterns are also discussed

Crown Copyright r 2007 Published by Elsevier Ltd All rights reserved

1 Introduction

A wealth of studies, based upon loanword formation (Blair & Ingram, 2003;LaCharite´ & Paradis, 2005;Silverman, 1992, among others) and phonetic and phonological accommodation in second language (L2)learning (Archibald, 1998;Best, 1995;Flege, 1995;Iverson et al., 2003;Kuhl, 1993, among others) have shownthat native speakers perceive and produce words and utterances of L2 through a phonetic or phonological

‘filter’ of their native language (L1) Studies of segmental feature transfer effects dominate the literature andexperimental studies of prosodic accommodation to a second language remain scarce, though in recent yearshave begun to appear (McGory, 1997; Nguyeˆ˜n & Ingram, 2005; Ueyama, 2000; Ueyama & Jun, 1998)

Trang 2

Current models of transfer effects (Best, 1995;Flege, 1995;Kuhl, 1993) have been formulated almost entirelyupon studies of segmental contrasts All models acknowledge the importance of prior phonological learning,

in the form of L1 imposed categorical boundaries on otherwise gradable phonetic dimensions But becausethey limit themselves to segmental transfer effects (where sound categories at single level of phonologicalcontrast map onto those of another language), such models avoid the more awkward and interesting questionsarising from considerations of how phonetic similarities interact with structural differences of the kind that areinevitably encountered in even the simplest cases of prosodic contact phenomena

This paper reports a study of prosodic transfer effects in the production and perception of three contrastiveEnglish stress patterns by Vietnamese learners of English These contrasts are framed as sets of ‘minimaltriplets’, disambiguated in their linguistic function by a preceding contextual phrase such as:

Our original motivation was simply to fill a gap in the literature by undertaking a study of prosodic transfereffects across two languages with sharply different prosodic systems, using a set of stimuli that would involveminimal confounding of segmental transfer effects, and two groups of second language learners (beginners andadvanced) that might provide some purchase on the learnability of the relevant phonetic features required tomaster the target phonological contrasts in perception and production

From a strictly phonetic perspective, the three English ‘stress’ patterns may be viewed as contrastingpatterns of pitch or accentual prominence (right edge prominence for the broad-focus noun phrase [B], leftedge prominence for the narrow-focus noun phrase [N] or compound [C]), plus a temporal factor(distinguishing the compound from the phrase) This, we demonstrate in the first part of the data analysis, byshowing that two orthogonal linear discriminant functions, based upon vowel nuclei fundamental frequency(f0) difference measures and normalized word or phrase durations successfully classify spoken English native-speaker tokens into the three stress groupings Thus, a linear phonetic feature detector, trained to recognizecategory boundaries on the relevant pitch and timing dimensions may be all that is required to discriminate/identify the three target stress types

However, the finding that native speakers of Australian English performed significantly worse in theperceptual experiment than a simple linear discriminator, supplied with the critical acoustic measurements ofnuclear f0 peak differences and syllable duration gave pause for reflection, that a two-parameter featuredetector may be seriously flawed as a perceptual model of discrimination between the three stress patterns.Further calling into question the appropriateness of the simple parametric model is the phonologicalconsideration that the broad focus, narrow focus and compound contrasts span two distinct domains ofcontrastive feature assignment—the lexicon in the case of compounds, and the post-lexical domain of phrasalaccent assignment in the case of the broad and narrow focus NPs Hence the listener’s task in perceptuallyjudging or producing acceptable tokens of the three stress patterns is likely to involve simultaneous access totwo autonomous aspects of prosodic competence concerned with phrase level control of intonational focusmarking on the one hand, and control over the lexical prosody that distinguishes compounds as words fromtheir otherwise homophonous phrasal counterparts on the other It will be argued that this simultaneous dual-level access to prosodic forms complicates the listener’s task in accurately perceiving the three stress patterns.From the perspective of the second language learner, any phonological transfer effects from Vietnamese toEnglish prosody will likely involve considerations of the lexical prosody of Vietnamese compounds andintonation-based accent assignment Elaboration of a model of the interaction of lexical stress with phrasalaccent assignment constitutes a central theme in the attempted integration of autosegmental theory (fromphonology) with phonetic investigations of suprasegmental features in speech production (Beckman & Ayers,

1994;Beckman & Pierrehumbert, 1986;Fletcher & Harrington, 2001; among others) Section 2 of this paperreviews the phonetics and phonology of Vietnamese compounds and word prosody, with a view to predicting

Trang 3

prosodic interference effects that might be expected from the perspective of an autosegmental model of lexicalstress and accent assignment.

Notwithstanding its shortcomings, the two-factor phonetic model, when it is supplemented by a distinctionbetween ‘active’ and ‘non-active’ prosody control parameters derived from L1 (Ueyama, 2000) provesworthy of closer scrutiny for its ability to predict transfer effects in discrimination and production of thethree stress patterns for Vietnamese learners of English The argument strategy of this paper is to pushthe predictions of an admittedly overly simple phonetic model of prosodic interference effects; to revealhow well it succeeds and where it fails to predict the perceptual responses and production characteristics

of learners, as well as the perceptual responses of native listeners Our analysis of the experimentalfindings emphasizes the linguistic information processing demands imposed by the perception and productiontasks and the strategies that native and non-native speaker–listeners probably adopted to meet these taskdemands

The organization of the paper is as follows: Section 2 begins with a review of recent phonetic studies ofprosodic transfer effects, focussing on contact situations between languages that differ in terms of the familiartypology of ‘tone’, ‘stress’, and ‘pitch accent’ languages Relevant background information on Vietnameseword and phrase level prosody is then discussed; specifically, (a) the lexical tonal system of Vietnamese andtonal transfer effects in the production and perception of English word stress and (b) a comparative phoneticanalysis of the compound—phrasal stress contrast in English and Vietnamese

Section 3 describes the experiments that were conducted In Section 3.1 the acoustic properties of thestimulus materials are described that were subsequently used in the perceptual experiment and as trainingmaterial for the production experiment We first show that the broad focus (B) narrow focus (N) andcompound (C) categories may be discriminated using just two critical acoustic parameters (F0 and normalizedduration), among a range of acoustic correlates of stress that were tested Next (Section 3.2), we investigatehow well the critical f0and timing parameters are preserved in the non-native speakers’ productions of thetarget English stress patterns, elicited as appropriate continuations of the context sentence cue Discriminantanalysis and other statistical tests indicate that Vietnamese subjects primarily respond to pitch cues, but thatsome acquired sensitivity to the durational contrast between compound and phrasal stress is evident from theresponses of the advanced learner group In Section 3.3 a perceptual discrimination experiment is reported.Native Australian English listeners as well as Vietnamese learners were given the task of identifying the

appropriate stress pattern (presented as auditory stimuli on the carrier phrase: It’s a _.) for a

given context

Section 4 addresses apparent discrepancies in the results of the production and perception tests in terms ofdifferential task demands imposed on native Australian English listeners and Vietnamese learners We arguethat beginning learners’ responses are dominated by a tonal transfer strategy in which Vietnamese lexical tonesare substituted for English accentual and boundary tones on the basis of tone ‘shape’ similarity, constrained

by Vietnamese phonotactics on the distribution of lexical tones In contrast, the advanced learner group evincesome accommodation to the temporal cues that contrast (compound) word, and phrase phonology in Englishand show evidence of de-accenting the rightmost components of the narrow-focus phrases under conditions ofcontrastive focus at the level of phrasal prosody The case for these distinct learner response strategies isstrengthened by a qualitative ToBI style analysis of pitch accent types and their distribution in the nativeEnglish and learner tokens Some tentative conclusions are then offered in Section 5

2 Prosodic transfer effects

A few studies that closely examine the phonetic properties of L2 prosody production by learners fromvarious language backgrounds have shown how L1 phonology constrains the production and perception of L2prosodic patterns Willems (1982) investigated intonational deviations in the English produced by nativespeakers of Dutch He found that L2 productions of English deviate from the native British English normmainly in the size and direction of pitch movements This could be clearly attributed to the transfer ofintonational characteristics from Dutch, which was examined through an instrumental comparison of theproduction of English utterances by monolingual English speakers and Dutch learners of English, and theproduction of comparable Dutch utterances by functionally monolingual Dutch speakers

Trang 4

A similar transfer effect from L1 intonation was found for Seoul Korean and Mandarin Chinese speakers ofEnglish.McGory (1997)investigated the production of American English word pairs differing in the location

of stress (e.g., memorizes vs memorial) in statements and questions and in several focus conditions by Seoul

Korean and Mandarin Chinese speakers Both groups of non-native speakers appeared to have difficultiesproducing native English prominence relations: where native English speakers produced pitch accents inprominent target words only, non-native speakers produced stressed syllables with higher f0values in bothprominent and less prominent words In addition, the non-native speakers did not distinguish betweenstatements and questions in their f0patterns The differences in intonation patterns between non-native andnative speakers of English could (to a large extent) be attributed to influences of the L1, which was clearlyshown in the different error patterns produced by non-native speakers of two different L1 backgrounds:Mandarin and Korean speakers

In a similar study,Ueyama and Jun (1998) examined the realization of English post-focus deaccentuationproduced by Tokyo Japanese speakers and Seoul Korean speakers at different proficiency levels They foundthat post-focus deaccentuation was easier for Japanese than for Korean speakers at the same level ofproficiency, which was attributed to an L1 transfer effect It appears that Japanese downstep (i.e., the pitchaccent of an accented word following another accented word has a lower f0peak relative to the preceding pitchaccent) was being positively transferred to L2 production and the lack of downstep in Korean (in whichphrases are marked by a phrase-final H tone) was negatively transferred to L2 production They also foundthat the degree of deaccentuation correlated with proficiency levels: the more fluent the speaker, the greater isthe degree of dephrasing

The above studies are restricted to the deviation of f0patterns (tonal shapes) in L2 intonation production.Studies that have jointly examined transfer effects in the acoustic correlates of adaptation to L2 temporal andaccentual structure are even rarer, but are now beginning to appear (see,Mennen, 2004) In a study on theproduction of accent peak alignment by Dutch non-native speakers of Greek, Mennen (2004) found a bi-directional interference in the realization of an accent contrast common to both languages The majority of theL2 learners (four out of five) in her study failed to produce native-like f0peak alignment values in the L2 Theyproduced the peak as early as that of the native Dutch control group (i.e., within the accented vowel) instatements with long vowels in the accented syllable of the test word, and considerably earlier than that of thenative Greek control group, who realized the peak in the following unaccented vowel Ueyama (2000) andNguyeˆ˜n and Ingram (2005) are two recent studies that investigate the acoustic correlates of L2 word-levelprosody In a study on Japanese learners’ production of English words,Ueyama (2000)found that the activerole of f0in the L1 Japanese pitch accent system positively transfers to English, as indicated by a consistentlyhigher f0in accented than in unaccented syllables By contrast, even though Japanese has a phonemic vowellength contrast, beginning Japanese speakers of English were less successful in realizing accented vs non-accented syllable duration contrasts than advanced speakers, because L1 Japanese word accent production isrestricted to f0 contrasts while duration is not actively manipulated In a companion study to the presentpaper, using the same subjects as this study (Vietnamese speakers of English at two different levels ofproficiency) but investigating a different L2 prosodic contrast: the production of English word stress contrasts

in segmentally homophonous noun/verb pairs (e.g., permit[n] vs permit[v]),Nguyeˆ˜n and Ingram (2005)foundthat Vietnamese learners can differentiate between stressed and unstressed syllables in English by means of f0contrast—an acoustic correlate available in both languages However, in the early stage of second languagelearning they fail to produce a syllable duration contrast that characterizes native productions and fail toreduce vowels in unstressed syllables, possibly because these two important phonetic features are not active inVietnamese tonal contrasts

In brief, both studies that examined acoustic correlates of L2 prosody suggest that L2 learners will have lessdifficulty realizing an acoustic correlate that is actively used for prosodic contrasts in both native and targetlanguages (e.g., f0in Japanese: pitch accent, f0in Vietnamese: lexical tone vs f0in English: word stress andaccent) than those that are not active in L1 (e.g., stress-induced duration contrasts and vowel reduction inJapanese and Vietnamese) Nevertheless, these two studies are restricted to the acoustic correlates of L2 word-level prosody The present study takes a further step in investigating the adaptation to contrasting temporaland accentual structure of (compound) word vs phrase prosody using both quantitative acousticmeasurement and qualitative tonal transcription (ToBI:Beckman & Ayers, 1994)

Trang 5

2.1 Comparative analysis of English word stress vs Vietnamese tones

Vietnamese—a tone language—and English—a stress accent language—have quite different systems ofword prosody English has a system of culminative word stress with predominantly short stressed word rootsand reduced suffixes and thus the majority of words have stressed first syllables (Dauer, 1983;Garde, 1965).Also, stressed syllables have more complex structures, whereas unstressed syllables often have reduced vowels.Vietnamese, on the other hand, has a system of lexically distinctive tones (Nguyeˆ˜n, 1970, 1980) and is stronglysyllabic in its phonological organization and morphology Most syllables are independent morphemes andevery syllable in an utterance bears an independent lexical tone specification which is not neutralized (becometoneless) in context In addition, no system of culminative word stress has been found; nevertheless, it is widelyaccepted that there is stress in the sense of accentual prominence at the phrasal level (Nguyeˆ˜n, 1970;Thompson, 1987)

English and Vietnamese also differ in terms of how they manipulate the acoustic correlates at word-levelprosody Studies on the acoustic correlates of English stress show that judgements of linguistically significantstress in English are contingent upon at least 4 acoustic parameters: fundamental frequency, duration,amplitude, and vowel quality (Beckman, 1986;Fry 1955; among others) On the other hand, in Vietnamese, inaddition to direction of f0movement (tone contour) and f0height—the two primary dimensions of linguistictone—voice quality, intensity and duration have also been found to distinguish tones (Nguyeˆ˜n & Edmondson,

1997; Ph:am, 2003; Vu˜, 1981, 1982) Voice quality, particularly the laryngeal features of creakiness andbreathiness are found to accompany some particular tones across dialects Creakiness, in addition to occurring

as a regular feature on the Broken (nga˜) and Drop (na.ng) tones of the Northern dialect and the Curve (hhoi)tone of the Central dialect, also occurs on some local variants of the Southern Drop tone (Vu˜, 1981).Creakiness and breathiness are found to accompany Falling (huyeˆ`n), Drop, Curve and Broken tones of theHanoi dialect and claimed to be a distinctive register feature, distinguishing low register tones from highregister tones (Ph:am, 2003) Intensity was found to highly correlate with f0(Vu˜, 1981) and thus can be said to

be supplementary to f0 Duration or particularly tonal length has been found to be not a distinctive feature inVietnamese (Ph:am, 2003;Vu˜, 1981) but only varies in segmental contexts (i.e tones in stop final syllables areinherently shorter than tones in other environments) From a study on native speakers’ perception ofVietnamese tones, Vu˜ (1981)came to the conclusion that the direction of f0movement, f0height and voicequality play a more important role than other tonal dimensions, such as duration and intensity, in theidentification of tones Intensity and duration supportively contribute to perception but play no independentrole in tone recognition

The aforementioned studies show that even though both languages employ f0as perceptual cues (to tones inVietnamese and word stress and accent in English), the two languages differ in terms of the manipulation ofthe acoustic cues Evidence on the transfer of tonal pitch features into Vietnamese learners’ Englishproduction and perception has been observed (Hoˆ`, 1997;Nguyeˆ˜n, 1970, 1980, 2003;Pittam & Ingram, 1991;Riney, 1988) For example, at the production level,Nguyeˆ˜n (1970)noted that Vietnamese speakers of Englishtend to substitute the high rising (sac) tone for primary stress resulting in exaggerated pitch changes onstressed syllables.Pittam and Ingram (1991) andRiney (1988) observed tonal effects provoked by Englishwords with syllables closed by an obstruent which were produced with a checked quality of the Rising (sac)tone and easily identified by its abrupt high rise (Nguyeˆ˜n & Ingram, 2004) In a recent study on Vietnameseperception of English polysyllabic words,Nguyeˆ˜n (2003)found that an English syllable could be perceived as

a certain Vietnamese tone depending on the syllable structure (a closed syllable ending in an obstruent or asyllable ending in a sonorant) and stress levels (stressed and unstressed), namely stressed syllables associatedwith high level tones and unstressed syllables with low level tones, which suggests that there is perceptual tonaltransfer which is constrained by relative pitch levels and the segmental composition of the syllables Theresults of these studies suggest that Vietnamese learners make reference to pitch in tone in the perception ofEnglish stress, in other words, they seem to interpret the intonation patterns on English words and phrases interms of their native language tone categories

However, the fact that word stress is ‘culminative’ in English in the sense that every content word or largerdomain has exactly one primary-stressed syllable, and whatever syllables remain are subordinate to it(Trubetskoy, 1939/1969), potentially has far-reaching implications for gestural timing and rhythmic

Trang 6

differences between the two languages, and for how such differences may be modelled phonologically InEnglish, stress contrasts are enhanced segmentally: stressed syllables are longer than unstressed syllables (i.e.,duration is a distinctive and active correlate in word stress production) and unstressed vowels tend to bereduced In contrast, in Vietnamese, generally considered as a syllable-timed language (Nguyeˆ˜n, 1970, 1980) inwhich each syllable has a lexical tone specification, no systematic difference in duration or vowel qualityamong syllables has been found Production-wise, one of the five or six lexical tones, namely tone na.ng(dropping), is much shorter than all other tones (Brunelle, 2003; Ph:am, 2003) However, tonal length wasfound to have no distinctive status in Vietnamese (Ph:am, 2003;Vu˜, 1981), suggesting that duration is not anactive cue in tonal contrast From this comparative analysis, it is predicted that Vietnamese learners of Englishwill be able to produce f0contrast—an acoustic correlate available in both languages—but will have problemsusing duration to enhance prominence contrast because they make limited use of it in their L1 and fail toreduce unstressed syllables.

2.2 Comparative analysis of compound—phrasal accent contrasts in English and Vietnamese

The three stress patterns of interest in this study (illustrated in examples (a)–(c) below) exemplify three types

of prominence:

(a) blackberry ¼ bla´ckberry (compound, meaning: a kind of fruit)

(b) black berry ¼ bla´ck be´rry (broad-focus noun phrase, meaning: a berry that is black)

(c) black berry ¼ bla´ck berry (narrow-focus noun phrase, with an emphatic contrastive accent on black, ascontrastive to green berry)

The compound bla´ckberry as a single three-syllable word has a primary word stress on black and secondary stress on berry The broad-focus noun phrase bla´ck be´rry consists of two accented constituents: a phrasal stress (or default accent assignment) on ber-, the first syllable of berry with a pre-nuclear accent on black

(Farnetani & Cosi, 1988; Hardcastle, 1968) In the narrow-focus noun phrase bla´ck berry the syllable black

receives an emphatic or contrastive stress

There has been controversy over the pragmatic functions, the phonological structures and the phonetic cuesassociated with these three ‘‘stress’’ patterns Firstly, there is the question of the nature of the distinctionbetween broad and narrow focus, which is commonly described in terms of scope: as to whether the listeners’attention is drawn to new information that has scope over the whole phrase (broad focus) or only to theelement within the phrase which contains new information (narrow scope) Narrow scope is often considered

to convey the specialized communicative function of countermanding some erroneous assumption that thespeaker believes the listener to hold In this usage, narrow scope is identified as ‘‘contrastive stress’’ There isongoing debate as to whether contrastive stress is a distinct type of prosodic effect or whether it should besimply treated as a case of accentual prominence Indeed, some maintain that contrastive accents are formallydifferent from other accents, either because the type of accent is different for the contrastive cases or becausethey are more prominent.Couper-Kuhlen (1984)andChafe (1974)mention the existence of a sudden drop in

f0after the contrastive accent, whereas a non-contrastive accent is more likely to be sustained

Pierrehumbert and Hirschberg (1990) suggested that contrastive accents have an L+H* pattern whilenovelty accents have an H* form Ladd and Morton (1997) have shown (for standard southern BritishEnglish) that the ‘‘emphatic’’ peak accent type has a higher, later peak than the ‘‘normal’’ peak accent type,which is in consistent with Pierrehumbert and Hirschberg (1990)’s L+H* and H* contours Bartels andKingston (1994)argue that what distinguishes narrow focus is enhanced prominence on the focused element

In English broad and late narrow focus have identical accent patterns: pitch accents are aligned with the lastaccentable constituents within an intonation contour Under early narrow focus the pitch accent is claimed toshift to an earlier location and the last accentable syllable is deaccented (Beckman & Pierrehumbert, 1986;Jackendoff, 1972; Ladd, 1980) Nevertheless, in a study on the production and perception of narrow-focuspatterns (e.g., RED ball vs red BALL) by 42 American English children (age 3–10) and 6 adults, Jannedy(1997)found that children as well as adults accent the noun regardless of whether the adjective or the noun iscontrasted However, there appears to be a strong tendency to use non- or less prominent accent types on thenoun when the adjective is narrowly focussed The perception results showed that adult listeners can reliably

Trang 7

interpret the pragmatic information of an early narrow focus regardless of whether the following noun isdeaccented or not The results of this study suggested that deaccenting the noun does not involve taking theaccent completely away but using less prominent accent types over more prominent ones and that childrenhave to learn to use less prominent accent types over prominent ones.

Secondly, although the phonological status of the contrast between compound and phrasal stress is not in

question for prototypical cases such as blackberry vs black berry there is some doubt as to whether the

prosodic pattern of the compound form is reliably distinguishable from its phrasal counterpart both in terms

of production and perception by native listeners In fact,Atkinson-King (1973)andVogel and Raimy (2002)

investigated the acquisition of compound vs phrasal stress (ho´t dog vs hot do´g) in English by children aged 5,

7, 9 and 11 The subjects were shown pairs of pictures representing a compound word and the correspondingphrase They heard a prerecorded tape with the names of the items, and were asked to indicate which one theyheard The results of both studies (Vogel & Raimy (2002) replicatedAtkinson-King (1973)’s study) showedthat even though the youngest children produced the right pattern, they did not parse the pattern in perceptionuntil as late as 11 or 12 years of age

Despite the vast body of work on the phonology and the phonetics of English stress, the interface betweenword and phrasal prosody as exemplified by the acoustic correlates and perception of these three stresspatterns has not been adequately investigated Hardcastle (1968) examined the f0 and intensity changesbetween the accented syllables of each stress pattern in Australian English and found that the f0and intensitychanges were clearly greatest for the syllables carrying the emphatic stress pattern (narrow focus) He alsofound the f0changes associated with the narrow focus closely resembled those associated with the compoundstress pattern A sharp upward f0movement at the beginning of the second element was found in the broad-focus noun phrases, which did not occur in items associated with the emphatic or compound stress pattern.Surprisingly perhaps, Hardcastle’s perception experiment showed that a significant majority of listeners haddifficulty in reliably distinguishing the three stress patterns Narrow-focus phrasal stress was often confusedwith the compound stress This he attributed to the similar f0and intensity changes associated with these twostress patterns

Some other studies have focussed on the investigation of the acoustic and perceptual correlates of twopatterns: compounds and broad-focus noun phrases only Bolinger and Gerstman (1957) found that in

three-constituent pairs like lighthouse keeper vs light housekeeper, the temporal interval between the

constituents was an efficient cue for distinguishing these pairs But there has been no evidence of temporal

interval as a distinctive cue in simpler pair constructions, like lighthouse and light house. Faure, Hirst,and Chafcouloff (1980)investigated two-constituent minimal pairs blackbird and black bird, finding that the

two constructions differed significantly in duration; total duration of phrases were 20% longer than totalduration of compounds Their data suggested that, while both duration and fundamental frequency areimportant features in the production of compounds and phrases, pitch, but not temporal structure, is crucialfor their perception Farnetani and Cosi (1988) found that while duration is the major differentiatingparameter in production (compounds are shorter in comparison to phrases), the perceptual distinction liesprimarily in the different prominence pattern: a sequence of an accented constituent followed by anunaccented one in compounds and of two accented constituents (the second heard as stronger than the first) innon-compounds

In Vietnamese, it seems that compounds and noun phrases are syntactically and semantically contrastivebut not phonologically contrastive under normal circumstances of production In terms of syntactic structure,Vietnamese compounds and phrases have a reversed word order from English (Vietnamese: Noun+Adjective:

hoa[flower] hoˆ`ng[pink] vs English: Adjective+Noun: black berry) A subtype of Vietnamese compounds(e.g.,

specializing compounds vs phrases) is claimed to have a reverse prosodic pattern of prominence of the Englishcompound—phrase pattern; that is weak–strong for compounds and strong–weak for noun phrases(Thompson, 1987) However, no conclusive acoustic evidence has been found to support this compound-phrasal prominence pattern (Nguyeˆ˜n & Ingram, 2007) In addition, it is generally claimed that there is notonal neutralization due to ‘‘sandhi’’ in Vietnamese (except in a subclass of reduplication), a phenomenon thatoccurs in other tone languages like Chinese and Thai (Chen, 2000;Gandour, 1974); that is, there is no tonelesssyllable such as in Shanghainese and the other Wu dialects or a systematic change of tone when words occur incombination such as in Mandarin Chinese (Chen, 2000) No systematic prosodic difference between

Trang 8

compounds and phrases has been found in Vietnamese Instead, every syllable in both a compound word and

a phrase has a full vowel and a lexical tone specification

In a recent experiment,Nguyeˆ˜n and Ingram (2007)investigated the acoustic and perceptual correlates that

distinguish compounds (hoa hoˆ`ng: a rose) from phrases (hoa hoˆ`ng: a pink flower) in Vietnamese under two

experimental conditions: one with a picture-naming task (representing spontaneous natural speech) and onewith a minimal pair sentence task (the ‘maximally contrastive’ elicitation condition) by 45 Vietnamese nativespeakers of three dialects (Hanoi, Hue, and Saigon) It was found that even under conditions of maximalcontrast, there was no conclusive acoustic evidence (in terms of f0 [Hz], intensity [dB], spectral tilt andduration) to support the claim of contrastive stress patterns between compounds and noun phrases inVietnamese If forced to realize a prosodic contrast under elicitation conditions of ‘maximal contrast’,Vietnamese speakers produced a juncture between the two constituents of the noun phrases, but only underthis condition, and no juncture was present between components of compounds Compound words as a wholewere not temporally compressed in comparison to their phrasal counterparts as in English, a stressed languagewith stress or foot-based timing Listeners relied only on the juncture between the two components of nounphrases as a cue to distinguish between phrases and compounds and failed to distinguish noun phrases fromcompounds in stimuli elicited under the picture-naming task where no juncture was produced between the twoconstituents of a phrase

In regards to phonetic cues of contrastive or corrective accentual focus in Vietnamese, some authors, such asHoa`ng and Hoa`ng (1975), orGsell (1980)consider that full tonal realization of accented syllables is one of thepositive marks of prominence (accent) at phrasal level in Vietnamese In a recent study on the effect ofemphasis on glotalized and non-glotalized Vietnamese tones (the Hanoi creaky falling tone (i.e the na.ng tone)

in obstruent vs sonorant final consonant environment respectively), Michaud and Vu˜ (2004) found that inVietnamese emphasis, syllable lengthening appears as a speaker-dependent variable, whereas a stable correlate

of emphasis is curve amplification, manifested as increased slope of f0curve or as f0register raising

3 Three experiments

3.1 Preliminary experiment

The aims of the first experiment were: (a) to construct a set of native speaker exemplar stimuli that could beused in perception and production experiments to test the mastery of the three-way pattern of prosodiccontrasts between compound words and broad and narrow-focus noun phrases by Vietnamese learners ofEnglish, and (b) to establish the effectiveness of the accentual pitch and timing cues discussed earlier fordiscriminating among the three Australian English stress patterns

3.1.1 Linguistic materials

Three sets of minimally contrastive triplets of compound words (C), broad-focus noun phrases (B), andnarrow-focus noun phrases (N) were constructed, using three syllabic templates: monosyllabic first elementplus disyllabic second element (e.g black berry); disyllabic first element plus monosyllabic second element (e.g.butter fish); disyllabic first and second element (e.g English teacher) There were four tokens for each syllabletype, yielding 12 sets of triplets or 36 items for each speaker Each item was made up of a short contextsentence, followed by fixed carrier sentence (This/it/he is ay) which ensured that target contrasts appeared asthe final elements in each sentence and were in approximately the same position in the sentence intonationcontour (see item example below and Appendices A and B for a complete list of stimuli)

3.1.2 The speakers

Four native speakers of Australian English, experienced in producing good quality exemplars for phoneticexperiments were used; two adult males (J.I and R.P.) and two females (E.C and F.H.) J.I is a phonetician

Trang 9

and R.P a linguist and actor with professional voice training (and are co-authors of this paper) E.C and F.H.are speech pathologists with extensive clinical and teaching experience They were presented with arandomized list of the test triplets with their preceding context sentences and instructed to read each item(context+target sentence) in a natural speaking manner.

3.1.3 Measurements

The sentences were digitized (at 20 kHz sampling rate and 16 bit precision) and spectrographicmeasurements were made via a sound editing and analysis program, the Emu Speech Tools (Cassidy,

1999) First, the Emu Labeller was used to mark the edges of the target syllables and vowels, relying primarily

on the spectrographic display in the Labeller Then the Emu Query Tool was used to extract syllable durations(ms) and f0(Hz) and intensity (dB) values at vowel midpoint

The segmentation criteria were generally based on the major discontinuities of the energy distribution over

frequency and time visible on the spectrograms Taking butter fish as an example, the syllable bu- was measured from the onset of the closure for [b] to the cessation of the vowel formants; the syllable -ter from the onset of closure for [t] to the onset of fricative noise for [f]; the syllable fish from the onset of fricative noise for

[f] to the offset of high-frequency fricative noise for [P] Since all the stops of the test items appear utterancesmedially, the onset of closure for the stops at the start of the syllable was taken from the offset/cessation of thepreceding word/segment

Studies of the effects of stress and accent on duration in English have shown that not only the rhymes butalso the initial consonants are lengthened relative to their counterparts in unstressed syllables (Ingrisano &Weismer, 1979;Umeda, 1977; among others) Therefore, in this experiment, the duration of the whole syllable,including the onset and the rhyme, was measured

f0and intensity measurements were taken at the center of the vowels of the stressed syllables, which wasextracted automatically by using an EMU-R query command on the basis of the labelled vowel onset and offset

3.1.4 Analysis

The acoustic analysis concerns fundamental frequency (f0), duration and intensity of the constituents of testitems The following acoustic parameters were investigated:

1 mid-vowel f0value of the first and second stressed syllable (e.g., English teacher: V1F0, V2F0),

2 mid-vowel intensity value of the first and second stressed syllable (V1 and V2 intensity),

3 F0 change (V1F0–V2F0),

4 intensity change (V1 intensity–V2 intensity),

5 duration of the constituent syllables (e.g., English teacher: S1 and S2 for the underlined accent-bearing

syllables and U1 and U2 for italic unstressed syllable),

6 duration of the whole compound words or noun phrases (blackberry, English teacher)

In order to control for segment compositional effects in duration measurements, intrinsic and dependent f0 effects, and individual speaker differences, all measurements were analyzed as pair-wise

context-comparisons within items (i ¼ 1, 6) and speakers (j ¼ 1, 4) and a mixed model ANOVA was used.

The mixed model two-way ANOVA, with stress patterns (Compound, Broad, and Narrow) and speakergroups (native, advanced, beginner) as fixed effects and speakers and items as random effects was conducted

on each acoustic parameter The restricted maximum likelihood (REML) method was used to estimate

variance components A Tukey post hoc test (with the criterion p-value set at 0.05) was then conducted to

determine the significant differences among levels of the main fixed factors and their interaction effects (i.e.,the pair-wise comparison among the three stress patterns within each speaker group)

Trang 10

Guided by the ANOVA results, two of the acoustic measures (f0change and duration of the whole words ornoun phrases) were selected as a basis for classification of the stimulus tokens produced by the four nativespeakers A scatter plot diagram of the test items on these two parameters, supported by a discriminantfunction analysis, provided an acoustic basis for distinguishing the three target stress patterns.

It is also worth noting that the preliminary analysis on all nine acoustic parameters showed no systematicsignificant difference among the three syllabic templates (e.g black berry, butter fish and English teacher)therefore, the syllabic template as a factor was excluded from further analysis However, in some four-syllablecompounds (e.g., English teacher), the second accent-bearing syllables, in spite of having less prominent f0, werenot totally deaccented and will be discussed separately in the qualitative analysis of the f0contour section

3.1.5 Results: classification of native speaker productions

A detailed report of individual parameters of the native English productions is presented along with theVietnamese learners’ productions in Section 3.2 Here we merely present the results of the discriminantfunction analysis which established that the three English contrastive stress patterns may be successfullydiscriminated on the basis of speaker-normalized peak f0 measurements and rate-normalized durationdifferences between compound words and phrases

In order to examine whether stress patterns produced by native speakers are classifiable on the basis of f0and duration cues, the f0change measure (V1F0–V2F0, defined previously) and a normalized word durationmeasure were fed into a linear Discriminant analysis (Splus2000TM) to partition the stimulus items into threenon-overlapping groups in acoustic space The results are shown in a scatter plot (Fig 1)

The normalized duration measure used was a modified Z-score, derived as follows Measurements of the

duration of the whole compound word or noun phrase were expressed as a difference score for each item fromthe mean of its minimal set (thus normalizing for intrinsic phone value and speaker differences)

Z-score ¼ (duration valuemean duration value/standard deviation) Then, the Z-score was converted to a t-score (t-score ¼ Z-score  10+50) in order to yield a whole and positive number.

The scatter plot of native speakers’ items (Fig 1) shows that all but 12 of the experimental stimuli (12/144:92% of tokens) were correctly classified into their Broad, Narrow and Compound groupings on the basis ofthe f0change measure and the normalized ‘word’ duration scores

3.1.6 Conclusions: classification of native speaker productions

The foregoing result demonstrates that on a set of tokens carefully produced by four phonetically awarenative speakers, a simple linear discriminator equipped with the ability to measure pitch changes acrossadjacent stressed syllables and the relative timing of these intervals can discriminate the three stress patterns

F0 change 48.5

49.0 49.5 50.0 50.5 51.0

Native speakers

Fig 1 The scatter plots of four native speakers’ stress patterns: B—Broad, N—Narrow, and C—Compound.

Trang 11

rather well—better, apparently, than native listeners, as the subsequent perception experiment revealed Wetake this seemingly clear result, both as a validation of the role of temporal as well as accentual patterning inEnglish compound (word)—phrasal contrasts and as, as it turns out, an indication of the inadequacy of asimple feature detector to adequately model perceptual discrimination of the prosodic patterns by nativelisteners.

3.2 Production experiment: Vietnamese learners of English

Following a perceptual experiment (reported in Section 3.3) which familiarized Vietnamese learners with thestimuli and three target stress patterns, a production experiment was conducted to gather the same set ofacoustic measures described previously, in order to assess prosodic transfer effects and the hypothesis that

‘active’ prosodic parameters in L1 feature prominently in the early stages of L2 acquisition and whether

‘inactive’ prosodic parameters can become ‘activated’ through prolonged exposure to L2

3.2.1 Subjects

Two groups of learners participated in this experiment; Vietnamese beginning learners of English, andVietnamese advanced speakers of English The beginner group consisted of 10 subjects (five males and fivefemales) who were first year university students majoring in English in Hanoi They were paid for theirparticipation in the experiment They all started learning English at the age of 12 (in secondary school) withthe Grammar Translation method which focuses mainly on vocabulary and grammar learning However, theywere exposed to communicative English learning during their first year in university As soon as they finishedtheir first year, they participated in this experiment

The advanced group consisted of ten postgraduate students at the University of Queensland (five males andfive females) They were in the age range 25–32 They had been in Australia for between 8 months and 10years All of them could be classified as competent and good users of English, since they had at least anaverage band score of 6.5 on the IELTS test (International English Language Testing System—a nine-bandproficiency test of English on four skills: listening, speaking, writing and reading) They were all teachers ofEnglish who held a BA, a degree in EFL teaching, had 2–3 years’ teaching experience and were undertaking an

MA in TESOL All of the subjects started learning English at the age of 12 with the Grammar Translationmethod, which they studied through secondary and high school They were exposed to the communicativelanguage teaching method during 4 years of undergraduate study They can speak Vietnamese and English,and have very limited knowledge of French which they learned as a second foreign language at university,where the curriculum has a strong emphasis on grammar

3.2.2 Procedure

Subjects had gained prior familiarity with triplets in a perceptual discrimination experiment, conducted inthe previous hour For the production experiment, subjects read the target sentences, accompanied by theirprior context, from cards presented in quasi-random order Before the recording, subjects were allowedsufficient time for familiarization and practice They then read the text aloud three times in their normalspeaking manner Only the third repetition was recorded and used for analysis The recording was made in aquiet room using a sound recording and editing computer software (Speech Station) at 20 kHz sampling rateand 16-bit precision In the case of the beginner group, a 5-min review of the three stress patterns andexplanation of the meaning of the triplets were given in both English and Vietnamese only so as to make surethey understood the target stress contrasts

3.2.3 Signal processing and measurements

Signal processing and measurements of fundamental frequency, duration and intensity were identical

to those reported previously (Sections 3.1.3–3.1.4 above) In addition, for purposes of qualitative parisons between the accentual patterns produced by the Vietnamese learners and those of the nativeAustralian English exemplars, a ToBI style analysis of the intonation contours of the target sentences wasconducted

Trang 12

com-3.2.4 Prosodic transcription of f 0 contours

Prosodic transcription of the f0contours on the accent-bearing syllables of the stress patterns was made inaccordance with the guidelines for the English ToBI labelling (Beckman & Ayers, 1994) The English ToBItonal patterns observed in the utterances were H*, L+H*, L*, L*+H, !H*, L+!H*, and H+!H* Theutterances were transcribed by two transcribers familiar with ToBI, one as the primary and the other as thesecondary transcriber, who made an independent categorization of a subset of tokens The inter-transcriberdifferences were then reviewed, discussed with the input of a third rater and resolved

3.2.5 Results: Vietnamese learners’ productions and native speaker exemplars

In the sections that follow the beginners’ and advanced learners’ acoustic production parameters arecompared with those obtained from the four native speakers’ exemplars of the three stress patterns, the key f0and duration characteristics of which were reported previously Statistical test results refer to the mixed modeltwo-way ANOVA and post hoc Tukey tests previously described (Section 3.1.4)

3.2.5.1 f 0 values: quantitative analysis. The two-way ANOVA results consistently showed a significant maineffect of stress pattern for all three f0parameters (V1F0: F(2,823) ¼ 24.1, po.0001, V2F0: F(2,823) ¼ 34.78,

po.0001, f0change: F(2,823) ¼ 45.97, po.0001) and the interaction of stress pattern  speaker group (V1F0:

F(4,823) ¼ 2.96, po.02, V2F0: F(4,823) ¼ 2.46, po.03, f0change: F(4,823) ¼ 2.96, po.02) while there was no

significant effect for speaker group Post hoc results (Table 1) indicated that across the three speaker groups,the V1F0 of the narrow-focus pattern was significantly greater than the f0value on the same syllable of thecompound and broad-focus patterns while there was no or marginal significant difference in V1F0 betweencompound and the broad patterns By contrast, V2F0 of the broad-focus pattern was significantly greater than

f0value on the second stressed syllable of the narrow and compound patterns across three groups There was

no significant difference in V2F0 between the narrow and compound patterns produced by advanced andbeginner groups while a marginal difference was found in items produced by native speakers’ group (NoC,

po.04) The result on f0change showed that the magnitude of f0change from the first stressed syllable to thesecond stressed syllable of the narrow pattern was significantly greater than that of the broad and compoundpatterns across the three speaker groups and that of the compound patterns was greater than the broad

pattern except for the beginner group (p ¼ 07 n.s.).

Fig 2 illustrates the difference in magnitude of f0 change among the three stress patterns across threespeakers groups, supported by Tukey results inTable 1 f0change was greatest in the narrow-focus pattern,less in the compound pattern and least in the broad-focus pattern The degree of f0change is also significantlydifferent among the three stress patterns for the native and advanced groups (N4C4B), while for beginnergroup, only f0change of the narrow pattern was significantly greater than that of the broad and compound

Table 1

The post hoc Tukey results on F0 values

MD: mean difference.

Trang 13

patterns (N4C ¼ B) It is also shown inFig 2that the degree of f0change was less for non-native speakersthan for native speakers, particularly least for the beginning group.

The compact f0change in the broad pattern indicates that the two accent-bearing syllables of this patternhave comparatively high f0level, i.e., both the adjective and the noun are accented The large f0decline fromthe first to the second accent-bearing element showed that the narrow and compound patterns are left headed,and with greater f0 change the narrow pattern has a higher f0peak than the compound The insignificantdifference in f0change between the compound and broad patterns as well as the small f0change from the first

to the second accent-bearing element produced across three stress patterns by the beginners compared tonative and advanced speakers indicates that they failed to de-accent the accent-bearing nouns in the narrowand compound patterns

3.2.5.2 f 0 contours: qualitative analysis. Generally both transcribers agreed on the presence/absence of apitch accent on utterances produced by three speaker groups (90%) Disagreement was mainly on tone typesproduced by non-native speakers Agreement rate on tone types was much better for utterances produced bythe advanced speakers than the beginners (advanced: 81%, beginner: 53%) It was sometimes difficult todecide a tone type for many of the beginners’ utterances due to various tonal shapes being fairly different fromstandard English ToBI tones, probably as a result of lexical tone transfer The results are reported in terms ofthe proportions (percentages) of tonal patterns observed on each of the three target stress patterns (broad,narrow, compound) for each speaker group

The results inTable 2show that the intonation pattern of broad-focus accent corresponds with a doubleaccent across the three speaker groups with three major f0patterns: (1) a step-up pattern, with a sharp upwardmovement on the second word from a flat high tone of the first word followed by a steep fall (H* L+H*;Fig 3a1) or two successive step-up peaks (L+H* L+H*), (2) two comparative f0peaks either with the flat hatpattern (H* H*;Fig 3b1) or comparative double peak pattern (L+H* L+H*), and (3) a step-down patternwith the second tone lower than the first one (L+H* L+!H*, H* !H*, or H* H+!H*;Fig 3c1) As shown inTable 2, less than half of the test items across three speaker groups (native: 30%, advanced: 47%, beginner:36%) have the clear step-up patterns consistent with Hardcastle’s findings while a majority of the testitems have comparative f0 peaks The step-down f0in some items may be due to an utterance final pitchdeclination effect

The intonation contour of the narrow-focus pattern spoken by Australian English speakers contains a singlepeak with an enhanced prominence (either L+H* or H*; Fig 3a2, b2, c2, d2 and e2) on the contrastiveadjective followed by a sudden drop in f0after the contrastive accent and a deaccented noun, consistent withfindings in previous studies on contrastive focus in English (Beckman & Ayers, 1994; Ladd, 1980;Pierrehumbert & Hirschberg, 1990) As shown in Table 2, while the native speakers consistently deaccentedthe noun in the narrow-focus pattern (94%), Vietnamese beginning learners of English failed to deaccent thenoun (only 5% of cases involved noun deaccentuation but 46% of items carried comparative peaks and 49%

0 10 20 30 40 50 60 70 80 90 100

B C N

Trang 14

involved step-down accent peaks; Fig 3g2 and h2) Nevertheless, the advanced Vietnamese speakersdeaccented the noun of the narrow-focus pattern in 70% of occurrences (compared to 11% with comparativepeaks and 19% with step-down peak patterns).

The f0contour of the compound patterns spoken by native Australian English speakers had a single f0peaksimilar to that of the narrow pattern, but with a less prominent f0on the adjective (a total of 83% including79% with H* and 4% L+H* patterns;Fig 3a3, b3, c3 and e3) 17% of native speakers’ compounds have astep-down accent pattern with a non-deaccented noun (H* H+!H*;Fig 3d3) This pattern is found mainly inthe four-syllable template (e.g., open classroom, plastic money, sleeping partner) An explanation for thismight be that these four-syllable compounds have not yet been completely lexicalized While the advancedVietnamese speakers could deaccent the noun of the compound on 67% of tokens (compared to 11% with a

Table 2

Percentage of tonal pitch patterns per stress pattern per speaker group

Total (%)

Tone patterns % Total

(%)

Tone patterns % Total

(%) Tone patterns %

Trang 15

comparative peak and 22% with step-down peak patterns), beginners mostly preserve the accent on the noun

of the compound (only 4% with noun deaccentuation, up to 52% of comparative peak and 45% of step-downpeak; e.g.,Fig 3g3 and h3)

Fig 3 Samples of intonation patterns The three target patterns were segmented from their own respective carrier phrases in the experiment data set and pasted on the same time axis for purpose of illustrations only.

Trang 16

The qualitative examination of the f0contour, consistent with the statistical results on f0values, showed thatthe narrow-focus noun phrases and compounds had a single accent with left headed f0pattern while the broadpattern had a double accent The contrastive accent in the narrow pattern had a higher f0 peak than thecompound The results fromTable 2showed that non-native speakers markedly differed from native speakers

Fig 3 (Continued)

Ngày đăng: 07/12/2022, 15:38

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w