Among the various vocal cues of emotions that have been studied, intonation isthe most common, and many researchers have shown that intonation is an effectivefunction of expressing the s
Trang 1CHAPTER ONE PRELIMINARIES AND MOTIVATIONS FOR THIS STUDY
The important role of vocal expression in the communication of emotion hasbeen recognised since antiquity In his classic work on the expression of emotion inanimals and human beings, Darwin (1872/1965) attributed primary importance to thevoice as a carrier of emotional cues As Scherer (1989:233) points out, “the use of thevoice for emotional expression is such a pervasive phenomenon that it has beenfrequently commented upon since the beginning of systematic scientific interest inhuman expressive behavior.” Vocal expressions are extremely powerful and may havethe ability to elicit similar emotional states in others Despite this, however, there islittle systematic knowledge about the details of the auditory cues which are actuallyresponsible for the expression and perception of emotion in the voice Studiesregarding emotional speech have been done in the last few decades, but fewresearchers actually agree on how to define the phonetic quality of expressed emotions
Trang 2Among the various vocal cues of emotions that have been studied, intonation isthe most common, and many researchers have shown that intonation is an effectivefunction of expressing the speaker’s emotion (Williams & Stevens, 1972; O’Connor &Arnold, 1973; Abe, 1980; Bolinger, 1986; Chung, 1995) – the same word or phrase,when spoken using varying intonation, can reflect very different emotions or attitudeswhich are easily recognisable to listeners Few studies, however, involve an analysis ofthe vowels and consonants in emotional speech, in spite of the fact that these segments
of speech are affected by emotion (Williams & Stevens, 1972; Scherer, 1986; Chung,
1995; Hirose et al, 1997) Hence, one of the aims of this study is to examine the
segmental features – and the role they play – in the expression of emotional English
Because this study is conducted in Singapore, it is also interesting to look atcertain features of the local variety of English There have been many studies onSingapore English (henceforth known as SE) in the past few decades, which haveprogressed from identifying the structural mistakes of SE (Elliott, 1980) to establishing
SE as a standard form of English and describing what its features include (Tongue,
1974; Platt et al, 1984; Gupta, 1992; Brown, 1999; Zhu, 2003) Recent researchers
have generally agreed on the existence of certain features of SE, and are turning theirattention to ethnic variations of these features since Singapore is a multi-ethnic society(Ho, 1999; Poedjosoedarmo, 2000; Lim, 2001) This study aims to approach SEresearch from a new angle by looking at the relationship between certain SE featuresand emotions
It is hoped that the findings of this study on vowel and consonantal qualitieswill find support for the position that these are significant vocal cues in emotionalspeech which deserve more attention in this area of research, and also provide a deeperunderstanding of how SE is used in natural, emotional conversation
Trang 31.2 Emotional speech
Each aspect of consideration in a study of emotional speech is rather complex
in itself There is a wide variety of possible vocal cues to look at, and an even widerrange of emotions under the different kinds of classifications It is therefore necessary
to explain the choices of emotions and vocal cues examined in this study Thefollowing sections provide a background to emotions and their categories, followed by
a discussion on the relationship between emotions and the voice, and how the decision
is made on which emotions to examine
1.2.1 Emotion labels and categories
One of the first difficulties a researcher on emotion faces is having to sievethrough and choose from a myriad of emotion labels in order to decide on whichemotions to study The number of emotion labels is virtually unlimited, for when itcomes to labelling emotions, the tendency has been to include almost any adjective ornoun remotely expressive of affect After all, “the most obvious approach to describingemotion is to use the category labels that are provided by everyday language.” (Cowie,2000:2) According to an estimation made by Crystal (1969), between the two studies
by Schubiger (1958) and O’Connor & Arnold (1973), nearly 300 different labels areused to describe affect It seems that the only bounds imposed here are those of theEnglish lexicon Thus, in the face of such a multitude of labels, some kind ofsystematisation, in order to constrain the labels introduced, is indispensable
However, grouping emotions into categories is also a difficult issue Asmentioned, there are thousands of emotion labels, and the similarity between them is amatter of degree If so, no natural boundaries exist that separate discrete clusters ofemotions As a consequence, there are many reasonable ways to group emotion labels
Trang 4together, and because there has never been a commonly accepted approach tocategorising emotional states, it is no surprise that researchers on emotions differ onthe number of categories and the kinds of categories to use The following sections willhighlight the ways in which some researchers have categorised emotions.
1.2.1.1 Biological approach
Panksepp (1994), who looks at emotions from a neurophysiological point ofview, suggests that affective processes can be divided into three conceptual categories.The researcher points out that while most models accept fear, anger, sadness, and joy
as major species of emotions, it is hard to agree on emotions such as surprise, disgust,interest, love, guilt, and shame, and harder to explain why strong feelings such ashunger, thirst, and lust should be excluded Panksepp therefore tries to include allaffective processes in his three categories Category One – “the Reflexive Affects” –consists of affective states which are organised in quite low regions of the brainstem,such as pain, startle reflex, and surprise Category Two – “the Blue-Ribbon, Grade-AEmotions” – consists of emotions produced by a set of circuits situated in intermediateareas of the brain which orchestrate coherent behavioural, physiological, cognitive, andaffective consequences Emotions like fear, anger, sadness, joy, affection, and interestfall under this category Lastly, Category Three – “the Higher Sentiments” – consists
of the emotional processes that emerge from the recent evolutionary expansion of theforebrain, such as the more subtle social emotions including shame, guilt, contempt,envy, and empathy
However, because the concerns of a biological-neurophysiological study arevastly different from that of a linguistic study, this method of categorisation is notcommonly referred to by linguistic researchers of emotional speech Instead, the
Trang 5question is more often “whether emotions are better thought of as discrete systems or
as interrelated entities that differ along global dimensions” ( Keltner & Ekman,2000:237) Linguistic researchers who take the stand that emotions are discretesystems would study a small number of emotions they take to be primary emotions(the mixing of which produces multiple secondary emotions), while researchers whofollow the dimensional approach study a much greater number of emotions (viewed asequally important), placing them along a continuum based on the vocal cues theyexamine The next two sections will briefly cover these different views on emotions
1.2.1.2 Discrete-emotions approach
The more familiar emotion theories articulate a sort of “dual-phase model ofemotion that begins with ‘primary’ biological affects and then adds ‘secondary’cultural or cognitive processes” (White, 2000:32) Cowie (2000:2) states that
“probably the best known theoretical idea in emotion research is that certain emotioncategories are primary, others are secondary.” Cornelius (1996) summarises six basic
or primary emotion categories, calling them the “big six”: fear, anger, happiness,sadness, surprise, and disgust Similarly, Plutchik’s (1962) theory points towards eightbasic emotions He views primary emotions as adaptive devices that have played a rule
in individual survival According to this comprehensive theory, the basic prototypedimensions of adaptive behaviour and the emotions related to them are as follows: (1)incorporation (acceptance), (2) rejection (disgust), (3) destruction (anger), (4)protection (fear), (5) reproduction (joy), (6) deprivation (sorrow), (7) orientation(surprise), and (8) exploration (expectation) The interaction of these eight primaryemotions in various intensities produces the different emotions observed in everydaylife
Trang 6This issue is discussed more fully in Plutchik (1980) He points out thatemotions vary in intensity (e.g annoyance is less intense than rage), in similarity (e.g.depression and misery are more similar than happiness and surprise), and in polarity(e.g joy is the opposite of sadness) In his later work (Plutchik, 1989), he reiterates theconcept that the names for the primary emotions are based on factor-analytic evidence,similarity scaling studies, and certain evolutionary considerations, and that emotionsdesignated as primary should reflect the properties of intensity, similarity, and polarity.Therefore, “if one uses the ordinary subjective language of affects, the primaryemotions may be labelled as joy and sadness, anger and fear, acceptance and disgust,and surprise and anticipation.”
Lewis (2000a, 2000b) also presents a model for emotional development whichinvolves basic or primary emotions In his model, the advent of the meta-representation of the idea of me, or the consciousness, plays a central role He lists joy,fear, anger, sadness, disgust, and surprise as the six primary emotions, which are theemotional expressions we observe in a person in his first six months of life Theseearly emotions are transformed in the middle of the second year of life as the idea of
me, or the meta-representation, is acquired and matures Lewis calls thistransformation “an additive model” because it allows for the development of newemotions He stresses that the acquisition of the meta-representation does nottransform the basic emotions; rather, it utilises them in an additive fashion, therebycreating new emotions The primary emotions are transformed but not lost, andtherefore the process is additive
Trang 71.2.1.3 Dimensional approach
The dimensional perspective is more common among those who view emotions
as being socially learned and culturally variable (Keltner & Ekman, 2000) Thisapproach argues that emotions are not discrete and separate, but are better measuredand conceptualised as differing only in degree on one or another dimension, such as
valence, activity, or approach or withdrawal (Schlosberg, 1954; Ekman et al, 1982;
Russell, 1997) One way of representing these dimensions is by classifying emotionsalong bipolar continua such as tense – calm, elated – depressed, and Happy – Sad.Interestingly, this method of classification is similar to Plutchik’s (1980) concept ofpolarity variation as mentioned above In an example of bipolar continua, Uldall(1972) sets 14 pairs of opposed adjectives that are placed at the two ends of a seven-degree scale, such as
Bored extremely quite slightly Neutral slightly quite extremely Interested
The other pairs of adjectives are polite – rude; timid – confident; sincere – insincere;tense – relaxed; disapproving – approving; deferential – arrogant; impatient – patient;emphatic – unemphatic; agreeable – disagreeable; authoritative – submissive;unpleasant – pleasant; genuine – pretended; weak – strong
A more systematic method structurally represents categories and dimensions byconverging multidimensional scaling and factor analyses of emotion-related words andsituations, such that categories are placed within a two- or three-dimensional space,like that shown in Figure 1.1 The implication of such a structure is that a particularinstance is not typically a member of only one category (among mutually exclusivecategories), but of several categories, albeit to varying degrees (Russell & Bullock,1986; Russell & Fehr, 1994)
Trang 8Figure 1.1: A circumplex structure of emotion concepts Figure taken from Russell &
Lemay (2000:497)
Figure 1.2: Multidimensional scaling of emotion-related words Figure taken fromRussell (1980)
Trang 9Russell (1989) also suggests the use of multidimensional scaling, and usingdistance in a space to represent similarity, to better represent the interrelationshipsamong emotion labels Figure 1.2 (Russell, 1980) shows the scaling of 28 emotion-related words, based empirically on subjects’ judgments of how the words areinterrelated He claims that such a model is a better solution to the categorisation ofemotions as it reflects the continuous variation of the emotions It also asserts thecorrelation between emotions – the closer the emotions are placed together, the morelikely that an emotional state can be classified as both of the emotions While the
“prototypical” emotion categories are not placed in the outset of the space like those inFigure 1.1, they are generally in similar positions in relation to the other emotionlabels Multidimensional scaling diagrams like Figures 1.1 and 1.2 have been derivedfor numerous languages besides English, such as that spoken in the Pacific (e.g Lutz(1982) on Ifaluk; Gerber (1985) on Samoa; White (2000) on Solomon Islands) and
Asia (e.g Heider (1991) on Indonesia; Romney et al (1997) on Japan).
There are other approaches and models which represent how emotions may beconceptualised, but it is beyond the scope of this study to explain each and every one
of them It is likely that no single theory or model is the “correct” one as somepsychologists would like to think, and that each one really paints a partial picture, andhighlights different properties, of emotion concepts They are highly interrelated(though some have received much more attention than others) and are not competingaccounts (Keltner & Ekman, 2000)
Trang 10mouth or larynx, accelerated breathing rate, and muscle tension Scherer (1979), forinstance, points out that arousal of the sympathetic nervous system, whichcharacterises emotional states, increases muscle tonus – generally meaning an increase
in fundamental frequency – and also affects the coordination and the rhythms ofreciprocal inhibition and excitation of muscle groups The latter effect “will affectpitch and loudness variability, stress patterns and intonation contours, speech rate…and many other speech parameters.” (Scherer, 1979:501f) Thus recent linguisticresearch has looked at a wide range of emotions
Among the many emotions researchers have studied, anger, sadness, and
happiness / joy are the most common (van Bezooijen, 1984; Scherer et al, 1991; Chung, 1995; Johnstone et al, 1995; Klasmeyer & Sendlmeier, 1995; Laukkanen et al, 1995; McGilloway et al, 1995; Mozziconacci, 1995, 1998; Nushikyan, 1995; Tosa & Nakatsu, 1996; Hirose et al, 1997; Nicholson et al, 2000) A fair number of studies also examine fear and / or boredom (Scherer et al, 1991; Johnstone et al, 1995; Klasmeyer & Sendlmeier, 1995; McGilloway et al, 1995; Mozziconacci, 1995, 1998; Nushikyan, 1995; Tosa & Nakatsu, 1996; Nicholson et al, 2000) Disgust is another relatively popular choice of study (van Bezooijen, 1984; Scherer et al, 1991; Johnstone
et al, 1995; Klasmeyer & Sendlmeier, 1995; Tosa & Nakatsu, 1996; Nicholson et al,
2000) Other emotions which some researchers study include despair, indignation,interest, shame, and surprise The majority of these studies involving emotional speechtend to include neutral as an emotion While – strictly speaking – neutral is not anemotion, one can understand the necessity of non-emotional data, which serves tobring out the innate values of the speech sounds so that emotional data has a basis ofcomparison (Cruttenden, 1997)
Trang 11It should be noted that for the emotion anger, some researchers make the
distinction between hot and cold anger (Johnstone et al, 1995; Hirose et al, 1997; Pereira, 2000) Hirose et al (1997) explain that anger can be expressed straight or
suppressed, resulting in different prosodic features, and this is supported by theirresults, which show two opposite cases for samples of anger
1.2.3 Choices of emotions for this study
Because natural conversation is recorded (in the form of anecdotal narratives ofrecollections of emotional events) for data for this study, fear and boredom were notchosen since people do not normally recall events which made them feel fearful orbored and still speak with traces of the emotions felt at the time of the events Disgust
is also not an option as people do not usually sustain their tone of disgust throughoutsignificant lengths of their narrative
(Hot) anger, sadness, happiness, and neutral are chosen as the four emotions forthis study According to the theories formulated by Lewis (2000a, 2000b) and Plutchik(1962, 1980, 1989), anger, sadness, and happiness are considered primary emotions Inother words, they are distinguishable from one another and none of them is a derivative
of another They are also shown to be in different quadrants of the multidimensionalscaling diagrams, Figure 1.1 (Keltner & Ekman, 2000) and Figure 1.2 (Russell, 1980).This means that anger, sadness, and happiness are dimensionally dissimilar from oneanother Neutral is necessary because, as mentioned, it serves as a basis of comparisonfor the data of the other emotions Furthermore, if neutral had a place in themultidimensional scaling diagrams, it would probably be close to calm, and hencewould be in the fourth quadrant, reasonably different from anger, sadness, andhappiness
Trang 12It is also useful to note that despite their being four relatively distinct emotions,they can be grouped into two opposite pairs of ‘active’ and ‘passive’ emotions, assuggested by the vertical axis of the circumplex structure in Figure 1.1 Activeemotions – in the case of this study, Angry and Happy – are represented by aheightened sense of activity or adrenaline, while passive emotions – Sad and Neutral–are represented by low levels of physical or physiological activity (Neutral is taken to
be similar to calm in Figure 1.1 and also At ease and Relaxed in Figure 1.2, and is also
found by Chung (1995) to be similar to other passive emotions in terms of pitch,duration and intonation.) Such pairing is useful for it provides one more way by which
to compare and contrast the four emotions
1.3 Two major traditions of research
There are two major traditions of research in the area of emotional speech:encoding and decoding studies (Scherer, 1989) Encoding studies attempt to identifythe acoustic (or sometimes phonatory-articulatory) features of recordings of a person’svocal utterances while he is in different emotional states In the majority of encodingstudies, these emotional states are not real but are mimicked by actors In contrast,decoding studies are not as concerned with the acoustic features, but with the ability ofjudges to correctly recognise or infer affect state or attitude from voice samples
Due to the large number of speech sounds examined in the different number ofemotions, this study will mainly be an encoding study However, a preliminaryidentifying test will be conducted using some of the speech extracts from which thevocal cues used for analysis are taken The purpose of the short listening test is tosupport my assumption that certain particular segments of conversation are
Trang 13representative of the emotions portrayed This test will be explained in further detail inChapter 2.
Before stating the motivations for this study and its aims, the following sectionswill provide a summary of the past research done on the aspects of voice which arecues to emotion, as well as a brief description of the segmental features of SingaporeEnglish
1.4 Past research on emotional speech
A glance at the way language is used shows us that emotions are expressed incountless different ways, and Lieberman & Michaels (1962) note that speakers mayfavour different acoustic parameters in transmitting emotions (just as listeners may rely
on different acoustic parameters in identifying emotions) In the last few decades,researchers have studied a wide range of acoustic cues, looking for the ones which play
a role in emotive speech
1.4.1 Intonation
… speakers rarely if ever objectify the choice of anintonation patter; they do not stop and ask themselves
“Which form would be here for my purpose?”…
Instead, they identify the feeling they wish to convey,and the intonation is triggered by it
(Bolinger, 1986:27)
It is an undisputed fact that intonation has an important role to play in theexpression of emotions In fact, it is generally recognised that the use of intonation toexpress emotions is universal (Nushikyan, 1995), i.e there are tendencies in therepetition of intonational forms in different languages (Bolinger, 1980:475-524) Thisexplains why there is more literature regarding intonational patterns in emotionalspeech than any other speech sounds Because pitch is the feature most centrally
Trang 14involved in intonation (Cruttenden, 1997), studies on intonation tend to focus on pitchvariations.
Mozziconacci (1995) focuses exclusively on the role of pitch in English andshows that pitch register varies systematically as a function of emotion Her acousticanalysis shows that neutral and boredom have low pitch means and narrow pitchranges, while joy, anger, sadness, fear, and indignation have wider ranges, with thelatter two emotions having the largest means However, there is a low emotion-identification performance in her perception test She attributes that to the fact that nocharacteristics other than pitch had been manipulated, which implies that pitch is notthe only feature involved differentiating emotions in speech Chung (1995) observesthat in Korean and French, the pitch contour seems to carry a large part of emotionalinformation, and anger and joy have a wider pitch range than sorrow or tenderness
Likewise, McGilloway et al (1995) find that in English utterances, happiness,
compared to fear, anger, sadness, and neutral, has the widest pitch range, longer pitchfalls, faster pitch rises, and produces a pitch duration that is shorter and of a narrower
range Kent et al (1996) generalise that large intonation shifts usually accompany
states of excitement, while calm and subdued states tend to manifest a narrow range ofintonation variations
However, the system of intonation to convey affective meaning is not the only
means of communicating emotions According to Silverman et al (1983), certain
attitudes are indistinguishable on the basis of intonation Uldall’s (1972) resultsdemonstrate that some of the attitudes (e.g the adjective pair “genuine – pretended”)are apparently rarely expressed by intonation Therefore, while intonation is asignificant means of conveying expressive meaning, it is not the only one and there arecertainly other equally important phenomena
Trang 151.4.2 Other vocal cues
It seems obvious that intonation is not the only means of differentiatingemotions, and that “other aspects such as duration and voice quality must also be takeninto consideration” ( Mozziconacci, 1995:181) Cruttenden (1997) points out that thereare a number of emotions, like joy, anger, fear, sorrow, which are not usuallyassociated directly with tones, but may be indicated by a combination of factors likeaccent range, key, register, overall loudness, and tempo Murray & Arnott (1993) notethat the most commonly referenced vocal parameters are pitch, duration, intensity, andvoice quality (the last term was not clearly defined though) Nevertheless, there arefewer studies done on any one of these aspects than on intonation Furthermore, thesecues and parameters mentioned by Cruttenden (1997) and Murray & Arnott (1993) arestill examples of prosodic features, and no mention is made by them of the role ofsegmental features
The study on Korean and French by Chung (1995) suggests that the vowelduration of the last syllable differs according to the emotions: it is very short in angerand long in joy and tenderness Consonantal duration, however, is less regular;lengthening tends to occur on stressed words (No mention is made, however, of
whether these words are sentence-final or sentence-medial.) Hirose et al (1997) find
speech rate to be higher in emotional speech as compared to non-emotional speech,and Chung (1995) elaborates that it is high in anger and joy but low in sorrow andtenderness (in both studies, speech rate, while not explicitly defined, is measured over
a sentence)
With regard to intensity, the most obvious result of research is that it increases
with anger (Williams & Stevens, 1972; Scherer, 1986; Chung, 1995; Leinonen et al,
Trang 161997) Other findings include intensity being significantly higher in joy than in sadness
and tenderness (Chung, 1995; Hirose et al, 1997).
Klasmeyer & Sendlmeier (1995) study glottis movement by analysing the
glottis pulse shape in emotional speech data Laukkanen et al (1995) also examine the
glottis and the role of glottal airflow waveform in identification of emotions in speech,but the study is, unfortunately, inconclusive, because, as admitted by the researchers,since the glottal waveform was studied only at F0 maximum, it remains uncertainwhether the relevance of the voice quality in their samples was related to the glottalwaveform or to a pitch synchronous change in it
1.4.3 Comparing between genders
There is little written literature on the comparison between male and femalespeech, much less the comparison between male and female emotional speech This isprobably due to the fact that early work in phonetics focused mainly on the adult malespeaker, mostly for social and technical reasons (Kent & Read, 2002:53) But it is anundeniable fact that the genders differ acoustically in speech A classical portrayal ofgender acoustic diversity is shown by Peterson & Barney (1952), who, from a sample
of 76 men, women, and children speakers asked to utter several vowels, derive F1-F2frequencies which falls within three distinct (but overlapping) clusters (men, women,and children) Likewise, Tosa (2000) discovers after running preliminary (artificialintelligence) training tests with data from males and females, two separate recognitionsystems – one for male speakers and another for female speakers – are needed, as theemotional expressions between males and females are different and cannot be handled
by the same program model However, further research had not been done to find outthe reason behind the gender difference
Trang 17We do know that the differences are due in part to biological factors: women
have shorter membranous length of the vocal folds, which results in higherfundamental frequency (F0), and greater mean airflow (Titze, 1989) Women’s voicesare also physiologically conditioned to have a higher and wider pitch range than men,particularly when they are excited (Brend, 1975; Abe, 1980) In an experiment inwhich 3rd, 4th, and 5th grade children were asked to retell a story, Key (1972) observedthat the girls used a very expressive intonation (i.e highly varied throughout speech),while the boys toned down intonational features even to the point of monotony
1.5 Singapore English
Singapore is a multi-ethnic society whose resident population of four millionare made up of 76.8% Chinese, 13.9% Malays, 7.9% Indians, and 1.4% of other races(Leow, 2001) While the official languages of the three main ethnic groups areMandarin, Malay, and Tamil, respectively, English is the primary working language,used in education and administration Because of this multi-ethnolinguistic situation,the variety of English spoken in Singapore is distinctive and most interesting to study.There has been much interest in two particular ways of studying SE One describes thenature and characteristics of SE, documenting the semantic, syntactic, phonological,and lexical categories of SE The other way is to provide an account for the emergence
of certain linguistic features of SE; for example, some studies show evidence of theinfluence of the ethnic languages on SE
SE generally has two variations: Standard Singapore English (SSE) andColloquial Singapore English (CSE) (Gupta, 1994) SSE is very similar to mostStandard Englishes, while CSE differs from Standard Englishes in terms ofpronunciation, syntax, etc But many educated Singaporeans of today speak a mixture
Trang 18of both varieties: the morphology, lexicon, and syntax is that of SSE but thepronunciation system is that of CSE (Lim, 1999).
Since the interest of this study lies in the relationship between emotions and thearticulation of segmental features, a brief description will be given of the phonologicalphenomenon of vowel- and consonant-conflation in SE
1.5.1 Vowels
It is commonly agreed by researchers that one of the most distinctive features
of SE pronunciation is the conflation of vowel pairs Much research has been done onthis phenomenon, and some researchers focus on the conflation of pairs of short andlong vowels such as [,]/[LØ], [£]/[$], [o]/[c], [8]/[XØ] (Brown, 1992; Poedjosoedarmo,2000; Gupta, 2001) Brown (1992) finds that Singaporeans have no distinctionbetween the abovementioned long and short vowel pairs, i.e each pair is conflated.Poedjosoedarmo (2000), in her study on standard SE, studies only the vowel pair[,]/[LØ], which she takes as representative of the phenomenon of vowel conflation in SE,and finds that the pair is indeed conflated Gupta (2001), however, finds that thestandard SE vowel pairs placed in descending order of how often they are conflatedare: [(]/[4], [8]/[XØ], [o]/[c], [£]/[$], [,]/[LØ] In other words, most Singaporeans makethe distinction between the vowels in [,]/[LØ], but few do so for the vowels in [(]/[4]
Other studies further include [(]/[4], analysing the differences in the positions
of the tongue for the vowels in recorded standard SE, spontaneous or otherwise (Lim,1992; Loke, 1993; Ong, 1993) Lim (1992) plots a formant chart for the vowel pairs[,]/[LØ], [(]/[4], [£]/[$], [o]/[c], and [8]/[XØ] and finds the vowel pairs statisticallysimilar in terms of frequency (i.e the vowel pairs are conflated) Likewise, Loke
Trang 19(1993) finds the conflation of the vowel pairs [,]/[LØ], [(]/[4], [£]/[$], and [8]/[XØ] byexamining vowel formants in spectrographs Ong (1993), on the other hand, finds thatthe vowel pairs [,]/[LØ], [(]/[4], and [o]/[c] do conflate but “there is no clear evidence”
of conflation of [8]/[XØ], while the vowel pair [£]/[$] appears to conflate only in terms
of tongue height
However, less attention is paid to the conflation of the final pair of ReceivedPronunciation (RP) monophthong vowels, []/[Ø], though they have been found to beconflated in standard SE (Deterding, 1994; Hung, 1995; Bao; 1998) While moststudies that omit this vowel pair from their list of vowel pairs examined do not providereasons for the omission, Brown (1988) provides one: the distinction between []/[Ø]
is primarily one of length rather than tongue positioning, which may be solely related
to stress, where the [Ø] appears in stressed syllables while [] appears in unstressedones He reasons that since SE rhythm is typically not stressed-based, he is notconsidering the distinction of these two vowels in his study Another possible problemwith studying the conflation of []/[Ø] is that, in the case of natural or spontaneousspeech, words containing these vowels in the stressed syllable occur much lessfrequently (Kent & Read, 2002)
To recapitulate, Singaporeans generally conflate the vowel pairs [,]/[LØ], [(]/[4],[£]/[$], [o]/[c], [8]/[XØ], and []/[Ø] which are normally distinguished in RP Table 1.1shows how the vowels are conflated Certain diphthongs are also shortened (tomonophthongs) in SE (Bao, 1998; Gupta, 2001), but because monophthongs are thefocus on vowels of this study, this section will not cover that aspect of Singaporeanvowel conflation
Trang 20Table 1.1: Vowels of RP and SE (adapted from Bao, 1998:158)
In final position, stops (especially voiceless stops) usually appear as glottalstops [] and consonant clusters tend to be simplified, often by the omission of the
final stop (e.g tact pronounced as [W(] or [W(N]; lift as [OLI]) (Gupta, 2001) This is
more common in informal speech, but speakers are actually able to produce theappropriate stops or consonant clusters in careful speech
Also, standard SE speakers do not distinguish voiced from voiceless final stops, fricatives, and affricates (Gupta, 2001) The contrast between the voicedand voiceless obstruents is neutralised, such that all obstruents are voiceless and fortis,
position-with no shortening of the vowel before them For example, edge [(G=] is pronounced
Trang 21by SE speakers as [(W6], and rice [U$,V] and rise [U$,]] are pronounced identically as
[U$,V] According to Gupta (2001), this conflation apparently occurs even in carefulspeech in most SE speakers
1.6 Motivations for this study
It is fascinating that expressing and identifying emotions come so naturally to
us, and yet are so difficult to define Despite the fact that the voice is an importantindicator of emotional states, research on vocal expression of emotion lags behind thestudy of facial emotion expression This is perhaps because of the overwhelmingnumber of emotions – or more precisely, emotion labels – and the many differentpossible vocal cues to study, such that researchers take their pick of emotions andvocal cues in a seemingly random fashion This makes it difficult to view the studiescollectively in order to determine the distinction between emotions
This study attempts to go back to the basics, so to speak, starting with emotionsthat are “more primary”, less subtle, and most dissimilar from one another, and thevocal cues that are most basic to any language – the vowels and consonants
And because this study is conducted in Singapore, it is an excellent opportunity
to examine SE from a different angle, applying that which is known about SEsegments – in this case, vowel conflation – to an area of research in which the study of
SE is completely new (i.e emotional speech), in the hope of providing a deeperunderstanding of conversational SE and the way its features interact with affect, which
is ever present in natural conversations Intuitively, one would expect vowel conflation
to be affected by emotions, because vowels are conflated by duration (Brown, 1992;Poedjosoedarmo, 2000; Gupta, 2001) and / or tongue position (Lim, 1992; Loke, 1993;Ong, 1993; Nihalani, 1995) and both duration and tongue position are variables of the
Trang 22voice affected by physiology which in turn are affected by emotional states (Scherer,
1979; Ohala, 1981; Johnstone et al, 1995) In short, emotional speech involves
physiological changes which affect the degree to which vowel pairs conflate Andhence this study hopes to discover the relationship between emotional states and vowelconflation, i.e whether vowels conflate more often in a certain emotion, and if so,which vowel pairs and how they conflate
1.7 Aims of this study
The main aim is to determine the vocal cues that distinguish emotions from oneanother when expressed in English, and how they serve to do so This study also aims
to discover any relationship between emotions and SE vowel conflation, as well as todetermine the difference in expression of emotions between males and females
Vocal cues of four different emotions are examined, the four emotions beinganger, sadness, happiness, and neutral The vocal cues fall under two generalcategories: vowels and consonants 12 vowels – [,], [LØ], [(], [4], [£], [$], [o], [c], [8],[XØ], [], [Ø] – as well as eight obstruents – [S], [W], [N], [I], [7], [V], [6], [W6] – will beanalysed The variables examined are vowel and consonantal duration and intensity, aswell as vowel fundamental frequency Formant measurements will be taken of thevowels in order to compare vowel quality, and VOT and spectral measurements will betaken of the obstruents (depending on the manner of articulation) in order to comparethem within their classes The vocal production and the method of measurement of thespecific cues examined are elaborated on in Chapter 3
The measurements of the vocal cues of anger, sadness, and happiness arecompared with those of neutral to determine how these emotions are expressed throughthese cues The quality of the vowel pairs (as mentioned in the earlier section on SE
Trang 23vowels) will also be compared across emotions to find out if there is a relationshipbetween vowel conflation and emotional expression Also, the average measurements
of all vocal cues of males are compared with that of females
In short, the research questions of this study are:
I whether segmental aspects of natural speech can distinguish emotions, and if so,
by which vocal cues (e.g intensity, duration, spectra, etc);
II whether a relationship exists between emotions and the vowel conflation that ispervasive in Singapore English; and
III whether there is an obvious or great difference in emotional expression betweenmales and females
Trang 24CHAPTER TWO RESEARCH DESIGN AND METHODOLOGY 2.1 The phonetics study
The analysis of the sounds of a language can be done in two ways: by auditory
or instrumental means In this study, the choice of speech extracts (i.e passages takenfrom the anecdotal narratives) from which data will be obtained for analysis is based
on auditory judgment, at the researcher’s own discretion The data is then analysedinstrumentally However, since auditory perception is subjective, a perception test –using short utterances taken from the speech extracts chosen by the researcher – isconducted in order to verify that the researcher’s choices are relatively accurate andrepresentative of general opinion The perception test will be elaborated on in furthersections
b) He has studied English as a first language in school up to at least GCE ‘A’level and very possibly, University
c) He uses English as his predominant or only language at work
Two decades later, despite any sociological change in Singapore, the criteriahave not changed very much; Lim & Foley’s (to appear) general description ofspeakers who are considered native speakers of SE is as such:
Trang 25i They are Singaporean, having been born in Singapore and having lived all,
if not most, of their life in Singapore
ii They have been educated in English as a first language with educationalqualifications ranging from Cambridge GCE ‘A’ level (General Certificate
of Education Advanced level) to a bachelor degree at the local university,and English is used as the medium of instruction at every level in allschools
iii They use English as their main language at home, with friends, at school or
at work; at the same time, most also speak other languages at home, atwork, and with friends
The six subjects for this study, which consist of three males and three femalessince gender is an independent variable in this study, fulfil all these criteria, and thuscan be said to be educated speakers of SE
All subjects are Chinese Singaporeans between 22 and 27 years of age, and areeither students or graduates of the National University of Singapore (NUS) or La Salle(a Singapore college of Arts) Those who have graduated are presently employed All
of them have studied English as a first language in school, and speak predominantly inEnglish to family, friends, and colleagues
The subjects are all close friends or family of the researcher and thus arecomfortable with relating personal anecdotes to the researcher on a one-to-one basis
2.3 Data
This study compares the vowels and consonants expressed in four emotions,namely anger, sadness, happiness, and neutral These emotions are chosen because, as
Trang 26mentioned before, they are the most commonly used emotions in research onemotional speech, and because they are relatively distinct from one another.
The vowels examined in this study are vowel pairs [,] and [LØ], [(] and [4], [£]and [$], [o] and [c], [8] and [XØ], [] and [Ø] These vowels pairs are commonlyconflated in SE and one of the aims of this study is to examine the relationshipbetween emotional speech and vowel conflation in SE
The consonants examined are all voiceless obstruents: stops [S], [W], [N],fricatives [I], [7], [V], [6], and affricate [W6] Voiceless instead of voiced obstruents areexamined because voiceless obstruents tend to have greater aspiration and frication.The consonantal conflations that occur specifically in final position in SE (as described
in the earlier chapter) will not be examined This is because all consonants in finalposition will not be examined, since certain stops and fricative – such as stops [S] and[W], and fricatives [7] and [6] – drop in intensity when placed in final position (Kent &Read, 2002)
To recapitulate, the research aims of this study are to (i) determine which vocalcues distinguish emotions, (ii) discover if there is a relationship between emotions and
SE vowel conflation, and (iii) determine the difference in emotional expressionbetween males and females The following subsections explain how data is collectedfor the purpose of this study
2.3.1 Data elicitation
Many studies on emotional speech tend to rely on professional or amateuractors to mimic emotions There are advantages in this practice, such as control of dataobtained, ease of obtaining data, and ability to ensure clarity of recording which in turn
Trang 27allows greater ease and accuracy in the analysis of the recorded data However, actorportrayals may be attributable to theatre conventions or cultural display rules, andreproduce stereotypes which stress the obvious cues but miss the more subtle oneswhich further differentiate discrete emotions in natural expression (Kramer, 1963;Scherer, 1986; Pittam & Scherer, 1993) Hence, this study intends to obtain data fromspontaneous speech rather than actor simulation.
A long-term research programme (Rimé et al, 1998) has shown that most
people tend to share their emotions by talking about their emotional experiences toothers This means that most people will engage in emotional speech while recountingemotional experiences and therefore such recounts should have an abundance ofspeech segments uttered emotionally Thus, data for this study is elicited by engagingthe subjects in natural conversation and asking them to recall personal emotionalexperiences pertaining to the emotions examined in this study, which are anger,sadness, and happiness With regard to neutral, subjects are asked about their averageday at work or school (depending on which applies to them) and perhaps also asked toexplain the manner of their job or schoolwork
2.3.2 Mood-setting tasks
Each subject was required to have only one recording session with theresearcher to record the Angry, Sad, Happy anecdotes and Neutral descriptions Thiswas in order to ensure that the recording environment and conditions of each subjectwere kept constant as far as possible for all of his or her anecdotes and descriptions.Since the subjects had to attempt to naturally express diverse emotions in the span ofjust a few hours, they were given mood-setting tasks to complete before they recordedeach emotional anecdote These tasks aimed to set the mood – and possibly to prepare
Trang 28the subjects mentally and emotionally – for the following emotional experiences whichthe subjects were about to relate They also helped to smoothen the transition betweenthe end of an emotional anecdote and the beginning of another in a completelydifferent emotion, making it less abrupt and awkward for both the researcher andsubject.
The mood-setting task to be completed before relating the Angry anecdote was
to play a personal computer (PC) game, called Save Them Goldfish!, supplied by the
researcher on a diskette The game was played on either a nearby PC or, if there was
no PC in the immediate vicinity of the recording, the researcher’s notebook The gamewas simple, engaging, and most importantly, its pace became more frantic the longer itran, thereby causing the subject to be more tensed and excited The task ended eitherwhen the game finally got too quick for the subject and the subject lost, or – if thesubject proved to be very adept at it – at the end of five minutes The stress-inducingtask aimed to agitate the subject so that by the end of it, regardless of whether thesubject had actually enjoyed playing it, the subject’s adrenaline had increased and he
or she was better able to relate the Angry anecdote with feeling than if he or she wascasually asked to do so
For the Sad anecdote, the preceding task involved reading a pet memorialfound from a website, as well as a tribute to the firemen who perished in the collapse
of the United States World Trade Center on September 11, 2001, taken from theDecember 2001 issue of Reader’s Digest Because all the recordings were donebetween July and October, 2002, the memories of the September 11 tragedy were stillvivid and the relevance of the tribute was possibly renewed since it was around thetime of the first anniversary of the tragedy The researcher allowed the subject to read
in silence for as long as it took, after which the subject was asked which article he or
Trang 29she related to better, and to explain the choice The purpose of asking the subject totalk about the article which affected him or her more was to attempt to make the topicand the tragedy of the depicted situation more personal for the subject, thereby setting
a subdued mood necessary for the Sad anecdote
The mood-setting task for the Happy anecdote was simply to engage in idlehumorous chatter for a few minutes Since all the subjects are close friends and family
of the researcher, the researcher knew which topics were close to the hearts of thesubjects and could easily lighten the mood
There was no mood-setting task for Neutral The subjects were just asked abouttheir average day at work or school, and, if they had little to say about their averageday, asked to explain the manner of their job or schoolwork
It should be noted that the researcher changed her tone of voice in her taskinstructions and conversations with the subjects in order to suit each task and thefollowing anecdote This also served to set the mood for each anecdotal recording
2.4 Procedure
The subjects were approached (for their consent to be recorded) months beforethe researcher’s estimated dates of recordings, and when they agreed to be recorded,they were asked to think of personal experiences which had caused them to be Angry,Sad, and Happy They were not told of the specific research aims of this study, onlythat each anecdote should take about five to ten minutes to relate, but if they could notthink of a single past event significant enough to take five to ten minutes to talk about,they could relate several short anecdotes The subjects were not asked to avoidrehearsing their stories as if each was a story-telling performance, because theresearcher assumed – correctly – that the subjects would not even attempt to do so due
Trang 30to their own busy schedules In fact, in one case, the subject even decided on hisanecdotes no more than an hour before the actual recording session.
For the recording, the subjects could pick any place of recording in which theyfelt most comfortable, provided the surroundings were quiet with minimalinterruptions Five of the subjects were recorded in their own homes while one wasrecorded in the Research Scholar’s Room at university The subjects could sit or restanywhere during the recording as long as they did not move about too much while theywere being recorded They could also have props or memoirs if they felt that theobjects would be helpful and necessary A sensitive, unobtrusive PZM microphone(model: Sound Grabber II) was placed between the subject and the researcher, and therecording was done on a Sony mini-disc recorder (model: MZ-R55)
Before the start of the recording, the subjects were assured that they were notbeing interviewed and did not need to feel awkward or stressed; they were merelyconversing with the researcher as they normally do and just had some personal stories
to tell They were reminded to speak in English and to avoid using any other languages
as far as possible Ample time was given for them to relax so that they would speak asnaturally as possible, and they were told they did not have to watch their language andcould use expletives if they wanted to
The order of the anecdotes told by each subject was fixed: Neutral, Sad, Angry,then Happy This order seemed to work because it was easy (on both the researcherand the subject) to start a recording by asking the subject to describe a day at work.Furthermore, subjects seemed to be able to talk at length when describing the nature oftheir (career or school) work because they wanted to be clearly understood, and thisperiod of time taken was useful for the subjects to get accustomed to speaking in thepresence of a microphone, no matter how inconspicuous It was noticed that the
Trang 31subjects quickly learned to ignore the microphone and could engage in naturalconversation with the researcher for most of the recording In fact, majority of thesubjects were comfortable enough to become rather caught up, emotionally, in tellingtheir anecdotes; one subject – a close friend of the researcher – even broke downduring her Sad anecdote, and then was animatedly annoyed during her Angry anecdote
45 minutes later
After the end of each anecdote and before the mood-setting task of the next, thesubjects were always asked if they wanted to take a break, since their anecdotes couldsometimes be rather lengthy On average, each subject took about two hours tocomplete his or her recording of anecdotes
2.5 A pilot recording
A pilot recording was conducted to test and improve on the effectiveness of themood-setting tasks and the general format of a recording session Despite a couple ofminor flaws in the initial recording design, which are described in the followingparagraphs, the pilot recording is included as data because the subject was very openand honest with her emotions while she was relating her various personal experiences
For Neutral data elicitation, the plan was originally to ask subjects to describetheir surroundings However, the pilot recording revealed that the subject would speakslowly and end with rising intonation for each observation she made, as if she wasreciting a list, which did not sound natural But when the subject came to a jigsawpuzzle of a Van Gogh painting on her wall and was asked more about it, her speechflowed naturally (and in a neutral tone) as she explained in detail the history of thepainting and the painter Because the subject has a strong interest in Art and is also aqualified Art teacher, it was realised that it was more effective to ask subjects to
Trang 32explain something which was familiar to them rather than to ask for a visualdescription of the surroundings Hence the prompt for Neutral was changed to askingsubjects about their average day and possibly asking them to elaborate on their work.
The mood-setting task for the Sad recording initially consisted of two articles
on the September 11, 2001 tragedy: a tribute to the firemen, and a two-page article onseveral families who had exchanged last words with their loved ones on Flight 93 –both of which were taken from the December 2001 issue of Reader’s Digest Subjectswere then supposed to be asked what they thought was most regrettable about thetragedy The subject for the pilot recording ended up expressing her political opinion,but as mentioned, she was emotionally honest when she related her Sad personalexperience (to the extent of weeping at certain points of her tale), and thus herrecording was still suitable for use as data despite the fact that the task did not serve itspurpose Following the suggestion of the subject, the longer article was replaced by apet memorial, which would be an effective mood-setting task for subjects who areanimal lovers, or who have or have had pets
The perception test was taken by 15 males and 15 females, all students of NUSand between 22 and 25 years of age The listeners were given listening test sheets onwhich all the utterances were written out – without indication of who the speakers
Trang 33were – so that they could read while they listened, in case they could not make out thewords in the utterances The listeners were told that they would hear 72 utterancesfrom different speakers, and the utterances were pre-recorded in a random order but inthe sequence as printed on the test sheets They were given clear instructions that theywould hear each utterance only once, after which they would have approximately tenseconds to decide whether it sounded Angry, Sad, Happy, or Neutral, and they had toindicate their choice by ticking the appropriate boxes corresponding to the emotions.The listeners were also reminded to judge the utterances based on the manner – instead
of content – of expression
2.6.1 Extracts for the test
The speech extracts for the perception test were taken from all the recordings ofthe three male and three female subjects Three utterances were taken from each of theAngry, Sad, Happy, and Neutral recordings of each subject, making a total of 72utterances for the entire perception test Two of the three utterances were extractedfrom the sections of the recordings which were considered very expressive, and onewas extracted from the sections which were considered somewhat expressive (cf.Chapter Three section 3.2.1.1 regarding segmenting recordings according toexpressiveness) The utterances were randomly chosen from the expressive sections bythe researcher
All the utterances started and ended with a breath pause, indicating the start andend of a complete and meaningful expression It was ensured that they consisted only
of clear-sounding speech, and lasted at least two full seconds This was so that thelisteners could discern what was being said, and that the utterances were not too shortfor the listeners to perceive anything
Trang 34The test lasted about 15 minutes However, it was felt that the length of timefor the test was sufficient as increasing the number of utterances for each emotion persubject would result in having too many utterances fro subjects to listen to.
2.6.2 Test results
With 30 listeners judging three utterances from each of the six subjects, thetotal number of possible matches for each of the four emotions is 540 (A match occurswhen the emotion perceived by the listener is the same as the intended emotion of theanecdote from which the utterance was extracted.) A breakdown of the results isshown in the table below, and illustrated by the bar chart following the table
It should be stressed that the perception test is done to verify that the researcher
is able to identify utterances which are representative of general opinion (on theemotion perceived), not to determine the specific utterances from which tokens foranalysis are later taken The a priori cut-off for acceptance that the researcher’s choicesare accurate is set at 60%, which means that as long as there are more than 60%matches in an emotion, it is concluded that the researcher is able to accurately pickutterances which listeners in general feel are representative of that emotion This inturn means that it will thus be acceptable that the researcher rates the expressiveness ofthe sections of recordings and also (randomly) selects the tokens for analysis.However, if less than 60% matches are made in an emotion in the perception test, itmeans that the researcher’s perception of emotions is not similar to that of listeners ingeneral, and will thus need independent raters of expressiveness of the sections ofrecordings before the researcher can select tokens for analysis from those sections
Trang 35Table 2.1: Results of perception test
Angry 519 (96.11 %) 13 Neutral (2.41 %)
8 Happy (1.48 %)Sad 439 (81.30 %) 99 Neutral (18.33 %)
2 Angry (0.37 %)Happy 371 (68.70 %) 138 Neutral (25.56 %)
22 Angry (4.07 %)
9 Sad (1.67 %)Neutral 488 (90.37 %) 9 Angry (1.67 %)
20 Happy (3.70 %)
23 Sad (4.26 %)Figure 2.1: Bar chart of results of perception test
Intended emotion of anecdote
SadHappyAngryNeutralMatch
As can be seen from the table and chart of the results, there is a high accuracy
of recognition for Angry, Neutral, and Sad, and a percentage of matches large enoughfor Happy, such that it can be concluded that the researcher’s perception of emotions is
an accurate reflection of that of listeners in general
One possible reason for the large number of matches across emotions is that thetest only required listeners to choose from four emotions which were relativelydissimilar from one another Taking the dimensional approach, it can be explained thatwhen listeners are asked to recognise emotions with relatively different positions in theunderlying dimensional space, they only have to infer approximate positions on thedimension in order to make accurate discriminations (Pittam & Scherer, 1993)
Trang 36Another reason could be that despite the reminder from the researcher to judge based
on manner of expression, the semantics of the utterances might still have played a part
in affecting the decisions of the listeners However, these reasons do not discount thefact that listeners are generally able to infer emotions from voice samples, regardless
of the verbal content spoken, with a degree of accuracy that largely exceeds chance(Johnstone & Scherer, 2000:228)
It is interesting to note that Neutral forms a large fraction of themisidentifications of the Angry, Sad, and Happy utterances This is probably due to thefact that people are not extremely expressive when recounting past experiences Days,months, or even years might have passed since the event itself, and hence the emotionsexpressed are possibly watered-down to some extent It is therefore understandablethat when these expressions are extracted in the form of short utterances and judgedwithout the help of context, they can sound like Neutral utterances However, while theabsolute differences between the emotions might be smaller because they may all becloser to Neutral, the relative differences between the emotions are still accuraterepresentations of the relative differences between fully-expressed emotions
Generally, the results of this perception test show that a large percentage oflisteners could identify the intended emotion of the utterances (likewise perceived bythe researcher as expressively uttered in the respective emotion) in the perception test.Another possible interpretation of the results is that a large percentage of the utteranceschosen for the perception test could be correctly identified by the emotion expressed Itcan thus be concluded that the researcher’s choices of data are generally accurate andrepresentative of general opinion, and thus the researcher can rate the expressiveness
of the sections of recordings from which tokens of sound segments are analysed
Trang 37CHAPTER THREE PRE-ANALYSIS DISCUSSION 3.1 Speech sounds explained
Before the presentation and analysis of data, it is necessary to briefly explainhow the speech sounds that are examined in this study are produced in general
3.1.1 Vowels
In the course of speech, the source of a sound produced during phonationconsists of energy at the fundamental frequency and its harmonics The sound energyfrom this source is then filtered through the supralaryngeal vocal tract (Lieberman &Blumstein, 1988:34ff; Kent & Read, 2002:18) The articulators such as the tongue andthe lips are responsible for the production of different vowels (Kent & Read, 2002:24).The oral cavity changes its shape according to the tongue position and lip roundingwhen we speak, and the cavity is shaped differently for different vowels It is when theair in each uniquely shaped cavity resonates at different frequencies simultaneouslythat its characteristic sounds are produced (Ladefoged, 2001:171) These frequenciesthen translate as dark bands of energy – known as formants – at various frequencies on
a spectrogram The lowest of these bands is known as the first formant or F1, and thesubsequent bands are numbered accordingly (2001:173)
The first and second formants (F1 and F2) are most commonly used toexemplify the articulatory-acoustic relationship in speech production, especially that ofvowels (see Figure 3.1) F1 varies inversely with vowel height: the lower the F1, thehigher the height of the vowel F2 is generally related to the degree of backness of thevowel (Kent & Read, 2002:92) F2 is lower for the back vowels than for the frontvowels However, the degree of backness correlates better with the distance between
Trang 38F1 and F2, i.e F2-F1 (Ladefoged, 2001:177): its value is higher for front vowels andlower for back vowels.
Figure 3.1: A schematic representation of the articulatory-acoustic relationships
Figure adapted from Ladefoged (2001:200)
3.1.2 Consonants
Consonants differ significantly among themselves in their acoustic properties,
so it is easier to discuss them in groups that are distinctive in their acoustic properties(Kent & Read, 2002:105) In this study, the groups of consonants examined are thestop, fricative, and affricate The following sub-sections briefly explain these groups ofconsonants articulatorily and acoustically so as to provide a general understanding oftheir differences and why they cannot be treated simply as one large class
3.1.2.1 Stops
A stop consonant is formed by a momentary blockage of the vocal tract,followed by a release of the pressure When the vocal tract is obstructed, little or no
Trang 39acoustic energy is produced But upon the release, a burst of energy is created as theimpounded air escapes In English, the blockage occurs at one of three sites: bilabial,alveolar, or velar (the glottal is usually considered separate from the rest) (Kent &Read, 2002:105-6) The stops examined in this study are the voiceless bilabial [S],alveolar [W], and velar [N].
Stops are typically classified as either initial prevocalic or final postvocalic (2002:106) Syllable-initial prevocalic stops are produced by, first, ablockage of the vocal tract (stop gap), followed by a release of the pressure (noiseburst), and finally, formant transitions Syllable-final postvocalic stops begin withformant transitions, followed by the stop gap, and finally, an optional noise burst Onlysyllable-initial prevocalic stops are examined in this study since syllable-finalpostvocalic stops do not always have noise burst and are therefore not reliable cues
syllable-The stop gap is an interval of minimal energy because little or no sound isproduced, and for voiceless stops, the stop gap is virtually silent (2002:110) Silentsegments could sometimes be pauses instead of stop gaps, and thus stop gaps arerelatively difficult to identify and quantify in a spectrogram, especially when a stopfollows a pause
The noise burst can be identified in a spectrogram by a short spike of energypattern usually lasting no longer than 40 milliseconds (2002:110)
Formant transitions are the shift of formant frequencies between a vowel andconsonant adjacent to each other In the case of syllable-initial prevocalic stopsexamined in this study, formant transitions are the shift of formant frequencies fromtheir values for the stop to that for the vowel Considering that formant transitions arementioned as part of the acoustic properties of stops, it was decided that formanttransitions should be included in the measurements of the voiceless stops in this study
Trang 40The manner of inclusion of formant transition values will be explained in the latersection on consonant measurements.
There are a couple of acoustic properties of stops which are commonlymeasured, of which one is the spectrum of the stop burst, which varies with the place
of articulation (Halle et al, 1957; Blumstein & Stevens, 1979; Forrest et al, 1988).
Kent & Read (2002:112-5) give a brief overview of some of the studies that have beendone on the identification of stops from their bursts, and surmise that correctidentification of stops is possible if several features are examined, namely spectrum atburst onset, spectrum at voice onset, and time of voice onset relative to burst onset(VOT)
VOT, or voice onset time, is another acoustic property commonly associatedwith the measurement of stops It is the time interval “between the articulatory release
of the stop and the onset of vocal fold vibrations” (2002:108) When the voicing onset
precedes the stop release (usually the case for voiced stops), the VOT has a negative
value A positive value is obtained when the onset of voicing slightly lags thearticulatory release (usually so for voiceless stops)
These acoustic properties will be referred to in the later section on consonantmeasurements, where the choice of acoustic properties measured, as well as themethods by which the measurements are made, are explained
3.1.2.2 Fricatives
Fricative consonants are formed by air passing through a narrow constrictionmaintained at a certain place in the vocal tract, which then generates turbulence noise(Kent & Read, 2002:121-2) A fricative can be identified in a spectrogram by itsrelatively long period of turbulence noise And as with stops, formant transitions join