Better Game Characters by Design- P8 doc

Chapter 7 completes the discussion of characters’ basic socialequipment with discussion of the voice—the rich messages that people convey through how they say things.1 The chapter includ

Trang 1

This page intentionally left blank

Trang 2

CHAPTER Seven

The Voice

Chapters 5 and 6 were about social cues that engage the eyes This chapter isdevoted to the ear Chapter 7 completes the discussion of characters’ basic socialequipment with discussion of the voice—the rich messages that people convey

through how they say things.1 The chapter includes an overview of the kinds ofsocial cues that the voice conveys, with many listenable examples from games

(including Warcraft III: Reign of Chaos, Final Fantasy X, The Sims™, Grim Fandango, and Curse of Monkey Island), and offers design tips for considering the aural side of

social signals when crafting character voices Chapter 7 also includes discussion ofsome future-facing voice technology and an interview with two pioneers in usingemotion detection from voice cues to adjust interfaces

Before reading this section, take a moment to listen to the first two voice samples

on the DVD (Clips 7.1 and 7.2) While listening to each person, try to form amental picture: How old are they? What gender? Is this person of high or lowstatus? Are they in a good or bad mood? Then see Section 7.9 for photos of thespeakers Most likely you correctly identified the majority of these visible traitsfrom voice alone

Listening to a person’s voice on the telephone, you can often make a good guessabout age, gender, social status, mood, and other characteristics without any visualcues to help Even if the person is speaking another language and you cannotunderstand the meaning of the speech, you can still get pretty far in assessing thesequalities How is this possible?

Researchers point to the evolutionary roots of speech in the grunts and calls ofour primate ancestors There are striking similarities in the vocal characteristics offright, anger, and dominance, among other social cues, when one compares primate

1Analyzing the social meaning of what characters say moves into the territory of linguistics,

Trang 3

and human voices Researchers who asked participants in a study to listen to malemacaque monkeys found that more than 80% of the listeners could accuratelyidentify what the dominance calls meant (Tusing and Dillard 2000, 149).

Social scientists refer to the information that is not conveyed by the words

in speech as paralinguistic cues A large proportion of the meaning in everyday

conversation emerges through paralinguistic cues—shifts in voice quality whilespeaking, pauses, grunts, and other nonlinguistic utterances Paralinguistic cuesplay an even bigger role in communication between people who already knoweach other well—a well-placed sigh or lack of a heartfelt tone conveys volumes

To make characters seem richly human in their communication, then, a designer

should have a solid understanding of what they are conveying with how they

say things

7.2.1 The Mechanics of Speech

To speak, a person pushes air from the lungs through the larynx, mouth, and nose.The pitch and the qualities of sounds are affected in two different ways: phonation

and articulation Phonation is the way a person moves the larynx itself to make the

initial sound When the shape of the muscle folds in the larynx (which used to

be called the vocal cords) is altered, it produces different sound pitches (called

fundamental frequencies) and also different sound qualities, such as breathiness or

harshness These qualities can shift due to a person’s emotional state—tenseness,

tiredness, depression, and excitement all can have effects on phonation

Articula-tion is when a person uses the natural resonance of the mouth, nose, and even of

the chest cavity, as well as moving the tongue and lips and palate, to alter thesound as it comes out People are very sensitive to shifts in articulation—forexample, a person can “hear” a smile in another’s voice, in part because the shift

in lip shape when speaking affects the articulation of the sound (see Figure 7.1).Listen to audio Clips 7.3 and 7.4 on the DVD Can you tell which recording wasmade while smiling? See Section 7.9 for the answer

As with facial expression and gesture, some of what people hear in others’voices comes from their physical qualities and their body’s involuntary reactions tocircumstances Some comes from learned strategies and responses to social circum-stances For example, gender and age come across in voice because of physicalqualities of the person’s vocal equipment itself (which can be a problem for peoplewhose voices fall outside the usual range for their gender or age group) Mood andemotion are signalled involuntarily (at least in part) because of changes in vocalproduction as the person’s nervous system reacts—for example, the dry mouth andspeedier heart rate of anxiety also have effects on the muscles in the larynx and onbreathing itself However, a person can also mold the tone of his or her voice insome ways, adopting a pacifying, pleading, arrogant, or neutral tone of voice using

intonation and rhythm (referred to as prosody by researchers) Failing to adopt the

proper social tone of voice is a communication in and of itself

Trang 4

7.2 THE PSYCHOLOGICAL PRINCIPLES

7.2.2 The Social Signals in Voice

Emotion in Voice

Emotions underpin decision-making, including social action and reaction (see

[Damasio 1994] for a fascinating account of the role of emotions in thinking)

Know-ing that another person is angry is crucial to understandKnow-ing how they are interpretKnow-ing

social actions and the world at large, thus helping to predict what they might do

next Failure to recognize emotional expression is a serious liability in human

inter-action—it is in fact a symptom of some disorders in the autism spectrum

Chapter 5 touched upon Ekman and colleagues’ work on recognizing facialexpressions of emotion There has also been extensive work on the expression

of emotion in the voice Voice researchers have found, when they look for

con-sistent signatures of emotions, that there are clear patterns (see [Kappas, Hess, and

Tension and movements

Lips Tongue

Pharynx

Height of larynx Tension and shape of folds Regularity of fold vibration

Subglottal air pressure

Regularity, depth Respiration

Articulation

Phonation

Cortex

Limbic system

Jaw Pharynx

Both phonation— the action of the larynx (vocal cords) and articulation— the shaping of the

mouth, tongue, and lips—create the subtle alterations in tone that carry social and emotional

information (based on Kappas, Hess, and Scherer 1991).

F I G U R E

7.1

Trang 5

Scherer 1991; Cahn 1990; and Burkhardt and Sendlmeier 2000] for more detail onthe taxonomy that follows):

• Anger (hot) Tense voice, faster speech rate, higher pitch, broader pitch range

• Anger (cold) Tense voice, faster speech rate, higher fundamental frequency and

intensity, tendency toward downward-directed intonation contours

• Joy Faster speech rate, raised pitch, broader pitch range, rising pitch pattern

• Fear Raised pitch, faster speech rate, broadened range, high-frequency energy

• Boredom Slower speech rate, additional lengthening of stressed syllables, lowered

pitch, reduced pitch range and variability

• Sadness (crying despair) Slower speech rate, raised pitch, narrowed pitch range,

narrowed variability

• Sadness (quiet sorrow) Slower speech rate, lowered pitch, narrower pitch range,

narrower variability, downward-directed contours, lower mean intensity, lessprecision of articulation

• Depression Lower intensity and dynamic range, downward contours

Notice the similarities among emotions—fear, anger, and joy all seem to besignalled by faster speech, higher pitch, and more range In contrast, quiet sadness,depression, and boredom share slowing of pace, lower pitch, and less variability.These effects can be traced back to what is going on in the person’s nervoussystem The arousal of a person’s sympathetic nervous system, which causes thingslike increased heart rate and sweating, also causes these changes in the voice.When a person’s parasympathetic system, which decreases blood pressure andslows heartrate, moves into action, it also shifts what happens in the voice itself

A glance back at the Laban movement graphs in Chapter 6 (Figures 6.12 and 6.13)shows that body movement style also seems to be modulated in this way

So how do people learn to tell apart the high-energy or low-energy emotions in thevoice? Certainly they use context contributed by the words themselves, but people arealso able to tell what position the mouth is in, based upon sound As mentionedabove, a person can “hear” a smile Voice researchers have detected different patterns

of intonation as well—such as the characteristic rising pitch pattern of joy It is alsothe case that people acclimate to one another’s vocal patterns—knowing someonewell includes knowing how they, in particular, signal sadness or joy with their voice.One game that takes full advantage of the power of paralinguistic cues in con-

veying emotion is The Sims™ Sim characters speak to one another, but their words

are entirely incomprehensible Simlish may be gibberish, but it is laden with tional signals, and it allows the player to draw conclusions about how his or herSim is feeling in general, and in relation to other Sim characters (see Figure 7.2).For example, listen to Clip 7.5 As the Sim characters move from joy to jealousy, it iseasy to follow along despite the lack of words

Trang 6

emo-7.2 THE PSYCHOLOGICAL PRINCIPLES

Interestingly, researchers have found connections between the expression ofemotion in the human voice and strategies for evoking emotion through music Two

researchers performed a meta-analysis of research that had been done on evoking

specific emotions with music, with work on emotions in speech Data “strongly

sug-gest that there are emotion-specific patterns of acoustic cues that can be used to

communicate discrete emotions in both vocal and musical expressions of emotion”

(Juslin and Laukka 2003, 799) This makes sense if one considers that the playing of

musical instruments tends to reflect the muscular tension and general arousal state

of the performer—creating a bridge to the listener into a particular emotional state

Some games, for example, Grim Fandango, make use of this connection between

music and emotion to heighten the player’s experience of a character’s emotional

reactions See Figure 7.3 and Clip 7.6, in which Manny’s boss berates him for a

The Sim language—“Simlish”—uses paralinguistic cues of emotion (listen to Clip 7.5) The

Sims™ Unleashed image ©2005 Electronic Arts Inc The Sims is a registered trademark of

Grim Fandago uses music to heighten the player’s reaction to an NPC’s tirade (listen to Clip 7.6).

F I G U R E

7.2

F I G U R E

7.3

Trang 7

mistake Notice the music in the background, which displays some of the sameaural qualities as the boss’s tirade.

Social Context and Identity

As researchers begin to assemble a more detailed picture of how the voice tributes to social interaction, one thing they are realizing is that emotion is notnecessarily the predominant message communicated Researchers in Japan whogathered a large body of recorded speech by asking people to wear headsets around

con-in everyday life, found few examples of strong emotion con-in voices Day to day, ple tended to keep their emotional reactions mostly to themselves What did show

peo-up were big differences in patterns of voice depending peo-upon who the person wasspeaking to—adjustment based on social roles and relationships (Campbell 2004)and, of course, individual differences in vocal style that emerged from each person’sown personality and physical qualities

Some traces of social roles and relationships in voices can be broken down alongthe dimensions first discussed in Chapter 2: cues of dominance and of friendliness.People demonstrating dominance tend to lower their voice somewhat and toconstruct shorter utterances in general They may sometimes speak more loudly,depending upon the situation Showing submission with voice involves using asofter, more highly pitched voice, and subordinates tend to say more As was men-tioned earlier in this chapter, these general vocal contours of dominance are true ofother primates as well as people Clips 7.7 and 7.8 demonstrate the differencebetween dominant and submissive voices Although the butler (Raoul) is initiallyvery dominant, he moves to submissive obsequiousness in the second clip once

Manny has a pass to the VIP lounge (see Figure 7.4) In general, Grim Fandango

Manny tries to gain entrance to the VIP lounge (from Grim Fandango) Listen to Clips 7.7 and 7.8

F I G U R E

7.4

Trang 8

7.2 THE PSYCHOLOGICAL PRINCIPLES

makes brilliant use of vocal dominance cues to heighten comic effect Other

exam-ples of games that use dominance cues in similar ways are The Curse of Monkey

Island (Figure 7.5, Clip 7.9) and Warcraft III (Figure 7.6, Clip 7.10).

Friendliness is shown in various ways Meeting with a friend usually leads towarmth and energy in the voice, the signals of joy Conversation among close

friends includes more range of emotion than between more distant acquaintances—

more revelation of personal emotional state and empathizing with the other’s state

The Curse of Monkey Island also makes use of dominance cues to heighten comic effect (listen to

F I G U R E

7.5

F I G U R E

7.6

The peons in Warcraft III are charmingly submissive in their voices and responses to player

commands (listen to Clip 7.10) Warcraft III: Reign of chaos provided courtesy of Blizzard

Entertainment, Inc.

Trang 9

through modulation of your own voice Intimacy with someone is often reflectedwith a more breathy quality in the voice For an example of the breathiness of inti-

macy, contrast the clips from Grim Fandango above, with Clip 7.11—a conversation

between Manny and his love interest, Meche Both Manny and Meche have a greatdeal of breathiness in their voices

Individual personality can come through in the voice as characteristic

patterns of emotion and energy For example, in Grim Fandango, the hat-check

girl is a high-energy character who makes rapid turns from enthusiasm to anger(see Figure 7.7, Clip 7.12)

Social Interaction Logistics

Vocal modulations during an interaction show that a person is listening andcomprehending and also help to orchestrate turn-taking in conversation “Back-channel” responses such as “uh hunh” make the speaker feel the listener isengaged with what is happening People also use such noises to indicate that theyare still thinking or to express a range of emotions in response to a statement beforethey can put them into words

Games with elaborate and extensive cut scenes, such as Final Fantasy X,

make artful use of these sorts of cues to reveal the nuances of relationshipsamong characters (see Figure 7.8)

Back-channel responses may be one reason that people enjoy using enabled multiplayer online games as well—players can hear the triumph ordespair in one another’s voices as they play, heightening the experience itself, andcan use vocal cues (e.g., “whoa!” or “uhhhh ”) to help guide one another’sactions

voice-The hat-check girl in Grim Fandango has a distinctive way of speaking (listen to Clip 7.12).

F I G U R E

7.7

Trang 10

7.3 DESIGN POINTERS

Missed Opportunity: Real-Time Vocal Adaptation

Currently, NPCs rarely offer real-time back-channel sounds and comments during

game play Revealing more complex awareness of a player as well as reactions to

the player through emotion- and information-laden audio cues as play situations

unfold could greatly increase the sense of social presence and connection a player

feels toward an NPC Imagine a sidekick or a just-rescued character gasping as the

player executes a tricky move or making a subtle noise of doubt and hesitation as

the player starts to move in a fruitless direction

This will become increasingly practical as voice synthesis becomes more andmore realistic, eliminating the need for a huge body of prerecorded audio files (see

Section 7.8, for more information about speech synthesis)

The cut-scenes in Final Fantasy X use vocal cues to heighten the player’s experience of the NPCs’

Trang 11

7.3.2 Give NPCs Audio Personality

If a character has strong personality traits, make sure they come through inthe voice as well It is possible to create humorous contrasts between voice

and appearance (as in the case of Daxter from Jak and Daxter, discussed in

Chapter 2; see Figure 7.10)

7.3.3 Use Voice (and Music) as an Emotional Regulator

Character voices can make a player calmer, more enthusiastic, triumphant, souse voice to shape a player’s emotional experience of game play: light-heartedwords after an intense battle sequence from a sidekick, for example, or a gruff

F I G U R E

7.9

Final Fantasy X uses vocal cues to heighten the player’s sense of characters’ relationships.

Daxter (from Jak and Daxter) has a dominant voice and mannerisms and a small body (see Clip 2.7) Jak and Daxter: The Precursor Legacy is a registered trademark of Sony

Computer Entertainment America Inc Created and developed by Naughty Dog, Inc.

F I G U R E

7.10

Trang 12

7.5 INTERVIEW: MIT MEDIA LAB’S ZEYNEP INANOGLU AND RON CANEEL

To respond in real time with appropriate emotion, a character needs to know how aplayer is feeling Designers can fake this social awareness to some degree becausemuch is known about the state of the player from the game engine itself—did theplayer just triumph, get badly beaten, and so forth However, there is work beingdone on alternative methods for assessing player emotion Speech researchershave been working for years to be able to detect the traces of emotions in voices.Increases in processing power and in understanding of emotion cues in voices isbeginning to lead to results

pep talk from a guide or mentor’ if things went badly Consider using music

to bolster the effects of an NPC’s words, as well as to help manage playeremotions as game play unfolds

7.3.4 Voice Checklist

When specifying the audio for each character in a game, take a moment toconsider each type of social cue As audio assets are created, revisit the criteria

to see if the desired qualities are coming through:

• Emotional state How is this character feeling right now? In general?

Toward the player? Toward other NPCs?

• Social status and context What is this character’s relationship to the others

in the action? In general? What about right now?

• Interaction logistics with the player and other characters How does the

character acknowledge the actions and reactions of other characters?

and Ron Caneel

Zeynep Inanoglu and Ron Caneel, graduate students at the MIT Media Laboratory,have created a program and interface for detecting the emotional content of voicemessages The system, called Emotive Alert, looks for vocal patterns indicatingvalence (positive or negative feelings), activation (level of energy), formality, andurgency The system is meant to allow the user to sort and prioritize messages and

to alert the user to those that are most urgent

Trang 13

Q: What was your inspiration for creating Emotive Alert?

Emotive Alert was mainly inspired by a seminar that we both attended last spring (2004).Both of us had various experiences working with speech signals, so when we came upwith the idea, Professor Rosalind Picard, who was giving the seminar, encouraged us totake on this project It also helped that Zeynep had access to her group’s voicemailsystem and was already using the voicemail data in other projects

Q: How did you choose the emotions to analyze from the messages?

In addition to the classical valence-arousal dimensions (Russell 1980) (happy/sad andexcited/calm) we chose urgency and formality since these are more interesting to look at

in the voicemail domain Since our approach only analyzes prosodic speech features(intonation, perceived loudness, rhythm) we hoped that these features would vary suffi-ciently in the dimensions that we chose

Q: Could the method you’ve evolved for analyzing the messages be helpful for analyzing “trash talk” among players in an online game-play environment?

Our method can be retrained to detect variances from a given speaker’s normal speakingstyle Acoustically, one would hope that these variances imply unusual behavior(i.e., trash talk in games) However, to make such systems reliable, a key-word spottingcapability should also be incorporated along with acoustic tracking

Q: Where do you think voice analysis and synthesis are heading next? Will there

be effective real-time analysis of emotion in conversation? What about lifelike synthesis of emotion in voices?

There is a lot of room to grow in both emotional synthesis and analysis Effective time analysis of emotion in conversation is a possibility, depending on what emotionalcategories we are tracking The problem is not only an issue of implementation but also

real-of emotions theories and available emotion data to train these systems on

Emotive Alert analyzes the pitch and energy of a voicemail message, applying emotion models

to suggest the predominant emotional tone of the message to the user.

F I G U R E

7.11

Trang 14

7.8 FURTHER READING

This chapter described social qualities of voices, including emotion, social context,and identity, and the handling of social logistics in interactions Design discussionincluded the power of ongoing vocal feedback, a missed opportunity in makingNPCs even more lifelike and engaging, as well as ways to incorporate other socialcues into character vocal design Part IV shifts focus to particular social functionsthat characters have in games and how these should affect design thinking

Each person should capture a brief segment from a movie or television

show in which two characters are speaking Take turns listening to (not

watching) these brief snippets of dialogue in a group and have everyonetry to identify the relative social status and relationship of the char-acters as well as their personality traits, emotional state, and as much ofthe social context as possible If there are members of the group fluent

in two languages, they should bring snippets from their second guage See if the group can identify status, personality, and emotionsregardless of understanding the words Discuss what it is that you arehearing and which cues are the most legible and accurate (e.g., that thegroup can most easily identify and agree upon)

Emotion and Reason

Damasio, A R 1994 Descartes’ Error: Emotion, Reason, and the Human Brain.

New York: Quill (an Imprint of HarperCollins Publishers)

Russell, J A 1980 A circumplex model of affect Journal of personality and socialpsychology 39(1-sup-6) Dec 1980, 1161–1178

Voice and Emotion

Bachorowski, J 1999 Vocal expression and perception of emotion Current

Direc-tions in Psychological Science 8(2):53–57.

Kappas, A., U Hess, and K R Scherer 1991 Voice and emotion In Fundamentals

of Nonverbal Behavior, eds R S Feldman and B Rimé, 200–238 Cambridge:

Cambridge University Press

Massaro, D W., and P B Egan 1996 Perceiving affect from the voice and the face.Psychonomic Bulletin & Review, 3(2), 215–221

Trang 15

ISCA 2003 Speech Communication 40 (1 and 2) April Special issues on emotion

and speech based upon ISCA Speech and Emotion workshop

van Bezooyen, R 1984 Characteristics and Recognizability of Vocal Expressions of

Emotion Dordrecht, Holland: Foris Publications.

Music and Voice

Juslin, P N., and P Laukka 2003 Communication of emotions in vocal expression

and music performance: Different channels, same code? Psychological Bulletin

129(5):770–814

Voice and Social Characteristics

Tepper, D T., Jr., and R F Haase 2001 Verbal and nonverbal communication of

facilitative conditions In Helping Skills: The Empirical Foundation ed C E Hill.

Washington, DC: American Psychological Association

Tusing, K., and J Dillard 2000 The sounds of dominance: Vocal precursors of

per-ceived dominance during interpersonal influence Human Communication Research

26:148–171

Modeling Users from Voice

Fernandez, R., and R W Picard 2003 Modeling Drivers’ Speech Under Stress

Speech Communication 40:145–149.

Speech Synthesis

Burkhardt, F., and W F Sendlmeier 2000 Verification of Acoustical Correlates of

Emotional Speech Using Formant Synthesis In Proceeding ISCA workshop (ITRW)

on Speech and Emotion, Belfast 2000 http://www.qub.ac.uk/en/isca/proceeding Cahn, J E 1990 The generation of affect in synthesized speech Journal of the

American Voice I/O Society 8 (July):1–19.

Campbell, N 2004 Getting to the Heart of the Matter: Speech is More Than Just the

Expression of Text or Language LREC Keynote http://feast.his.atr.jp/nick/cv.html.

Murray, I., and J Arnott 1993 Toward the simulation of emotion in synthetic

speech: A review of the literature on human vocal emotion Journal Acoustical

Society of America 2:1097–1108.

Linguistics

Clark, H H 1996 Using Language Cambridge: Cambridge University Press.

Tiêu đề	The Voice
Trường học	Standard University
Chuyên ngành	Game Design
Thể loại	Essay
Năm xuất bản	2006
Thành phố	City Name

Định dạng
Số trang	30
Dung lượng	897,1 KB