1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

The oxford handbook of computational and mathematical psychology (oxford library of psychology) 1st edition {PRG} 2015

425 140 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 425
Dung lượng 5 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Specifically, the chapters cover the key developments in elementary cognitive mechanisms e.g., signal detection, information processing, reinforcement learning, basic cognitive skills e.

Trang 1

The Oxford Handbook of

Trang 2

The Oxford Handbook of Computational and

Mathematical Psychology

Trang 3

Personality and Social Psychology

Kay Deaux and Mark Snyder

Trang 5

Oxford University Press is a department of the University of

Oxford It furthers the University’s objective of excellence in research,

scholarship, and education by publishing worldwide.

Oxford New York

Auckland Cape Town Dar es Salaam Hong Kong Karachi

Kuala Lumpur Madrid Melbourne Mexico City Nairobi

New Delhi Shanghai Taipei Toronto

With offices in

Argentina Austria Brazil Chile Czech Republic France Greece

Guatemala Hungary Italy Japan Poland Portugal Singapore

South Korea Switzerland Thailand Turkey Ukraine Vietnam

Oxford is a registered trademark of Oxford University Press

in the UK and certain other countries.

Published in the United States of America by

Oxford University Press

198 Madison Avenue, New York, NY 10016

c

 Oxford University Press 2015

All rights reserved No part of this publication may be reproduced, stored in

a retrieval system, or transmitted, in any form or by any means, without the prior

permission in writing of Oxford University Press, or as expressly permitted by law,

by license, or under terms agreed with the appropriate reproduction rights organization.

Inquiries concerning reproduction outside the scope of the above should be sent to the

Rights Department, Oxford University Press, at the address above.

You must not circulate this work in any other form

and you must impose this same condition on any acquirer.

Library of Congress Cataloging-in-Publication Data

Oxford handbook of computational and mathematical psychology / edited by Jerome R Busemeyer, Zheng Wang, James T Townsend, and Ami Eidels.

pages cm – (Oxford library of psychology)

Includes bibliographical references and index.

ISBN 978-0-19-995799-6

1 Cognition 2 Cognitive science 3 Psychology–Mathematical models.

4 Psychometrics I Busemeyer, Jerome R.

Trang 6

Dedicated to the memory of

Dr William K Estes (1919–2011) and Dr R Duncan Luce (1925–2012)

Two of the founders of modern mathematical psychology

Trang 8

S H O R T C O N T E N T S

Oxford Library of Psychology ix

Trang 10

OX F O R D L I B R A R Y O F P S YC H O L O G Y

The Oxford Library of Psychology, a landmark series of handbooks, is published

by Oxford University Press, one of the world’s oldest and most highly respected publishers, with a tradition of publishing significant books in psychology The

ambitious goal of the Oxford Library of Psychology is nothing less than to span

a vibrant, wide-ranging field and, in so doing, to fill a clear market need.

Encompassing a comprehensive set of handbooks, organized hierarchically,

the Library incorporates volumes at different levels, each designed to meet a

distinct need At one level are a set of handbooks designed broadly to survey the major subfields of psychology; at another are numerous handbooks that cover important current focal research and scholarly areas of psychology in depth and

detail Planned as a reflection of the dynamism of psychology, the Library will

grow and expand as psychology itself develops, thereby highlighting significant new research that will impact on the field Adding to its accessibility and ease

of use, the Library will be published in print and, later on, electronically.

The Library surveys psychology’s principal subfields with a set of handbooks

that capture the current status and future prospects of those major plines The initial set includes handbooks of social and personality psychology, clinical psychology, counseling psychology, school psychology, educational psychology, industrial and organizational psychology, cognitive psychology, cognitive neuroscience, methods and measurements, history, neuropsychology, personality assessment, developmental psychology, and more Each handbook undertakes to review one of psychology’s major subdisciplines with breadth, comprehensiveness, and exemplary scholarship In addition to these broadly-

subdisci-conceived volumes, the Library also includes a large number of handbooks

designed to explore in depth more specialized areas of scholarship and research, such as stress, health and coping, anxiety and related disorders, cognitive development, or child and adolescent assessment In contrast to the broad coverage of the subfield handbooks, each of these latter volumes focuses on

an especially productive, more highly focused line of scholarship and research.

Whether at the broadest or most specific level, however, all of the Library

handbooks offer synthetic coverage that reviews and evaluates the relevant past and present research and anticipates research in the future Each handbook in

the Library includes introductory and concluding chapters written by its editor

to provide a roadmap to the handbook’s table of contents and to offer informed anticipations of significant future developments in that field.

Trang 11

An undertaking of this scope calls for handbook editors and chapter authors who are established scholars in the areas about which they write Many of the nation’s and world’s most productive and best-respected psychologists have

agreed to edit Library handbooks or write authoritative chapters in their areas

in the Library the information they seek on the subfield or focal area of

psychology in which they work or are interested.

Befitting its commitment to accessibility, each handbook includes a comprehensive index, as well as extensive references to help guide research.

And because the Library was designed from its inception as an online as well

as print resource, its structure and contents will be readily and rationally

searchable online Further, once the Library is released online, the handbooks

will be regularly and thoroughly updated.

In summary, the Oxford Library of Psychology will grow organically to

provide a thoroughly informed perspective on the field of psychology, one that reflects both psychology’s dynamism and its increasing interdisciplinarity.

Once published electronically, the Library is also destined to become a

uniquely valuable interactive tool, with extended search and browsing capabilities, As you begin to consult this handbook, we sincerely hope you will share our enthusiasm for the more than 500-year tradition of Oxford University Press for excellence, innovation, and quality, as exemplified by the

Oxford Library of Psychology.

Peter E Nathan Editor-in-Chief

Oxford Library of Psychology

Trang 12

A B O U T T H E E D I T O R S

Jerome R Busemeyeris Provost Professor of Psychology at Indiana University He

was the president of Society for Mathematical Psychology and editor of the Journal

of Mathematical Psychology His theoretical contributions include decision field

theory and, more recently, pioneering the new field of quantum cognition.

Zheng Wang is Associate Professor at the Ohio State University and directs the Communication and Psychophysiology Lab Much of her research tries

to understand how our cognition, decision making, and communication are contextualized.

James T Townsend is Distinguished Rudy Professor of Psychology at Indiana University He was the president of Society for Mathematical Psychology and editor

of the Journal of Mathematical Psychology His theoretical contributions include

systems factorial technology and general recognition theory.

Ami Eidels is Senior Lecturer at the School of Psychology, University of Newcastle, Australia, and a principle investigator in the Newcastle Cognition Lab His research focuses on human cognition, especially visual perception and attention, combined with computational and mathematical modeling.

Trang 14

Department of Psychological and Brain Sciences

University of California, Santa Barbara

and Brain Sciences

Cognitive Science Program

Samuel J Gershman

Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridge, MA

Thomas L Griffiths

Department of PsychologyUniversity of California, BerkeleyBerkeley, CA

Todd M Gureckis

Department of PsychologyNew York UniversityNew York, NY

Robert X D Hawkins

Department of Psychologicaland Brain SciencesIndiana UniversityBloomington, IN

Andrew Heathcote

School of PsychologyUniversity of NewcastleCallaghan, NSWAustralia

Marc W Howard

Department of Psychologicaland Brain SciencesCenter for Memory and BrainBoston University

Boston, MA

Brett Jefferson

Department of Psychologicaland Brain SciencesIndiana UniversityBloomington, IN

Michael N Jones

Department of Psychological and Brain SciencesIndiana University

Bloomington, IN

Trang 15

Vanderbilt Vision Research Center

Center for Integrative and

Vanderbilt Vision Research Center

Center for Integrative and

Center for Adaptive Rationality (ARC)

Max Planck Institute for Human

Development

Berlin, Germany

Emmanuel Pothos

Department of PsychologyCity University LondonLondon, UK

Babette Rae

School of PsychologyUniversity of NewcastleCallaghan, NSWAustralia

Roger Ratcliff

Department of PsychologyThe Ohio State UniversityColumbus, OH

Tadamasa Sawada

Department of PsychologyHigher School of EconomicsMoscow, Russia

Jeffrey D Schall

Department of PsychologyVanderbilt Vision Research CenterCenter for Integrative andCognitive NeuroscienceVanderbilt UniversityNashville, TN

Philip Smith

School of Psychological SciencesThe University of MelbourneParkville, VIC

Australia

Fabian A Soto

Department of Psychologicaland Brain SciencesUniversity of California, Santa BarbaraSanta Barbara, CA

Joshua B Tenenbaum

Department of Brain and Cognitive SciencesMassachusetts Institute of TechnologyCambridge, MA

James T Townsend

Department of Psychologicaland Brain SciencesCognitive Science ProgramIndiana UniversityBloomington, IN

Joachim Vandekerckhove

Department of Cognitive SciencesUniversity of California, IrvineIrvine, CA

Wolf Vanpaemel

Faculty of Psychologyand Educational SciencesUniversity of LeuvenLeuven, Belgium

Trang 18

C O N T E N T S

1 Review of Basic Mathematical Concepts Used in Computational and

Jerome R Busemeyer, Zheng Wang, Ami Eidels, and James T Townsend

Part IElementary Cognitive Mechanisms

F Gregory Ashby and Fabian A Soto

3 Modeling Simple Decisions and Applications Using a Diffusion

Roger Ratcliff and Philip Smith

4 Features of Response Times: Identification of Cognitive Mechanisms

Daniel Algom, Ami Eidels, Robert X D Hawkins, Brett Jefferson, and

James T Townsend

Todd M Gureckis and Bradley C Love

Part IIBasic Cognitive Skills

6 Why Is Accurately Labeling Simple Magnitudes So Hard? A Past,

Chris Donkin, Babette Rae, Andrew Heathcote, and Scott D Brown

7 An Exemplar-Based Random-Walk Model of Categorization and

Robert M Nosofsky and Thomas J Palmeri

Amy H Criss and Marc W Howard

Part IIIHigher Level Cognition

Joseph L Austerweil, Samuel J Gershman, Joshua B Tenenbaum, and

Thomas L Griffiths

Trang 19

10 Models of Decision Making under Risk and Uncertainty 209

Timothy J Pleskac, Adele Diederich, and Thomas S Wallsten

Michael N Jones, Jon Willits, and Simon Dennis

Tadamasa Sawada, Yunfeng Li, and Zygmunt Pizlo

Part IVNew Directions

John K Kruschke and Wolf Vanpaemel

Joachim Vandekerckhove, Dora Matzke, and Eric-Jan Wagenmakers

Thomas J Palmeri, Jeffrey D Schall, and Gordon D Logan

16 Mathematical and Computational Modeling in

Richard W J Neufeld

Jerome R Busemeyer, Zheng Wang, and Emmanuel Pothos

Trang 20

P R E FAC E

Computational and mathematical psychology has enjoyed rapid growth over

the past decade Our vision for the Oxford Handbook of Computational and

Mathematical Psychology is to invite and organize a set of chapters that review

these most important developments, especially those that have impacted— and will continue to impact—other fields such as cognitive psychology, developmental psychology, clinical psychology, and neuroscience Together with a group of dedicated authors, who are leading scientists in their areas,

we believe we have realized our vision Specifically, the chapters cover the key developments in elementary cognitive mechanisms (e.g., signal detection, information processing, reinforcement learning), basic cognitive skills (e.g., perceptual judgment, categorization, episodic memory), higher-level cognition (e.g., Bayesian cognition, decision making, semantic memory, shape perception), modeling tools (e.g., Bayesian estimation and other new model comparison methods), and emerging new directions (e.g., neurocognitive modeling, applications to clinical psychology, quantum cognition) in computation and mathematical psychology.

An important feature of this handbook is that it aims to engage readers with various levels of modeling experience Each chapter is self-contained and written by authoritative figures in the topic area Each chapter is designed to be

a relatively applied introduction with a great emphasis on empirical examples

(see New Handbook of Mathematical Psychology (2014) by Batchelder, Colonius,

Dzhafarov, and Myung for a more mathematically foundational and less applied presentation) Each chapter endeavors to immediately involve readers, inspire them to apply the introduced models to their own research interests, and refer them to more rigorous mathematical treatments when needed First, each chapter provides an elementary overview of the basic concepts, techniques, and models

in the topic area Some chapters also offer a historical perspective of their area

or approach Second, each chapter emphasizes empirical applications of the models Each chapter shows how the models are being used to understand human cognition and illustrates the use of the models in a tutorial manner Third, each chapter strives to create engaging, precise, and lucid writing that inspires the use

of the models.

The chapters were written for a typical graduate student in virtually any area of psychology, cognitive science, and related social and behavioral sciences, such as consumer behavior and communication We also expect it to be useful for readers ranging from advanced undergraduate students to experienced faculty members and researchers Beyond being a handy reference book, it should be beneficial as

Trang 21

a textbook for self-teaching, and for graduate level (or advanced undergraduate level) courses in computational and mathematical psychology.

We would like to thank all the authors for their excellent contributions Also

we thank the following scholars who helped review the book chapters in addition

to the editors (listed alphabetically): Woo-Young Ahn, Greg Ashby, Scott Brown, Cody Cooper, Amy Criss, Adele Diederich, Chris Donkin, Yehiam Eldad, Pegah Fakhari, Birte Forstmann, Tom Griffiths, Andrew Heathcote, Alex Hedstrom, Joseph Houpt, Marc Howard, Matt Irwin, Mike Jones, John Kruschke, Peter Kvam, Bradley Love, Dora Matzke, Jay Myung, Robert Nosofsky, Tim Pleskac, Emmanuel Pothos, Noah Silbert, Tyler Solloway, Fabian Soto, Jennifer Trueblood, Joachim Vandekerckhove, Wolf Vanpaemel, Eric-Jan Wagenmakers, and Paul Williams The authors and reviewers’ effort ensure our confidence in the high quality of this handbook.

Finally, we would like to express how much we appreciate the outstanding assistance and guidance provided by our editorial team and production team at Oxford University Press The hard work provided by Joan Bossert, Louis Gulino, Anne Dellinger, A Joseph Lurdu Antoine and the production team of Newgen Knowledge Works Pvt Ltd., and others at the Oxford University Press are essential for the development of this handbook It has been a true pleasure working with this team!

Jerome R Busemeyer

Zheng Wang James T Townsend

Ami Eidels December 16, 2014

Trang 22

1 Review of Basic Mathematical Concepts Used in Computational and

expectations, maximum likelihood estimation

We have three ways to build theories to explain

and predict how variables interact and relate to each

other in psychological phenomena: using natural

verbal languages, using formal mathematics, and

using computational methods Human intuitive

and verbal reasoning has a lot of limitations For

example, Hintzman (1991) summarized at least

10 critical limitations, including our incapability

to imagine how a dynamic system works Formal

models, including both mathematical and

compu-tational models, can address these limitations of

human reasoning

Mathematics is a “radically empirical” science

(Suppes, 1984, p.78), with consistent and rigorous

evidence (the proof ) that is “presented with a

completeness not characteristic of any other area

of science” (p.78) Mathematical models can help

avoid logic and reasoning errors that are typically

encountered in human verbal reasoning The

complexity of theorizing and data often requires

the aid of computers and computational languages

Computational models and mathematical models

can be thought of as a continuum of a theorizing

process Every computational model is based on

a certain mathematical model, and almost every

mathematical model can be implemented as a

computational model

Psychological theories may start as a verbaldescription, which then can be formalized usingmathematical language and subsequently codedinto computational language By testing the modelsusing empirical data, the model fitting outcomescan provide feedback to improve the models, as well

as our initial understanding and verbal descriptions.For readers who are newcomers to this excitingfield, this chapter provides a review of basicconcepts of mathematics, probability, and statisticsused in computational and mathematical modeling

of psychological representation, mechanisms, andprocesses See Busemeyer and Diederich (2010) andLewandowsky and Ferrel (2010) for a more detailedpresentations

Mathematical Functions

Mathematical functions are used to map a set of

points called the domain of the function into a set

of points called the range of the function such that

only one point in range is assigned to each point

in the domain.1 As a simple example, the linear

function is defined as f (x)= a·x where the constant

a is the slope of a straight line In general, we use the notation f (x) to represent a function f that maps

a domain point x into a range point y = f (x) If a

Trang 23

function f (x) has the property that each range point

y can only be reached by a single unique domain

point x, then we can define the inverse function

f−1(y) = x that maps each range point y = f (x)

back to the corresponding domain point x For

example, the quadratic function is defined as the

map f (x) = x2= x · x, and if we pick the number

x = 3.5, then f (3.5) = 3.52= 12.25 The quadratic

function is defined on a domain of both positive

and negative real numbers, and it does not have an

inverse because, for example, (− x)2= x2and so

there are two ways to get back from each range point

y to the domain However, if we restrict the domain

to the non-negative real numbers, then the inverse

of x2exists and it is the square root function defined

on non-negative real numbers √y=√x2= x There

are, of course, a large number of functions used

in mathematical psychology, but some of the most

popular ones include the following

The power function is denoted x a where the

variable x is a positive real number and the constant

a is called the power A quadratic function can be

obtained by setting a= 2 but we could instead

choose a= 0.50, which is the square root function

produces the reciprocal x−1 = 1

x, or we could

choose any real number such as a= 1.37 Using

a calculator, one finds that if x = 15.25 and a=

1.37, then 15.251.37 = 41.8658 One important

property to remember about power functions is that

x a · x b = x a +b and x b · y b = (x · y) band(x a ) b = x ab

Also note that x0= 1 Note that when working

with the power function, the variable x appears

in the base, and the constant a appears as the

power

The exponential function is denoted e x where

the exponent x is any real valued variable and

the constant base e stands for a special number

that is approximately e ∼= 2.7183 Sometimes it is

more convenient to use the notation e x = exp(x)

instead Using a calculator, we can calculate e2.5=

2.71832.5= 12.1825 Note that the exponent can

be negative −x < 0, in which case we can write

e x If x = 0, then e0= 1 The exponential

function always returns a positive value, e x > 0, and

it approaches zero as x approaches negative infinity.

More complex forms of the exponential are often

used For example, you will later see the function

e

x −μ

σ

 2

, where x is a variable and μ and σ are

constants In this case, it is more convenient to

you to first compute the squared deviation y=

e y = e x +yand(e x ) a = e a ·x In contrast to the power

function, the base of the exponential is a constantand the exponent is a variable

The (natural) log function is denoted ln (x) for positive values of x For example, using a calculator,

normally use the natural base e= 2.7183 If instead

we used base 10, then log10(10)= 1.) The log

function obeys the rules ln (x ·y) = ln(x)+ln(y) and

ln (x a)= a · ln(x) The log function is the inverse

of the exponential function: ln ( exp (x)) = x and

the exponential function is the inverse of the log

function exp ( log (x)) = x The function a x where a

is a constant and x is a variable can be rewritten in terms of the exponential function: define b = ln(a), then e bx = (e b)x = exp(ln(a)) x = a x

Figure 1.1 illustrates the power, exponential, andlog functions using different coefficient values forthe function As can be seen, the coefficient changesthe curve of the functions

Last but not least are the trigonometric functionsbased on a circle Figure 1.2 shows a circle with

its center located at coordinates (0, 0) in an (X , Y )

plane Now imagine a line segment of radius

r = 1 that extends from the center point to thecircumference of the circle This line segmentintersects with the circumference at coordinates

( cos (t · π),sin(t · π)) in the plane The coordinate

the circumference down onto the the X axis, and the point sin (t · π) is the projection of the point

on the circumference to the Y axis The variable

t (which, for example can be time) moves this

point around the circle, with positive values movingthe point counter clockwise, and negative values

equals one-half cycle around the circle, and 2π is

the period of time it takes to go all the way aroundonce The two functions are related by a translation(called the phase) in time: cos(t · π + (π/2)) = sin (t ·π) Note that cos is an even function because cos (t · π) = cos(−t · π), whereas sin is an odd

function because−sin(t · π) = sin( − t · π) Also

note that these functions are periodic in the sense

that for example cos (t · π) = cos(t · π + 2 · k · π) for any integer k We can generalize these two

functions by changing the frequency and the phase.For example, cos(ω · tπ + θ) is a cosine function

with a frequency ω (changing the time it takes

Trang 24

Log function: y = log(x)

Fig 1.1 Examples of three important functions, with various parameter values From left to right: Power function, exponential

function, and log function See text for details.

–1 0

Fig 1.2 Left panel illustrates a point on a unit circle with a radius equal to one Vertical line shows sine, horizontal line shows cosine.

Right panel shows sine as a function of time The point on the Y axis of the right panel corresponds to the point on the Y axis of the

left panel.

to complete a cycle) and a phaseθ (advancing or

delaying the initial value at time t= 0)

Derivatives and Integrals

A derivative of a continuous function is the rate

of change or the slope of a function at some point

Suppose f (x) is some continuous function For a

small increment, the change in this function is

is the change divided by the increment df (x)  =

f (x) −f (x−)

the derivative of the function at x, denoted as

d

derived in calculus (see Stewart (2012) or anycalculus textbook for an introduction to calculus).For example, in calculus it is shown that dx d e c ·x=

c · e c ·x, which says that the slope of the exponential

function at any point x is proportional to the

exponential function itself As another example, it

is shown in calculus that dx d x a = a · x a−1, which is

the derivative of the power function For example,

the derivative of a quadratic function a·x2is a linearfunction 2· a · x, and the derivative of the linear function a · x is the constant a The derivative of

the cosine function is dt d cos (t) = −sin(t) and the

derivative of sine is dt d sin (t) = cos(t).

Trang 25

Fig 1.3 Illustration of the power function and its derivatives The lines in both panels mark the power-function line The slope of the

dotted line (the tangent to the function) is given by the derivative of that function (in this example, at x= 1).

Figure 1.3 illustrates the derivative of the power

function at the value x = 1 for two different

coefficients The curved line shows the power

function, and the straight line touches the curve at

x= 1 The slope of this line is the derivative

The integral of a continuous function is the area

under the curve within some interval (see Fig 1.4)

Suppose f (x) is a continuous function of x within

the interval [a, b] A simple way to approximate this

area is to divide the interval into N very small steps,

with a small increment being a step:

, and finally sum all the areas of the

rectangles to obtain an approximate area under the

As the number of intervals becomes arbitrarily large

and the increments get arbitrarily small so that

If we allow the upper limit of the integral to be a

variable, say z, then the integral becomes a function

of the upper limit, which can be written as F (z) =

z

of an integral? Let’s examine the change in the area

divided by the increment

theorem can then be used to find the integral of

a function For example, the integral ofz

0x a dx=

(a + 1)−1z a+1becausedz d (a + 1)−1z a+1= z a Theintegral ofz

of-ple, suppose V (t) represents the strength of a neural

connection between an input and an output at time

guiding the learning process A simple, discrete time

linear model of learning can be V (t) = (1 − α) ·

rate parameter We can rewrite this as a differenceequation:

= − α · V (t − 1) + α · x (t)

= − α · (V (t − 1) − x (t)).

This model states that the change in strength at time

t is proportional to the negative of the error signal,

which is defined as the difference between the

Trang 26

8 x 10

2 4 6

0

Fig 1.4 The integral of the function is the area under curve It can be approximated as the sum of the areas of the rectangles (left

panel) As the rectangles become narrower (middle), the sum of their areas converges to the true integral (right).

previous strength and the new reward If we wish

to describe learning as occurring more continuously

in time we can introduce a small time incrementt

into the model so that it states

= − α · t · (V (t − t) − x (t)),

which says that the change in strength is

propor-tional to the negative of the error signal, with the

constant of proportionality now modified by the

time increment Dividing both sides by the time

incrementt we obtain

dV (t)

t = −α · (V (t − t) − x (t)),

and now if we allow the time increment to approach

zero in the limit, t → 0, then the preceding

equation converges to a limit that is the differential

equation

d

which states that the rate of change in strength is

proportional to the negative of the error signal

Sometimes we can solve the differential equation

for a simple solution For example, the solution to

the equationdt d V (t) = −α · V (t) + c is V (t) = c

a

e −α·t because when we substitute this solution back

into the differential equation, it satisfies the equality

of the differential equation

A stochastic difference equation is frequently used

in cognitive modeling to represent how a state

changes across time when it is perturbed by noise

For example, if we assume that the strength of

a connection changes according to the precedinglearning model, but with some noise (denoted

stochastic difference equation

Note that the noise is multiplied by√

t instead

required so that the effect of the noise does not

noise remains proportional tot (which is the key

characteristic of Brownian motion processes) SeeBattacharya and Waymire (2009) for an excellentbook on stochastic processes

Elementary Probability Theory

Probability theory describes how to assign abilities to events See Feller (1968) for a review ofprobability theory We start with a sample space that

prob-is a set denoted as , which contains all the unique

outcomes that can be realized For simplicity, wewill assume (unless noted otherwise) that the samplespace is finite (There could be a very large number

of outcomes, but the number is finite.) For example,

if a person takes two medical tests, test A and test

B, and each test can be positive or negative, thenthe sample space contains four mutually exclusiveand exhaustive outcomes: all four combinations ofpositive and negative test results from tests A and B.Figure 1.5 illustrates the situation for this simple

example An event, such as the event A (e.g., test A

is positive) is a subset of the sample space Suppose

for the moment that A, B are two events The disjunctive event A or B (e.g., test A is positive

or test B is positive) is represented as the union

A ∪B The conjunctive event A and B (e.g., test A is

Trang 27

Test A

Test B

Negative Negative

Positive Positive

A

A ∩ B B

A ∩ B

Fig 1.5 Two ways to illustrate the probability space of events A and B The contingency table (left) and the Venn diagram (right)

correspond in the following way: Positive values on both tests in the table (the conjunctive event, A∩B) are represented by the overlap

of the circles in the Venn diagram Positive values on one test but not on the other in the table (the XOR event, A positive and B negative, or vice versa) are represented by the nonoverlapping areas of circles A and B Finally, tests that are both negative (upper left entry in the table) correspond in the Venn diagram to the area within the rectangle (the so-called “sample space”) that is not occupied

by any of the circles.

positive and test B is positive) is represented as the

intersection A ∩ B The impossible event (e.g., test

A is neither positive nor negative), denoted , is an

empty set The certain event is the entire sample

denoted A.

A probability function p assigns a number

between zero to one to each event The impossible

event is assigned zero, and the certain event

is assigned one The other events are assigned

probabilities 0 ≤ p(A) ≤ 1 and p ¯A

= 1 −

p(A) However, these probabilities must obey the

following additive rule: If A ∩ B = then p(A ∪

mutually exclusive so that A ∩ B = ? The answer

is called the “or”, which follows from the previous

Suppose we learn that some event A has occurred,

and now we wish to define the new probability

for event B conditioned on this known event.

The conditional probability p(B|A) stands for the

probability of event B given that event A has

occurred, which is defined as p (B|A) = p(A∩B) p(A)

Similarly, p(A|B) = p (A∩B)

p (B) is the probability of

event A given that B has occurred Using the

definition of conditional probability, we can then

define the “and” rule for joint probabilities as

follows: the probability of A and B equals p(A∩B) =

An important theorem of probability is called

Bayes rule It describes how to revise one’s beliefs

based on evidence Suppose we have two mutually

exclusive and exhaustive hypotheses denoted H1

and H2 For example H1could be a certain disease is

present and H2is the disease is not present Define

the event D as some observed data that provide

evidence for or against each hypothesis, such as a

medical test result Suppose p(D|H1) and p (D|H2)

are known These are called the likelihood’s of thedata for each hypothesis For example, medicaltesting would be used to determine the likelihood

of a positive versus negative test result when thedisease is known to be present, and the likelihood

of a positive versus negative test would also beknown when the disease is not present We define

p(H1) and p(H2) as the prior probabilities ofeach hypothesis For example, these priors may

be based on base rates for disease present ornot Then according to the conditional probabilitydefinition

p(H1|D) = p(H1)p(D|H1)

The last line is Bayes’ rule The probability p (H1|D)

is called the posterior probability of the hypothesisgiven the data It reflects the revision from theprior produced by the evidence from the data

If there are M ≥ 2 hypotheses, then the rule isextended to be

We often work with events that are assigned

to numbers A random variable is a function thatassigns real numbers to events For example, aperson may look at an advertisement and thenrate how effective it is on a nine-point scale Inthis case, there are nine mutually exclusive andexhaustive categories to choose from on the rating

Trang 28

scale, and each choice is assigned a number (say,

1, 2, , or 9) Then we can define a random

variable X (R), which is a function that maps the

category event R onto one of the nine numbers.

For example, if the person chooses the middle

rating option, so that R = middle, then we assign

the event and instead write the random variable

simply as X For example, we can ask what is

the probability that the random variable is assigned

Then we assign a probability to each value of a

random variable by assigning it the probability of

the event that produces the value For example,

p(X= 5) equals the probability of the event that the

person picks the middle value Suppose the random

variable has N values

x1, x2, ,x i, ,x N Inour previous example with the rating scale, the

random variable had nine values The function

p(X = x i) (interpreted as the probability that

the person picks a choice corresponding to value

x i ) is called the probability mass function for the

random variable X This function has the following

Often we measure more than one random variable

For example, we could present an advertisement

and ask how effective it is for the participant

personally but also ask how effective the participant

believes it is for others Suppose X is the random

variable for the nine-point rating scale for self, and

let Y be the random variable for the nine-point

rating scale for others Then we can define a joint

probability that x j is selected for self and that y j is

selected for others These joint probabilities form

a two way 9× 9 table with with p(X = x i , Y = y j)

in each cell This joint probability function has the

Often, we work with random variables that have

a continuous rather than a discrete and finite

distri-bution, such as the normal distribution Suppose X

is a univariate continuous random variable In thiscase, the probability assigned to each real number iszero (there are uncountably infinite many of them

in any interval) Instead, we start by defining the

cumulative distribution function F (x) = p(X ≤ x) Then we define the probability density at each value

of x as the derivative of the cumulative probability function, f (x)= d

the density, we compute the probability of X falling

in some interval [a, b] as p (X ∈ [a,b]) =b

The increment f (x)· dx for the continuous random

variable is conceptually related to the mass function

 2, where

μ is the mean of the distribution and σ is the

standard deviation of the distribution The normaldistribution is popular because of the central limittheorem, which states that the sample mean ¯X =



variable X will approach normal as the number

of samples becomes arbitrarily large, even if the

original random variable X is not normal We often

work with sample means, which we expect to beapproximately normal because of the central limittheorem

Expectations

When working with random variables, we areoften interested in their moments (i.e., their means,variances, or correlations) See Hogg and Craig(1970) for a reference on mathematical statistics.These different moments are different concepts, butthey are all defined as expected values of the randomvariables The expectation of the random variable

is defined as E [X ]=i p(X = x i ) · x i for the

Trang 29

discrete case and it is defined as E [X ]=f (x) ·x·dx

for the continuous case The mean of a random

variable X , denoted μ Xis defined as the expectation

of the random variableμ X = E [X ] The variance

of a random variable X , denoted σ2

X, is defined asthe expectation of the squared deviation around the

mean:σ2

X = Var (X ) = E (X − μ)2 For example,

in the discrete case, this equalsσ2

(x i − μ X )2 The standard deviation is the square

root of the variance, σ X =σ2

X The covariance

between two random variables (X , Y ), denoted

σ XY, is defined by the expectation of the product

Often we need to combine two random variables

by a linear combination Z = a · X + b · Y , where

a, b are two constants For example, we may sum

two scores, a = 1 and b = 1, or take a difference

between two scores, a = 1,b = −1 There are two

important rules for determining the mean and the

variance of a linear transformation The expectation

operation is linear: E [a · X + b · Y ] = a · E [X ] +

b · E [Y ] The variance operator, however, is not

linear: var(a · X + b · Y ) = a2var(X ) + b2var(Y ) +

2ab · cov(X ,Y ).

Maximum Likelihood Estimation

Computational and mathematical models of

psychology contain parameters that need to be

estimated from the data For example, suppose a

person can choose to play or not play a slot machine

at the beginning of each trial The slot machine pays

the amount x(t) on trial t, but this amount is not

revealed until the trial is over Consider a simple

model that assumes that the probability of choosing

to gamble on trial t, denoted p(t), is predicted by

the following linear learning model

1+ e −β·V (t)

This model has two parametersα,β that must be

estimated from the data This is analogous to the

estimation problem one faces when using multiple

linear regression, where the linear regression is

the model and the regression coefficients are the

model parameters However, computational and

mathematical models, such as the earlier learning

model example, are nonlinear with respect to

the model parameters, which makes them morecomplicated, and one cannot use simple linearregression fitting routines

The model parameters are estimated from theempirical experimental data These experimentsusually consist of a sample of participants, and eachparticipant provides a series of responses to severalexperimental conditions For example, a study oflearning to gamble could obtain 100 choice trials

at each of 5 payoff conditions from each of 50participants One of the first issues for modeling

is the level of analysis of the data On the onehand, a group-level analysis would fit a model

to all the data from all the participants ignoringindividual differences This is not a good idea

if there are substantial individual differences Onthe other hand, an individual level analysis wouldfit a model to each individual separately allowingarbitrary individual differences This introduces anew set of parameters for each person, which isunparsimonious A hierarchical model applies themodel to all of the individuals, but it includes amodel for the distribution of individual differences.This is a good compromise, but it requires a goodmodel of the distribution of individual differences.Chapter 13 of this book describes the hierarchicalapproach Here we describe the basic ideas offitting models at the individual level using a methodcalled maximum likelihood (see Myung, 2003for a detailed tutorial on maximum likelihoodestimation) Also see Hogg and Craig (1970)for the general properties of maximum likelihoodestimates

Suppose we obtain 100 choice trials (gamble,not gamble) from 5 payoff conditions from eachparticipant The above learning model has twoparameters (α,β) that we wish to estimate using

[x1, x2, ,x t, ,x500] , where each x t is zero (notgamble) or one (gamble) If we pick values for thetwo parameters (α,β), then we can insert these into

our learning model and compute the probability

of gambling, p (t), for each trial from the model Define p(x t , t) as the probability that the model predicts the value x t observed on trial t For example, if x t = 1, then p(x t , t) = p(t), but if x t= 0,

then p(x t , t)=1− p(t), where recall that p(t) is

the predicted probability of choosing the gamble.Then we compute the likelihood of the observed

sequence of data D given the model parameters

Trang 30

0 50 100 150 200 250 300

Fig 1.6 Example of maximum likelihood estimation The histograms describe data sampled from a Gamma distribution with scale

and shape parameters both equal to 20 Using maximum likelihood we can estimate the parameter values of a Gamma distribution that best fits the sample (they turn out to be 20.4 and 19.5, respectively), and plot its probability density function (solid line).

To make this computationally feasible, we use the

log likelihood instead

This likelihood changes depending on our choice

in computational software, such as MATLAB, R,

Gauss, and Mathematica can be used to find the

maximum likelihood estimates The log likelihood

is a goodness-of-fit measure Higher values indicate

better fit Actually the computer algorithms find

the minimum of the badness of fit measure

G2= −2 · LnL(D|α,β).

Maximum likelihood is not restricted to learning

models and it can be used to fit all kinds of models

For example, if we observe a response time on each

trial, and our model predicts the response time

for each trial, then the preceding equation can be

applied with x tequal to the observed response time

on a trial, and with p(x t , t) equal to the predicted

probability for the observed value of response time

on that trial Figure 1.6 shows an example in which

a sample of response time data (summarized in

the figure by a histogram) was fit by a gamma

distribution model for response time using two

model parameters

Now suppose we have two different competing

learning models The model we just described has

two parameters Suppose the competing model is

quite different and it is more complex with four

parameters Also suppose the models are not nested

so that it is not possible to compute the same

predictions for the simpler model using the more

complex model Then we can compare models byusing a Bayesian information criterion (BIC, seeWasserman, 2000, for review) This criterion isderived on the basis of choosing the model that

is most probable given the data (However, thederivation only holds asymptotically as the samplesize increases indefinitely.) For each model wewish to compare, we can compute a BIC index:

BIC model = G2

model + n model · ln(N ), where n model

equals the number of model parameters estimated

The BIC index is an index that balances model fitwith model complexity as measured by number ofparameters (Note however, that model complexity

is more than the number of parameters, see Chapter13) It is a badness-of-fit index, and so we choosethe model with the lowest BIC index See Chapter

14 for a detailed review on model comparison

Trang 31

perception In addition, we provide two chapters

on modeling tools, including Bayesian estimation

in hierarchical models and model comparison

methods We conclude the handbook with three

chapters on new directions in the field, including

neurocognitive modeling, mathematical and

com-putational modeling in clinical psychology, and

cognitive and decision models based upon quantum

probability theory

The models reviewed in the handbook make

use of many of the mathematical ideas presented

in this review chapter Probabilistic models

ap-pear in chapters covering signal detection theory

(Chapter 2), probabilistic models of cognition

(Chapter 9), decision theory (Chapters 10 and 17),

and clinical applications (Chapter 16)

Stochas-tic models (i.e., models that are dynamic and

probabilistic) appear in chapters covering

informa-tion processing (Chapter 4), percpetual judgment

(Chapter 6), and random walk/diffusion models

of choice and response time in various cogntive

tasks (Chapters 3, 7, 10, and 15) Learning and

memory models are reviewed in Chapters 5, 7, 8,

and 11 Models using vector spaces and geometry

are introduced in Chapters 11, 12, and 17

The basic concepts reviewed in this chapter

should be helpful for readers who are new to

math-ematical and computational models to jumpstart

reading the rest of the book In addition, each

chapter is self-contained, presents a tutorial style

introduction to the topic area exemplified by many

applications, and provides a specific glossary list ofthe basic concepts in the topic area We believe youwill have a rewarding reading experience

Note

1 This chapter is restricted to real numbers.

References

Battacharya, R N & Waymire, E C (2009) Stochastic processes

with applications (Vol 61) Philadelphia, PA: Siam.

Busemeyer, J R., & Diederich, A (2009) Cognitive modeling.

Thousand Oaks, CA: SAGE.

Cox, D R & Miller, H D (1965) The theory of stochastic

processes (Vol 134) Boca Raton, FL: CRC Press.

Feller, W (1968) on An introduction to probability theory and its

applications (3rd ed., Vol 1) New York, NY: Wiley.

Hintzman D.L (1991) Why are formal models useful in psychology? In W E Hockley & S Lewandowsky (Eds.),

Relating theory and data: Essays on human memory in honor of Bennet B Murdock (pp 39–56) Hillsdale, NJ: Erlbaum.

Hogg, R V., & Craig, A T (1970) Introduction to mathematical

statistics (3rd ed.) New York, NY: Macmillan.

Lewandowsky, S & Ferrel, S (2010) Computational modeling

in cognition: Principles and practice Thousand Oaks, CA:

SAGE.

Myung, I J (2003) Tutorial on maximum likelihood

estima-tion Journal of Mathematical Psychology, 47, 90–100 Stewart, J (2012) Calculus (7th ed.) Belmont, CA: Brooks/Cole Suppes, P (1984) Probabilistic metaphysics Oxford: Basil

Blackwell.

Wasserman, L (2000) Bayesian model selection and model

averaging Journal of Mathematical Psychology, 44(1), 92–

107.

Trang 32

P A R T

I

Elementary Cognitive

Mechanisms

Trang 34

2 Multidimensional Signal Detection Theory

F Gregory Ashby and Fabian A Soto

Abstract

Multidimensional signal detection theory is a multivariate extension of signal detection

theory that makes two fundamental assumptions, namely that every mental state is noisyand that every action requires a decision The most widely studied version is known as

general recognition theory (GRT) General recognition theory assumes that the percept oneach trial can be modeled as a random sample from a multivariate probability distributiondefined over the perceptual space Decision bounds divide this space into regions that areeach associated with a response alternative General recognition theory rigorously definesand tests a number of important perceptual and cognitive conditions, including perceptualand decisional separability and perceptual independence General recognition theory hasbeen used to analyze data from identification experiments in two ways: (1) fitting and

comparing models that make different assumptions about perceptual and decisional

processing, and (2) testing assumptions by computing summary statistics and checking

whether these satisfy certain conditions Much has been learned recently about the neuralnetworks that mediate the perceptual and decisional processing modeled by GRT, and thisknowledge can be used to improve the design of experiments where a GRT analysis is

anticipated

perceptual independence, identification, categorization

Introduction

Signal detection theory revolutionized

psy-chophysics in two different ways First, it

in-troduced the idea that trial-by-trial variability in

sensation can significantly affect a subject’s

perfor-mance And second, it introduced the field to the

then-radical idea that every psychophysical response

requires a decision from the subject, even when

the task is as simple as detecting a signal in the

presence of noise Of course, signal detection theory

proved to be wildly successful and both of these

assumptions are now routinely accepted without

question in virtually all areas of psychology

The mathematical basis of signal detection

theory is rooted in statistical decision theory, which

itself has a history that dates back at least severalcenturies The insight of signal detection theoristswas that this model of statistical decisions was also

a good model of sensory decisions The first signaldetection theory publication appeared in 1954(Peterson, Birdsall, & Fox, 1954), but the theorydid not really become widely known in psychologyuntil the seminal article of Swets, Tanner, and

Birdsall appeared in Psychological Review in 1961.

From then until 1986, almost all applications ofsignal detection theory assumed only one sen-sory dimension (Tanner, 1956, is the principalexception) In almost all cases, this dimensionwas meant to represent sensory magnitude For

a detailed description of this standard univariate

Trang 35

theory, see the excellent texts of either Macmillan

and Creelman (2005) or Wickens (2002) This

chapter describes multivariate generalizations of

signal detection theory

Multidimensional signal detection theory is a

multivariate extension of signal detection to cases in

which there is more than one perceptual dimension

It has all the advantages of univariate signal

detection theory (i.e., it separates perceptual and

decision processes) but it also offers the best existing

method for examining interactions among

percep-tual dimensions (or components) The most widely

studied version of multidimensional signal

detec-tion theory is known as general recognidetec-tion theory

(GRT; Ashby & Townsend, 1986) Since its

incep-tion, more than 350 articles have applied GRT to

a wide variety of phenomena, including

categoriza-tion (e.g., Ashby & Gott, 1988; Maddox & Ashby,

1993), similarity judgment (Ashby & Perrin, 1988),

face perception (Blaha, Silbert, & Townsend, 2011;

Thomas, 2001; Wenger & Ingvalson, 2002),

recog-nition and source memory (Banks, 2000; Rotello,

Macmillan, & Reeder, 2004), source monitoring

(DeCarlo, 2003), attention (Maddox, Ashby, &

Waldron, 2002), object recognition (Cohen, 1997;

Demeyer, Zaenen, & Wagemans, 2007),

per-ception/action interactions (Amazeen & DaSilva,

2005), auditory and speech perception (Silbert,

2012; Silbert, Townsend, & Lentz, 2009), haptic

perception (Giordano et al., 2012; Louw, Kappers,

& Koenderink, 2002), and the perception of sexual

interest (Farris, Viken, & Treat, 2010)

Extending signal detection theory to multiple

dimensions might seem like a straightforward

mathematical exercise, but, in fact, several new

conceptual problems must be solved First, with

more than one dimension, it becomes necessary

to model interactions (or the lack thereof ) among

those dimensions During the 1960s and 1970s,

a great many terms were coined that attempted

to describe perceptual interactions among separate

stimulus components None of these, however, were

rigorously defined or had any underlying theoretical

foundation Included in this list were perceptual

independence, separability, integrality, performance

parity, and sampling independence Thus, to be

useful as a model of perception, any multivariate

extension of signal detection theory needed to

provide theoretical interpretations of these terms

and show rigorously how they were related to one

another

Second, the problem of how to model decisionprocesses when the perceptual space is multidi-mensional is far more difficult than when there

is only one sensory dimension A standard detection-theory lecture is to show that almost anydecision strategy is mathematically equivalent tosetting a criterion on the single sensory dimension,then giving one response if the sensory value falls

signal-on signal-one side of this criterisignal-on, and the other respsignal-onse

if the sensory value falls on the other side Forexample, in the normal, equal-variance model, this

is true regardless of whether subjects base theirdecision on sensory magnitude or on likelihoodratio A straightforward generalization of this model

to two perceptual dimensions divides the perceptualplane into two response regions One response isgiven if the percept falls in the first region and theother response is given if the percept falls in thesecond region The obvious problem is that, unlike

a line, there are an infinite number of ways to divide

a plane into two regions How do we know which

of these has the most empirical validity?

The solution to the first of these two problems—that is, the sensory problem—was proposed byAshby and Townsend (1986) in the article thatfirst developed GRT The GRT model of sensoryinteractions has been embellished during the past

25 years, but the core concepts introduced by Ashbyand Townsend (1986) remain unchanged (i.e., per-ceptual independence, perceptual separability) Incontrast, the decision problem has been much moredifficult Ashby and Townsend (1986) proposedsome candidate decision processes, but at that timethey were largely without empirical support In theensuing 25 years, however, hundreds of studieshave attacked this problem, and today much isknown about human decision processes in percep-tual and cognitive tasks that use multidimensionalperceptual stimuli

Box 1 Notation

AiBj = stimulus constructed by setting

compo-nent A to level i and compocompo-nent B to level j

aibj= response in an identification experiment

signaling that component A is at level i and component B is at level j

X1= perceived value of component A

X2= perceived value of component B

Trang 36

Box 1 Continued

f ij (x1,x2) = joint likelihood that the perceived

value of component A is x1 and the perceived

value of component B is x2on a trial when the

presented stimulus is AiBj

g ij (x1) = marginal pdf of component A on trials

when stimulus AiBjis presented

r ij = frequency with which the subject

re-sponded Rj on trials when stimulus Si was

presented

P(R j|Si) = probability that response Rjis given

on a trial when stimulus Siis presented

General Recognition Theory

General recognition theory (see the Glossary for

key concepts related to GRT) can be applied to

virtually any task The most common applications,

however, are to tasks in which the stimuli vary

on two stimulus components or dimensions As

an example, consider an experiment in which

participants are asked to categorize or identify faces

that vary across trials on gender and age Suppose

there are four stimuli (i.e., faces) that are created by

factorially combining two levels of each dimension

In this case we could denote the two levels of the

gender dimension by A1(male) and A2(female) and

the two levels of the age dimension by B1(teen) and

B2(adult) Then the four faces are denoted as A1B1

(male teen), A1B2(male adult), A2B1(female teen),

and A2B2(female adult)

As with signal detection theory, a fundamental

assumption of GRT is that all perceptual systems

are inherently noisy There is noise both in the

stimulus (e.g., photon noise) and in the neural

systems that determine its sensory representation

(Ashby & Lee, 1993) Even so, the perceived value

on each sensory dimension will tend to increase as

the level of the relevant stimulus component

in-creases In other words, the distribution of percepts

will change when the stimulus changes So, for

example, each time the A1B1face is presented, its

perceived age and maleness will tend to be slightly

different

General recognition theory models the sensory

or perceptual effects of a stimulus AiBjvia the joint

probability density function (pdf ) f ij (x1, x2) (see

Box 1 for a description of the notation used in this

article) On any particular trial when stimulus AiBj

is presented, GRT assumes that the subject’s percept

can be modeled as a random sample from this

joint pdf Any such sample defines an ordered pair

(x1, x2), the entries of which fix the perceived value

of the stimulus on the two sensory dimensions.General recognition theory assumes that the subjectuses these values to select a response

In GRT, the relationship of the joint pdf to themarginal pdfs plays a critical role in determiningwhether the stimulus dimensions are perceptually

integral or separable The marginal pdf g ij (x1)simply describes the likelihoods of all possible

sensory values of X1 Note that the marginal pdfsare identical to the one-dimensional pdfs of classicalsignal detection theory

Component A is perceptually separable from

component B if the subject’s perception of A doesnot change when the level of B is varied Forexample, age is perceptually separable from gender

if the perceived age of the adult in our faceexperiment is the same for the male adult as forthe female adult, and if a similar invariance holdsfor the perceived age of the teen More formally, in

an experiment with the four stimuli, A1B1, A1B2,

A2B1, and A2B2, component A is perceptuallyseparable from B if and only if

g11(x1)= g12(x1) and g21(x1)= g22(x1)

Similarly, component B is perceptually separablefrom A if and only if

g11(x2)= g21(x2) and g12(x2)= g22(x2), (2)

for all values of x2 If perceptual separability fails

then A and B are said to be perceptually integral.

Note that this definition is purely perceptual since

it places no constraints on any decision processes

Another purely perceptual phenomenon is ceptual independence According to GRT, compo-

per-nents A and B are perceived independently instimulus AiBj if and only if the perceptual value

of component A is statistically independent ofthe perceptual value of component B on AiBj

trials More specifically, A and B are perceivedindependently in stimulus AiBjif and only if

f ij (x1, x2)= g ij (x1) g ij (x2) (3)

for all values of x1 and x2 If perceptual pendence is violated, then components A and Bare perceived dependently Note that perceptualindependence is a property of a single stimulus,whereas perceptual separability is a property ofgroups of stimuli

inde-A third important construct from GRT is sional separability In our hypothetical experiment

Trang 37

deci-with stimuli A1B1, A1B2, A2B1, and A2B2, and

two perceptual dimensions X1 and X2, decisional

separability holds on dimension X1(for example),

if the subject’s decision about whether stimulus

component A is at level 1 or 2 depends only on

the perceived value on dimension X1 A decision

bound is a line or curve that separates regions of the

perceptual space that elicit different responses The

only types of decision bounds that satisfy decisional

separability are vertical and horizontal lines

The Multivariate Normal Model

So far we have made no assumptions about

the form of the joint or marginal pdfs Our

only assumption has been that there exists some

probability distribution associated with each

stim-ulus and that these distributions are all embedded

in some Euclidean space (e.g., with orthogonal

dimensions) There have been some efforts to

extend GRT to more general geometric spaces

(i.e., Riemannian manifolds; Townsend, Aisbett,

Assadi, & Busemeyer, 2006; Townsend &

Spencer-Smith, 2004), but much more common is to

add more restrictions to the original version of

GRT, not fewer For example, some applications

of GRT have been distribution free (e.g., Ashby

& Maddox, 1994; Ashby & Townsend, 1986),

but most have assumed that the percepts are

multivariate normally distributed The multivariate

normal distribution includes two assumptions

First, the marginal distributions are all normal

Second, the only possible dependencies are pairwise

linear relationships Thus, in multivariate normal

distributions, uncorrelated random variables are

statistically independent

A hypothetical example of a GRT model that

assumes multivariate normal distributions is shown

in Figure 2.1 The ellipses shown there are contours

of equal likelihood; that is, all points on the same

ellipse are equally likely to be sampled from the

underlying distribution The contours of equal

likelihood also describe the shape a scatterplot of

points would take if they were random samples

from the underlying distribution Geometrically,

the contours are created by taking a slice through

the distribution parallel to the perceptual plane and

looking down at the result from above Contours

of equal likelihood in multivariate normal

distribu-tions are always circles or ellipses Bivariate normal

distributions, like those depicted in Figure 2.1 are

each characterized by five parameters: a mean on

each dimension, a variance on each dimension, and

a covariance or correlation between the values on

the two dimensions These are typically catalogued

in a mean vector and a variance-covariance matrix.For example, consider a bivariate normal distribu-

tion with joint density function f (x1,x2) Then themean vector would equal

σ1σ2)

The multivariate normal distribution has other important property Consider an identifica-tion task with only two stimuli and suppose theperceptual effects associated with the presentation

an-of each stimulus can be modeled as a multivariatenormal distribution Then it is straightforward toshow that the decision boundary that maximizesaccuracy is always linear or quadratic (e.g., Ashby,1992) The optimal boundary is linear if thetwo perceptual distributions have equal variance-covariance matrices (and so the contours of equallikelihood have the same shape and are just trans-lations of each other) and the optimal boundary isquadratic if the two variance-covariance matrices areunequal Thus, in the Gaussian version of GRT, theonly decision bounds that are typically consideredare either linear or quadratic

In Figure 2.1, note that perceptual independenceholds for all stimuli except A2B2 This can beseen in the contours of equal likelihood Notethat the major and minor axes of the ellipses thatdefine the contours of equal likelihood for stimuli

A1B1, A1B2, and A2B1 are all parallel to thetwo perceptual dimensions Thus, a scatterplot ofsamples from each of these distributions would becharacterized by zero correlation and, therefore,statistical independence (i.e., in the special Gaussiancase) However, the major and minor axes of the

A2B2 distribution are tilted, reflecting a positivecorrelation and hence a violation of perceptualindependence

Next, note in Figure 2.1 that stimulus nent A is perceptually separable from stimulus com-ponent B, but B is not perceptually separable from

compo-A To see this, note that the marginal distributionsfor stimulus component A are the same, regardless

of the level of component B [i.e., g11(x1) =

Trang 38

Respond A2B1

Respond A1B2

Respond A1B1

Fig 2.1 Contours of equal likelihood, decision bounds, and

marginal perceptual distributions from a hypothetical

multi-variate normal GRT model that describes the results of an

identification experiment with four stimuli that were constructed

by factorially combining two levels of two stimulus dimensions.

g12(x1) and g21(x1)= g22(x1), for all values of x1]

Thus, the subject’s perception of component A does

not depend on the level of B and, therefore,

stimu-lus component A is perceptually separable from B

On the other hand, note that the subject’s

percep-tion of component B does change when the level

of component A changes [i.e., g11(x1) = g21(x1)

and g12(x1) = g22(x1) for most values of x1] In

particular, when A changes from level 1 to level 2

the subject’s mean perceived value of each level of

component B increases Thus, the perception of

component B depends on the level of component

A and therefore B is not perceptually separable

from A

Finally, note that decisional separability holds

on dimension 1 but not on dimension 2 On

dimension 1 the decision bound is vertical Thus,

the subject has adopted the following decision rule:

Component A is at level 2 if x1> X c1; otherwise

component A is at level 1

(i.e., the x1 intercept of the vertical decision

bound) Thus, the subject’s decision about whether

component A is at level 1 or 2 does not depend on

the perceived value of component B So component

A is decisionally separable from component B On

the other hand, the decision bound on dimension

x2is not horizontal, so the criterion used to judge

whether component B is at level 1 or 2 changes with

the perceived value of component A (at least for

larger perceived values of A) As a result, component

B is not decisionally separable from component A

Applying GRT to Data

The most common applications of GRT are

to data collected in an identification experimentlike the one modeled in Figure 2.1 The key datafrom such experiments are collected in a confusionmatrix, which contains a row for every stimulusand a column for every response (Table 2.1 displays

an example of a confusion matrix, which will be

discussed and analyzed later) The entry in row i and column j lists the number of trials on which

stimulus Si was presented and the subject gaveresponse Rj Thus, the entries on the main diagonalgive the frequencies of all correct responses and theoff-diagonal entries describe the various errors (orconfusions) Note that each row sum equals thetotal number of stimulus presentations of that type

So if each stimulus is presented 100 times thenthe sum of all entries in each row will equal 100.This means that there is one constraint per row, so

an n × n confusion matrix will have n × (n – 1)

degrees of freedom

General recognition theory has been used toanalyze data from confusion matrices in twodifferent ways One is to fit the model to theentire confusion matrix In this method, a GRTmodel is constructed with specific numerical values

of all of its parameters and a predicted confusionmatrix is computed Next, values of each parameterare found that make the predicted matrix as close

as possible to the empirical confusion matrix Totest various assumptions about perceptual and deci-sional processing—for example, whether perceptualindependence holds—a version of the model thatassumes perceptual independence is fit to the data

as well as a version that makes no assumptionsabout perceptual independence This latter versioncontains the former version as a special case (i.e.,

in which all covariance parameters are set to zero),

so it can never fit worse After fitting these twomodels, we assume that perceptual independence

is violated if the more general model fits nificantly better than the more restricted modelthat assumes perceptual independence The othermethod for using GRT to test assumptions aboutperceptual processing, which is arguably morepopular, is to compute certain summary statisticsfrom the empirical confusion matrix and then

sig-to check whether these satisfy certain conditionsthat are characteristic of perceptual separability or

Trang 39

independence Because these two methods are so

different, we will discuss each in turn

It is important to note however, that

regard-less of which method is used, there are certain

nonidentifiabilities in the GRT model that could

limit the conclusions that are possible to draw from

any such analyses (e.g., Menneer, Wenger, & Blaha,

2010; Silbert & Thomas, 2013) The problems

identification data (i.e., when the stimuli are A1B1,

A1B2, A2B1, and A2B2) For example, Silbert and

Thomas (2013) showed that in 2× 2 applications

where there are two linear decision bounds that

do not satisfy decisional separability, there always

exists an alternative model that makes the exact

same empirical predictions and satisfies decisional

separability (and these two models are related by an

affine transformation) Thus, decisional separability

is not testable with standard applications of GRT to

2× 2 identification data (nor can the slopes of the

decision bounds be uniquely estimated) For several

reasons, however, these nonidentifiabilities are not

catastrophic

First, the problems don’t generally exist with

3× 3 or larger identification tasks In the 3 × 3

case the GRT model with linear bounds requires

at least 4 decision bounds to divide the perceptual

space into 9 response regions (e.g., in a tic-tac-toe

configuration) Typically, two will have a generally

vertical orientation and two will have a generally

horizontal orientation In this case, there is no

affine transformation that guarantees decisional

separability except in the special case where the

two vertical-tending bounds are parallel and the

two horizontal-tending bounds are parallel (because

parallel lines remain parallel after affine

transfor-mations) Thus, in 3 × 3 (or higher) designs,

decisional separability is typically identifiable and

testable

Second, there are simple experimental

2 identification experiment to test for decisional

separability In particular, switching the locations

of the response keys is known to interfere with

performance if decisional separability fails but not

if decisional separability holds (Maddox, Glass,

O’Brien, Filoteo, & Ashby, 2010; for more

infor-mation on this, see the section later entitled “Neural

Implementations of GRT”) Thus, one could add

100 extra trials to the end of a 2 × 2

identi-fication experiment where the response key

loca-tions are randomly interchanged (and participants

are informed of this change) If accuracy drops

significantly during this period, then decisionalseparability can be rejected, whereas if accuracy isunaffected then decisional separability is supported

using the newly developed GRT model with dividual differences (GRT-wIND; Soto, Vucovich,Musgrave, & Ashby, in press), which was patternedafter the INDSCAL model of multidimensionalscaling (Carroll & Chang, 1970) GRT-wIND is fit

in-to the data from all individuals simultaneously Allparticipants are assumed to share the same groupperceptual distributions, but different participantsare allowed different linear bounds and they areassumed to allocate different amounts of attention

to each perceptual dimension The model does notsuffer from the identifiability problems identified

by Silbert and Thomas (2013), even in the 2× 2case, because with different linear bounds foreach participant there is no affine transformationthat simultaneously makes all these bounds satisfydecisional separability

Fitting the GRT Model to Identification Data

computing the likelihood function

When the full GRT model is fit to identificationdata, the best-fitting values of all free parametersmust be found Ideally, this is done via the method

of maximum likelihood—that is, numerical values

of all parameters are found that maximize thelikelihood of the data given the model Let S1, S2, , Sn denote the n stimuli in an identification

experiment and let R1, R2, , Rn denote the n responses Let r ij denote the frequency with whichthe subject responded Rjon trials when stimulus Si

was presented Thus, r ij is the entry in row i and column j of the confusion matrix Note that the r ij

are random variables The entries in each row have a

multinomial distribution In particular, if P(R j|Si)

is the true probability that response Rj is given

on trials when stimulus Si is presented, then theprobability of observing the response frequencies

r i1 , r i2 , , r in in row i equals

n i!

r i1 !r i2!···r in!P(R1|Si)r i1 P(R2|Si)r i2 ···P(R n|Si)r in

(6)

stimulus Si was presented during the course of theexperiment The probability or joint likelihood ofobserving the entire confusion matrix is the product

Trang 40

of the probabilities of observing each row; that is,

General recognition theory models predict that

P(R j|Si) has a specific form Specifically, they

predict that P(R j|Si) is the volume in the Rj

response region under the multivariate distribution

of perceptual effects elicited when stimulus Siis

pre-sented This requires computing a multiple integral

The maximum likelihood estimators of the GRT

model parameters are those numerical values of each

parameter that maximize L Note that the first term

in Eq 7 does not depend on the values of any

model parameters Rather it only depends on the

data Thus, the parameter values that maximize the

second term also maximize the whole expression

For this reason, the first term can be ignored during

the maximization process Another common

prac-tice is to take logs of both sides of Eq 7 Parameter

values that maximize L will also maximize any

monotonic function of L (and log is a monotonic

transformation) So, the standard approach is to

find values of the free parameters that maximize

estimating the parameters

In the case of the multivariate normal model,

the predicted probability P(R j|Si) in Eq 8 equals

the volume under the multivariate normal pdf

that describes the subject’s perceptual experiences

on trials when stimulus Si is presented over the

response region associated with response Rj To

estimate the best-fitting parameter values using a

standard minimization routine, such integrals must

be evaluated many times If decisional separability is

assumed, then the problem simplifies considerably

For example, under these conditions, Wickens

(1992) derived the first and second derivatives

necessary to quickly estimate parameters of the

model using the Newton-Raphson method Other

methods must be used for more general models that

do not assume decisional separability Ennis and

Ashby (2003) proposed an efficient algorithm for

evaluating the integrals that arise when fitting any

GRT model This algorithm allows the parameters

of virtually any GRT model to be estimated via

standard minimization software The remainder of

this section describes this method

The left side of Figure 2.2 shows a contour

of equal likelihood from the bivariate normal

distribution that describes the perceptual effects ofstimulus Si, and the solid lines denote two possibledecision bounds in this hypothetical task In Figure2.2 the bounds are linear, but the method worksfor any number of bounds that have any parametricform The shaded region is the Rjresponse region

Thus, according to GRT, computing P(R j|Si) isequivalent to computing the volume under the Si

perceptual distribution in the Rj response region.This volume is indicated by the shaded region inthe figure First note that any linear bound can bewritten in discriminant function form as

and c is a constant The discriminant function form

of any decision bound has the property that positivevalues are obtained if any point on one side of thebound is inserted into the function, and negativevalues are obtained if any point on the oppositeside is inserted So, for example, in Figure 2.2, the

constants b1, b2, and c can be selected so that h1(x)

< 0 for any point below the bound Similarly, for the h2bound, the constants can be selected so that

and h2(x) < 0 for any point to the left Note that

under these conditions, the Rj response region is

defined as the set of all x such that h1(x)> 0 and

normal (mvn) pdf for stimulus Si as mvn(µi, i),then

to transform the problem using a multivariate form

of the well-known z transformation Ennis and

Ashby proposed using the Cholesky transformation

Any random vector x that has a multivariate normal

distribution can always be rewritten as

where µ is the mean vector of x, z is a random

vector with a multivariate z distribution (i.e., a tivariate normal distribution with mean vector 0

mul-and variance-covariance matrix equal to the identity

... clinical psychology, and

cognitive and decision models based upon quantum

probability theory

The models reviewed in the handbook make

use of many of the mathematical. .. choice trials

at each of payoff conditions from each of 50participants One of the first issues for modeling

is the level of analysis of the data On the onehand, a group-level analysis... conceptually related to the mass function

 2, where

μ is the mean of the distribution and σ is the< /i>

standard deviation of the distribution The normaldistribution

Ngày đăng: 22/04/2019, 13:30

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w