A decision tree approach to selecting an appropriate observation reliability index

4, 1990 A Decision Tree Approach to Selecting an Ap- propriate Observation Reliability Index Hoi K.. Covalt 2 Accepted: September 17, 1990 Based on the conceptual framework outlined

Trang 1

Journal of Psychopathology and Behavioral Assessment, Vol 12, No 4, 1990

A Decision Tree Approach to Selecting an Ap-

propriate Observation Reliability Index

Hoi K Suen, 1,3 Donald Ary, 2 and Wesley C Covalt 2

Accepted: September 17, 1990

Based on the conceptual framework outlined by Cone (1986) and Suen (1988),

a practical decision tree is developed as an aid for the selection o f observational reliability indices

KEY WOROS: interobserver agreement; iatraobserver reliability

The purpose of this brief paper is to suggest a practical process through which the appropriate choice of reliability index can be deter- mined, based on the conceptual framework provided by Cone (1986) and Suen (1988) There are numerous methods of assessing interobserver agreement and/or reliability today These methods have been derived from statis- tical assumptions, are appropriate for different epistemological paradigms, and yield different types of information While many authors have com- pared various indices conceptually or have shown mathematical relation- ships, this paper uses a decision tree format to show when different indices are appropriate

One can divide observation researchers as to which paradigms they adhere There are those who position themselves closely to the traditional Skinnerian view that only directly observable behaviors can be measured; this is the idiographic-behavior paradigm On the other hand, there are those researchers who attempt to develop observational measures of psychological constructs or mediating processes which cannot be directly

1230 CEDAR Building, Department of Educational & School Psychology & Special Education, The Pennsylvania State University, University Park, Pennsylvania 16802 2Northern Illinois University, De Kalb, Illinois 60115,

3]'o whom correspondence should be addressed

359 0882-2689D0/1200-0359506.00/0 9 1990 Plenum PubEshmg Corporation

Trang 2

Table I A 2 • 2 Table Representing

t h e O b s e r v a t i o n a l D a t a o f T w o

Observers"

Observer 2

q2 p2

~0 = nonoccurrence of behavior; 1 = occurrence of behavior; p~ = proportion of occurrence reported by Ob- server 1; qi = p r o p o r t i o n of nonoccurrence reported by Observer 1;

p2 = p r o p o r t i o n o f occurrence reported by Observer 2; q2 = proportion of nonoccurrence reported by Observer 2

measured; this is of the nomothetic-trait paradigm (cf Cone, 1986; Suen, 1988) Because the object of measurement is a directly observable behavior, those concerned with idiographic-behavior measures need to show correspondence between the behavior and some external criterion Thus, a criterion-referenced interpretation of data, with its external criterion, best serves the idiographic-behavior paradigm The nomothetic-trait view has a wider scope of acceptable events to measure Therefore, either a criterion- referenced or a norm-referenced interpretation of the data will be appropriate, depending on the selection of assessment: a set, external criterion

or a subject's relation to a normed group With these points in mind, observational reliability indices can be discussed

Observational reliability indices can be divided into three groups, labeled interobserver agreement, intraobserver reliability, and intraclass generalizability Interobserver a g r e e m e n t indices are appropriate for nomothetic or idiographic paradigms and criterion-referenced interpretations They are essentially measures of observer interchangeability in situations of multiple observers with a single subject or event All interobserver agreement indices are omnibus indicators which do not distinguish between systematic and random measurement errors Some commonly used or fre- quently recommended interobserver agreement indices include proportion agreement (Po), occurrence agreement (Poet), nonoccurrence agreement ( P n o n ) , kappa (~:), Dice's index (SD), Scott's pi (r0, and Maxwell and Pillemer's rll (Hartmann, 1982) Whereas ~:, r~, and rll adjust for chance agreement, the other interobserver agreement indices do not Furthermore, using the conventional cell labels presented in Table I, it can be added

that n and rtl are influenced by b-c and a-d differences; Pocc, Pnon, and S D

Trang 3

Reliability Index 361

I

Nomothetic

O r i e n t a t i o n

t

Observational Reliability Indices

I

Idiographic

O r i e n t a t i o n

I

N o ~ -

r e f e r e n c e d

t

Info on Omnibus

Single Average Single

observ, across observ

C h a n c e Chance Chance

c o r r e c t e d c o r r e c t e d corrected

I

Criterion-

r e f e r e n c e d - -

t

I

Info on

s o u r c e s

of error

i

Single Average observ, across

observ

r

Chance Chance

c o r r e c t e d

r c

I

O m n i b u s

indicator

I

Single

observ

I

C h a n c e

c o r r e c t e d c o r r e c t e d

I

gc

l

Not

c o r r e c t e d for chan

Influenced Treat I n f l u e n c e d T r e a t

by b-c and all agree, by b-c all agree

a-d diff & disagree, only & disagree

Fig 1 A decision tree for the selection of observational reliability indices

treat all b c agreements and disagreements the same

Intraobserver indices are appropriate for the nomothetic paradigm and for norm-referenced orientation only The intraobserver reliability index is Pearson's r (or the d~ coefficient) The Pearson's r (or qb) measures the reliability of a single score from a single observer Problems arise if the classical parallel tests assumptions are violated (cf Nunnally, 1978; Suen and Ary, 1989), for then the results are uninterpretable However,

o n c e these assumptions are met, r (or qb) b e c o m e s an intraobserver reliability index

Trang 4

T a b l e I I F o r m u l a e f o r t h e C o m p u t a t i o n o f R e l i a b i l i t y I n d i c e s

G i v e n T a b l e I:

Po Pocc Pnon

k

SD 7F

rl 1

G i v e n t h e f o l l o w i n g :

O" 2 0"02

= b + c

= b / ( a + b + d )

= c / ( a + c + d)

= ( b + c - p i P 2 - q l q 2 ) / ( 1 - p i p 2 - q l q 2 )

= ( b c - a d ) / ( p l p 2 q l q 2 ) v2

= 2 b / ( p l + p z )

= [4(bc - a d ) - ( a - d)2]/(p~ + p 2 ) ( q l + q2)

= 2 ( b c - a d ) / ( p i q l + p 2 q 2 )

= ( M S s - N S ~ ) / K

= ( M S o - M S ~ ) / N a~ 2 = M S e

w h e r e M S S is s u b j e c t m e a n s q u a r e , M S o is o b s e r v e r m e a n s q u a r e , MSe

s q u a r e , K is n u m b e r o f o b s e r v e r s , a n d N is n u m b e r o f s u b j e c t s :

r~ 2 = ~rs~/(a2 + a~ 2)

rc 2 = a y ( a ~ 2 + 002 ~ o-e2 )

a n = K a s 2 / ( K a s 2 + ae 2)

a~ = K a s 2 / ( K a s 2 + ao 2 + ae 2)

is r e s i d u a l m e a n

The intraclass measures m a k e calculations based on an ANOVA In

an interobserver situation, the subject (event) variance is considered the

"true" variance and other estimable variances as error The exception is when certain conditions (or facets) of measurement are considered fixed

In these situations, variances due to the interaction between the condition and the subject becomes part of the "true" variance as well (cf Suen, 1990) Commonly recommended intraclass indices for interobserver reliability include Hartmann's coefficient (rn2), Berk's r12 and r22 (denoted re2 and ac, respectively, in this paper) and Cronbach's alpha as suggested by Bakeman and Gottman ( a , ) (cf Bakeman & Gottman, 1986; Suen, 1988) The r n 2

index is equivalent to Pearson's r or dp yet does not require the restrictive classical parallel tests assumptions Similar to Pearson's r, it is appropriate for nomothetic paradigms and norm-referenced interpretations The a,, is also appropriate for nomothetic paradigms and norm-referenced orienta- tions but is an indicator of reliability of average scores across observers and is appropriate only if the average score across a number of observers equal to the number used in the reliability check is used as the unit of analysis in subsequent comparisons and analyses The ac is the same as cx,, but is used for criterion-referenced data and either nomothetic or idiographic paradigms Finally, the rc2 index is analogous to rn2 and is use- ful for nomothetic or idiographic paradigms and criterion-referenced orien-

t a t i o n s B o t h ac a n d re2 provide i n f o r m a t i o n of b o t h interobserver agreement and intraobserver reliability yet do not require the restrictive

Trang 5

Reliability Index 363

classical p a r a l l e l t e s t s a s s u m p t i o n s A l l intr~tclass i n d i c e s p r o v i d e i n f o r m a -

t i o n o n t h e specific s o u r c e s o f m e a s u r e m e n t e r r o r s

F i g u r e 1 p r o v i d e s a d e c i s i o n t r e e to a i d p r a c t i t i o n e r s in t h e c h o i c e o f

o b s e r v a t i o n a l r e l i a b i l i t y indices T h e s e i n d i c e s a r e a p p r o p r i a t e w h e n all

o t h e r c o n d i t i o n s o f m e a s u r e m e n t a r e c o n t r o l l e d a n d s t a n d a r d i z e d a n d t h e only d i m e n s i o n s t h a t c h a n g e a r e t h e o b s e r v e r u s e d a n d t h e b e h a v i o r o b - served A l s o p r e s e n t e d ( T a b l e II) is a list o f f o r m u l a s to c a l c u l a t e e a c h o f

t h e a b o v e indices

R E F E R E N C E S

Bakeman, R., & Gottman, J M (1986) Observing interaction: An introduction to sequential analysis London: Cambridge University Press

Cone, J D (1986) Idiographic, nomothetic, and related perspectives in behavioral assessment

In R O Nelson & S C Hayes (Eds.), Conceptual foundations of behavioral assessment

(pp 111-128) New York: Guilford

Hartmann, D P (1982) Assessing the dependability of observational data In D P Hartmann (Ed.), Using observers to study behavior (pp 51-66) San Francisco, CA: Jossey-Bass NunnaUy, J C (1978) Psychometric theory New York: McGraw-Hill

Suen, H K (1988) Agreement, reliability, accuracy, and validity: Toward a clarification Be- havioral Assessment, 10, 343-366

Suen, H K (1990) Principles of test theories Hillsdale, NJ: Lawrence Erlbaum (in press) Suen, H K., & Ary, D (1989) Analyzing quantitative behavioral observation data Hillsdale,

N J: Lawrence Erlbaum

Định dạng
Số trang	5
Dung lượng	227,31 KB