4, 1990 A Decision Tree Approach to Selecting an Ap- propriate Observation Reliability Index Hoi K.. Covalt 2 Accepted: September 17, 1990 Based on the conceptual framework outlined
Trang 1Journal of Psychopathology and Behavioral Assessment, Vol 12, No 4, 1990
A Decision Tree Approach to Selecting an Ap-
propriate Observation Reliability Index
Hoi K Suen, 1,3 Donald Ary, 2 and Wesley C Covalt 2
Accepted: September 17, 1990
Based on the conceptual framework outlined by Cone (1986) and Suen (1988),
a practical decision tree is developed as an aid for the selection o f observational reliability indices
KEY WOROS: interobserver agreement; iatraobserver reliability
The purpose of this brief paper is to suggest a practical process through which the appropriate choice of reliability index can be deter- mined, based on the conceptual framework provided by Cone (1986) and Suen (1988) There are numerous methods of assessing interobserver agree- ment and/or reliability today These methods have been derived from statis- tical assumptions, are appropriate for different epistemological paradigms, and yield different types of information While many authors have com- pared various indices conceptually or have shown mathematical relation- ships, this paper uses a decision tree format to show when different indices are appropriate
One can divide observation researchers as to which paradigms they adhere There are those who position themselves closely to the traditional Skinnerian view that only directly observable behaviors can be measured; this is the idiographic-behavior paradigm On the other hand, there are those researchers who attempt to develop observational measures of psychological constructs or mediating processes which cannot be directly
1230 CEDAR Building, Department of Educational & School Psychology & Special Education, The Pennsylvania State University, University Park, Pennsylvania 16802 2Northern Illinois University, De Kalb, Illinois 60115,
3]'o whom correspondence should be addressed
359 0882-2689D0/1200-0359506.00/0 9 1990 Plenum PubEshmg Corporation
Trang 2Table I A 2 • 2 Table Representing
t h e O b s e r v a t i o n a l D a t a o f T w o
Observers"
Observer 2
q2 p2
~0 = nonoccurrence of behavior; 1 = occurrence of behavior; p~ = propor- tion of occurrence reported by Ob- server 1; qi = p r o p o r t i o n of non- occurrence reported by Observer 1;
p2 = p r o p o r t i o n o f occurrence reported by Observer 2; q2 = propor- tion of nonoccurrence reported by Observer 2
measured; this is of the nomothetic-trait paradigm (cf Cone, 1986; Suen, 1988) Because the object of measurement is a directly observable behavior, those concerned with idiographic-behavior measures need to show cor- respondence between the behavior and some external criterion Thus, a criterion-referenced interpretation of data, with its external criterion, best serves the idiographic-behavior paradigm The nomothetic-trait view has a wider scope of acceptable events to measure Therefore, either a criterion- referenced or a norm-referenced interpretation of the data will be ap- propriate, depending on the selection of assessment: a set, external criterion
or a subject's relation to a normed group With these points in mind, ob- servational reliability indices can be discussed
Observational reliability indices can be divided into three groups, labeled interobserver agreement, intraobserver reliability, and intraclass generalizability Interobserver a g r e e m e n t indices are appropriate for nomothetic or idiographic paradigms and criterion-referenced interpreta- tions They are essentially measures of observer interchangeability in situa- tions of multiple observers with a single subject or event All interobserver agreement indices are omnibus indicators which do not distinguish between systematic and random measurement errors Some commonly used or fre- quently recommended interobserver agreement indices include proportion agreement (Po), occurrence agreement (Poet), nonoccurrence agreement ( P n o n ) , kappa (~:), Dice's index (SD), Scott's pi (r0, and Maxwell and Pillemer's rll (Hartmann, 1982) Whereas ~:, r~, and rll adjust for chance agreement, the other interobserver agreement indices do not Furthermore, using the conventional cell labels presented in Table I, it can be added
that n and rtl are influenced by b-c and a-d differences; Pocc, Pnon, and S D
Trang 3Reliability Index 361
I
Nomothetic
O r i e n t a t i o n
t
Observational Reliability Indices
I
I
Idiographic
O r i e n t a t i o n
I
N o ~ -
r e f e r e n c e d
t
Info on Omnibus
Single Average Single
observ, across observ
C h a n c e Chance Chance
c o r r e c t e d c o r r e c t e d corrected
I
Criterion-
r e f e r e n c e d - -
t
I
Info on
s o u r c e s
of error
i
Single Average observ, across
observ
r
Chance Chance
c o r r e c t e d
r c
I
O m n i b u s
indicator
I
Single
observ
I
C h a n c e
c o r r e c t e d c o r r e c t e d
I
gc
l
Not
c o r r e c t e d for chan
Influenced Treat I n f l u e n c e d T r e a t
by b-c and all agree, by b-c all agree
a-d diff & disagree, only & disagree
Fig 1 A decision tree for the selection of observational reliability indices
treat all b c agreements and disagreements the same
Intraobserver indices are appropriate for the nomothetic paradigm and for norm-referenced orientation only The intraobserver reliability index is Pearson's r (or the d~ coefficient) The Pearson's r (or qb) measures the reliability of a single score from a single observer Problems arise if the classical parallel tests assumptions are violated (cf Nunnally, 1978; Suen and Ary, 1989), for then the results are uninterpretable However,
o n c e these assumptions are met, r (or qb) b e c o m e s an intraobserver reliability index
Trang 4T a b l e I I F o r m u l a e f o r t h e C o m p u t a t i o n o f R e l i a b i l i t y I n d i c e s
G i v e n T a b l e I:
Po Pocc Pnon
k
SD 7F
rl 1
G i v e n t h e f o l l o w i n g :
O" 2 0"02
= b + c
= b / ( a + b + d )
= c / ( a + c + d)
= ( b + c - p i P 2 - q l q 2 ) / ( 1 - p i p 2 - q l q 2 )
= ( b c - a d ) / ( p l p 2 q l q 2 ) v2
= 2 b / ( p l + p z )
= [4(bc - a d ) - ( a - d)2]/(p~ + p 2 ) ( q l + q2)
= 2 ( b c - a d ) / ( p i q l + p 2 q 2 )
= ( M S s - N S ~ ) / K
= ( M S o - M S ~ ) / N a~ 2 = M S e
w h e r e M S S is s u b j e c t m e a n s q u a r e , M S o is o b s e r v e r m e a n s q u a r e , MSe
s q u a r e , K is n u m b e r o f o b s e r v e r s , a n d N is n u m b e r o f s u b j e c t s :
r~ 2 = ~rs~/(a2 + a~ 2)
rc 2 = a y ( a ~ 2 + 002 ~ o-e2 )
a n = K a s 2 / ( K a s 2 + ae 2)
a~ = K a s 2 / ( K a s 2 + ao 2 + ae 2)
is r e s i d u a l m e a n
The intraclass measures m a k e calculations based on an ANOVA In
an interobserver situation, the subject (event) variance is considered the
"true" variance and other estimable variances as error The exception is when certain conditions (or facets) of measurement are considered fixed
In these situations, variances due to the interaction between the condition and the subject becomes part of the "true" variance as well (cf Suen, 1990) Commonly recommended intraclass indices for interobserver reliability in- clude Hartmann's coefficient (rn2), Berk's r12 and r22 (denoted re2 and ac, respectively, in this paper) and Cronbach's alpha as suggested by Bakeman and Gottman ( a , ) (cf Bakeman & Gottman, 1986; Suen, 1988) The r n 2
index is equivalent to Pearson's r or dp yet does not require the restrictive classical parallel tests assumptions Similar to Pearson's r, it is appropriate for nomothetic paradigms and norm-referenced interpretations The a,, is also appropriate for nomothetic paradigms and norm-referenced orienta- tions but is an indicator of reliability of average scores across observers and is appropriate only if the average score across a number of observers equal to the number used in the reliability check is used as the unit of analysis in subsequent comparisons and analyses The ac is the same as cx,, but is used for criterion-referenced data and either nomothetic or idiographic paradigms Finally, the rc2 index is analogous to rn2 and is use- ful for nomothetic or idiographic paradigms and criterion-referenced orien-
t a t i o n s B o t h ac a n d re2 provide i n f o r m a t i o n of b o t h interobserver agreement and intraobserver reliability yet do not require the restrictive
Trang 5Reliability Index 363
classical p a r a l l e l t e s t s a s s u m p t i o n s A l l intr~tclass i n d i c e s p r o v i d e i n f o r m a -
t i o n o n t h e specific s o u r c e s o f m e a s u r e m e n t e r r o r s
F i g u r e 1 p r o v i d e s a d e c i s i o n t r e e to a i d p r a c t i t i o n e r s in t h e c h o i c e o f
o b s e r v a t i o n a l r e l i a b i l i t y indices T h e s e i n d i c e s a r e a p p r o p r i a t e w h e n all
o t h e r c o n d i t i o n s o f m e a s u r e m e n t a r e c o n t r o l l e d a n d s t a n d a r d i z e d a n d t h e only d i m e n s i o n s t h a t c h a n g e a r e t h e o b s e r v e r u s e d a n d t h e b e h a v i o r o b - served A l s o p r e s e n t e d ( T a b l e II) is a list o f f o r m u l a s to c a l c u l a t e e a c h o f
t h e a b o v e indices
R E F E R E N C E S
Bakeman, R., & Gottman, J M (1986) Observing interaction: An introduction to sequential analysis London: Cambridge University Press
Cone, J D (1986) Idiographic, nomothetic, and related perspectives in behavioral assessment
In R O Nelson & S C Hayes (Eds.), Conceptual foundations of behavioral assessment
(pp 111-128) New York: Guilford
Hartmann, D P (1982) Assessing the dependability of observational data In D P Hartmann (Ed.), Using observers to study behavior (pp 51-66) San Francisco, CA: Jossey-Bass NunnaUy, J C (1978) Psychometric theory New York: McGraw-Hill
Suen, H K (1988) Agreement, reliability, accuracy, and validity: Toward a clarification Be- havioral Assessment, 10, 343-366
Suen, H K (1990) Principles of test theories Hillsdale, NJ: Lawrence Erlbaum (in press) Suen, H K., & Ary, D (1989) Analyzing quantitative behavioral observation data Hillsdale,
N J: Lawrence Erlbaum