Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Language Assessment
Trang 1Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Language Assessment Quarterly
Publication details, including instructions for authors andsubscription information:
http://www.tandfonline.com/loi/hlaq20
Variability in ESL Essay Rating Processes: The Role of the Rating Scale and Rater Experience
Khaled Barkaoui aa
York University,Version of record first published: 19 Feb 2010
To cite this article: Khaled Barkaoui (2010): Variability in ESL Essay Rating Processes: The Role of the
Rating Scale and Rater Experience, Language Assessment Quarterly, 7:1, 54-74
To link to this article: http://dx.doi.org/10.1080/15434300903464418
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden
The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date The accuracy of any
instructions, formulae, and drug doses should be independently verified with primarysources The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material
Trang 2DOI: 10.1080/15434300903464418
HLAQ
1543-4303
Language Assessment Quarterly, Vol 7, No 1, Dec 2009: pp 0–0
Language Assessment Quarterly
Variability in ESL Essay Rating Processes: The Role of the
Rating Scale and Rater Experience
ESL Essay Rating Processes
This study examined the roles and effects of two sources of variability in the rating context, ing scale and rater experience, on English as a second language (ESL) essay rating processes Itmay be useful to think of the rating process as involving a reader/rater interacting with threetexts (the writing task, the essay, and the rating scale) within a specific sociocultural context(e.g., institution) that specifies the criteria, purposes, and possibly processes of reading andinterpreting the three texts to arrive at a rating decision (Lumley, 2005; Weigle, 2002) Althoughvarious factors can contribute to variability in scores and rater decision-making processes,research on second-language essay rating has tended to focus on such factors as task require-ments, rater characteristics, and/or essay features (Barkaoui, 2007a)
rat-However, it is obvious that other contextual factors, such as rating procedures, influenceraters’ judgment of student performance and the scores they assign As Schoonen (2005)argued, “The effects of task and rater are most likely dependent on what has to be scored in atext and how it has to be scored” (p 5) In addition, the rating scale is an important component
of the rating context because it specifies what raters should look for in a written performanceand will ultimately influence the validity of the inferences and the fairness of the decisions
Correspondence should be sent to Khaled Barkaoui, York University, Faculty of Education, 235 Winters College,
4700 Keele Street, Toronto, Ontario, M3J 1P3 Canada E-mail: kbarkaoui@edu.yorku.ca
Trang 3that educators make about individuals and programs based on essay test scores (Weigle,2002) This aspect of the rating context, however, has received little attention (Barkaoui,2007a; Hamp-Lyons & Kroll, 1997; Weigle, 2002).
This article focuses on two types of rating scales, holistic and analytic, that are widely used inlarge-scale and classroom assessments (Hamp-Lyons, 1991; Weigle, 2002) These two types ofscales differ in terms of scoring methods and implications for rater decision-making processes(Goulden, 1992, 1994; Weigle, 2002) In terms of scoring method, in analytic scoring raters assign
subscores to individual writing traits (e.g., language, content, organization); these subscores may
then be summed to arrive at an overall score In holistic scoring, the rater may also consider
individ-ual elements of writing but chooses one score to reflect the overall qindivid-uality of the paper (Goulden,
1992, 1994) In terms of decision-making processes, with analytic scoring, the rater has to evaluatethe different writing traits separately In holistic scoring the rater has to consider different writingtraits too, but the rater has also to weight and combine their assessments of the different traits toarrive at one overall score, which is likely to make the rating task more cognitively demanding.These differences are likely to influence essay rating processes and outcomes However,although the literature is replete with arguments for and against the two rating methods, little isknown about whether and how they impact on ESL essay reading and rating processes and scores(Barkaoui, 2007a; Hamp-Lyons & Kroll, 1997; Weigle, 2002) Such studies as have been reported
in the literature (e.g., Bacha, 2001; O’Loughlin, 1994; Schoonen, 2005; Song & Caruso, 1996)examined the effects of rating scales on rater and score reliability but did not consider the ratingprocess Furthermore, the findings of some of these studies are mixed For example, in two studiescomparing the holistic and analytic scores assigned by ESL and English teachers to ESL essays,O’Loughlin (1994) found that holistic ratings achieved higher levels of interrater agreement acrossboth rater groups, whereas Song and Caruso (1996) found significant differences in terms of theholistic, but not the analytic, scores across rater groups Bacha (2001), on the other hand, reportedhigh levels of inter- and intrarater reliabilities for both types of rating scales
I am not aware of any study that has examined the effects of different types of rating scales
on L2 essay rating processes (but see Barkaoui, 2007b) Most qualitative studies have
investi-gated the decision-making behaviors and aspects of writing that raters attend to when ratingessays with no specific rating guidelines (e.g., Cumming, Kantor, & Powers, 2002; Delaruelle,1997), or when using holistic (e.g., Milanovic, Saville, & Shuhong, 1996; Sakyi, 2003;Vaughan, 1991) or analytic scoring (e.g., Cumming, 1990; Lumley, 2005; Smith, 2000; Weigle,1999) Lumley and Smith may be two exceptions in that, although they did not specifically com-pare different rating scales, their findings raise several relevant questions concerning the role ofthe rating scale in essay rating processes Smith found that raters attend to other textual features
in addition to those mentioned in the rating scale, that raters with different reading strategiesinterpret and apply the rating criteria differently, and that the rating criteria have different effects
on raters with different approaches to essay reading and rating Lumley found that (a) raters mayunderstand the rating criteria similarly in general, but emphasize different components andapply them in different ways, and (b) raters may face problems reconciling their impression ofthe text, the specific features of the text, and the wordings of the rating scale
Another limitation of previous research is that the frameworks that describe the essay ratingprocess (e.g., Cumming et al., 2002; Freedman & Calfee, 1983; Homburg, 1984; Milanovic etal., 1996; Ruth & Murphy, 1988; Sakyi, 2003) do not discuss whether and how the content andorganization of the rating scale influence rater decision-making behaviors and the aspects of
Trang 4writing raters attend to For example, Freedman and Calfee seemed to suggest that essay rating
is a linear process where the rater reads the essay, forms a mental representation of it, comparesand matches this representation to the rating criteria, and then articulates a rating decision Otherstudies of essay rating did not include any rating scales (e.g., Cumming et al., 2002) As a result,these studies do not discuss the role of the rating scale in variation in rater decision-makingbehaviors Such information is crucial for designing, selecting, and improving rating scales andrater training as well as for the validation of ESL writing assessments
To examine rating scales inevitably means examining the individuals using them, i.e., raters
As Lumley (2005) emphasized, the rater is at the center of the rating activity (cf Cumming,Kantor, & Powers, 2001; Erdosy, 2004) One of the rater factors that seems to play animportant role in the rating process is rater experience (e.g., Cumming, 1990; Lumley, 2005;Schoonen, Vergeer, & Eiting, 1997; Wolfe, 2006) Schoonen et al., for instance, argued that theexpertise and knowledge that raters bring to the rating task are essential for a reliable and validrating (p 158) There is a relatively extensive literature on the effects of rater expertise on ESLessay rating processes (Cumming, 1990; Delaruelle, 1997; Erdosy, 2004; Sakyi, 2003; Weigle,1999) This research indicates that experienced and novice raters employ qualitatively differentrating processes Cumming (1990), for example, found that experienced teachers had a muchfuller mental representation of the essay assessment task and used a large and varied number ofcriteria, self-control strategies,1 and knowledge sources to read and judge ESL essays Noviceraters, by contrast, tended to evaluate essays with only a few of these component skills and cri-teria, using skills that may derive from their general reading abilities or other knowledge theyhave acquired previously (e.g., editing)
However, there is no research on how raters with different levels of experience approachessay rating with different types of rating scales Cumming (1990) hypothesized that noviceraters, unlike experienced raters, may benefit from analytic scoring procedures to direct theirattention to specific aspects of writing as well as appropriate evaluation strategies and criteria,whereas Goulden (1994) hypothesized that analytic scoring is easier for inexperienced raters, asfewer unguided decisions (e.g., weighting different evaluation criteria) are required It was theaim of the present study to investigate these empirical issues Specifically, the current studyused think-aloud protocols to examine the roles of rating scale type (holistic vs analytic), raterexperience (novice vs experienced) and interaction among them in variability in ESL essayrating processes Following previous research (e.g., Cumming et al., 2002; Lumley, 2005;Milanovic et al., 1996), rating processes are defined as the decision-making behaviors of theraters and the aspects of writing they attend to while reading and rating ESL essays
Trang 5raters were graduate students and/or ESL instructors who had been teaching and rating ESLwriting for at least 5 years, had an M.A or M.Ed degree, had received specific training inassessment and essay rating, and rated themselves as competent or expert raters Novice raterswere mainly teaching English as a second language students who were enrolled in or had justcompleted a preservice or teacher training program in ESL, had no ESL teaching and ratingexperience at all at the time of data collection, and rated themselves as novice raters The partic-ipants were recruited from various ESL and ESL teacher education (teaching English as asecond language) programs at universities in southern Ontario They varied in terms of theirgender, age, and first-language backgrounds, but all were native or highly proficient non-nativespeakers of English Table 1 describes the profile of a typical participant in each group.
Data Collection Procedures
The study included 180 essays produced under real-exam conditions by adult ESL learners fromdiverse parts of the world and with varying levels of proficiency in English Each essay waswritten within 30 minutes in response to one of two comparable independent prompts (Studyand Sports)
Each rater rated a random sample of 24 essays, 12 silently and 12 while thinking aloud Toensure counterbalancing, half the participants in each group were randomly assigned to startwith holistic rating and the other half to start with analytic rating The holistic and analyticscales, borrowed from Hamp-Lyons (1991, pp 247–251), included the same evaluation criteria,wording and number of score levels (9), but differed in terms of whether to assign one overallscore (holistic) or multiple scores (analytic) to each essay The rating criteria in the analyticscale were grouped under five categories: communicative quality, organization, argumentation,linguistic accuracy, and linguistic appropriacy
Each participant attended a 30-minute individual training session about one of the ratingscales and rated and discussed a sample of four essays Next, each rated 12 essays silently athome using the first rating scale (these silent ratings are not considered in this paper) Each raterthen attended a 30-min session where they received detailed instructions and careful training onhow to think aloud while rating the essays following procedures and instructions in Cumming
et al (2001, pp 83–85) Later, each participant rated the remaining 12 essays while thinking
TABLE 1 Typical Profile of a Novice and an Experienced Rater
Role at time of the research TESL student ESL teacher ESL teaching experience None 10 years or more Rating experience None 5 years or more Post-graduate study None M.A./M.Ed.
Received training in assessment No Yes Self-assessment of rating ability Novice Competent or expert
Note TESL = teaching English as a second language; ESL = English as a second
language;
an = 11 bn = 14.
Trang 6aloud into a tape-recorder At least two weeks later, each participant attended a second trainingsession with the second rating scale and rated 12 essays silently and 12 while thinking aloud.Each participant rated the same 12 think-aloud essays with both scales but in a different randomorder of essays and prompts All participants did all the think-aloud protocols individually, atthe participant’s home, to allow them enough time to verbalize and to minimize researchereffects on the participants’ performance Figure 1 summarizes the data collection procedures.
Data Analysis
Data for this current study consisted of the participants’ think-aloud protocols only Becausesome raters did not record their thinking aloud while rating some of the essays and because ofpoor recording quality, only 558 protocols (out of 600) were analyzed The novice ratersprovided 264 of these protocols (47%) There was an equal number of protocols for each ratingscale and on each prompt The protocols were coded with the assistance of the computerprogram Observer 5.0 (Noldus Information Technology, 2003), a software for the organization,analysis, and management of audio and video data Using Observer allowed coding to be carriedout directly from the protocol audio-recordings (instead of transcripts)
The unit of analysis for the think-aloud protocols was a decision-making statement, which
was segmented using the following criteria from Cumming et al (2002): (a) a pause of fiveseconds or more, (b) rater reading aloud a segment of the essay, and/or (c) end or beginning ofthe assessment of a single essay The coding scheme was developed based mainly on Cum-ming et al.’s (2002) empirically based schemes of rater decision-making behaviors andaspects of writing raters attend to Cumming et al.’s main model of rater behavior, as itapplied to the rating of independent prompts,2 consists of various decision-making behaviorsgrouped under three foci (rater self-monitoring behavior, ideational and rhetorical elements ofthe text, control of language within the text) and two strategies (interpretation and judgment).Interpretation strategies consist of reading strategies aimed at comprehending the essay,whereas judgment concerns evaluation strategies for formulating a rating Cumming et al alsodistinguished between three general types of decision-making behavior: a focus on self-monitoring(i.e., focus on one’s own rating behavior, e.g., monitor for personal bias), a focus on the
2 Cumming et al (2001, 2002) developed three frameworks based on data from different types of tasks and both ESL and English teachers.
FIGURE 1 Summary of data collection procedures.
Phase 1:
1 Orientation session for rating scale 1 (scales counterbalanced).
2 Rating 12 essays silently using scale 1 (at home).
3 Think-aloud training.
4 Rating 12 essays while thinking aloud using scale 1 (at home).
Phase 2:
5 Orientation session for rating scale 2.
6 Rating 12 essays silently using scale 2 (same essays as in 2 above) (at home).
7 Rating 12 essays while thinking aloud using scale 2 (same essays as in 4 above) (at home).
Trang 7essay’s realization of ideational and rhetorical elements (e.g., essay rhetorical structure,coherence, relevance), and a focus on the essay’s accuracy and fluency in the English language(e.g., syntax, lexis).
Based on preliminary inspections of the data, 36 codes were selected from Cumming et al.’s
frameworks and three new ones were added: (a) Read, interpret, refer, or comment on rating
scale to account for the raters’ uses of the rating scales; (b) Assess communicative effectiveness
or quality, which pertains to text comprehensibility and clarity at both the local and global
levels; and (c) Compare scores across rating categories, to account for participants’ comparison
of scores assigned to the same essay on different analytic rating categories The final codingscheme consisted of 39 codes A complete list of the codes with examples from the current study
is presented in the appendix
The author coded all the protocols by assigning each decision-making statement all therelevant codes in the coding scheme To check the reliability of the coding, the coding schemewas discussed with another researcher, who then independently coded a random sample of 70protocols (3,083 codes) Percentage agreement achieved was 81%, computed for agreement interms of the main categories in the appendix Percentage agreements for main categories andwithin each category varied, however (e.g., 76% for self-monitoring-judgment, 85% forlanguage-judgment) For most cases, the coders were able to reconcile the codes In the fewcases where they were not able to reach an agreement, the author decided the final code to beassigned
As in previous studies (e.g., Cumming, 1990; Cumming et al., 2002; Wolfe, 2006;
Wolfe, Kao, & Ranney, 1998), the focus in this study is on comparing the frequency of the
decision-making behaviors and aspects of writing attended to Consequently, the codedprotocol data were tallied and percentages were computed for each rater for each code in thecoding scheme These percentages served as the data for comparison across rater groupsand rating scales Statistical tests were then conducted on the main categories in the appen-dix Subcategories were used for descriptive purposes only and to explain significant differ-ences in main categories Because the coded data did not seem to meet the statisticalassumptions of parametric tests, nonparametric tests were used to compare coded dataacross rating scales (Wilcoxon Signed-Ranks Test) and across rater groups (Mann-WhitneyTest).3 Because these tests rely on ranks, the following descriptive statistics are reported
next: median (Mdn) and the highest (Max) and lowest (Min) values for each main category.
Finally, because each participant provided 12 protocols for each rating scale, each rater had
24 percentages for each code For example, each rater had 24 percentages, 1 for each essay foreach rating scale (i.e., 12 essays × 2 rating scales), for the code “scan whole composition.” To be
able to analyze the coded data statistically, these percentages had to be aggregated as follows
To compare coded data across rating scales, the protocols were aggregated at the rater level, bytype of rating scale, to obtain 2 average percentages for each code for each rater, 1 for eachrating scale To compare the coded data across rater groups, the protocols were aggregated at therater level to obtain one proportion per rater Statistical tests were then conducted on aggregateddata
3Wilcoxon signed-ranks test is a nonparametric equivalent of the dependent t test, whereas Mann-Whitney test is a nonparametric equivalent of the independent t test for comparing two independent groups.
Trang 8ana-= 6%; and rhetorical and ideational, Mdn ana-= 4%) and more language focus (Mdn ana-= 23%) than did the analytic scale, which elicited significantly more judgment strategies (Mdn = 63%) and self- monitoring focus (Mdn = 50%) than did the holistic scale.
In terms of subcategories, Table 3 shows the strategies that were reported more frequentlywith each rating scale Table 3 shows that there were more references to specific linguistic fea-tures (e.g., syntax, lexis, spelling) with the holistic scale, whereas the analytic scale elicitedmore reference to rating language overall (see appendix for examples) In addition, with holisticscoring raters tended to read and interpret the essay more frequently, whereas the analytic scale
TABLE 2 Descriptive Statistics for Decision-Making Behaviors by Rating Scale
Focus
Self-monitoring* 43.88 36.18 62.30 50.40 39.29 62.53 Rhetorical 31.00 18.58 44.10 28.10 22.24 36.84 Language* 22.84 12.12 37.77 20.39 11.99 33.96 Strategy
Interpretation* 41.70 32.86 51.12 37.38 25.12 43.67 Judgment* 58.30 48.88 67.14 62.62 56.33 74.88 Strategy × Focus
Interpretation
Self-monitoring* 30.96 26.50 36.38 29.71 18.20 35.92 Rhetorical* 3.67 35 13.99 3.42 80 6.31 Language* 5.75 2.20 11.46 3.76 1.07 11.70 Judgment
Self-monitoring* 13.41 7.67 28.84 22.06 10.97 31.98 Rhetorical 24.83 15.57 36.08 26.09 19.40 33.10 Language 17.51 9.92 27.57 15.56 9.99 27.51
Note N = 25 raters.
*Wilcoxon Signed Ranks tests indicated that the differences across rating scales were statistically significant at p < 05.
Trang 9elicited more reference to the rating scale and articulating and justifying scores Finally, theanalytic scale prompted more references to text organization and linguistic appropriacy.
Rater Experience Effects
Table 4 reports descriptive statistics for the percentages of think-aloud codes by main category
across rater groups It shows that, overall, (a) both groups reported more judgment (Mdn = 59% and 61% for novices and experts, respectively) than interpretation (Mdn = 41% and 39%) strate- gies, (b) self-monitoring focus was the most frequently mentioned (Mdn = 49% and 45%) and language the least frequently mentioned focus (Mdn = 23% and 20%) for both groups, and (c) the novice raters reported slightly more interpretation strategies (Mdn = 41%) and self-monitor- ing focus (Mdn = 49%) than the experienced group (Mdn = 39% and 45%, respectively), who reported slightly more judgment strategies (Mdn = 61%) and rhetorical and ideational focus (Mdn = 30%) Mann-Whitney tests indicated that none of these differences was statistically significant at p < 05, however.
Table 5 shows the subcategories that each rater group reported more frequently than the othergroup did Overall, Table 5 shows that the novices tended to refer to the rating scale and to focus
on local textual aspects and understanding essay content (e.g., summarize ideas) morefrequently than did the experienced raters, who tended to refer more frequently to the essay and
to rhetorical aspects of writing such as text organization and ideas, as well as the writer’s tion and essay length, two aspects that were not included in the rating scales
situa-Interaction Effects
Table 6 reports descriptive statistics of the percentages of think-aloud codes by main categoryacross rating scales and rater groups First, comparing across rating scales within rater group,
TABLE 3 Medians for Strategies That Differed by 1% or More Across Rating Scales
Higher with the holistic scale Read or reread essay 19.32% 14.33%
Interpret ambiguous or unclear phrases 2.36% 1.29%
Articulate general impression 2.97% 1.83%
Rate ideas and/or rhetoric 3.18% 2.09%
Classify errors into types 3.26% 1.82%
Consider lexis 2.28% 1.28%
Consider syntax and morphology 3.62% 2.24%
Consider spelling or punctuation 3.78% 1.91%
Higher with the analytic scale Refer to, read or interpret rating scale 7.78% 11.07%
Articulate, justify or revise scoring decision 8.55% 16.85%
Assess text organization 2.98% 4.54%
Assess style, register, or linguistic appropriacy 1.10% 3.49%
Rate language overall 1.04% 3.32%
Trang 10Table 6 shows that both rater groups reported more self-monitoring focus and judgment gies with the analytic scale and more interpretation strategies and language-interpretation withthe holistic scale Wilcoxon Signed Ranks tests indicated that these differences across rating
strate-scales were statistically significant for both rater groups at p < 05 In addition, the novice raters
TABLE 4 Descriptive Statistics for Decision-Making Behaviors by Rater Group
Focus
Self-monitoring 49.08 43.29 62.42 45.25 39.91 54.74 Rhetorical 27.70 22.43 37.38 30.32 21.32 38.83 Language 23.04 13.45 27.99 20.43 13.91 34.62 Strategy
Interpretation 40.88 36.19 45.09 38.51 31.94 45.52 Judgment 59.12 54.91 63.81 61.49 54.48 68.06 Strategy × Focus
Interpretation
Self-monitoring 30.20 25.91 34.69 29.42 23.77 33.24 Rhetorical 4.47 81 9.31 3.33 90 7.30 Language 4.81 2.31 11.58 4.58 1.76 9.65 Judgment
Self-monitoring 18.30 14.81 27.73 16.79 10.83 26.43 Rhetorical 23.32 19.01 30.68 27.02 18.68 33.52 Language 16.28 11.15 20.19 15.75 12.15 24.97
an = 11 raters bn = 14 raters.
TABLE 5 Medians for Strategies That Differed by 1% or More Across Rater Groups
Higher for the novice group Refer to, read or interpret rating scale 10.15% 8.38%
Articulate or justify score 13.79% 11.22%
Interpret ambiguous or unclear phrases 2.03% 1.02%
Summarize ideas and propositions 1.87% 0.71%
Edit or interpret unclear phrases 1.69% 0.51%
Consider spelling and punctuation 3.78% 2.64%
Higher for the experienced group Read or reread essay 15.89% 17.35%
Envision writer’s personal situation 0.66% 1.67%
Assess text organization 2.42% 4.19%
Rate ideas and/or rhetoric 2.01% 3.13%
Assess quantity 1.01% 2.17%
Trang 11reported significantly more language focus (Mdn = 27%, particularly language-interpretation) and rhetorical-interpretation (Mdn = 6%) with the holistic scale than they did with the analytic scale ((Mdn=18% and 3%), while the experienced raters reported significantly more self-moni- toring-interpretation (Mdn = 31%) with the holistic scale than they did with the analytic scale (Mdn = 29%) The following is a list of the subcategories of strategies that each rater group
reported more frequently with each rating scale
The novices reported the following subcategories more frequently with
TABLE 6 Descriptive Statistics for Decision-Making Behaviors by Rating Scale and Rater Groups
Interpretation 41.70 35.48 51.12 37.38 33.61 42.44 Judgment 58.30 48.88 64.52 62.62 57.56 66.39 Strategy × Focus
Interpretation
Self-monitoring 30.56 26.50 33.59 30.04 25.33 35.92 Rhetorical 5.92 35 13.99 3.39 1.28 6.31 Language 6.52 2.20 11.46 3.73 1.07 11.70 Judgment
Self-monitoring 14.84 7.67 28.84 22.72 18.89 28.20 Rhetorical 22.49 16.45 32.68 25.04 19.40 28.67 Language 15.25 9.92 27.57 14.63 9.99 19.34 Experienced b
Focus
Self-monitoring 43.66 38.83 51.94 45.79 39.29 60.11 Rhetorical 31.11 18.58 44.10 29.98 23.60 36.84 Language 22.23 12.24 37.77 21.03 12.92 33.96 Strategy
Interpretation 41.63 32.86 49.32 37.05 25.12 43.67 Judgment 58.37 50.68 67.14 62.95 56.33 74.88 Strategy × Focus
Interpretation
Self-monitoring 31.08 26.95 36.38 29.36 18.20 33.00 Rhetorical 3.40 65 10.42 3.48 80 4.55 Language 5.14 2.24 11.21 4.02 1.27 8.34 Judgment
Self-monitoring 12.23 8.43 21.97 20.57 10.97 31.98 Rhetorical 26.15 15.57 36.08 27.01 21.27 33.10 Language 17.66 10.00 26.55 17.06 10.84 27.51
an = 11 bn = 14.