I wish to understand which criteria Vietnamese teachers raters have employed in assessing students’ writing, how much weight they give for each criteria and to what extent their perspect
Trang 1ASSESSMENT OF TERTIARY ENGLISH- MAJOR STUDENTS5 WRITING: VIETNAMESE
A thesis submitted in partial fu lfilm e n t o f the requirements fo r the degree o f Master o f Education (T E S O L ֊ International),Faculty o f Education,Monash
U n ive rsity, M elbourne,A ustralia
Trang 2TABLE OF CONTENTS
A bstract
A cknowledgements
Declaration
List o f Tables
CHAPTER ONE: INTRO DUCTION
1.1 B a c k g r o u n d o f t h e r e s e a r c h
1.2 R e s e a r c h a i m s
1.3 O u t l in e o f t h e t h e s i s
CHAPTER TWO: REVIEW OF THE LITE R A TU R E
2 1 C o m m u n i c a t i v e c o m p e t e n c e
2 2 P r in c ip l e s o f C o m m u n i c a t i v e L a n g u a g e T e s t in g 2.2.1 V alidity
2.2.2 R e lia b ility
2.2.3 P racticality
2 3 I s s u e s in t e s t i n g w r it in g s k i l l s
2.3.1 Aspects in testing w ritin g
2.3.2 M arking scheme
2.3.3 D ifficulties in assessing w ritin g
2 4 S u m m a r y o f t h e c h a p t e r ՜
CHAPTER THREE: M E TH O D O LO G Y
3 1 A QUALITATIVE APPROACH
3 2 S e l e c t io n o f p a r t i c i p a n t s
3 3 M e t h o d s f o r d a t a c o l l e c t i o n
3 4 M e t h o d s f o r d a t a a n a l y s i s
CHAPTER FOUR: FINDINGS AND DISCUSSION
4.1 A n o v e r v i e w o f a s s e s s m e n t p r a c t ic e s a n d w r it in g s y l l a b u s in s o m e V i e t n a m e s e E n g l i s h m a j o r u n i v e r s i t i e s
4.1.1 Assessment practices
4.1.2 W riting syllabus
4 2 T e a c h e r s 5 p e r s p e c t iv e s o n a s s e s s m e n t c r i t e r i a
4.2.1 Process o f developing criteria checklist
4.2.2 Process o f scoring
4 3 C h a r a c t e r i s t i c s o f a “ g o o d ” a r g u m e n t e s s a y
4.3.1 Purpose
4.3.2 Thesis
4.3.3 Evidence
4.3.4 Refutation
4.3.5 Persona
4 4 F a c t o r s a f f e c t in g t e a c h e r s ' a s s e s s m e n t j u d g e m e n t s
4 4
9 9 9 Ớ 5 9 о
0
1 3
5
5
*
>л
> ( р , з ^
^
^
^
^ і
:
2 ՛ շ
2 2 2 2 2 2 4
^
*
յ
* Հ Հ
^
«
* 4
Trang 3A p p e n d ix 1: A n d e r s o n ’s MARKING SCHEME
A p p e n d ix 2: J a c o b s e t AL.’s s c o r i n g p r o f i l e
A p p e n d ix 3: A COMPARISON OF TWO APPROACHES
A p p e n d ix 4: S t u d e n t s ' w r i t i n g t e x t s
A p p e n d i x 5: R e f l e c t io n q u e s t i o n s
A p p e n d ix 6: R e f l e c t i o n a n s w e r s
A p p e n d ix 7: E n g l i s h T r a n s l a t i o n o f I n t e r v i e w s ••••
A p p e n d ix 8: V i e t n a m e s e T r a n s c r i p t s o f I n t e r v i e w s 4.4.1 The influence o f raters culturally-based perspectives and norms 47
4.4.2 The influence o f the marking scheme 49
4 5 A s s e s s m e n t c r i t e r i a 7. 4 9 4 6 T h e u s e f u l n e s s o f t h e p r o c e s s o f d i s c u s s i o n o f a s s e s s m e n t c r it e r ia 5 3 CHAPTER FIVE: CONCLUSION AN D RECO M M EDATIO N S 55
5.1 S u m m a r y o f t h e s t u d y 55
5 2 R e c o m m e n d a t i o n s 5 6 5 3 L i m i t a t i o n s 5 7 5 4 D i r e c t i o n s f o r f u t u r e r e s e a r c h 5 7 REFERENCES 59
66 68
6 9
7 0
8 0
81
83
02
Trang 4A bstract
Direct tests seem to increasingly become popular in Vietnam as getting students’
to write is the best way to test their writing ability (Hughes, 2003) One o f the most significant challenges in assessing writing is the subjectivity o f judgements and ensuring that these judgements are consistent Unfair decisions may affect individuals’ lives (Hughes, 1989) For this reason, the research was carried out in order to explore how teacher raters make their scoring judgments, to develop collaboratively a set o f criteria in a checklist through which teachers’ assumptions about 'good’ w riting were revealed and to gather teachers’ perspectives on the usefulness o f the process for a reliable and valid scoring
Five Vietnamese teachers, who are pursuing their Master degrees in Melbourne, Australia, and who teach in universities with English - major courses, were involved in this study A criteria checklist was first developed by two experienced teachers A workshop was then held with the attendance o f five teachers applying the checklist to score the sample essay Changes were made after the discussion o f the marking and a new criteria checklist was established The agreed upon criteria checklist was used to rate four essays The participants were then asked to provide their reflections in w riting on the usefulness o f this process o f training for their work as raters
The findings showed that inconsistency among raters in scores exists even when there was a shared criteria checklist There was a change in consistency across raters (inter-rater reliability) after the discussion o f the marking o f the sample essay Nonetheless, the question o f how much raters scale up or down in their grading is still challenging Raters’ cultural perspectives (the norm, Western style
or Oriental style, that raters favour) and the rating scheme (the holistic or analytical scoring) also influence teachers’ judgments A ll o f these would be improved through on-going rater training and moderation and the development o f
a more detailed criteria checklist Also, the characteristics o f a ‘good’ argument essay (the w riting genre that was assessed in this study) and the usefulness o f the discussion workshop were presented through the teachers’ perspectives Finally,
in response to the findings, a holistic criteria checklist was developed with the Vietnamese ten-point scale and level descriptors
Trang 5A C K N O W L E D G E M E N T S
I would like to express my deepest gratitude to my supervisor, Mrs Rosemary Viete for her whole-hearted assistance during the development o f this research thesis under her supervision, without which this research thesis could not have been completed
I am especially indebted to my family, my husband for their constant encouragement during the course o f my study
I am also very much grateful to the teacher participants without whom this thesis would have been impossible
I sincerely express my great thanks to Mr Le Thanh Dzung, Dean o f English Department - Hanoi University o f Foreign Studies for his kind assistance
M y special thanks also go to Dr Sophie Arkoudis,Department o f Language, Literacy
& Arts Education - University o f Melbourne for her valuable help
Finally, I also would like to convey my gratitude to my colleagues who have helped
me in various ways
Trang 6Full name: Nguyen Tra M y
The plan for this research was approved by the Standing Committee on Ethics in Research Involving Humans on 6 August, 2003 (Reference 2003/524)
Trang 7List o f Tables
Page
• Table 4.1 The calculation o f students’ final results 26
• Table 4.3 Criteria checklist set up by two experienced teacher
• Table 4.7 Summary o f agreements and disagreements in the
Trang 8C H A PT E R ONE: IN TR O D U C TIO N
1.1 Background of the research
The major language o f international communication for the Socialist Republic o f Vietnam was Russian from 1954 until the recent political changes in Eastern and Central Europe For the South o f Vietnam, French was the first foreign language (this area was under French occupation) till 1954,and then English (due to the fact that the
US involved in Vietnam war) until the reunification o f the country in 1975 After reunification, Russian was the first national foreign language for a number o f years, and little attention was paid to the teaching o f either English or French (Do, 1999; Nguyen and Crabbe,1999) In the context o f political renovation and the open-door policy pursued by the Vietnamese government in the past decade, English has become the first foreign language In recent years, Vietnam has extended its political, diplomatic and economic relationship with other countries and consequently, it has witnessed an explosion in the demand for English (Brogan and Nguyen, 1999) With the move to a market economy by the Vietnamese government and the growth o f international business as well as an increasing number o f foreign tourists, knowledge
o f English has become the passport to a better-paid job not only in the tourism and hospitality industries, but also in many other enterprises (Nguyen and Crabbe, 1999)
The spread o f English as a global means o f communication has had much impact on the English language teaching and learning in Vietnam Language testing which often goes in line w ith language teaching and learning is o f high importance It works as the motivation for teaching and learning processes, measures learners’ levels and has an influence on the curriculum since designers might revise program goals and objectives in the on-going development o f the curriculum (Brown, 1995) A number
o f studies have been conducted in the field o f testing in general and in testing writing
in particular (Freedman, 1979; Hamp-Lyons,1991; Vaughan, 1991; O ’ Loughlin,
1992; О 5Hagan, 1999; Lumley, 2002; Weigle,2002) W riting assessment in the context o f Vietnam, however, seems to be largely unexplored The fact has prompted
me to conduct research on the assessment o f writing, a relatively subjective assessment, in Vietnamese universities
Trang 9In terms o f subjectivity, direct tests were discouraged and avoided as reliability dominated in language testing in the past This was true in the world in the 1950s and 1960s (McNamara, 2000: 38) and still is in some Vietnamese universities as I have observed Grammatical structures and knowledge o f vocabulary were assessed instead
o f writing skills However, in many universities nowadays direct tests are becoming common This gives rise to issues o f reliability and validity The problem o f subjectivity has increasingly been recognised to be “ something that had to be faced and managed,,in direct tests (McNamara, 2000: 38) This strengthens my wish to do research in this field
I wish to understand which criteria Vietnamese teachers raters have employed in assessing students’ writing, how much weight they give for each criteria and to what extent their perspectives on assessment criteria are similar or different, which might explain degrees o f disagreement and discrepancy o f the final scores for a piece o f writing among raters
Also, my interest lies in the factors that affect teachers’ assessment judgements I am inspired to know which norms Vietnamese teacher raters favour in their assessment since both Western w riting style and Oriental w riting style, linear and circular respectively, according to views expressed in Liddicoat (1997) might well be observed in students’ writing M y study, in addition, involves identifying which marking schemes {holistic or analytical marking) teachers favour and how these
reflect students’ best abilities
Finally, teachers’ perspectives on the usefulness o f discussion o f assessment criteria, a kind o f moderation, are explored The fact that we young teachers are often not given the assessment guidelines and training prompted me to offer and investigate teachers’ perceptions o f this process o f training cum moderation
1.2 Research aims
The purpose o f this research is to: 1) find out the assumptions English teachers in Vietnamese universities share about ‘ good w riting’ for an argument essay; 2) have participants collaboratively develop a set o f criteria for scoring such writing; 3) identify
Trang 10the basis on which teachers make scoring judgements against these criteria, and 4) find out teachers’ perceptions o f the usefulness o f the process as a tool for more reliable and valid training and scoring.
1.3 Outline of the thesis
This thesis consists o f five chapters Chapter One is the introduction and the research aims Chapter Two reviews the literature on communicative language testing and the testing o f writing Chapter Three presents the qualitative methodology used for the research with the focus on in-depth interviews and open-ended questionnaires Chapter Four deals w ith the discussion o f the findings The summary o f the findings and the recommendations for teachers to be better supported in assessing writing performance are presented in Chapter Five Following the chapters are References and Appendices
Trang 11C H A PT E R TW O : REVIEW OF TH E L ITER A TU R E
In this part, I w ill deal w ith the notion o f communicative competence, which is considered the framework for communicative language testing, the principles o f which w ill then be presented with three fundamental criteria: validity, reliability and practicality Other issues in language testing w ill also be looked at Lastly, I w ill mention issues in testing w riting skills, which are relevant to the focus o f my research
2.1 Communicative competence
Communicative language teaching (CLT) was devised in the late 1960s to satisfy the new demands o f using English (Soler and Guzman, 2000) Communicative competence, a principal concept o f this approach, has generated a number o f discussions around its definition Many authors have mentioned the distinction between “ competence,, and “ performance,, Savignon (1983: 9) argued that
“ competence is what one knows Performance is what one does,, Kempson (1977: cited in Canale and Swain, 1980) claimed that competence is identified as the language users’ knowledge and performance is the study o f the use o f that knowledge Canale and Swain (1983) later developed a framework o f communicative competence consisting o f four aspects:
1 Grammatical competence includes those competences involved in language
use, i.e the knowledge o f such linguistic aspects as lexicology, morphology,
syntax, phonetics and phonology;
2 Sociolinguistic competence refers to control o f the conventions o f language
used that are determined by the features o f the specific language use;
3 Discourse competence means the mastery o f how to combine grammatical
forms and meanings to achieve unity o f a spoken or written text in different genres; and
4 Strategic competence is defined as the mastery o f verbal and non-verbal
communication strategies used to compensate for breakdowns in communication, and to enhance the effect o f utterance
(Adapted from Savignon, 1983; Bachman, 1990; Berns, 1990 and Shaw, 1992)
Trang 12Canale and Swain (1980) also demonstrated implications for a communicative testingprogramme as follows:
communicative testing must be devoted not only to what the learner knows about the second language and about how to use it (competence) but also to what extent the learner is able to actually demonstrate this knowledge in a meaningful communicative situation (performance) (34)
The notion o f communicative competence can be taken into account by test designers
in terms o f test content and test methods It is also useful in working out the criteria and the marking scheme
2.2 Principles of Communicative Language Testing
Three basic considerations in language testing that are mentioned by a number o f researchers (Hughes, 1989; Bachman, 1990; Weir, 1993; McNamara, 2000) are validity, reliability and practicality
2.2.1 Validity
Validity is defined as whether the test measures what it is meant to measure (Weir,
1990) W eir (1990) demonstrated five sub-components o f validity: construct validity, content validity, face validity, wash-back validity and criterion-related validity Other
kinds o f va lid ity were also mentioned such as concurrent validity, predictive validity
(Bachman, 1990; Davies, 1990) ,operational validity (Viete, 1992) and consequential validity (McNamara, 2000) A ll subgroups o f validity w ill be discussed below.
Bachman (1990: 255) argued that construct validity deals w ith “ the extent to which
performance on tests is consistent with predictions that we make on the basis o f a theory o f abilities” Hughes (1989), Davies (1990) and W eir (1990) also share the same views on the notion o f construct validity.
Content validity, according to Anastasi (1982: 131,cited in Weir, 1990: 25),is
defined as “ the systematic examination o f the test content to determine whether it covers a representative sample o f the behaviour domain to be measured” The argument o f the relevance o f the test content is discussed by Bachman (1990) and
Trang 13McNamara (2000) The former holds the view that content validity involves content
relevance and content coverage, as agreed by Davies (1990) The latter argues that
“judgements as to the relevance o f content are often quite complex, and the validation effort is accordingly elaborate.” (51)
McNamara (2000: 133) defined face validity as “ the extent to which a test meets the
expectations o f those involved in its use, e.g administrators, teachers, candidates and test score users,, W eir (1990) argued that students would not perform at their best in the absence o f face validity It, however, must be the first one to be neglected i f there
exists a conflict between it and any o f the other validities (Davies, 1990)
Another type o f validation is washback validity, which refers to the influence o f the
test “ on the teaching and learning that precedes it” (Weir, 1990: 27) It is appreciated that i f language teachers equip students with skills relevant to present and future needs and the test is designed to reflect these, the relationship between the test and the teaching that precedes it w ill become closer
Criterion-related validity demonstrates the relationship between test scores and a
suitable criterion o f performance (Bachman, 1990; Weir, 1990) Concurrent validity
(examining the correlation between test scores and another measure o f performance, usually an older established test) and predictive validity (concerning whether test
scores can predict future performance) are two types o f criterion-related validity (Bachman, 1990; Weir, 1990)
Operational validity describes “ the relationship between the ‘ real w orld’ performance
and the performance measured by the test” (Viete, 1992: 122) In other words, only by observation o f the candidate functioning in the real world and comparison o f this with performance on the best can operational validity be established (Viete, 1992)
Consequential validity is considered as changes that occur as a consequence o f a test’ s
introduction and “ may in turn have an impact on what is being measured by the test, in such a way that the fairness o f inferences about candidates is called into question” (McNamara, 2000: 53)
Trang 14Among the different aspects o f validity, construct validity is regarded as the most
important As Cumming (1996) argued:
Rather than enumerating various types o f validity , the concept o f construct validity has been widely agreed upon as the single, fundam ental p rinciple that subsumes various other aspects o f validation, relegating their status to research strategies o r categories o f em pirical evidence by which construct validity might
be assessed o r asserted (5)
What is more, Gipps (1994: 61) stressed that “ construct validity is needed not only to support test interpretation,but also to justify test use,, Bachman and Palmer (1996) added that construct validity helps to interpret scores from language assessment as indicators o f learners’ language ability A crucial question emerged: “ To what extent can we justify these interpretations?” (Bachman and Palmer, 1996: 21) In writing assessment, the issue o f construct validity underlying concerns about reliability in scoring has been investigated (Hamp-Lyons, 1990) Several aspects in such research involved the decisions and criteria that raters employ to form their judgements and the empirical validation o f scales and criteria used for scoring (Hamp-Lyons, 1990) In this research, the construct validity underpinning the basis on which teacher raters made their judgements o f students’ writing texts and established the criteria checklist was investigated
2.2.2 Reliability
McNamara (2000: 136) referred to re lia b ility as “ consistency o f measurement o f
individuals by a test” Davies (1990: 21) also demonstrated a similar definition as he put forward: ^ re lia b ility [is] the consistency o f test judgements and results” Two
main groups o f factors affecting the reliability o f the tests are test-related factors and scorer-related factors (Viete, 1992) Considerations should be taken into account in several aspects o f re lia b ility
Test-related factors consist o f the testing environment (fam iliarity, personnel involved
in the test, timing, physical conditions), test rubric (time organization, time allocation and instructions), the input (format, nature o f language), the expected response (format, nature o f language, restrictions on response), and the relationship between input and response (reciprocal, nonreciprocal, adaptive) (Bachman, 1990)
Trang 15Scorer-related factors refer to the format and nature o f the assessment criteria,criterion-referenced scoring methods (holistic or analytical scoring), degree o f experience and training o f scorers, conditions for scorers, number o f scorers (multiple scoring is preferred), sequence and number o f performances scored, degree o f independence o f scorers, existence o f moderation procedures and anonymity o f tests (Viete, 1992) According to my observation, in the majority o f Vietnamese English major universities or departments, scoring criteria are not always available, training o f scorers appears to be in absence, raters get tired from marking too many w riting tasks within a short time due to the large number o f students and limited staff and time Moderation procedures ensuring individual scorers use all criteria and procedures consistently and assisting in making final decisions about scores where major discrepancies occur amongst scorers (Hughes, 1989 and Walker, 1990,quoted in Viete, 1992) usually do not exist Other scholars described scorer-related factors including the consistency o f scoring among different raters [“ inter-marker reliability,
(Bachman, 1990: 180,Weir, 1990: 32)] and the consistency o f each individual rater [“ intra-marker reliability*,(Bachman, 1990: 179; Weir, 1990: 32)] In my research, inter-marker reliability was considered
Validity and re lia b ility are interrelated because a valid test must be a reliable one and
a test which is a reliable measure o f something other than what we intend to measure (not valid) is useless (Hughes, 1989; Weir, 1990) W eir (1990: 33) argued that “ it is sometimes essential to sacrifice a degree o f reliability in order to enhance validity” Later, he agreed with Guilford (1965: 481,cited in Weir, 1990: 33) that “ i f a choice has to be made, validity 'after all, is more im po rta n t,, A compromise between the two, however, should be looked for depending on the purpose o f the test
A number o f sources o f variability in raters’ judgements have been identified in numerous studies Among others, these include raters’ cultural or disciplinary background (O ’ Loughlin, 1992; Cumming et al., 2002),raters’ training and moderation (Hamp-Lyon, 1991; Weir, 1993; Weigle,1994; Alderson, 1995; Bachman and Palmer, 1996,Lumley, 2002; Hughes, 2003),different interpretations o f assessment criteria (Gipps, 1994) There have been increasing attempts to enhance the test reliability Assessment criteria, for instance, have been developed to provide raters with a basis from which raters form their judgements (Cumming et al., 2002)
Trang 16What is more, rater training and moderation have been carried out to help raters to reach a degree o f agreement about assessment criteria and rating scales or in other words “ help bring raters to a temporary agreement on a set o f common standards,(Weigle, 2002: 72) Nevertheless, the issue o f reliability might undeniably persist, since “ raters w ill never be in complete agreement on w riting scores” (Weigle, 2002: 72) and complete elimination o f inconsistencies would be an unrealistic goal as Bachman and Palmer (1996) demonstrated.
2.2.3 Practicality
P racticality or “ test efficiency” involves the “ financial viability,o f the test design, administration and scoring (Weir, 1990: 34-35) It is almost impossible to maintain high validity and reliability in a test that is not too costly and does not require a lot o f people, time and materials (Davies, 1990) Bachman (1996) argued that a given test could not be said to be more or less practical than another since it depends on a specific testing situation where resources required vary Compromise is necessary to maintain the balance among validity, re lia b ility and p ra ctica lity o f the test (Bachman,
1996)
2.3 Issues in testing writing skills
2.3.1 Aspects in testing writing
Weir (1993) demonstrated three aspects that should be taken into account when testing written production They are conditions, operations and quality o f output The
literature suggested that text types, topic and time allowance were different conditions
that impact on the reliability and validity o f the w riting tests It is argued that having more than one w riting task {text types) to perform increases reliability and validity
because it is relatively d ifficu lt to know about the candidate’ s general w riting ability through one w riting task (Hughes, 1989; Hamp-Lyons, 1990-1991,cited in Weir, 1993) Test practicality, however, w ill be an influence in terms o f the time taken by such variety W riting topics should be relevant to students’ background knowledge to
ensure that they are able to write something on the topics (Weir, 1993) A choice o f topics could affect the test reliability because “ too much uncontrolled variance’’ w ill
Trang 17appear in the test (Weir, 1993: 135) In regard to the appropriate time allowed for the
completion o f w riting tasks, it is necessary to provide sufficient time for candidates to produce texts that have to be long enough to be marked reliably (Weir, 1993)
Two different approaches for assessing w riting ability described by Hamp-Lyons (1991 cited in Weir, 1993) are the indirect method and the direct method The former deals w ith a discrete point framework like grammar, vocabulary, spelling, etc and these elements can be tested separately by the use o f objective tests (Weir, 1993) It would be d iffic u lt to make statements about how w ell candidates write from the discrete item tests The latter refers to “ more direct extended w riting tasks” which involve “ the production o f continuous texts,,in which writers can raise their own ideas (Weir, 1993: 133)
In communicative testing and process-oriented curricula, direct tests seem to be more suitable though the process in the w riting examination does not usually reflect the process o f w riting including brainstorming, outlining, w riting and rewriting, editing and revising (V eit et al., 1994) The fact that in the Vietnamese educational context, there is a large number o f students and limited staff, and a heavily exam-oriented curriculum explain this Nonetheless, in this present study, 50 minutes, the test length,
is expected to give students enough time to carry out these steps to produce a 250 word w riting test, achieving a relative balance between practicality and validity
2.3.2 Marking scheme
Two basic approaches to scoring, analytical scoring and ho listic scoring, are
discussed by a number o f authors (Hughes, 1989; Hamp-Lyons, 1991; Weir, 1993;
O ’ M alley and Pierce, 1996; McNamara, 2000; Weigle, 2002) In the follow ing subsections, the definitions o f the two marking approaches and the arguments for which approaches to adopt are presented
2.3.2.1 Analytical marking
A nalytical m arking is the method in which each aspect o f a performance e.g content,
grammar, organization, etc is rated separately and the final score is the total o f these individual ones (W eir, 1993) Other writers share sim ilar definitions though the
Trang 18wordings are o f little difference like “analytic scales separate the features o f acomposition into components that are each scored separately,(O ’Malley and Pierce, 1996: 144) or “ scripts are rated on several aspects o f w riting or criteria rather than given a single score” (Weigle, 2002: 114).
Analytical scoring holds a number o f advantages First, more detailed feedback regarding specific aspects o f students’ w riting performance and diagnostic information for teachers in planning instruction are provided (Perkins, 1983 cited in O’Malley and Pierce, 1996; Bachman and Palmer, 1996; Weigle, 2002) Moreover, components o f w riting students have progressed in most rapidly can be seen through analytical scoring (Hamp-Lyons, 1991) and the problem o f uneven development o f subskills can be revealed (Hughes, 1989) Another advantage in terms o f scorers and scoring process is that every aspect o f writing skill that might be ignored has to be looked at and more scores given for each component can result in more reliable scores (Hughes, 1989) In addition, explicit concern reflected in teachers’ feedback, particularly teachers’ praise on the positive aspects o f students’ w riting makes students feel motivated, encouraged and invited to write as is shown by Tran (2002)
in relation to Vietnamese students o f English in writing
Limitations can be witnessed in this kind o f marking scheme First, it is obvious that analytical marking takes longer than holistic schemes, an issue o f practicality, since more than one score for each component is required (Hughes, 1989; O’ Malley and Pierce, 1996; Weigle, 2002) The major problem as seen by Hughes (1989),Weir, (1993) and Weigle (2002) is whether scorers judge each aspect separately from the others [called a “ halo effect” by Hughes (1989: 103) and W eir (1993: 163)] In other words, “ rating o f one criterion might have a knock-on effect in the rating o f the next,’ (Weir, 1993: 164) since every component in a piece o f w riting is integrated Madsen (1983) and O’ M alley and Pierce (1996) also raised another issue It is that teachers raters do not agree w ith the weight given to each component (O ’M alley and Pierce, 1996) or do not know how to weigh each error (Madsen, 1983) It might even be the case that experienced raters use the analytical scoring scheme but rate more holistically to come to a single score (Weigle,2002)
Trang 19In analytic m arking schem es, each aspect such as organization, vocabulary andgrammar might be equally weighted like in Anderson’ s scheme (cited in Hughes 1989: 101-102,see Appendix 1),which consists o f five scales, each divided into six levels with score points ranging from 1 to 6,where the final score is the total o f all weighted scales A note-worthy point in this scheme is “ the conjunction o f frequency
o f error and the effect o f errors on communication,(Hughes, 1989: 103) Put in a different way, a small number o f grammatical errors can have a more serious effect on communication than a series o f another kind (Hughes, 1989) A different scheme can
be witnessed in Jacobs et al.’ s scoring profile (1981 cited in Hughes, 1989,see Appendix 2) It is apparent from this scheme is that the more significant one aspect is, the more weight it receives (Hughes, 1989; Weigle, 2002) Five components o f writing: content, language use, organization, vocabulary and mechanics receive 30,
25,20,20 and 5 points respectively in order o f different emphasis The weightings can vary according to students’ levels (Hughes, 1989) The association o f each score with its descriptors helps raters to grant scores in accordance w ith students’ levels (Hughes, 1989)
2.3.2.2 Holistic marking
H olistic marking (referred to as global marking by Weir, 1993; Bachman and Palmer,
1996 or general impression marking by Weir, 1990; Weigle, 2 0 0 2 or
“ im pressionistic” scoring by Hughes, 1989) refers to the rating o f a performance as a
whole (McNamara, 2000) In this approach, scores are not required for each component in the criteria
Hamp-Lyons (1991) and Weigle (2002) shared a common view that holistic scoring has become prevalent in w riting assessment in the past 25 years A number o f positive features explain this trend Apparently, this approach to scoring is faster and consequently less expensive than any other approach (Hughes, 1989; Hamp-Lyons, 1991; Weir, 1993; Weigle, 2002) It takes experienced scorers a couple o f minutes (Hughes, 1989) or even one minute or less (Hamp-Lyons, 1991) to assign a one page text a score For this reason, the use o f more than one rater is encouraged (Weir, 1993) “ to compensate for interrater unreliability” (P Cooper, 1984: 243 cited in
Trang 20Hamp-Lyons, 1991) This notion is also shared by Hughes (1989) and Hamp-Lyons(1991) as it is argued that scores given by multiple raters are more reliable than thosegiven by a single one This is, however, true
only i f the markers are equally consistent in their own marking I f this is not the case the re lia b ility o f the more consistent marker on his own might be better than the combined re lia b ility estimate fo r two markers who exhibit unequal consistencies (Weir, 1993: 165)
Another advantage o f this kind o f scoring is the intention to focus the reader’s attention on the strengths o f the writing as White (1985,cited in Hamp-Lyons, 1991) claimed Readers’ attention can concentrate on certain aspects o f w riting and therefore can provide appropriate information in an efficient way (Weigle, 2002) What is more, holistic scoring reflects best the authentic and personal reaction o f a reader to a text (White, 1984,cited in Weigle, 2002) and “ reinforces the vision o f reading and writing as intensely individual activities involving the fu ll s e lf’ (White, 1985: 33,
quoted in Hamp-Lyons, 1991)
Holistic marking, on the other hand, presents several weaknesses First, a person’s writing ability cannot be seen through the single score since diagnostic information through scores for each component o f a w riting task such as organization, content, vocabulary is not provided (Weir, 1993; Bachman and Palmer, 1996; Weigle, 2002) One might have a good command o f grammar, but not be very good at organizing ideas Others might have abundant ideas organized in a logical way but be poor at sentence structure (Weigle,2002) A profile o f student writers including a description
o f language ability (errors) or a prescription for treatment is expected but holistic scoring fails to do so (Bachman and Palmer, 1996)
A major problem o f holistic scoring is the employment o f multiple 'hidden,
components o f language ability when arriving at the final score as Bachman and Palmer (1996) and Weigle (2002) demonstrated It is d ifficu lt to interpret the score since different raters do not necessarily use the same criteria and i f they do, different components might be weighted differently (Bachman, 1996; Weigle, 2002)
“ Superficial characteristics” (Bachman and Palmer, 1996: 144) namely length, handwriting (Markham, 1976; Sloan and McGinnis, 1982 cited in Weigle, 2002) ,
word choice and spelling errors (Charney, 1984,cited in Vaughan, 1991) which are
Trang 21easy to pick out but irrelevant when assessing writing ability have an influence on raters (Stewart and Grobe, 1979,cited in Vaughan, 1991).
Holistic scoring may be done wholly subjectively giving few or no indications o f how different components are considered in receiving a single mark (Bachman and Palmer, 1996) Reliability w ill be increased i f a scoring guide or rubric is given (Hamp-Lyons,1991) since information in the scoring scales should be considered “ the ultimate decision about the individual test taker” (Bachman and Palmer, 1996: 210) Weir (1990: 64) also argued that the provision o f “ a detailed mark scheme and (by) the efficient standardization o f examiners prior to the marking exercise,,would be o f great help
Another way to improve reliability is the call for multiple scoring proposed by Hughes (1989); W eir (1990,1993); Hamp-Lyons (1991); Bachman and Palmer (1996) and Salies (1998) Francis (1977,cited in Weir, 1990) also suggested that markers in the rating process read 10 to 25% o f sample scripts so that a standard is established in their mind and then read all scripts and allocate grades
To move towards increasingly reliable scores, the development o f a “ sense o f community5’ is necessary (White, 1985: 24,quoted in Hamp-Lyons, 1991) Raters should give up their subjectivity in the holistic approach and adjust their reactions in line with those o f others raters (Hamp-Lyons, 1991) The construct o f community is not for the purpose o f scoring essays ending within the day, but rather the process through the year Hamp-Lyons (1991) defined the construct o f community and its value as follows:
Community in this context is achieved by the same readers meeting together, reading essays, and talking about them, again and again Under these conditions, with sensitive leadership, readers come to fin d disagreement irrelevant and discussion becomes the appropriate term fo r their collegial exploration (245).
For the present study, I asked my participants for their opinions about the criteria in
an assessment checklist using an analytical approach and sought their perspectives on
Trang 22which approach reflect best students’ ability and on the usefulness o f the construct o fcommunity.
2.3.2.3 Analytical marking versus holistic marking
A summary o f the pros and cons o f the two approaches is presented by Weigle (2002,
see Appendix 3) W eir (1990) held the view that analytical method is more reliable, quoting a number scholars conducting research in this field such as Hartog et al (1936),Cast (1939) and Francis (1977) A ll shared the same view that “ variation between markers was, to some extent, reduced by the analytical method,,(Hartog et al., 1936,cited in Weir, 1990: 64) or “ the analytical method slightly superior [to holistic one], (Cast, 1939, cited in Weir, 1990: 64) Hamp-Lyons (1991) also supported this analytical method as she reasoned reliability tends to increase
Other scholars, on the other hand, favour the holistic approach Madsen (1983: 122) argued that “ the holistic approach is better [than the analytical approach],, What is more, holistic scoring is more valid than analytical one as White (1984, cited in Weigle, 2002) proposed The choice o f either holistic or analytical scoring depends on the purpose o f the testing and the circumstances o f scoring (Hughes, 1989) Holistic scoring, for instance, might be more appropriate for a small group at one place while analytical scoring might be the right choice for a less well trained group or in different places (Hughes, 1989)
2.3.3 Difficulties in assessing writing
The more communicative and authentic a productive test needs to be, the more difficult it is for a tester to construct a test and produce an appropriate assessing score The difficulties come mainly from the following factors:
2.3.3.1 Sampling
A one-task test may hardly cover the whole range o f activities as set in the course objectives, so tests w ith more than one task are recommended by W eir (1993) However, too much freedom should not be given to candidates as it w ill affect the test validity and reliability In the case o f this study, w riting skill is assessed It should not
be done through the examination o f command over grammar or vocabulary
Trang 23(McNamara, 2000) To test students’ writing ability is to get students to write as suggested by Hughes (1989) through direct tests (McNamara, 2000).
Candidates should know how long the essay should be and for whom this essay is to
be written so that length can be adjusted and a decision about a colloquial style or an academic style can be made (Bachman, 1991; Alderson, 1995) W riting tasks do not aim at testing intelligence or wide general knowledge (Hughes, 1989; Weir, 1993; Alderson, 1995) I f students are not well informed on the topic or find it irrelevant or boring, they may not have enough time to exhibit their level o f English proficiency or not be able to write interestingly (Weir, 1993; Alderson, 1995) This raised the issue
o f whether to allow a choice o f topics, which could affect the test reliability since different topics may result in different responses (Weir, 1993) Jacobs et al (1981, quoted in Weir, 1993: 135) claimed that all students should write on the same topics
“ because allowing a choice o f topics introduces too much uncontrolled variance into the te st,, It, therefore, reduces reader consistency and reliability when evaluating the test (Jacobs et al., 1981,quoted in Weir, 1993)
2.3.3.2 Establishing criteria fo r marking
Establishing the criteria for marking is essential in assessing w riting performance, a subjective scoring process, since the absence o f criteria affects the reliability o f the test in regard to the scorers’ inconsistency in their marking scheme (McNamara, 2000) According to Bachman and Palmer (1996),there are three causes for inconsistency, a potential problem with ratings They are different interpretations o f scales, different standards o f severity and reaction to elements not relevant to scales.
Different interpretations o f scales may occur within different raters or the same rater
on different occasions, resulting in inconsistent scoring (Bachman and Palmer, 1996)
In assessing w riting performance, the interpretation o f register may differ among
raters It is more complicated in holistic scoring since “ raters may tend to develop their own internal componential rating criteria,,(Bachman and Palmer, 1996: 221)
Different standards o f severity refers to the lack o f agreement on the meanings o f the
Trang 24levels o f ability The same writing text may receive various sub-scores for one component, organization, for instance by different raters (Bachman and Palmer, 1996) Reaction to elements not relevant to scales means the influence o f irrelevant
material such as handwriting, positions raters hold on the issue and other elements discussed in 2.4.2.2 (Bachman and Palmer, 1996)
Assessing w riting performance is not a matter o f making “ right- wrong” decisions, but decisions about how well students complete their task (Alderson et al., 1995) A rating scale thus is needed It involves “ descriptors,,(or “ level descriptors,,used by McNamara, 2000: 40) consisting o f numbers, letters or labels (e.g Excellent or Very good) describing “ the kind o f behaviour that each point on the scales refer to” (Alderson, 1995: 107)
Samples o f w riting scales with detailed descriptors in the holistic approach and analytical approach are presented in Table 2.1 and Table 2.2 below
Table 2.1 A sample holistic scale (A lder son t 1995: 108)
18-20 Excellent Natural English with minimal errors and complete
realisation o f task set
16-17 Very good More than a collection o f simple sentences, with good
vocabulary and structures Some non-basic errors
12-15 Good Simple but accurate realisation o f the task set with
sufficient naturalness o f English and not many errors
8-11 Pass Reasonably correct but awkward and non-communicative
OR fair and natural treatment o f subject, with some serious errors
5-7 Weak Original vocabulary and grammar both inadequate to the
subject
0-4 Very poor Incoherent Errors show lack of basic knowledge of
English
Trang 25Table 2.2 A sample analytical scale (Weir, 1993: 160 and Alders on et al., 1995: 109-110)
Relevance and Adeauacv of Content
0 The answer bears almost no relation to the task set Totally inadequate answer
1 Answer o f limited relevance to the task set Possibly major gaps in treatment o f topic and/or pointless repetition
2 For the most part answers the task set, though there may be some gaps or redundant information
3 Relevant and adequate answer to the task set
Compositional organization
0 No apparent organization o f content
1 Very little organization o f content Underlying structures not sufficiently apparent
2 Some organizational skills in evidence but not adequately controlled
3 Overall shape and internal pattern clear Organizational skills adequately controlled
3 Satisfactory use o f cohesion resulting in effective communication
Adequacy of Vocabulary for Purpose
0 Vocabulary inadequate even for the most basic parts o f the intended communication
1 Frequent inadequacies in vocabulary for the task Perhaps frequent lexical inappropriacies and/or repetitions
2 Some inadequacies in vocabulary for the task Perhaps some lexical inappropriacies and/or circumlocution
3 Almost no inadequacies in vocabulary for the task Only rare inappropriacies and/or circumlocution
Grammar
0 Almost all grammatical patterns inaccurate
1 Frequent grammatical inaccuracies
Trang 262 Some grammatical inaccuracies.
3 Almost no grammatical inaccuracies
Mechanical Accuracy I (Punctuation)
0 Ignorance o f conventions o f punctuation
1 Low standard o f accuracy o f punctuation
2 Some inaccuracies o f punctuation
3 Almost no inaccuracies o f punctuation
Mechanical Accuracy II iSoelline)
1 Almost all spelling inaccurate
2 Low standard o f accuracy in spelling
3 Some inaccuracies in spelling
4 Almost no inaccuracies in spelling
In my research, the teacher participants were asked for the criteria they employed for their scoring Scorers’ inconsistency was witnessed when comparing the scores they gave for the students’ essays The issues o f which criteria Vietnamese teachers o f English employ when assessing students’ w riting and how much emphasis they place
on each criterion was dealt with
2.3.3.3 Scoring
When scoring written performance, test scorers have to cope w ith a number o f factors such as handwriting or appearance o f the written work (discussed in Section 2.3.2.2) These irrelevant factors should be excluded when scoring to ensure the validity and reliability o f the test
2.4 Summary of the chapter
This chapter has dealt w ith the notion o f communicative competence,the framework for communicative language testing and three key aspects in assessment: validity, reliability and practicality A number o f problems concerning raters’ judgements emerged, namely different interpretations o f assessment criteria, various weightings given to each component, marking schemes (holistic or analytical marking), etc leading to the discrepancies among raters in terms o f the results This raised the issue
o f developing a set o f criteria for scoring (including the characteristics o f the genre being tested) and raters training and moderation, which are further explored in Chapter Four
Trang 27C H A PT E R THREE: M E T H O D O L O G Y
In the previous chapter, the literature review relating to the topic o f this study is presented This chapter focuses on the qualitative approach this study adopted The selection o f participants and the methods for data collection and analysis w ill be discussed in the later sections o f this chapter
3.1 A qualitative approach
A qualitative approach is required in my research because “ interpretive,,and
“ critical” approaches are employed in my research (Scott and Usher, 1996: 18) It is argued that all human action is meaningful and has to be interpreted and understood within the context o f social practice Researchers, then listen to participants and interpret what they say according to their own values and beliefs (Scott and Usher, 1996) Audio equipment was used and interviews were conducted to collect data Information from recorded interviews was analysed and synthesised using
“ interpretive epistemology,,approach (Scott and Usher, 1996) As Denzin and Lincoln (1994: 4) stated, qualitative methods “ emphasise processes and meanings that are not rigorously examined or measured in terms o f quantity, amount, intensity or frequency5, Gay (1996) argued that a qualitative approach wishes to obtain a more comprehensive picture o f the educational process, a holistic, in-depth understanding
o f the phenomenon under study rather than just finding answers to the question o f how well, how much or how accurately something is done Moreover, Maykut and Morehouse (1994) affirmed that qualitative research is generally examining people’ s words and actions in narrative or descriptive ways more closely representing the situation experienced by the participants The aim o f qualitative research is “ to discover patterns which emerge after close observation, careful documentation and thoughtful analysis o f the research topic” (Maykut and Morehouse, 1994: 21) A ll the features introduced by respected authors seem to suit my study because I want to obtain a deep understanding o f what criteria Vietnamese teachers o f English at tertiary level employ when marking w riting tasks and how much emphasis they place on each criterion through interviews
Trang 283.3 Methods for data collection
I used students,essays (photocopied ones, see Appendix 4) at my university, which were sent to me from my home country with the permission from the Department Dean These are the end-term argument essays written by second year students The old scores, remarks and students,names were taken off Since I hoped to get teachers, reactions from giving feedback to pieces o f writing, the essays were left in their original handwritten form so that the nature o f the rating task would seem more authentic The procedure aimed to ensure the authentic atmosphere o f testing because teachers worked on the real tests w ith real handwriting
Two rounds o f interviews were conducted First, two experienced teachers were asked
to set up a criteria checklist for assessing argument essays, through which the characteristics o f an argument essay were also revealed from their perspectives A workshop then was held with the participation o f all five teachers after allowing some time for researcher to do the summary o f the first interview In order to collect information on teachers’ assumptions about students’ w riting and the criteria checklist, the participants in this study were asked to read one selected essay among five, rate it based on the given criteria checklist set up in the first interview There are two reasons why I asked my participants to rate one essay prior to the discussion o f the criteria First, teacher participants would get themselves fam iliar w ith the writing
Trang 29text and the criteria In addition, the process let them think about any category in the criteria that was vague to them or suggest any category or sub-category they wanted
to add after marking this essay Finally, they could quote evidence from the same essay that they all knew in the process o f rating to support their arguments when developing collaboratively a set o f criteria for scoring such writing, which was one aim o f this research The workshop, in my opinion, was helpful because arguments and suggestions from different participants were expected to help teachers recall ideas from their own experience, even pave the way for new ideas which might be practical
as McDonough and McDonough (1997: 183) hold the same view when claiming that
“ individuals can 'spark5 each other into perceptive lines o f discussion,,
The criteria checklist set up in the workshop was the ground for the marking o f the remaining four essays, among which two (one generally considered to be good and one generally considered to be bad) then were chosen for raters to discuss their judgements Discrepancies in scores for the same essay were observed, resulting from contradictory opinions appearing at this stage, which would provide broader views o f writing assessment The content o f the discussion is presented in Chapter Four
The last aim o f the research is to investigate teachers’ perspectives o f the usefulness
o f the process for reliable training and scoring Questions (see Appendix 5) were sent
to participants through e-mails and answers (see Appendix 6) were received within a week since the process o f marking and discussion was still fresh Suggestions from the participants about how scorers can be best supported in w riting assessment can be witnessed
Among three main possibilities o f doing research, write-up after the interview, audio recording and note-taking (McDonough and McDonough, 1997),audio recording was required in my research because it, though being time-consuming, shows a number o f advantages over the other two in that “ it ensures an accurate and detailed record o f actual language data which may not just be the vehicle for the interview, but its object
as w ell” (McDonough and McDonough, 1997: 186) In contrast, w rite-up after the interview might encounter the problem o f memory, and the quantity o f vague data and
Trang 30the distraction o f attention o f both interviewers and interviewees can be seen in note- taking method.
լ have chosen semi-structured interviews among the three types o f interviews (structured, semi-structured and unstructured interviews) introduced by M inichiello (1990) and McDonough and McDonough (1997) While structured interviews seem to
be rigid because they require “ standardised questions,,and are regarded as “ one-way process” (M inichiello, 1990: 90-91) and unstructured ones refer to “ formal interview schedules and ordering o f questions,,and are controlled by interviewers (M inichiello, 1990: 92),semi-structured interviews appear to be relatively suitable for my research because semi-structured ones are based on “ an interview guide or schedule (which) is developed around a list o f topics without fixed wording or fixed ordering o f
questions.” (M inichiello, 1990: 92) Interviewers, therefore, are allowed greater flexibility than in the survey-style interview What is more, open-ended questions w ill
be employed in my interviews since they 4allow the respondents to feel that they can contribute more individual points o f view and more detailed information than is elicited in closed questions,(McDonough and McDonough, 1997: 176) In my research, I can stop my participants and ask them for elaboration whenever encountering vague ideas In-depth interviewing is a method used in interpretive research (M inichiello, 1990),the approach that I have chosen
The interviews were conducted in Vietnamese, as it is believed that it would help the interviews to be informal and intimate The interviewers would also find it easy to express their ideas
3.4 Methods for data analysis
The audiotaped interviews were transcribed in Vietnamese and then translated into English The data did suggest some categories for characterizing information in my study as suggested by Wiersma (1995) They were organized and coded according to categories and subcategories (Wiersma, 1995) Based on the main source o f data from the interviews, similarities and differences teachers shared about the characteristics o f
a 'good’ argument essay and the criteria applied when scoring were identified Scores awarded by teacher raters were compared; discrepancies were witnessed and
Trang 31explained in relation to the literature Recommendations on further development o f writing assessment including a detailed criteria checklist and the points o f focus in criteria discussion were collected and sorted to fu lfil the aim o f the research The English versions were line-numbered so that it would make it easier to find the references in the chapter dealing with the discussion o f the findings based on the translated versions.
Trang 32C H A PT E R FOUR: FIN D ING S AND D ISC U SSIO N
This chapter first presents a broad overview o f the assessment practices and the writing syllabus in some Vietnamese universities with English major courses and then the findings from the interviews and the reflection questions conducted with five teacher participants In terms o f confidentiality, pseudonyms were used to assign to participants namely Khanh, Linh, Ha, Thanh and Vinh (referred to as K, L, H, T and
V as cited in interviews respectively) The English versions o f the interviews, which were used for data analysis, were divided into four parts, A, B, c, and D, line- numbered and coded E This chapter focuses on the following categories: an overview
o f assessment practices and writing syllabus, teachers’ perspectives on assessment criteria, characteristics o f an argument essay, factors affecting teachers’ assessment reliability, assessment criteria and the usefulness o f the process o f discussion o f assessment criteria These categories were derived from the research aims and the literature review The data are presented and then discussed in light o f the literature under each category
4.1 An overview of assessment practices and writing syllabus in some Vietnamese English major universities
The description comes from personal interviews with teacher participants and other English major teachers pursuing their Master degrees in Melbourne, Australia
4.1.1 Assessment practices
A t several universities where English is taught as a main subject, continuous assessment has been applied This means students are assessed twice in a semester through the mid-term test and the end-term one I f they fail a subject, they can retake the exam conducted later There has been a tendency towards continuous assessment
in some other institutions where it is in absence The calculation o f students’ final results at the end o f the semester varies among universities, which is presented in Table 4.1 below
Trang 33Table 4.1 The calculation o f students9 fin al results
Table 4.2 Classifying system (Le, 2002: 40)
o f universities with English major courses in Vietnam
Trang 344.1.2 W ritin g syllabus
Essay writing is often introduced to students in second year after they get familiar with paragraph w riting w ith various genres such as description, cause and effect, comparison and contrast, and argument in their first year Argument essay writing mainly appears in the second semester o f the second year at some English major universities A t this stage, students’ levels are expected to be equivalent to upper- intermediate level 2.5 periods per week in a 20- week semester or 3 periods in a 15- week semester are normally reserved for essay writing The types o f essay being taught (e.g narratives, description, etc) might differ among universities, but the argument essay is always present
The book W riting Academic English by Oshima and Hogue (1991) which covers from
how to write a paragraph and then an essay seems to be the ground for general essay writing at a number o f educational institutions Depending on the essay type, teachers
in charge may find relevant documents from different sources or may be given some teaching guidelines from the Department
4.2 Teachers’ perspectives on assessment criteria
4.2.1 Process of developing criteria checklist
As introduced in Section 3.3 (under Chapter Three: Methodology), two experienced teachers, namely Linh (L) and Khanh (K) were asked to set up a criteria checklist for assessing argument essays
The two most important aspects o f the text as the two teacher participants agreed upon are content and organization (App.7, AE: 2,3; AE: 11-13) Referring to content, Khanh implied a good expression o f ideas (AE: 4) while Linh meant the relevance between idea expression and topic, and the clear and explicit response to the question (App.7,AE: 12-13) Organization as suggested by Khanh is the inclusion o f an introduction, a body development, a conclusion and linkage between paragraphs through cohesion and coherence (App.7, AE: 7-8) Linh shared a similar opinion and also added unity o f ideas and flow o f ideas in this category (App.7, AE: 16-17) Due
Trang 35to their importance, they thus received most weight Over the ten-point scale, Khanh was inclined to give content 3-4 marks, organization 3 marks (App.7, AE: 73-74) Meanwhile, Linh had the tendency o f assigning content 4 marks, organization 2.5 marks (App.7, AE: 78) Finally, they both agreed to give content 4 marks and organization 3 marks The weighting assigned to content and organization are congruent with other studies, one o f which is o f Freedman (1979: 112,cited in Hamp- Lyons, 1991) who stated that “ content was the most significant single factor in the final judgment o f an essay,, Similar findings about majority o f raters’ attention to content were reported by Cumming (1989,cited in Weigle,2002) and Connor and Carrell (1993,cited in Weigle, 2002) Madsen (1983: 121) also argued that “ a larger element o f w riting than mechanics, punctuation and grammar) to be included might well be organization” Another study conducted recently w ith six Vietnamese teachers
o f English by Le (2002) consolidated this finding Almost all o f the six participants in Le’ s study gave content and organization most weight (3-4 marks over ten point scale)
Following content and organization is grammar as two teacher participants suggested (App.7, AE: 65; App.7, AE: 9) According to Linh, “ Grammar includes syntax, vocabulary and word choice,(App•フ,AE: 67) Though having no different opinion, Khanh added that there was a tendency to focus on a variety o f grammar structures when assessing students’ w riting in some Vietnamese universities (App.7, AE: 18- 19) Khanh argued:
A number o f raters in my department pay attention to a variety o f grammar structures like passive and active voices, complex and compound sentences, etc due to the fa c t that teaching English in Vietnam focuses on grammar Variety o f grammar structures should be seen in a good piece o f w ritin g (App.7,AE: 18-
The weight for grammar was o f slight difference between two teachers as Linh gave1.5 marks and Khanh gave 2 marks (Khanh gave grammar and lexical choice 1 mark each, then combined the two in one category: grammar) (App.7, AE: 78-79; AE: 74- 75) They then agreed to assign grammar 2 marks (App.7,AE: 81-82)
Trang 36Others factors established were mechanics and overall judgement Mechanics consists
o f punctuation and spelling (App.7, AE: 17) and overall judgement includes originality, impression and creativity in Linh’ s opinion (App.7,AE: 27) Linh gave each category 0.5 marks (App.7, AE: 82-83) For Khanh’ s part, he tended to give 1 mark for each category (App.7, AE: 75-76) as he argued that “ 0.5 marks for mechanics and overall judgement is so little,(App.7,AE: 84) He was finally in agreement w ith Linh in giving mechanics and overall judgement 0.5 marks each after discussion (App.7,AE: 87)
The summary o f the criteria establishment is presented in Table 4.3 below This criteria checklist was handed to five participants prior to the process o f rating and discussion The participants were given time to think about criteria categories They used this criteria checklist in assessing one student’ s w riting, then gave feedback about the criteria categories and discussed any changes at the workshop
Table 4.3 Criteria checklist set up by two experienced teacher participants
Content (including idea expression, argument
development, topical knowledge, relevance of
ideas, etc)
4
Organization (including paragraph structure,
essay structure, coherence, cohesion, etc)
3
Grammar (including syntax, word choice,
variety o f structures, etc)
2
Overall judgement (including originality,
creativity, impression, etc)
Trang 37in total as well After marking the first essay, scores given by teachers were revealed and discrepancies in marking and differences in terms o f criteria categories were witnessed Teacher participants referred back to the criteria checklist, discussed and elaborated it Changes were made to the categories and sub-scores o f the criteria checklist The newly amended criteria checklist then was used to rate the remaining four essays o f which two (one generally regarded as good and one generally regarded
as bad) namely essay 3 and essay 1 respectively were selected for identification o f the basis on which teachers made their scoring judgements against the criteria The process was under three sub-headings: discussion o f marking a sample essay and criteria checklist, discussion o f marking essay 3,and discussion o f marking essay 1
4.2.2.1 Discussion o f criteria checklist
The score and sub-scores that the teachers gave to sample essay (essay 4) are displayed in Table 4.4 below
Table 4.4 Scores on essay 4
Trang 381/5 points, leads to a two level difference in the classification since 6.5 points beongs to Average-Good while 8.25 points is in Very Good as shown in 丁able 4.2 Dscussion about the basis on which teacher raters made their scoring judgements is prsented below.
Oi discussing content, Vinh and Ha are the two raters who gave 3 marks as the restthee awarded 2.5 marks A ll raters appreciated the presence o f the counter argument
in the essay, one characteristic o f a “ good” argument essay which is discussed inSction 4.3 According to Vinh, “ the student’ s level is rather good and the essay met
th requirements,(App.7, BE: 9-10) She argued that this essay possesses a number
01 good points in terms o f clear idea expression, combinations between illustrationaid argument (supporting evidence), a reasonable number o f arguments, a balancebitween supporting arguments and counter arguments (App.7, BE: 2-8) Khanhag*eed with Vinh and was impressed by a clear statement and the inclusion o f onecoxnter argument (App 7,BE: 11-12) Meanwhile, Ha commented:
Argument is relevant W riter fs p o in t o f view is rather clear in the way he supported one side o f the issue w hile the other side was also considered through the counter argument (App.7, BE: 13-15).
Acomprehensive overview o f the issue through the argument and counter argument, a Ікеаг way o f building argument (topic sentences and supporting sentences in each pragraph) and evidence to illustrate are the reasons why Linh allocated 2.5 marks for ontent (App.7, BE: 26-27,BE: 43-44) Thanh also expressed her attention to agument and counter argument but claimed that they belong to organization rather tbn content (App.7, BE: 17-18)
Cie o f the issues arising from the discussion is about the requirement o f persuasive vdting,as Vinh suggested that “ It is hard to ask second-year students to write prsuasively when starting studying argument essay, which should be required at a hgher level” (App.7, BE: 45-46) V in h ’s point might be valid i f students are not itroduced to paragraph w riting (including argument paragraphs) in their first year as dscussed in Section 4.1.2 Linh thus held the view that “ the level o f students, orriculum objectives and syllabus should be taken into account when setting an
Trang 39assessment criteria checklist” otherwise different raters can focus on different subcategories in the same criteria, content in this case (App.7, BE: 52-53).
Counter argument, an indispensable part o f a “ good” argument essay (discussed inSection 4.3) is another issue that emerged A bonus point for the inclusion o f counterarguments was proposed by Vinh (App.7 BE: 85) Thanh and Ha, however, haddifferent view as they demonstrated:
I f counter argument is not very good and w ell supported, w ill a bonus be given?
(App.7, BE: 86)
A counter argument can be seen in this essay but it repeated the fir s t idea I prefer two persuasive ideas without counter argument to three ideas with counter argument that does not contribute more to the content.
(App.7, BE: 87-89)
Khanh and Linh shared similar opinions w ith Thanh and Ha They finally came to the conclusion that a counter argument is optional and the essay would not be marked down without it (App.7, BE: 91-92)
In regard to organization, different teachers assigned different marks While Vinh and Linh gave 3 and 2.5 marks respectively, the rest (Thanh, Ha and Khanh) gave 2 marks The different interpretation o f organization resulted in the uneven allocation o f marks The first disagreement witnessed is about inform ality as expressed in the extract below
I am not quite sure where inform al style like “ never give up” , “ don [in this essay] belongs I marked down a lo t fo r inform ality in organization in this essay.
(Thanh5s opinion, App.7, BE: 107-108)
I do not think inform ality is in organization It belongs to style in overall judgement and is not important.
(Linh’s opinion, App.7, BE: 109-110)
In my opinion, fo rm a lity in w ritin g is significant so that it cannot be in overall judgement.
(Thanh’s opinion, App.7, BE: 112-113)
I am confused because register is not in the crite ria checklist What I think is inform ality (like ‘‘don v ,ỷ or “ can ,t, , in this essay) belongs to mechanics.
(Ha’ s opinion,App.7, BE: 117-118)
Trang 40A t last, register (style) after discussion was added as a separate element in the criteria checklist In finding what mark to reserve for register and which categories in the criteria checklist to deduct marks from, various opinions were observed Khanh thought “ A mark can be deducted from either content or organization” (App 7,BE: 143) and Linh favoured a mark deducted from organization (App.7, BE: 144) Ha, nevertheless, preferred the mark to be deducted from organization as she stated that
“ More attention is given to structure, coherence and cohesion rather than content by teachers as I think from my perspective as a former student.,(App.7,BE: 149-150) Ha’s opinion seemed to meet with strong disagreement from Khanh and Linh They argued that:
Content is significant in an argument essay A piece o f w ritin g is not considered
to be good with good vocabulary and grammar but poor content.
(App.7,BE: 147-148)
O rganization is p a id more attention in teaching We teachers often emphasize that a paragraph needs a topic sentence and supporting sentences This does
not mean organization is more important than content Organization is easy
fo r teachers to teach and easy fo r students to remember It is more d iffic u lt to teach how to build up an argument Teachers ’ focus on organization when giving feedback unconsciously sends a message to students saying that more attention is fo r organization Content, in fact, is significant
Khanh paid attention to linkage within each paragraph Their comments are as follows
“ F irs tly ”, ‘‘secondly, , ‘‘th ird ly are appropriately used although they are repeated
(V inh ’ s opinion, App.7, BE: 120-121)
Variety o f coherence devices rather than discourse markers should be seen in excellent essays Other linking devices can be used like ideas expressed in