1. Trang chủ
  2. » Luận Văn - Báo Cáo

Diagnosing the support needs of second language writers does the time allowance matter

10 81 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 449,25 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

CATHIE ELDER University of Melbourne Carleton, Victoria, Australia UTE KNOCH University of Melbourne Carleton, Victoria, Australia RONGHUI ZHANG Shenzhen Polytechnic Instit

Trang 1

BRIEF REPORTS

TESOL Quarterly invites readers to submit short reports and updates on their work

These summaries may address any areas of interest to TQ readers.

Edited by ALI SHEHADEH

United Arab Emirates University

JOHN LEVIS

Iowa State University

Diagnosing the Support Needs of Second Language Writers: Does the Time Allowance Matter?

CATHIE ELDER

University of Melbourne

Carleton, Victoria, Australia

UTE KNOCH

University of Melbourne

Carleton, Victoria, Australia

RONGHUI ZHANG

Shenzhen Polytechnic Institute

Shenzhen, China

 This study investigates the impact of changing the time allowance for the writing component of a diagnostic English language assessment administered on a voluntary basis to fi rst year undergraduates at two uni-versities with large populations of immigrant and international students following their admission to the university The test is diagnostic in the sense of identifying areas where students may have diffi culties and there-fore benefi t from targeted English language intervention concurrently with their academic studies A change in the time allocation for the writ-ing component of this assessment (from 55–30 minutes) was introduced

in 2006 for practical reasons It was believed by those responsible for implementing the assessment that a reduced time frame would minimize the problems associated with scheduling the test and accordingly encour-age faculties to adopt the assessment tool as a means of identifying their students’ language learning needs The current study aims to explore

Trang 2

how the shorter time allowance would infl uence the validity, reliability,

and overall fairness of an EAP writing assessment as a diagnostic tool

The impetus for the study arose from anecdotal reports from test

raters to the effect that, under the new time limits, students were either

planning inadequately in preparation for the task or else failing to meet

the word requirements The absence of planning time was perceived

to have a negative impact on the quality of students’ written discourse

Concerns were also expressed that the limited nature of the writing

sample made it diffi cult to provide an accurate and reliable assessment

of student’s ability to cope with the writing demands of the academic

situation

As discussed in Weigle (2002), the time allowed for test administration

raises issues of authenticity, validity, reliability, and practicality Most

aca-demic writing tasks in the real world are not generally performed under

time limits, and academic essays usually require refl ection, planning, and

multiple revisions A writing task within a reduced time frame without

access to dictionaries and source materials will inevitably be inauthentic

in the sense that it fails to replicate the conditions under which academic

writing is normally performed Moreover, unless a test task is designed

expressly to measure the speed at which test takers can answer the

ques-tion posed, rigid time limits potentially threaten the validity of score

inferences about test takers’ writing ability The limited amount of

writ-ing produced under time pressure may also make it diffi cult for raters to

accurately assess the writer’s competence On the other hand,

institu-tional constraints are inevitable on the resources available for any

assess-ment A timed essay test is certainly easier and more economical to

administer, and it can be argued that even a limited sample of writing

elicited under less than optimal conditions may be better than no

assess-ment at all as a means of fl agging potential writing needs Achieving a

balance between what is optimal in terms of validity, authenticity,

reliabil-ity, and what is institutionally feasible is clearly important in any test

situation

Research investigating the time variable in writing assessment has

produced somewhat contradictory fi ndings, perhaps because of the

dif-ferent tasks, participants, contexts, and methodologies involved and

also the differing time allocations investigated Some studies suggest

that allowing more time results in improved writing performance

(Biola, 1982; Crone, Wright, & Baron, 1993; Livingston, 1987; Younkin,

1986; Powers & Fowles, 1996), whereas others fi nd that changing the

time allowance makes no difference to performance as far as rater

reli-ability and or rank ordering of students is concerned (Caudery, 1990;

Hale, 1992; Kroll, 1990) Not all studies use independent ability

mea-sures (such as test scores from a different language test) or a

counter-balanced design that controls for extraneous effects such as task

Trang 3

diffi culty and order of presentation (but see Powers & Fowles, 1996) Investigative methods also differ, with most studies looking only at mean score differences across tasks without considering the validity implica-tions of any differences in the relative standing of learners when the time variable is manipulated (but see Hale, 1992) Moreover, most stud-ies have focused on overall scores, based on a holistic scoring or perfor-mance aggregates, rather than exploring whether the time condition has a variable impact on different dimensions of performance, such as

fl uency and accuracy (but see Caudery, 1990) It is particularly impor-tant to consider these different dimensions when one is dealing with assessment for diagnostic purposes, where the prime function of the test score is to provide feedback to teachers and learners about future learning needs If changing the time allocation infl uences the nature of information yielded about particular dimensions of writing ability, this result may have important validity implications as well as practical consequences

THE STUDY

This study aims to establish whether altering the time conditions on

an academic writing test has an effect on (a) the analytic and overall (average) scores raters assigned to students’ writing performance and (b) the level of interrater reliability of the test If scores differ according

to time condition, this result would have implications for who is

identi-fi ed as needing language support, and if consistent rating is harder to achieve under one or another condition, then decisions made about individual candidates’ ability cannot be relied on Thirty students each completed two writing tasks aimed at diagnosing their language support needs For one of these tasks they were given a maximum of

30 minutes of writing time and for the other they were given 55 minutes

A fully counterbalanced design was chosen to control for task version and order effect

RESEARCH QUESTIONS

The study investigated the following research questions:

1 Do students’ scores on the various dimensions of writing ability differ between the long (55-minute) and short (30-minute) time condition?

2 Are raters’ judgments of these dimensions of writing ability equally reliable under each time condition?

Trang 4

METHOD

Context of the Research

The preliminary study reported in this article was conducted in the

context of a diagnostic assessment administered in very similar forms at

both the University of Melbourne and the University of Auckland The

assessment serves to identify the English language needs of

undergradu-ate students following their admission to one or the other university and

to guide them to the appropriate language support offered on campus

The Diagnostic English Language (Needs) Assessment or DELA/DELNA

(the name of the testing procedure differs at each university) is a general

rather than discipline-specifi c measure of academic English The writing

subtest, which is the focus of this study, is described in more detail in the

Instruments section The data for the current study were collected at the

University of Auckland and analysed at the University of Melbourne

Participants

Test Takers

The participants in the study were 30 fi rst-year undergraduate students

at the University of Auckland ranging in age from 20 to 39 years old

The group comprised 19 females and 11 males All participants were

English as an additional language (EAL) students from a range of L1

backgrounds, broadly refl ecting the diversity of the EAL student

popula-tion at the University of Auckland The majority (64%) were Chinese

speakers, while other L1 backgrounds included French, Malay, Korean,

German, and Hindi The mean length of residence in New Zealand was

5.3 years

Raters

Two experienced DELNA raters were recruited to rate the essays

col-lected for the study DELNA raters are regularly trained and monitored

(see, e.g., Elder, Barkhuizen, Knoch, & von Randow, 2007; Elder, Knoch,

Barkhuizen, & von Randow, 2005; Knoch, Read, & von Randow, 2007)

Both raters had postgraduate qualifi cations in TESOL as well as rating

experience in other assessment contexts (e.g., International English

Language Testing System)

Trang 5

Instruments

Tasks

To achieve a counter balanced design, two prompts were chosen for the study The topics of the essays were as follows:

Version A: Every citizen has a duty to do some sort of voluntary work Version B: Should intellectually gifted children be given special assis-tance in schools?

The task required students to write an argument essay of approxi-mately 300 words in response to these questions To help students formu-late the content of the essays, students were provided with a number of brief supporting or opposing statements, although they were asked not to include the exact wording of these statements in their essays

To ascertain that the two prompts used were of similar diffi culty, over-all ratings were compared across the 60 essays An independent samples

t test showed that the two prompts were statistically equivalent with respect

to the challenge they presented to test takers, t (58) = 0.415, p = 0.680

Rating Scale

The rating scale used was an analytic rating scale with three rating categories (fl uency, content, and form) rated on six band levels rang-ing from 1–6, where a score of 4 or less indicates a need for English lan-guage support Raters were asked to produce ratings for each of the three categories These ratings were also averaged to produce an overall score

Procedures

Data Collection

To obtain an independent measure of the students’ language ability, the students fi rst completed a screening test comprising a vocabulary and speed-reading task (Elder & Von Randow, in press) Based on these scores, the students were divided into four groups of more or less equal ability Then, to control for prompt and order effect, a fully counter bal-anced design was used as outlined in Table 1

The writing scripts were presented in random order to the raters, who were given no information about the condition under which the writing

Trang 6

was produced, so as to eliminate the possibility of their taking the time

allowance into account when assigning the scores Raters have been

found in other studies (e.g., McNamara & Lumley, 1997), to compensate

candidates for task conditions which they feel may have disadvantaged

them

Data Analysis

The scores produced by the two raters were entered into SPSS (2006)

T-tests and correlational analyses were used to answer the two research

questions

RESULTS

Research Question 1 Do students’ scores on the various dimensions of

writing ability differ between the long (55-minute) and short

(30-min-ute) time condition?

Two different types of analyses were used to explore variation in

stu-dents’ scores under the two time conditions First, mean scores obtained

under each condition were compared (see Table 2 ) The means for form

and fl uency were almost identical in each time condition, whereas for

content, the long writing task elicited ratings almost half a band higher

TABLE 2

Paired Samples t Tests

Variable Mean–short SD–short Mean–long SD–long t df P

Average fl uency rating 4.13 0.73 4.15 0.79 0.128 29 0.899

Average content rating 4.18 0.79 4.40 0.86 1.58 29 0.125

Average form rating 3.90 0.78 4.02 0.80 1.07 29 0.293

Average total rating 4.07 0.71 4.19 0.76 1.14 29 0.262

Note SD = standard deviation

TABLE 1 Research Design

Group N Version Time limit Version Time limit

Trang 7

than those allocated to the short one Although mean scores for each of the analytic criteria were consistently higher in the 55-minute condition,

a paired samples t test ( Table 2 ) showed that none of these mean

differ-ences was statistically signifi cant Second, a Spearman rho correlation was used to ascertain if the ranking of the candidates was different under the two time conditions Table 3 presents the correlations for the fl uency, content, and form scores under the two conditions as well as a correla-tion for the averaged, overall score

Although the correlations in Table 3 are all signifi cant, they vary some-what in strength The average scores for writing produced under the short and long time condition correlate more strongly than do the ana-lytic scores assigned to particular writing features The correlations are lowest for the fl uency criterion, although a Fisher R-to-Z transformation indicates that the size of this coeffi cient does not differ signifi cantly from the others

Research Question 2 Are raters’ judgments of writing ability equally reliable under each time allocation?

It was of further interest to determine if there were any differences in the reliability of rater judgments under the two time conditions Table 4 presents the correlations between the two raters under the two time conditions Although the correlation coeffi cients for the short and long conditions were not signifi cantly different from one another, Table 4 shows that correlations were consistently higher for the short time condition

TABLE 3 Correlations of Scores Under Short and Long Condition

Note All results signifi cant at 0.01 level (2-tailed)

TABLE 4 Rater Correlations

Note All results signifi cant at 0.01 level (2-tailed)

Trang 8

DISCUSSION

The current study’s purpose was to determine both the validity and

prac-tical implications of reducing the time allocation for the DELA/DELNA

writing test from 55 to 30 minutes Mean score comparisons showed that

students performed very similarly across the two task conditions Although

this result accords with those of writing researchers such as Kroll (1990),

Caudery (1990), Powers and Fowles (1996), it is somewhat at odds with

Biola (1982), Crone et al (1993), and Younkin, (1986), who showed that

students performed signifi cantly better when more time was given for

their writing However, as already suggested in our review of the

litera-ture, the differences between these studies’ fi ndings may be partly a

func-tion of sample size

Worthy of note in our study is the greater discrepancy in means for

con-tent between the long and short writing conditions The fact that test takers

scored marginally higher on this category under the 55-minute condition

is unsurprising, given that it affords more time for test takers to generate

ideas on the given topic In general, however, the practical impact of the

score differences observable from this study are likely to be negligible

One might argue that shortening the task will produce slightly depressed

means for the undergraduate population as a whole, with the result that

a marginally higher proportion of students receive a recommendation of

“needs support.” However, this is hardly of a magnitude that would create

signifi cant strain on institutional resources and is, in any case, potentially of

benefi t in terms of ensuring that a larger number of borderline students

are fl agged, thereby gaining access to language support classes

More important is the question whether the writing construct changes

when the time allocation decreases, because this has implications for the

validity of inferences drawn about test scores The cross-test correlational

statistics are not strong for any of the rating criteria, and this is

particu-larly true for fl uency, implying that opportunities to display coherence

and other aspects of writing fl uency may differ under the two time

condi-tions These construct differences have potential implications for EAP

support teachers who may use DELA/DELNA writing profi les to

deter-mine how to focus their interventions It cannot however be assumed

that the writing produced in the short time condition is a less valid

indi-cator of candidates’ academic writing ability than writing produced within

the long time frame

As for interrater reliability, the fi ndings of this study revealed (as in

the Hale, 1992 study) that scoring consistency was acceptable and

compa-rable across the two time conditions In fact, the data reported here

sug-gest that alignment between raters increases slightly in the short writing

condition on each of the writing criteria Because this fi nding is not

statistically signifi cant, it is not appropriate to speculate further about

Trang 9

possible reasons for this outcome, but this issue is certainly worth explor-ing further with a larger data set In the meantime we can conclude that shortening the writing task presents no disadvantage as far as reliability of rating is concerned

The issue investigated in this small-scale preliminary study certainly begs further investigation, both with a larger sample, and using methods not yet applied in research on the impact of timing on writing perfor-mance Procedures such as think-aloud verbal reports and discourse anal-ysis could be used to get a better sense of any construct differences resulting from the time variable than can be gleaned from a quantitative analysis If writing produced under the 55-minute condition were found

to show more of the known and researched characteristics of academic discourse than that produced within the 30-minute condition, this result would have important validity implications with regard to the diagnostic capacity of the procedure and its usefulness for students, teaching staff and other stakeholders A further issue, which is the subject of a subse-quent investigation, is how test takers feel about doing the writing task under more stringent time conditions Although we have shown that enforcing more stringent time conditions does not make a difference to test scores, it may be perceived as unfair, making it less likely that students will take their results seriously and act on the advice given However, we could caution that any decision based on these results will, as is the case with any language testing endeavor, involve a trade-off between what is feasible and what is desirable in the context of concern

ACKNOWLEDGMENTS

The authors thank Martin von Randow for assistance with aspects of the study design and Janet von Randow and Jeremy Dumble for their efforts in administering the test tasks and recruiting participants and raters for this study

THE AUTHORS

Cathie Elder is director of the Language Testing Research Centre at the University of Melbourne, in Carleton, Victoria, Australia Her major research efforts and output have been in the area of language assessment She has a particular interest in issues

of fairness and bias in language testing and in the challenges posed by the assessment

of language profi ciency for specifi c professional and academic purposes

Ute Knoch is a research fellow at the Language Testing Research Centre, University

of Melbourne, in Carleton, Victoria, Australia Her research interests are in the areas

of writing assessment, rating scale development, rater training, and assessing lan-guages for specifi c purposes

Ronghui Zhang is a lecturer in the Department of Foreign Languages at Shenzhen Polytechnic Institute, Shenzhen, China Her research interests are in the area of for-eign language pedagogy and writing assessment

Trang 10

REFERENCES

Biola, H R (1982) Time limits and topic assignments for essay tests Research in the

Teaching of English, 16, 97–98

Caudery, T (1990) The validity of timed essay tests in the assessment of writing skills

ELT Journal, 44, 122–131

Crone, C., Wright, D., & Baron, P (1993) Performance of examinees for whom English is

their second language on the spring 1992 SAT II: Writing Test Unpublished manuscript

prepared for ETS, Princeton, NJ

Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J (2007) Evaluating rater

responses to an online rater training program Language Testing, 24 , 37–64

Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J (2005) Individual feedback to

enhance rater training: Does it work? Language Assessment Quarterly, 2 , 175–196

Elder, C., & Von Randow, J (in press) Exploring the utility of a Web-based English

language screening tool Language Assessment Quarterly

Ellis, R (Ed.) (2005) Planning and task performance in a second language Oxford:

Oxford University Press

Hale, G (1992) Effects of amount of time allocated on the Test of Written English (Research

Report No 92-27) Princeton, NJ: Educational Testing Service

Knoch, U., Read, J., & von Randow, J (2007) Re-training writing raters online: How

does it compare with face-to-face training? Assessing Writing, 12, 26–43

Kroll, B (1990) What does time buy? ESL student performance on home versus class

compositions In B Kroll (Ed.), Second language writing: Research insights for the

class-room Cambridge: Cambridge University Press

Livingston, S A (1987, April) The effects of time limits on the quality of student-written

essays Paper presented at the meeting of the American Educational Research

Association, Washington, D.C., United States

McNamara, T., & Lumley, T (1997) The effect of interlocutor and assessment mode

variables in overseas assessments of speaking skills in occupational settings

Language Testing, 14 , 140–156

Powers, D E., & Fowles, M E (1996) Effects of applying different time limits to a

proposed GRE writing test Journal of Educational Measurement, 33 , 433–452

SPSS, Inc (2006) SPSS (Version 15) [Computer software] Chicago: Author

Weigle, S C (2002) Assessing Writing Cambridge: Cambridge University Press

Younkin, W F (1986) Speededness as a source of test bias for non-native English

speakers on the College level Academic Skills Test Dissertation Abstracts International,

47/11-A, 4072

Effect of Repetition of Exposure and Profi ciency

Level in L2 Listening Tests

HIDEKI SAKAI

Shinshu University

Nagano, Japan

Second language (L2) listening test developers must take into account

a variety of factors such as the characteristics of the input, the task, and

Ngày đăng: 13/01/2019, 16:29

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN