Effect of Repetition of Exposure and Profi ciency Level in L2 Listening Tests HIDEKI SAKAI Shinshu University Nagano, Japan Second language L2 listening test developers must tak
Trang 1REFERENCES
Biola, H R (1982) Time limits and topic assignments for essay tests Research in the
Teaching of English, 16, 97–98
Caudery, T (1990) The validity of timed essay tests in the assessment of writing skills
ELT Journal, 44, 122–131
Crone, C., Wright, D., & Baron, P (1993) Performance of examinees for whom English is
their second language on the spring 1992 SAT II: Writing Test Unpublished manuscript
prepared for ETS, Princeton, NJ
Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J (2007) Evaluating rater
responses to an online rater training program Language Testing, 24 , 37–64
Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J (2005) Individual feedback to
enhance rater training: Does it work? Language Assessment Quarterly, 2 , 175–196
Elder, C., & Von Randow, J (in press) Exploring the utility of a Web-based English
language screening tool Language Assessment Quarterly
Ellis, R (Ed.) (2005) Planning and task performance in a second language Oxford:
Oxford University Press
Hale, G (1992) Effects of amount of time allocated on the Test of Written English (Research
Report No 92-27) Princeton, NJ: Educational Testing Service
Knoch, U., Read, J., & von Randow, J (2007) Re-training writing raters online: How
does it compare with face-to-face training? Assessing Writing, 12, 26–43
Kroll, B (1990) What does time buy? ESL student performance on home versus class
compositions In B Kroll (Ed.), Second language writing: Research insights for the
class-room Cambridge: Cambridge University Press
Livingston, S A (1987, April) The effects of time limits on the quality of student-written
essays Paper presented at the meeting of the American Educational Research
Association, Washington, D.C., United States
McNamara, T., & Lumley, T (1997) The effect of interlocutor and assessment mode
variables in overseas assessments of speaking skills in occupational settings
Language Testing, 14 , 140–156
Powers, D E., & Fowles, M E (1996) Effects of applying different time limits to a
proposed GRE writing test Journal of Educational Measurement, 33 , 433–452
SPSS, Inc (2006) SPSS (Version 15) [Computer software] Chicago: Author
Weigle, S C (2002) Assessing Writing Cambridge: Cambridge University Press
Younkin, W F (1986) Speededness as a source of test bias for non-native English
speakers on the College level Academic Skills Test Dissertation Abstracts International,
47/11-A, 4072
Effect of Repetition of Exposure and Profi ciency
Level in L2 Listening Tests
HIDEKI SAKAI
Shinshu University
Nagano, Japan
Second language (L2) listening test developers must take into account
a variety of factors such as the characteristics of the input, the task, and
Trang 2the test takers (see, e.g., Brindley, 1998; Buck, 2001; Rost, 2002; Thompson, 1995) One such issue to be considered is the number of times a listening passage should be played, which concerns the characteristics of both the input and the task in that the number of exposures to a listening passage
is a matter of how the assessment task is administered and at the same time how it can increase the redundancy of the input in the passage To better understand the role of repetition in listening tests, it is important
to examine whether repetition and profi ciency levels exhibit any interac-tional effect because if differential effects of repetition are observed for various L2 learners, only a portion of the test takers will benefi t from the repeated exposure This study addresses the issue of the interactional effect between repetition and profi ciency levels
Although this study focuses primarily on issues related to the test-ing of listentest-ing, repeated presentation of a listentest-ing passage is not lim-ited to testing; it is frequently and widely used in listening instruction (e.g., Harmer, 1998, p 100) Thus, so far this issue has been addressed mainly in studies that investigated the effects of repetition on L2 listen-ing comprehension In general, research conducted to date on the effect
of repeated exposure has shown that repetition may facilitate L2 listen-ing comprehension (e.g., Berne, 1995) However, previous studies that included listeners’ profi ciency levels as an independent variable yielded mixed results about the interactional effect between repetition and
pro-fi ciency levels (Cervantes & Gainer, 1992; Chang & Read, 2006; Iimura, 2007; Lund, 1991) On one hand, Lund, and Chang and Read demon-strated supportive evidence for the interactional effect Lund examined the effects of repetition and different course levels (i.e., profi ciency lev-els) on listening and reading comprehension in German as an L2 He found a signifi cant three-way (modality, trial, and course level) inter-actional effect in the lexical item analysis of the recall protocols That
is, the improvement of the fi rst-semester and second-semester students
in the listening recall task was about half of the improvement of the third-semester students in the listening recall task, whereas there was no difference in the improvement among the students at different profi -ciency levels in the reading recall task Therefore, he argued, only semester students benefi ted from the repeated exposure in the listening task Chang and Read examined the effects of four different types of listening support: preview of the questions, repetition of the input, pro-vision of the topic knowledge, and vocabulary instruction They also investigated their interactional effects with profi ciency levels based on the results of the listening section of the Test of English for International Communication (TOEIC) Results showed that the effects of the four lis-tening support types differed according to profi ciency level In the con-dition of repetition of the input and preview of the questions, the high listening profi ciency group outperformed the low listening profi ciency
Trang 3group; in the other two conditions (provision of topic knowledge and
vocabulary instruction), both groups scored similarly For the high
lis-tening profi ciency group, repetition of the input and the provision of
background knowledge were more effective than vocabulary instruction;
for the low listening profi ciency group, the provision of topic knowledge
was more effective than vocabulary instruction and preview of the
ques-tions Based on these results, Chang and Reed suggested (a) that the
low listening profi ciency group benefi ted less than the high listening
profi ciency group from preview of the questions and repetition of the
input, (b) that both groups benefi ted from the provision of topic
knowl-edge, and (c) that vocabulary instruction was the least effective for both
groups
On the other hand, Cervantes and Gainer (1992) and Iimura (2007)
reported the lack of an interactional effect between repetition and profi
ciency levels Cervantes and Gainer examined the effect of input modifi
-cation (including repetition) on Japanese university students’ listening
comprehension of a lecture in English Results of the study showed that
both simplifi cation and repetition were more facilitative of
comprehen-sion than no modifi cation and that no interactional effect between input
conditions and profi ciency levels was observed Thus, they argued that
repetition augments listening comprehension for both higher and lower
listening profi ciency learners Iimura examined the effect of repeated
exposure, question types, and profi ciency levels on the listening
compre-hension of Japanese senior high school students The participants were
divided into three listening profi ciency groups based on the results of the
listening section of the third-grade level of the Society for Testing English
Profi ciency (STEP) test He found that repetition improved performance
on both question types (local and global questions) irrespective of profi
-ciency levels
In summary, these four previous studies have produced mixed results
regarding the interactional effect between repetition and profi ciency
level It must be noted that these previous studies used different tasks to
assess listening comprehension: a free written recall task (Lund, 1991), a
partial dictation task (Cervantes & Gainer, 1992), a multiple-choice test
(Chang & Read, 2006 ), and an open-ended question task (Iimura, 2007)
The choice of tasks seems to be quite important because the effect of
rep-etition may easily be confounded with the effect of the preview of
ques-tions For example, in a research design using multiple-choice tests or
open-ended questions, participants in the repetition condition hear the
passage, read the questions, and answer them; and then the procedure is
repeated This inevitably forms what Sherman (1997) called the sandwich
version of administering questions Sherman found that the sandwich
ver-sion was more effective than the questions–listening–listening condition
or the listening–listening–questions condition Thus, even if L2 test
Trang 4takers were exposed to the listening passage the same number of times, varying the timing of administering questions may lead to different degrees of listening comprehension
To avoid the confounding effects of preview of questions, the current study used free written recall tasks in which, after listening to the passage(s), test takers were required to write what they understood (Thompson, 1995, p 28) Because no intervening elements exist between the test taker and the text in free written recall tasks (Alderson, 2000,
p 230), it is possible to isolate the effects of previewing questions from the effect of repetition Free written recall tasks have another advantage
As Alderson put it, “it [the free written recall task] is also claimed to provide a picture of learner processes” (p 230) Comparing written pro-tocols with the original text will enable researchers to analyze idiosyn-cratic recall protocols (i.e., additive information that does not appear in the original text) and misinterpretations (i.e., incorrect recall protocols) and obtain useful and detailed information about how learners listen
in the L2
RESEARCH QUESTIONS
The following three research questions (RQs) were posited for this study
1 Does repetition affect learners’ listening recall performance?
2 Is the effect of repetition on listening comprehension the same for learners at different profi ciency levels?
3 How does repetition affect learners’ production of idiosyncratic recalls and misinterpretations?
As mentioned earlier, previous studies have supported the effect
of repetition on L2 listening comprehension; nevertheless, only one study (Lund, 1991) used free written recall tasks In order to accumulate empirical evidence regarding the effect of repeated exposure, RQ 1 was posed The current study used quantitative and qualitative analyses
in order to investigate the effect of repeated exposure for different listen-ing profi ciency groups RQs 2 and 3 were posited for the respective analyses
METHOD
Participants
The participants in this study were 36 learners of English (6 males and
30 females) from the author’s intact class at the Faculty of Education of a
Trang 5university in central Japan All the participants had received formal
instruction of English at junior and senior high schools for 6 years before
entering the university Of the 36 participants, 16 were second-year
stu-dents, 16 were third-year stustu-dents, and 4 were fourth-year students Two
of the participants reported that they had studied abroad in
English-speaking countries for about 1 year
In order to divide the participants into two listening profi ciency
groups, the listening sections ( k = 60) of three forms (A, B, and C) of the
Michigan English Placement Test (Corrigan, Dobson, Kellman, Spaan, &
Tyma, 1993) were administered to them 1 month before the
experimen-tal task The reliability coeffi cient (Cronbach’s alpha) was 0.75 The mean
score of 34 was used as the cutoff point Those who scored 35 or above
were assigned to the higher listening profi ciency group (HG); the others
who scored 34 or below were categorized as the lower listening profi
-ciency group (LG) Thus, the HG consisted of 16 participants ( M = 39.94;
SD = 4.06), and the LG had 20 participants ( M = 29.90; SD = 5.01) The
difference between the two groups was statistically signifi cant: t (34) =
6.639, p < 0.000
Procedures
The experiment was carried out in a classroom where listening
mate-rials were played on a CD player The participants were given a blank
sheet of paper They listened to the fi rst passage and were told to write
down in Japanese everything they understood as extensively and
accu-rately as possible on one side of the sheet after listening While listening
to the passage, they were not allowed to take notes, but were asked to
concentrate on listening The allotted writing time was 3 minutes Then a
second passage was played After listening to the second passage, the
par-ticipants were told to write under the recall protocols of the fi rst passage
on the same side of the paper The allotted writing time was 3 minutes
For the second trial, the participants were asked to turn over the paper
so that they could not access what they had written for the fi rst listening
trial The same procedure was repeated The total time was about 15
min-utes The instructions were provided in Japanese
Materials
The passages derived from past examinations of the presecond grade
of the STEP tests, and the attached CD was used (Obunsha, 2004) The
STEP tests are widely known in Japan as tests of English profi ciency They
come in seven levels: fi rst grade (the highest level), prefi rst grade, second
Trang 6grade, presecond grade, third grade, fourth grade, and fi fth grade (the lowest level) Each grade has its own test aimed at different profi ciency levels The presecond grade test is targeted at the senior high school level (for more information about the STEP, see Society for Testing English Profi ciency, n.d.) This grade’s test was chosen because it was considered not to be so diffi cult for the participants performing the free written recall tasks, cognitively demanding tasks in which participants need to lis-ten to the passages, understand the information, and write down the information that they comprehend
Both passages were monologue narratives Monologue narratives were chosen for this study because they constitute one of the common text types used on the STEP tests Because the STEP tests are used widely in Japan, a large number of test takers encounter similar pas-sages during each test administration The fi rst passage read by a male contained 47 words in four sentences, whereas the second read by a female contained 48 words in four sentences The recording time for each passage was 27 seconds and 29 seconds, respectively Thus, the reading speeds were 104.4 words per minute and 99.3 words per min-ute, respectively
To check the diffi culty level of the listening passages, the participants were asked to underline the unknown words in the scripts of the two pas-sages 2 months after the experiment This interval occurred because the summer vacation was between the semesters Responses from 34 partici-pants (19 from LG and 15 from HG) were examined Six (3 LG learners
and 3 HG learners) reported that they did not know the word secretary in
the fi rst passage; 2 (1 LG learner and 1 HG learner) reported that they
did not know the word poetry in the second passage Because the numbers
of the participants reporting unknown words of the two groups were not
so different from each other and because the number of unknown words, that is, only two words, was small, it is suggested that the passages used for this study were easy for both groups
Scoring
The recall protocols written in the participants’ fi rst language (L1) were analyzed by idea unit analysis The passages were divided into idea units in advance, mainly on the basis of Carrell’s (1985) defi nition of idea units:
Basically, each idea unit consisted of a single clause (main or subordinate, including adverbial and relative clauses) Each infi nitival construction, gerundive, nominalized verb phrase, and conjunct was also identifi ed as a separate idea unit In addition, optional and/or heavy prepositional phrases were also designated as separate idea units (p 737)
Trang 7In addition, to make idea units shorter for the analysis of recall
pro-tocols on the listening tests, adverbials and nonheavy prepositional
phrases functioning as adverbials were counted as separate idea units
Based on these criteria, the two passages for this study were divided into
16 and 12 idea units respectively (see appendix) Then exact recall of
each idea unit was assigned one point Thus, the highest possible score
was 28
All the recall protocols were scored by the author To check the
intra-rater reliability, they were scored again after a 1-month interval The
agreement rate was 98.02% (988 out of 1008 idea units) To assess
inter-rater reliability for the scoring, about 20% of the protocols (14 out of the
72 protocols) were scored by another rater The interrater reliability
agreement rate was 99.49% (390 out of 392 idea units) The test
reliabil-ity (Cronbach’s alpha) for the fi rst trial was 0.69; the alpha for the second
trial was 0.76
RESULTS
RQs 1 and 2: Main Effect of Repetition and Interactional
Effect of Repetition and Profi ciency Levels
Table 1 shows descriptive statistics of recall performance on the fi rst
listening and the second listening for each group The results indicate
that repetition facilitated listening comprehension for both groups For
both groups, the second effort was better than the fi rst effort Second,
both groups improved to a similar degree: HG improved by 5.88 points,
and LG improved by 5.00 points It is important to note that HG
outper-formed LG on the fi rst listening This result may support the use of the
Michigan English Placement Test for the division of the participants
A two-way ANOVA was performed with time (fi rst listening and second
listening) being a within-subjects factor and with profi ciency level (high
TABLE 1 Descriptive Statistics of Recall Performance by Group and Time
HG
( n = 16)
LG
( n = 20)
Note Dif = 2nd listening − 1st listening; M = mean; SD = standard deviation; SES = standard error
of skewness; SEK = standard error of kurtosis
Trang 8and low) being a between-subjects factor The main effects of time and
profi ciency level were signifi cant: F (1, 34) = 82.40, p = 0.000; F (1, 34) = 6.45, p = 0.016 Effect sizes measured as Pearson’s correlation coeffi cients
(Field, 2005, pp 514–516) were also calculated According to Field (p 33), a coeffi cient of 0.50 or above is considered to show a large effect;
a coeffi cient of 0.30–0.50, a medium effect; and a coeffi cient of 0.10–
0.30, a small effect The effect size of time was large, r = 0.84,
explain-ing 70.6% of the total variance; the effect size of profi ciency level was
medium, r = 0.40, explaining 16.0% of the total variance Thus, the results
showed that HG outperformed LG and that the second effort was better than the fi rst On the other hand, the interactional effect of time and
profi ciency level was not signifi cant: F (1, 34) = 0.53, p = 0.470, r = 0.12
Thus, the effects of repetition facilitated both HG and LG to a similar degree
RQ 3: Idiosyncratic Recall Units, Misinterpretation, and
Repetition
For this study, idiosyncratic recall units were operationalized as recall protocols of information which the original passages did not contain In other words, idiosyncratic recall units are additional information not found in the passages The following are examples of idiosyncratic recall units from the data of this study:
1 She usually has lunch at a noodle shop every day
2 She wanted to become a singer in the future
3 Nancy wants to become an English teacher
The italicized parts were not contained in the original passages (see appendix); therefore, they were coded as idiosyncratic recall units The results show that although more participants in LG produced idiosyn-cratic recall units on the fi rst listening than HG (12 out of 20, 60.0% for
LG versus 6 out of 16, 37.50% for HG), repeated exposure brought about improvement for 66.67% of those who had produced idiosyncratic recall units in both groups (8 out of 12 for LG; 4 out of 6 for HG) In addition, the number of those who produced new idiosyncractic recall units in LG (6 out of 20) was larger than HG (1 out of 16) Thus, the results suggest that less profi cient learners produce more idiosyncratic recall units than more profi cient learners at both listening times, that repeated exposure helps decrease idiosyncratic recall units, and that the effectiveness of repeated exposure is the same for both groups
Analysis of misinterpretations, operationalized as incorrect recall pro-tocols of the original texts (“to become a swimmer” for Idea Unit 203
Trang 9to become a singer ), shows that repetition is benefi cial for both groups as
well In order to clarify this point, misinterpretations of Idea Unit 112
( Before she went back ) are shown here as an example In LG, only 1
par-ticipant recalled this idea unit correctly on the fi rst listening Thus, the
other 19 participants did not get points for this idea unit Of the 19
par-ticipants, 12 did not write any protocols for this idea unit; 7 provided
incomplete protocols and were not assigned a point On the second
listening, of the 12 participants who did not write any protocols on the
fi rst listening, 3 did not produce any protocols; 1 wrote correct protocols;
and 8 gave almost correct protocols but misinterpreted the conjunction
before in this unit ( Before she went back ), as “after she went back” or “on
the way back.” The 7 participants who provided incomplete protocols
on the fi rst listening did not improve on the second listening In HG,
the number of participants who produced no protocols on the fi rst
listen-ing was smaller ( n = 6) than in LG Of the 6 participants, 2 did not recall
anything on the second listening; 1 produced correct protocols; and
3 gave almost correct protocols but failed to recall the word before Of
those who wrote almost correct protocols, that is, misinterpretations, on
the fi rst listening ( n = 6), 3 produced correct protocols on the second
listening Thus, this example shows that repetition helped both groups
understand the text further even though the protocols were not assigned
a point
DISCUSSION
Guided by three research questions, this study provided the following
fi ndings First, repetition was shown to have facilitated listening
compre-hension Moreover, repetition reduced the production of idiosyncratic
recall protocols In other words, repetition led to more precise
compre-hension of the passages Second, this study did not fi nd any interactional
effect between repetition and profi ciency levels That is to say, repetition
was effective for HG and LG to the same degree Third, repetition helped
both groups reduce production of idiosyncratic recall protocols, although
LG produced more idiosyncratic recall protocols than HG Therefore,
this study found that repeated exposure facilitated listening
comprehen-sion for both HG and LG and did not support the argument that the
effect of repetition varies according to profi ciency level Analysis of
mis-interpretations also supported these fi ndings It is important to note that
the fi ndings of this study should be limited to the case in which learners
have suffi cient L2 ability to understand the lexical items of the listening
passages
These fi ndings lend support to the results of Cervantes and Gainer
(1992) and Iimura (2007) and give evidence against the studies of Chang
and Read (2006) and Lund (1991), who argued for the interactional
Trang 10effect between repetition and profi ciency levels Here it is worthwhile to examine the results of Chang and Read, in particular in terms of the effect between repetition and profi ciency levels; although Chang and Read argued for a differential effect of repetition for different profi -ciency levels, another interpretation seems possible They argued that
“LLP [low listening profi ciency] learners benefi ted less than HLP [high listening profi ciency] learners from PQ [preview of the questions] and RI [repetition of the input],” mainly based on the fi ndings that the high listening profi ciency groups scored better than the lower listening profi ciency group in these two conditions (p 389) Because all four conditions in Chang and Read’s study included previewing the ques-tions, they stated that “in effect the PQ [preview of the questions] group was a comparison group to provide a basis for evaluating the enhanced listening support experienced by the other three groups” (p 385) However, they did not make such comparisons in their discussion of the results It seems natural that the high listening profi ciency group outperformed the low listening profi ciency group because the two profi -ciency groups differed in their profi -ciency levels For the condition of the preview of the questions, the mean scores of the two profi ciency groups were 18.39 and 14.91, respectively If these mean scores were treated as a baseline, both profi ciency groups in the repetition condi-tion obtained higher means (20.47 and 16.44, respectively) although the difference between the two conditions was not signifi cant Thus, another possible interpretation of Chang and Read’s results is that rep-etition may have improved the performance of both profi ciency groups, but the changes were not statistically signifi cant In other words, their results may not provide evidence that repetition affects different profi -ciency levels differently
Therefore, aside from Chang and Read (2006), a statistically signifi -cant interactional effect between repetition and profi ciency level was reported only by Lund (1991) It should be noted that he found a statis-tically signifi cant interactional effect only in one of the two analyses of the recall protocols, that is, in the lexical item analysis but not in the idea unit analysis Although the current study used free written recall tasks like Lund’s study, it did not carry out lexical item analysis It is pos-sible that detailed scoring systems may be necessary to detect the differ-ential effect of repetition If this is correct, it is plausible that some studies did not fi nd the differential effect of repetition: The studies (Cervantes & Gainer, 1992; Chang & Read, 2006; Iimura, 2007) used a partial dictation task, a multiple-choice test, and an open-ended ques-tion task, respectively, which require test takers not to understand every-thing in the passages but to listen to part of the passages Thus, the mixed results of the previous studies may be due to different analysis methods