1. Trang chủ
  2. » Giáo án - Bài giảng

Estimating Guessing Effects on the Vocabulary Levels Test for Differing Degrees of Word Knowledge

11 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 11
Dung lượng 138,4 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Multiple-choice tests such as the Vocabulary Levels Test VLT are often viewed as a preferable estimator of vocabulary knowledge when compared to yes/no checklists, because self-reporting

Trang 1

BRIEF REPORTS AND SUMMARIES TESOL Quarterly invites readers to submit short reports and updates on their work These summaries may address any areas of interest to Quarterly readers.

Edited by ALI SHEHADEH

United Arab Emirates University

ANNE BURNS

Macquarie University

Estimating Guessing Effects on the Vocabulary Levels Test for Differing Degrees of Word Knowledge

JEFFREY STEWART

Kyushu Sangyo University

Fukuoka, Japan

DAVID A WHITE

Harvard University

Cambridge, Massachusetts, United States

doi: 10.5054/tq.2011.254523

& The Vocabulary Levels Test (Nation, 1990) has been referred to by Meara as ‘‘the nearest thing we have to a standard test in vocabulary’’ (Meara, 1996, p 38) Multiple-choice tests such as the Vocabulary Levels Test (VLT) are often viewed as a preferable estimator of vocabulary knowledge when compared to yes/no checklists, because self-reporting tests introduce the possibility of students overreporting or underreporting scores However, multiple-choice tests have their own unique disadvan-tages Simply put, if a multiple-choice test lists possible answers, there is a possibility that test takers will guess the correct answer regardless of their knowledge or ability It has long been acknowledged that guessing on multiple-choice tests affects test reliability and inflates scores (Zimmerman & Williams, 1965; Baldauf, 1982), and scoring formulas such as cfg (Huibregtse, Admiraal, & Meara, 2002) have been proposed to adjust scores for guessing, under the assumption that the probability of a correct guess is consistent among test takers In item response theory, a

Trang 2

family of approaches that link person ability to item difficulty with probabilistic models (Brown & Hudson, 2002), the three-parameter logistic model (Birnbaum, 1968) has been developed to consider the effect of guessing on estimations of ability However, effects of estimation

of guessing on tests such as the VLT are complicated by the fact that distractors are chosen from the same frequency level of words as the correct answer, and therefore from the tested domain This introduces the possibility that increases in scores due to guessing could vary depending on the proportion of words in the tested domain known by the test taker Determining the relationship between proportions of words known and score increases due to guessing is the goal of this study

BACKGROUND

This study arises from previous research by one of the authors (Stubbe, Stewart, & Pritchard, 2010) on the validity of yes/no vocabulary tests developed by Meara and Buxton (1987), which ask test takers to self-report which words they know on checklists There are concerns that students may overestimate vocabulary on such checklists; Chall and Dale (1950, cited in Anderson & Freebody, 1981) reported that test takers tended to overestimate vocabulary at a rate of approximately 11%, and Janssens (1999, cited in Beeckmans, Eyckmans, Janssens, Dufranne, & Van

de Velde, 2001) found that a majority (69%) of learners studied overestimated their vocabulary knowledge on yes/no tests In the recent study by Stubbe, Stewart, and Pritchard (2010), scores on the yes/no tests were compared to subsequent scores on a bilingual vocabulary test of the same words using the format of VLT for the purpose of determining the potential effects of vocabulary overestimation by learners using the tests Interestingly, scores on the VLT-style test were substantially higher, with

a mean of 70.9%, compared to 50.7% on the yes/no tests (N 5 97) It was concluded from the results that, in contrast to some similar studies (e.g., Mochida & Harrington, 2006), the participants in this experiment (lower level Japanese university students) had a tendency to underestimate their vocabulary sizes on checklists However, it remained unclear to what degree the difference in scores could be accounted for by guessing effects made possible by the VLT’s multiple-choice format, leading to questions regarding the degree to which the VLT could inflate reports of words known by test takers, and how these figures could vary depending on the proportion of tested words known by learners Were we to assume students’ self-estimates were accurate and that they knew 50% of the tested words, what increase in scores could we expect due to guessing? With this parameter known, the extent to which scores differed due to genuine underestimation on the yes/no form could be determined

Trang 3

Format of the Vocabulary Levels Test

The VLT employs a format with three questions sharing six words as possible choices (see Figure 1)

In addition to the original monolingual version, which uses English definitions, there are numerous bilingual versions (see Figure 2) The dependencies on the VLT created by clustering items in groups

of 3 has been noted, though previous research has determined that the dependency does not have an overly adverse effect on test reliability (Beglar & Hunt, 1999) However, it should be noted that the principal analysis conducted by Beglar and Hunt, the Rasch model (Rasch, 1960), assumes minimal guessing Wright (1995) argues that mean square fit statistics, designed to detect inconsistencies between predicted and observed responses by test takers on items, can compensate for the lack

of a ciparameter in Rasch measurement However, underlying this view

is the assumption that guesses are entirely random in nature and that therefore the test takers’ level of knowledge does not affect the component of the score due to guessing, and while the fit statistics are useful for examining unpredicted or erratic response patterns in test takers (for example, a test taker of low ability correctly answering a difficult item), they do not account for systemic guessing effects that are consistent throughout the test and predictable at the test takers’ level of ability (Martin, del Pino, & De Boeck, 2006) Consequently, the extent to which the VLT question format is subject to consistent guessing effects has not been as thoroughly explored

FIGURE 1 Example of the VLT format, original monolingual version.

FIGURE 2 Example of the VLT format, Japanese version.

Trang 4

How does a learner’s vocabulary size (which the VLT attempts to measure) affect the accuracy rate of a learners’ guessing, and the overall score increase from guessing? Assume first that the test taker knows none

of the words to be matched and none of the distractors in a given set of three words to be matched They must then guess three times, each time with a 1/6 probability of choosing the correct answer Now suppose that the learner still knows none of the words to be matched, but does know two of the distractors within a given set They still must guess three times, but due to process of elimination, they now have a 1/4 chance of getting each guess right Clearly, then, increased levels of vocabulary knowledge can lead to increased efficacy of guessing At the same time, as knowledge levels increase, learners are more likely to know the correct answers and thus less likely to need to guess in the first place The relationship between levels of knowledge and guessing efficacy are thus somewhat complicated

FINDING THE EXPECTED SCORE GIVEN A LEVEL OF WORD KNOWLEDGE

For this study, the precise relationship between the proportion of words a student knows and their expected score was determined using elementary probability theory The assumption is that students will be able to choose the correct L1 translation or English definition given its corresponding English word if and only if they will know that English word given the definition or translation Applying Bayes’ rule and the properties of a binomial distribution, which gives the frequency distribution of the probability of a number of successful outcomes for

a number of independent trials, the following formula was derived:

SX3

i~0

X3

j~0

iz 3{i

6{i{j

3 i

  3 j

 

pizjð1{pÞ6zizj

where S is the number of sets and p is the proportion of Japanese (L1)– English word pairs that the student actually knows For the sake of conciseness, the formula contains indeterminate forms These should be evaluated to zero Algebraic simplification and a passage from the absolute score to the percentage yields the following, more elegant formula:

E~pz1

6{

p6

6

Trang 5

where E is the expected score as a percentage Note that p refers not to the proportion of words known in a given test but rather the proportion

of words known in the total set of testable words Furthermore, it should

be noted that prior research has shown that scores of known words can

be somewhat lower when the monolingual format of the VLT is used, particularly when higher frequency words are tested (Stewart, 2009) This is due to a change in the construct of what constitutes a known word between test formats For the purposes of the formula, this means that different formats of the test may correspond to differing values of p However, the formula itself may be applied to either format

Expected scores for words known adjusted for guessing given percentage of words known are listed in Figure 3

When zero words are presumed known on a 99-item test, the expected increase in points due to guessing is 16.7 Interestingly, this 16.7-point discrepancy remains nearly constant as knowledge levels increase, despite the fact that as more words are known, fewer words remain to

be guessed For example, if a student knows zero words on a 100-item test, the predicted 16.7% correctness rate for guessing would on average lead to a 16.7-point increase from guessing However, if a student knows

FIGURE 3 Expected VLT scores given percentage of words known.

Trang 6

50 of the words on the test, there are only 50 remaining words for which guessing is possible Therefore, a 16.7% correctness rate for guessing the remaining words would only lead to an 8.3-point increase overall This effect is evident also in the simplified formula for the expected score We observe from the formula that for any given level of knowledge

p, the expected observed score can be simplified to the sum of the knowledge level p and the term which expresses the expected contribution to the score due to guessing For low values of p (below 0.6), p

6

6 is below 0.01, so that the contribution due to guessing stays steady at approximately 1/6, or roughly 16.7% However, the expected contribution due to guessing falls for higher p, and indeed is zero when

p 5 1 Qualitatively speaking, as knowledge levels rise, fewer words remain to be guessed, but those that do remain can be guessed more accurately due to the process of elimination For low levels of knowledge these two tendencies cancel out, but as knowledge levels become high, the fact that very few words remain to be guessed strictly limits the expected contribution due to guessing

We note that the derivative of the expected score with respect to p is

1 – p to the fifth power This evaluates to 1 when p 5 0 and then strictly decreases to zero when p 5 1, confirming that expected score is monotonically increasing with respect to knowledge level and that, as knowledge levels become higher, additional increments have less of an effect on the observed score

While the above formula calculates the expected observed score based

on the true percentage of words known, a program was written to find the most likely true percentage of words known given an observed score and the number of questions on the test using maximum likelihood estimation Results from the program for observed scores in 5-point intervals are listed in Table 1, assuming a test with 99 items

Note that the proportion of words known that will produce a given expected score and the most likely proportion of words known if that score is observed are usually very close, especially for percentages in the middle of the range These numbers are by no means always identical, but they will become closer as the number of questions on the test increases

VALIDATION

For a statistical validation of the above formula, it was necessary to perform multiple guessing simulations on the VLT format Early attempts at simulations yielded results that varied somewhat trial by trial, and it was clear that large numbers of trials were necessary to obtain

Trang 7

reasonably narrow confidence intervals for mean score increases Therefore, a software program was written in C++ to run multiple simulations This allowed thousands of simulations to be run in a relatively short period of time

We ran 1000 trials for 100 ‘‘known’’ word rates ranging from 1% presumed known to 100% presumed known, for a total of 100,000 99-item VLT-format test simulations For the sake of brevity, mean test scores for proportions of known words are listed in intervals of 0.05 It should be noted that Table 2 reports mean score increases from guessing; the standard deviation reveals further information about the nature of the distributions Furthermore, it should be noted that the proportion

of words known is taken to be an overall measurement, of which the test

is a sample

When 0–60% of words were presumed known, the mean score increase was 16.658 points, with a mean standard deviation of 4.93 After this point, the guessing score increases began to decrease, dropping sharply when more than 0.8 of words were considered known, a relationship identical to that demonstrated in Figure 4 using the simplified formula

The lower score increases from guessing for simulations with higher proportions of known words was surprising, as the data confirmed that likelihood of guessing unknown words did increase as the proportion of known words increased

TABLE 1

Estimated Percentage of Words Known Given Observed VLT Score

Trang 8

However, it appears that the greater proportion of correct guesses is countered by a ceiling effect for guessing; for students who know more than 84% of words on a test, it is impossible to see an increase of more than 16% from guessing With 95% of words presumed known, the score

FIGURE 4 Proportion of correctly guessed words by proportion of words known.

TABLE 2

Simulation Mean Test Scores and Increases by Proportion of ‘‘Known’’ Words (N 5 1000)

Proportion

of "known"

words

Mean test

score Increase

Standard deviation

Statistic

Standard error Statistic

Standard error

0.100 100.000

Trang 9

increase from guessing is under 5, despite a very high proportion of correctly guessed words (0.89)

CONCLUSION

As proportions of known words rises, so does the probability of correctly guessing the diminishing numbers of remaining unknown words This results in a fairly consistent score increase of approximately 16–17 points on a 99-item VLT test until over 60% of words are known,

at which point the score increase due to guessing gradually begins to diminish It should be noted, however, that although means for guessing effects have been estimated to narrow confidence intervals, the standard deviations for guessing effects at most levels of vocabulary knowledge are around 5 points, meaning that individual students may see increases substantially higher or lower

Implications for Educators and Researchers

Latent trait theory models such as Rasch measurement have gained increased popularity in vocabulary testing in recent years (Laufer & Goldstein, 2004; Beglar, 2010) But whereas latent trait theory holds great promise in language testing, tests of learners’ vocabulary sizes are arguably an instance in which use of classical statistics is theoretically justified: learners’ knowledge of a sample of words from a given frequency level is polled in order to make inferences regarding the true proportion of words known at that frequency level

However, it should be noted that the multiple-choice test format employed by tests such as the VLT inflates estimates of a learners’ vocabulary size This problem is compounded by the fact that on tests such as the VLT, distractors are drawn from the same frequency level as the target word, and that therefore the probability of a successful guess cannot simply be determined by the number of distractors used For this reason the authors recommend that when possible, tests that do not employ a multiple-choice format be used, such as yes/no vocabulary tests

or productive tests of vocabulary knowledge (e.g., Laufer & Nation, 1999), in which learners provide words

Furthermore, it should be noted that studies that have tested and compared learners’ active and passive vocabulary knowledge (e.g., Laufer & Goldstein, 2004; Laufer, Elder, Hill, & Congdon, 2004) frequently employ a multiple-choice format for tests of passive recognition Whereas such studies commonly report active knowledge

as lagging passive knowledge by large degrees, the extent to which the multiple-choice format inflates scores on passive measures, thereby

Trang 10

enlarging differences in scores, should be considered by researchers when reporting results

THE AUTHORS

Jeffrey Stewart is a lecturer at Kyushu Sangyo University in Fukuoka, Japan His research interests include vocabulary acquisition and language testing.

David A White is a mathematics undergraduate at Harvard University in Cambridge, Massachusetts, United States His interests include computer programming and Japanese language.

REFERENCES

Anderson, R C., & Freebody, P (1981) Vocabulary knowledge In J T Guthrie (Ed.), Comprehension and teaching: Research reviews (pp 77–117) Newark, DE: International Reading Association.

Baldauf, R (1982) The effects of guessing and item dependence on the reliability and validity of recognition based cloze tests Educational and Psychological Measurement, 42, 855–867 doi:10.1177/001316448204200321.

Beeckmans, R., Eyckmans, J., Janssens, V., Dufranne, M., & Van de Velde, H (2001) Examining the yes/no vocabulary test: Some methodological issues in theory and practice Language Testing, 18, 235–274.

Beglar, D (2010) A Rasch-based validation of the Vocabulary Size Test Language Testing, 27, 101–118 doi:10.1177/0265532209340194.

Beglar, D., & Hunt, A (1999) Revising and validating the 2000 word level and university word level vocabulary tests Language Testing, 16, 131–162.

Birnbaum, A (1968) Some latent trait models and their use in inferring an examinee’s ability In F M Lord and M R Novick (Eds.), Statistical theories of mental test scores (pp 397–472) Reading, MA: Addison-Wesley.

Brown, J D., & Hudson, T (2002) Criterion-referenced language testing Cambridge, England: Cambridge University Press.

Chall, J S., & Dale, E (1950) Familiarity of selected health terms Educational Research Bulletin, 39, 197–206.

Huibregtse, I., Admiraal, W., & Meara, P (2002) Scores on a yes–no vocabulary test: Correction for guessing and response style Language Testing, 19, 227–245 doi:10.1191/0265532202lt229oa.

Janssens, V (1999) Over ‘slapen’ en ‘snurken’ en de hulp van de context hierbij ANBF-nieuwsbrief, 4, 29–45.

Laufer, B., & Nation, P (1999) A vocabulary-size test of controlled productive ability Language Testing, 16, 33–51.

Laufer, B., Elder, C., Hill, K., & Congdon, P (2004) Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21, 202–226 doi:10.1191/0265532204lt277oa.

Laufer, B., & Goldstein, Z (2004) Testing vocabulary knowledge: Size, strength, and computer adaptiveness Language Learning, 54, 399–436 doi:10.1111/j.0023-8333.2004.00260.x.

Martin, E., del Pino, G., & De Boeck, P (2006) IRT models for ability-based guessing Applied Psychological Measurement, 30, 183–203 doi:10.1177/ 0146621605282773.

Ngày đăng: 22/10/2022, 20:13

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w