tasks or items • Test construct the underlying ability/trait being captured/measured by the test • A sample of the TLU tasks: choosing the most characteristic tasks from the domains • Re[r]
Trang 2Module 5119 Language Testing and Assessment
Week 3
Stages of test development and test
specifications
Trang 3From Week 2
Test usefulness model
• Reliability - consistency of measurement
• Construct validity - appropriateness of score
interpretation
• Authenticity - linking test tasks to TLU tasks
• Interactiveness - interaction between language
ability and test tasks
• Impact - on individuals, teaching, educational
institutions and society
• Practicality - consideration of resources
Trang 4Quiz
In a small group, decide how you would evaluate
each of the 12 given assessment scenarios, according the 6 factors listed Fill in the chart with 5-4-3-2-1
scores, with 5 indicating that the principle is highly
fulfilled and 1 indicating very low or no fulfillment
Use your best intuition to supply these evaluations, even though you don’t have complete information on each context Report your group’s findings to the rest
of the class and compare
Trang 5Stages of Test Development
Trang 6Thinking stage (initial ground-clearing)
• Establish the major purpose of a test
(achievement/placement?)
• Determine appropriate objectives (sample on next slide)
cannot test each one,
choose a possible subset of the objectives to
test
• Theories of language and language use
(e.g., linguistic competence, communicative
competence)
Trang 8Thinking stage (cont.)
Trang 9Understanding the constraints
Trang 10Writing stage
Writing test content in light of test specifications
• What goes into the test or what the test contains (i.e tasks or
items)
• Test construct (the underlying ability/trait being
captured/measured by the test)
• A sample of the TLU tasks: choosing the most characteristic
tasks from the domains
• Relevance, coverage and authenticity of tasks
• Content selection involves test methods
Trang 11Test methods
• Presentation of materials or tasks
eliciting responses (prompting)
• Response format
The way in which candidates will be required to response
to/engage with the test materials
– Fixed response format (e.g., MCQ, True/False) (-$) – Constructed response format (e.g., cloze-test, short answer
questions) (+$) – Authenticity of task format and response
• Scoring method
how candidate responses will be rated or scored
Trang 13Issues of authenticity
• Simulation of the real-world tasks and settings
• Direct inference from test performance to likely
TLU domain
• Principled compromise
Trang 14Issues of authenticity (Cont.)
Examples
• IELTS speaking tasks
• Academic listening test - listening to lectures
and note-taking
– The question of interruption
– What to note and what not to
– Pre-set questions
Trang 15Piloting stage
Trying out the test:
• Analysis of the trial data
• Establish reliability and validity
• Understanding perceptions of test takers
• Revising the test in light of analysis and
feedback
• Selection and training of raters
Trang 16Implementation stage
Trang 17Test specifications
Trang 18Test Specifications (Hughes, 1989)
• Format and Timing
• Criterial levels of performance
• Scoring procedures
• Sampling
• Item writing and moderation
• Pretesting
Trang 19• Addressees: kind of people that test takers are
expected to write or speak to
• Topics: topics selected according to suitability for the
test takers and the type of test
Trang 20Format and Timing
• Test structure (including time allocated to
components)
• Item types procedures, with examples
• What weighting assigned to each component
• How many pages will be presented (for
reading or required (for writing)?
• How many items for each component?
Trang 21Criterial levels of performance:
Required level(s) of performance for success to be specified
Trang 22Item writing and moderation
Critical questions that may be asked:
• Is the task perfectly clear?
• Is there more than one possible correct response?
• Can test-takers arrive at the correct response without having the skill supposedly being tested?
• Do test-takers have enough time to perform the
tasks?
The best way to identify items that have to be improved or abandoned is thru teamwork/ collaborative work
Trang 23What should test specifications look like?
• What is purpose of the test? (placement? achievement?)
• What sort of learner will be taking the test? (age, sex, level?)
• How many sections/papers?
• What target language situation is envisaged for the test?
• What text types should be chosen? (written or spoken?)
• What language skills should be tested? (micro skills?)
• What language elements? (gram structures/features?)
• What sort of tasks are required? (simulated authentic?)
• How many items are required for each section?
• What test methods? (multiple choice, gap filling?)
• What rubrics are used as instructions for candidates? (examples )
• Which criteria to be used for assessment by markers? (accuracy,
appropriacy?)
Trang 24Test specifications
The specification is based on Criterion-Referenced Measurement
(CRM) as opposed to Norm-Referenced Measurement
(NRM)
NRM: Concerned with determining the relative standing or rank
order of test-takers (scores in percentiles)
individual performance is evaluated in terms of its typicality for the population in question ( How good was it compared
with the performance of others?)
CRM: Concerned with determining the absolute standing of test
takers in relation to the criterion that is tested (scores refer
to the extent of the domain or criterion mastered) : Did it
meet what was required?
Trang 25• individual performance is evaluated in terms of
its typicality for the population in question (How good was it compared with the performance of others?)
• Uses a comparison between individuals as a
frame of reference
• Requires a score distribution
• Typically associated with comprehension tests or
tests of grammar and vocabulary
Trang 26• Concerned with determining the absolute standing of test takers in relation to the
extent of the domain or criterion mastered):
Did it meet what was required?
Individual performances are evaluated against a verbal description of a satisfactory performance at a given level
Trang 27• A series of performance goals set for individual learners
→ learners can reach these at their own rate
• Motivation is maintained,
• Striving is for a ‘personal best’ rather than against other learners
• typically involves judgements as to how a performance should be classified
• includes the indices of the quality of raters (e.g., inter-rater
reliability indices, classification analysis)
Trang 28
Sample Test specifications
• Handouts
Trang 29On the basis of experience or intuition, try to write a specification for a test designed to measure the level
of language proficiency of students applying to study
an academic subject in the medium of English at an overseas university Compare your specification with those of tests which have been actually constructed for that purpose For example, you might look at ELTS and TOEFL If specifications are not available, you will have to infer them from sample tests or past papers