Stages of Test Development

tasks or items • Test construct the underlying ability/trait being captured/measured by the test • A sample of the TLU tasks: choosing the most characteristic tasks from the domains • Re[r]

Trang 2

Module 5119 Language Testing and Assessment

Week 3

Stages of test development and test

specifications

Trang 3

From Week 2

Test usefulness model

• Reliability - consistency of measurement

• Construct validity - appropriateness of score

interpretation

• Authenticity - linking test tasks to TLU tasks

• Interactiveness - interaction between language

ability and test tasks

• Impact - on individuals, teaching, educational

institutions and society

• Practicality - consideration of resources

Trang 4

Quiz

In a small group, decide how you would evaluate

each of the 12 given assessment scenarios, according the 6 factors listed Fill in the chart with 5-4-3-2-1

scores, with 5 indicating that the principle is highly

fulfilled and 1 indicating very low or no fulfillment

Use your best intuition to supply these evaluations, even though you don’t have complete information on each context Report your group’s findings to the rest

of the class and compare

Trang 5

Stages of Test Development

Trang 6

Thinking stage (initial ground-clearing)

• Establish the major purpose of a test

(achievement/placement?)

• Determine appropriate objectives (sample on next slide)

cannot test each one,

choose a possible subset of the objectives to

test

• Theories of language and language use

(e.g., linguistic competence, communicative

competence)

Trang 8

Thinking stage (cont.)

Trang 9

Understanding the constraints

Trang 10

Writing stage

Writing test content in light of test specifications

• What goes into the test or what the test contains (i.e tasks or

items)

• Test construct (the underlying ability/trait being

captured/measured by the test)

• A sample of the TLU tasks: choosing the most characteristic

tasks from the domains

• Relevance, coverage and authenticity of tasks

• Content selection involves test methods

Trang 11

Test methods

• Presentation of materials or tasks

eliciting responses (prompting)

• Response format

The way in which candidates will be required to response

to/engage with the test materials

– Fixed response format (e.g., MCQ, True/False) (-$) – Constructed response format (e.g., cloze-test, short answer

questions) (+$) – Authenticity of task format and response

• Scoring method

how candidate responses will be rated or scored

Trang 13

Issues of authenticity

• Simulation of the real-world tasks and settings

• Direct inference from test performance to likely

TLU domain

• Principled compromise

Trang 14

Issues of authenticity (Cont.)

Examples

• IELTS speaking tasks

• Academic listening test - listening to lectures

and note-taking

– The question of interruption

– What to note and what not to

– Pre-set questions

Trang 15

Piloting stage

Trying out the test:

• Analysis of the trial data

• Establish reliability and validity

• Understanding perceptions of test takers

• Revising the test in light of analysis and

feedback

• Selection and training of raters

Trang 16

Implementation stage

Trang 17

Test specifications

Trang 18

Test Specifications (Hughes, 1989)

• Format and Timing

• Criterial levels of performance

• Scoring procedures

• Sampling

• Item writing and moderation

• Pretesting

Trang 19

• Addressees: kind of people that test takers are

expected to write or speak to

• Topics: topics selected according to suitability for the

test takers and the type of test

Trang 20

Format and Timing

• Test structure (including time allocated to

components)

• Item types procedures, with examples

• What weighting assigned to each component

• How many pages will be presented (for

reading or required (for writing)?

• How many items for each component?

Trang 21

Criterial levels of performance:

Required level(s) of performance for success to be specified

Trang 22

Item writing and moderation

Critical questions that may be asked:

• Is the task perfectly clear?

• Is there more than one possible correct response?

• Can test-takers arrive at the correct response without having the skill supposedly being tested?

• Do test-takers have enough time to perform the

tasks?

The best way to identify items that have to be improved or abandoned is thru teamwork/ collaborative work

Trang 23

What should test specifications look like?

• What is purpose of the test? (placement? achievement?)

• What sort of learner will be taking the test? (age, sex, level?)

• How many sections/papers?

• What target language situation is envisaged for the test?

• What text types should be chosen? (written or spoken?)

• What language skills should be tested? (micro skills?)

• What language elements? (gram structures/features?)

• What sort of tasks are required? (simulated authentic?)

• How many items are required for each section?

• What test methods? (multiple choice, gap filling?)

• What rubrics are used as instructions for candidates? (examples )

• Which criteria to be used for assessment by markers? (accuracy,

appropriacy?)

Trang 24

Test specifications

The specification is based on Criterion-Referenced Measurement

(CRM) as opposed to Norm-Referenced Measurement

(NRM)

NRM: Concerned with determining the relative standing or rank

order of test-takers (scores in percentiles)

individual performance is evaluated in terms of its typicality for the population in question ( How good was it compared

with the performance of others?)

CRM: Concerned with determining the absolute standing of test

takers in relation to the criterion that is tested (scores refer

to the extent of the domain or criterion mastered) : Did it

meet what was required?

Trang 25

• individual performance is evaluated in terms of

its typicality for the population in question (How good was it compared with the performance of others?)

• Uses a comparison between individuals as a

frame of reference

• Requires a score distribution

• Typically associated with comprehension tests or

tests of grammar and vocabulary

Trang 26

• Concerned with determining the absolute standing of test takers in relation to the

extent of the domain or criterion mastered):

Did it meet what was required?

Individual performances are evaluated against a verbal description of a satisfactory performance at a given level

Trang 27

• A series of performance goals set for individual learners

→ learners can reach these at their own rate

• Motivation is maintained,

• Striving is for a ‘personal best’ rather than against other learners

• typically involves judgements as to how a performance should be classified

• includes the indices of the quality of raters (e.g., inter-rater

reliability indices, classification analysis)

Trang 28

Sample Test specifications

• Handouts

Trang 29

On the basis of experience or intuition, try to write a specification for a test designed to measure the level

of language proficiency of students applying to study

an academic subject in the medium of English at an overseas university Compare your specification with those of tests which have been actually constructed for that purpose For example, you might look at ELTS and TOEFL If specifications are not available, you will have to infer them from sample tests or past papers

Định dạng
Số trang	29
Dung lượng	1,16 MB