(TIỂU LUẬN) LANGUAGE ASSESSMENT evaluative essay analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques

● Analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques.. This essay would explore the assessment target

Trang 1

UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES

VIETNAM NATIONAL UNIVERSITY

Lecturer: Cao Thúy Hồng

Hanoi, December 2021

Trang 2

● Analyze the targets of all the questions in the exam paper, the tasks in all the skills

assessment, and the question writing techniques

● Evaluate the given exam paper based on prescribed criteria in the rating scale below

● Estimate the match between the exam paper (targets, tasks) and the contents (targets, tasks) that students have learned in the 9th grade, second-semester English textbook

RESPONSE

In this essay, we would perform an analysis and evaluation of the chosen test - an term II exam paper for Vietnamese 9th graders This essay would explore the assessment targets ofthe test tasks, and analyze the test in terms of five language assessment principles and question-writing techniques

end-of-I Assessment targets

There are six units included in the coursebook (Tieng Anh 9 - Volume 2), namely Recipes

and eating habits, Tourism, English around the world, Space travel, Changing roles in society, and

My future career For each unit, six components are covered, which are vocabulary, grammar, listening, reading, writing, and speaking.

The table below demonstrates the assessment targets of the test according to the sixcomponents mentioned above

Table 1 Assessment targets

Performance Target

Genres Topics Conditions levels contents

VOCABULARY

(covered in II Reading - Exercise 1; III Writing - Exercise 1; IV Speaking - Exercise 1)

forms,

in constructing sentencesmeanings and

provided that all the lexicaluses of a

Trang 4

provided that all the lexical

course

phrases

LISTENING (covered in I Listening)

provided the audio is aroundabout the 150-200 words in length, in

information

speakers and speech is delivered relativelyteaching career slowly and clearly in standard

dialect

provided the audio is around

200 words in length, in which

teaching career above A2 level (CEFR); speech

is delivered relatively slowlyand clearly in standard dialect

READING (covered in II Reading)

information

WRITING (covered in III Writing, Exercise 2)

about a trip

remember the some cues are given

most

Trang 5

SPEAKING (covered in IV Speaking)

to five

about personal provided that related lexical

eating habits

about the provided that context is made

cooking clubs given

Trang 6

In summary, from the given test tasks, it can be concluded that this test aims at assessing 1language component - vocabulary, and 4 language skills Pronunciation can be deduced as more orless incorporated in the speaking assessment Abovementioned assessment targets cover some

important learning targets such as vocabulary, speaking, listening, reading, and writing (See

Appendix A), but noticeably miss out on assessing grammar Specifically, as an achievement test,

assessing the depth of vocabulary (Unit 7, 8, 9) is integrated in the assessment of reading, speakingand writing skills In addition, listening tasks touch upon the assessment of two major sub-skills inthe syllabus, which are listening for general and specific information (Unit 9, 12) Regardingreading skills, the tasks display a comprehensive coverage of 3 crucial sub-skills for 9th graders:reading for general and specific information, and making lexical inferences (Unit 9, 10).Concerning writing skills, the assessment target is questionable with the choice of irrelevant topiccompared to the learning targets This, accompanied with the exclusion of the subject matter in Unit

11, shows a marked discrepancy between assessment targets of the test and learning targets of thetextbook The effects of this will be discussed in the following sections

II Qualities of the test

To deduce the quality of this assessment, this section will analyze the test tasks against fivebenchmarks of a language assessment, namely reliability, validity, authenticity, washback, andpracticality

1 Reliability

A test is considered reliable when a consistent result is recorded on different occasions ofadministration (Brown, 2004) While the factor of test administration and the students themselvescannot be measured, test/retest and rater reliability can be examined based on the given test tasks.Specifically, this test showcases a considerable level of test unreliability

Firstly, the 45-minute time allowance seems too constricted considering the coverage offour language skills, which total 13 selected-response items, 12 limited-response items and 3extended-response tasks Furthermore, the mismatch between learning targets and assessmenttargets can cause test unreliability as students are expected to revise according to the predeterminedlesson objectives only In addition, poorly written test items such as writing task 2, which will bediscussed later, can interfere with the interpretation of students’ performances, leading to testunreliability

Besides, the reliability of the test is influenced by human errors and subjectivity in thescoring process (Brown, 2004) While inter-rater reliability is not an issue since this type of test is

Trang 7

rarely graded by more than one teacher, problems might arise within the scoring process itself, alsoknown as intra-rater reliability In this test, the inclusion of 3 selected-response tasks and 3 limited-

response tasks (See Table 2) entails higher intra-rater reliability However, the objectivity in scoring

of the latter can be compromising as alternative answers might occur Additionally, 3 performancetasks in the writing and speaking sections are subject to scoring subjectivity if marking rubrics arenot well-constructed The grading of students’ competencies then lays at the sole mercy of thescorer, impeding the impartial assessment of the students In this case, since there is only a samplewriting for reference, and no marking rubrics provided, it is of limited power for us to further assessthe test’s rater reliability

textbook (See Appendix A) such as identifying general and specific information, and delivering a

talk or conversation about the given topics Meanwhile, the listening and writing tasks showsubstantial room for improvement For the listening tasks, the writing of gap-filling items in Q1-2

of task 2 underrepresents the target skill of listening for specific information Simultaneously, therecording, which supposedly consists of 2 talks, is modified into scripted monologues, losing thenatural characteristics of the target situations As for the writing tasks, while the sentencecompletion task matches the target of assessing vocabulary, the paragraph writing task fails toclearly communicate the expected outcome to the students regarding the genre and topic of thewriting These shortcomings prove the test to be construct-invalid, which might hinder theinterpretation of the test scores in evaluating students’ performances

In addition, the test is also partially content-invalid While sufficient tasks are provided toassess vocabulary and four language skills on a range of topics, grammar - a crucial languagecomponent - is not assessed in any tasks Besides, the content of Unit 11 is not covered in any parts

of the test This lends itself to inadequate representativeness of the learning targets, which reducesthe content validity of the test Furthermore, the sentence completion task can be seen as an indirecttesting of vocabulary, which might lower the content validity as well (Brown, 2004) At the sametime, Q4 of this task seems out of place since it touches on a grammatical point that is not included

Trang 8

in the curriculum, and does not serve any meaningful assessment target in the entirety of the task.

Writing task 2 also touches on a topic that is not the target of writing clarified in the textbook (See

Appendix A) Generally speaking, content validity is severely underperformed in this test.

3 Authenticity

Authenticity is the degree to which test materials and test conditions present what happens

in the real target situation (Brown, 2004) In terms of authenticity, Brown (2004) suggested severalcriteria to precisely evaluate the authenticity of a test, namely language use, items, topics, thematicorganization and resemblance to real-life situations Taking these into consideration, the listeningrecording shows an appropriate use of language; however, it is adapted into scripted monologueswith little intonation and relatively slow speaking pace For the reading section, although Task 1Q2-3 are contextualized to measure students’ sub-skill of inferring the meaning of unknown wordsfrom the context, both of the texts are not provided with an authentic source Regarding thespeaking part, it successfully resembles real-world tasks in which students have to perform theirlearned knowledge and skill In addition, one advantage that the four parts have in common is thetopics of the tasks, which are meaningful and relevant to the course

4 Washback

Washback is the effect of testing on how students prepare for the test (Brown, 2004) Thereare two types of washback: positive and negative, based on whether it has beneficial or undesirableeffects on educational practices (Hughes, 2003) In this case, the analyzed test has a somewhatpositive washback as it thoroughly covers a sizable portion of learning targets identified in the

texbook However, the test fails to assess grammar as well as knowledge learned in Unit 11:

Changing roles in society Therefore, students might be perplexed when their preparation is not

reflected on the test, potentially resulting in demotivation for students in later assessment

5 Practicality

Practicality is the relationship between the resources that will be required and the resourcesthat will be available (Bachman & Palmer, 1996) The chosen test requires reasonably pricedprinting materials and well-prepared equipment such as speakers and exam papers, which cannot bepractically measured The only criterion that can be evaluated is the impracticality of the timeallowance It is challenging for students to successfully complete the test within the set time frame

of 45 minutes To elaborate, the test includes a 7-minute long recording, 150-200-word texts, and 3performance tasks, which might be impractical for teachers to administer and for students to finishthe test in the time limit Moreover, as the test does not contain an evaluation system and the

Trang 9

procedure on how teachers can administer the two speaking tasks, it is difficult to evaluate the practicality of this part in particular and the test in general.

III Question writing techniques

The analyzed test encompasses two types of assessment methods, which are response assessment, and constructed-response assessment The specifics of the question types arepresented in the following table:

selected-Table 2 Assessment methods

Multiple Choice

1

(covered in II Reading - Exercise 1)

(covered in II Reading - Exercise 2)

Extended- (covered in III Writing - Exercise 2)

- Interview: Response to open-ended questions 3response

(covered in IV Speaking - Exercise 1)

Trang 10

1 Listening

The listening tasks are constructed in the forms of gap-filling (Task 1, Q1-5; Task 2, Q1-2)and sequencing tasks (Task 2, Q3-6) The former’s task items are designed with little modifications

Trang 11

from the tapescript while the latter are synthesized to fit with the aims of listening for general ideas.With this construct, both tasks allow minimal guessing probability and highly subjective scoring.Besides, the instructions for both tasks are written clearly with a brief description of the context.However, there is no direct instruction on how to note the order of events in the sequencing task,which might pose an unnecessary challenge for students in achieving the task requirement.Regarding the input, gap-filling items of task 1 (Q1-5) are efficiently designed in a table withreasonable intervals in between This allows students to track the items easily and have enough time

to fill in one gap before the next item is mentioned Meanwhile, the writing of gap-filling items intask 2 (Q1-2) is of little service to the purpose of listening for specific information, but rather just totesting the recognition of words in use Vocabulary wise, all words and phrases in the tasks arelargely within the A2 level (roughly 90% in task 1 and 85% in task 2) Those of higher levels are

mostly taught previously, such as although (B1) and combine (B2) Advanced level words (C1, C2),

accounting for 5% in task 2, might hinder the comprehensibility of the recording

2 Reading

Both instructions for these two tasks are written clearly and briefly, which ensures the

effectiveness of the instruction The test designers also make great use of action verbs (read,

complete, circle) as well as clarify the task requirements (decide if the statements are true or false).

In terms of the topics chosen, both are relevant to the content of the course as they demand the

knowledge learned in Unit 9 and Unit 10.

Besides, there are noticeable differences between the two reading tasks Reading task 1 is amultiple choice assessment, which contributes to the subjectivity of the rating process However, asthe answers for Q4-5 are directly taken from the passage without being paraphrased, there is a fairchance that students can guess the answers correctly without regard to the comprehension of thetext In terms of the input, approximately 20% of the words in passage 1 are above A2 level with

only 1 word above B2 level (See Appendix C) Given the majority of those are included in the

syllabus, the chosen passage is generally comprehensible to the targeted learners Meanwhile,reading task 2 takes the form of a true/ false assessment, thus requiring students’ understanding ofthe text and enabling subjective grading Regarding the input of text 2, although approximately 23%

and 12% of the words are at B1 and B2 level respectively (See Appendix C), the majority of the words at B2 level have already been taught to students in Unit 10 such as launched, missions,

telescopes (See Appendix B), which improves the comprehensibility of the text However, there are

several spelling and grammatical mistakes shown in the text, such as accommodate, flybies, serve

as space environment… In terms of layout, task 2 has an effective layout as it requires students to

Trang 12

circle the correct answers instead of writing them down, which creates the uniformity andsubjectivity for the answers.

of the test tasks It is highly likely that students will struggle to deliver the requirements of the tasksince task specifications are not clearly communicated Apart from that, the cues provided aresufficient and contain simple languages appropriate for students of A2 level

4 Speaking

There are several problems that need adjusting in the instructions of the speaking section.Both of them are unclear, informal, and lengthy, which might cause confusion to the test-takers Aclear context for the role-play in task 2 is also missing In response, task 2 should include anexample as its requirement is rather complicated and might cause students difficulty performing it.Regarding the assessment method, task 1 is designed as an interview, requiring students to respond

to open-ended questions while task 2 is a paired test in which two students have to carry out aconversation with given prompts These two formats are particularly well-suited for assessing thegrasp of vocabulary and speaking skills set out in the targets Nevertheless, the test designers shouldelaborate more on how the examiners can successfully conduct the tasks given the tense time limit

of the test Besides, both speaking tasks contain simple vocabulary, which is suitable for students of

A2 level In terms of topic, the focus is placed upon Unit 7: Recipes and eating habits, which is

relevant to the course and interesting to students

Tiêu đề	Language assessment evaluative essay analyze the targets of all the questions in the exam paper, the tasks in all the skills assessment, and the question writing techniques
Tác giả	Nguyễn Minh Ngọc, Phạm Đỗ Nguyên Hương
Người hướng dẫn	Lecturer Cao Thúy Hồng
Trường học	University of Languages and International Studies, Vietnam National University, Hanoi
Chuyên ngành	English Language Teaching
Thể loại	Evaluative essay
Năm xuất bản	2021
Thành phố	Hanoi

Định dạng
Số trang	22
Dung lượng	65,26 KB