Scope of the study The scope of this thesis is limited to a research on teachers’ and test-takers” cvaluation of the cxisting achicvement test in terms of its face validity for the sceon
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF POST — GRADUATE STUDIES,
SUBMITTED BY: TRAN THỊ THU HƯƠNG
A thesis submitted in partial fulfillment of the requirements
for the degree of Master of Arts
EVALLATING TIDE VALIDITY OF ‘TLE FINAL ACLUEVEMENT TEST FOR SECOND — YEAR NON - MAJOR STUDENTS AT ELECTRONIC —
ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH
UNIVERSITY OF TECIINOLOGY EDUCATION
(Đánh giá độ giá trị cúa bài kiểm tra cuỗi kỳ cho sinh viên không
chuyên tiếng Anh năm thứ hai tại khoa Diện — Diện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Định)
M.A MIXOR THESIS
Field: Language Teaching Methodalogy
Code: 60 14 10
HANOI, 2011
Trang 2
VIETNAM NATIONAL UNIVERSITY, HANOT
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF POST — GRADUATE STUDIES:
SUBMITTLD BY: TRAN TL THU UỐNG
A thesis submitted in partial fulfillment of the requirements
for the degree of Master of Arts
EVALUATING THE VALIDITY OF ‘THE FINAL ACHIEVEMENT TEST FOR SECOND — YEAR NON — MAJOR STUDENTS AT ELECTRONIC —
ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH
UNIVERSITY OF TECHNOLOGY EDUCATION (Đánh giá độ giả trị của bài kiểm tra cuỗi kỳ cho sinh viên không chuyên tiếng Anh năm thứ hai tại khaa Điện — Điện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Dinh)
M.A MIXOR THESIS
Field: Language Teaching Methodology Code: 60 14 10
Supervisor: Pham Lan Anh, M.A
HANOI, 2011
Trang 3
vi
TABLE OF CONTENTS DECLARATION
2.4.3, Reasons for choosing face Validity esnsesmnenenmuntenminnentsemenennsie
2.5 Some measures to increase face validity - - eos dB
Trang 4CHAPTER 3: THE STUDY 20 3.1 English learning and teaching at Nam Dinh University of Technology Education 20
3.1.2 The English teaching staff 20
3.1.6 Difficulty level and discrimination of the final test 24
3.2 English testing at Nam Dinh University of Technology Education 4
3.2.1 Testing situation 24
3.3.1, Survey questionnaire t8412.0148008M230E:610600.Ø0 ean 26
3.4, Data analysis of survey questionnaires and interviews eesceum 26 3.4.1 Data analysis of the administration of the test _¬ see 27
3.4.1.2 Data analysis of the logistics of the test ” ‘i TỶ -„28
3.4.2 Data analysis of face validity of the test 252 22221ecce 29
3.4.2.1 Data analysis of general opinion about the test 8 3 30 3.4.2.2 Data analysis of reading comprehension task - ey |
3.4.2.3 Data analysis of grammar knowledge task _ anne
3.4.2.4 Data analysis of translation task ni
3.5 Discussion and findings "_ "¬
3.5.1 Similaritles im teachers and stuđents” pereeption -.5522 - 525 3Ổ
Trang 6iv
LIST OF FIGURES, TABLES AND CIIARTS
Figures
Figure 1: ‘Three Considerations for ‘Test Choice
Figure 2: The Scope of Tnpael of Language Tesls
Figure 3: Relationship between reliability and validity
Tables
Table 1: ESP syllabus content allocation
Table 2: Specification of test 12
‘Table 3: Teachers and students’ comments on the format of the test
Table 4: Teachers and students’ comments on the adnuinistration of the test
Table 5: Teachers and students’ comment on the whole test
‘Table 6: eachers and students’ comment on students’ reading comprehension ability, themne and instruction of the reaiting comprohensian bask
Table 7: Teachers and students’ comment on the grammar task
Table 8: Teachers and students’ comment on the translation task
Chart 3: Percentage of the results which students get in the test
Chart 4: Percentage of whal test tasks studerils cannal do
Chart 5: Percentage of tcachers and students’ comment on the length of the reading lox Chart 6: Students’ comment on whether or not the reading text is diftieult,
Trang 7LIST OF ABBREVIATIONS
NUTE: Nam Dinh University of Technology Education
EGP: English for General Purposes
ESP English for Specific Purposes
Trang 8
lmguags abilily is through a tanguage tosl In education, cspacially at the Facutly of Foreign Languages at Xam Dinh University of Technology Education (NUTE), testing the
students’ achievement toward teaching objectives is needed Without an achievement test,
itis difficult to see how rational educational decisions can be made
NUTE is a technological university and students’ Icamning English ability is really low, Evaluating a test is also one of the methods to improve students’ learning process and results, Although there were many theses mentioned this problem but at NUTE it is still now and very necessary because how to evaluate the test after cach semester still rcecives litfle attention and up to now the process of test analysis after each examination hasn't been fully investigaled, Consequently, students’ resulls are gelling worse and worse As a teacher myself, I see that we, teachers at NUTE, just stop at experienced level of test making procedure, test administration, test marking procedure during and after examination We do not care testing evaluation from other teachers and students, so the rosulls of test are nol still improved
Trang 91.2 Scope of the study
The scope of this thesis is limited to a research on teachers’ and test-takers” cvaluation of the cxisting achicvement test in terms of its face validity for the sceond-year non-English major students at Electronic - Electrical Engineering Department, NUTE due
lo the limitations in time, ability and availabilily of data, Moreover, it is impossible for the author to cover alll used final achievement tests as well as design a sample achievement test for second-year students Instead, only a test specification for test 12 in semester 3 is presented
1.3 Aims of the stuily
Following the scope of the research above, the aims of this research are:
1 To indentify the English teachers and students’ evaluation of the final existing
achievement tesl (lest 12) at NUTE in terms of face validity
2 To provide suggestions for test designers
1.4 Methods of the study
In order to achicve the above aims, the study has been cazzicd out as follows:
First, the author goes to library to read theory about assessment and testing, achievement test with characteristics of a good achievement test and test validity with a spocial focus on face vatidity and sore moasuras to inereasc iL, From her critical reading, many reference materials have been gathered, analyzed, and synthesized to draw out a
Trang 10theoretical basis to evaluate the current test being used for the 3°! semester students in
lerms af ils face validity,
Then, qualitative methodologies involving data collected through survey
questionnaires and interviews were employed from both teachers and students at NU’
1.5 Research questions
This study is implemented to find answers to the following research questions:
1 What are the teachers’ and test takers” (students’) perceptions of the final 3”
English achievernent test at NUTE in terms of its face validity?
2 What are suggestions to improve face validity of the final 3" semester English
achievement test at NLFTE?
1.6, Design of the study
The thesis is divided into four major chapters:
Chapter 1: Introduction presents basic information such as: the rationale, the scope, the aims, the method, the research questions and the design of ths study
Chapter 2: Literature review reviows thcorctical backgrounds on cvaluating a test, which includes relationship between teaching, learning and assessment, purposes of formative and summative assessments, achievenent lests, characleristies of good RGP and ESP tests, face validity and some measures to increase face validity
Chapter 3: ‘The study is the main part of the thesis showing the context of the study and the
detailed result oblaimed from collected lests and findings in response Lo the research
questions Then, [hs author gives some solutions to improve ths final avhisvernont test
Chapter 4; Conclusion offars conclusions and proposes some suggestions for futher research on the topic
Trang 11CIIAPTER 2: LITERATURE REVIEW
This chapter provides an overview of the theoretical background of the study It inchides three main sections Section 2.1 discusses the relationship between teaching,
anont, Scetion 2.2 focuses on the purposes of formative ass
sunumative assessment Section 2,3 gives a brief desctiption of achievement tests,
characteristics of a good GP test and SP test It is then followed by section 2.4 in which
face validity is focused, Finally, section 2.5 suggests some measures to inercase face validity
2.1 Relationship between teaching, learning and assessment
In the relationship betwssn teaching, leaming and assessment, curriculum and content standards also play an important role, Curriculum is best characterized as what should take place in the classroom It desoribes the topics, themes, units and questions comlained within the conlent standards, Content standards ara the framework for curiculum, Curriculum can vary fom programs to programs, as well as from instructors to instructors Unlike content standards, curriculum focuses on delivering the “big” ideas and
concepts tht, the content standards idonlify as necessary for thơ leamer lo understand and apply, Curriculum seives as a guide for instructors; addressing teaching techniques, recommending activities, scope and sequence, and modes of presentation considered most effective In addition, curriculum indicates the textbooks, materials, activites and cquipment thal help lewners aehieve the content standards best In the leaching and leaming process, assessment 15 a tool to give the nature of evidence required to demonstrate that the content standards have been met To ensure valid and reliable accountability, the assessment sclected should test the state standards Clcarly, assessment, curriculum and content standards have close relationship; assessment is the basis to give the content standards and curriculum is generalized ftom the content slandards
Longman dictionary of language teaching and applied linguistics (the 3° edition) (Richard, etc., 2005) defines assessment as “a systematic approach to collecting information and making references about the ability of a student or the quality or success
of a teaching course an the basis of various sources of evidence”.
Trang 12Assessment is a critical link for teaching and learning, which also plays a vital role
in the process of curiculum design and teaching implementation, From the perspective of the behavior research in classroom teaching, Richards & Nunan (1990) hold that assessment refers to the sel of processes through which we make judgments about a leamer’s level of skills and knowledge Assessment should:
- _ Insure rehability and validity,
+ Provide for pre-, while- and post-lesting;
- Be criterion — or standards — referenced,
- Inform instruction;
~ Scive as an accountability measure,
+ Be adaptable to a variety of instructional environments;
- Accommodate leamers with special needs
Various assessment measures are known to all, like evaluation, examination, questionnaire, inlerviow, discussion and observation, so on, And testing is the most available means to implement the assessment in the teaching procsss In Rrown’s (2001) view, in the curriculum system, it needs analysis, objectives, testing, materials, teaching and evaluation, So does Richards (1990) say, the language cnmiculurn exploration nocds analysis, goods and objectives, syllabus design, methodologies, testing and evaluation Both of them emphasize the importance of testing
Bachman (1990; 20) defines the term “test” as “a measurement instrument designed
lo elicil a specific sample of an individual’s betavior” The definition provides the basis and general of tests Oller (1979: 1 defines language test as an instrument that attempts to measure the extent to which students have leamed in a foreign language course, From the lwo definitions, this rescarch agrees thal hmgnage lost is a sol of instruments in forms of
in the beginning of the Tinglish courses Feedback and feedforward are very important in the teaching and leaming process, The author expresses the relationship between feedback
Trang 13and feedforward through an example of catching a ball When we move to catch a ball, we
must interpret our view of the ball’s movement to estimate its future trajectory Our
attempt to catch the ball incorporates this anticipation of the ball’s movement in determining our own movernent, As the ball gets closer, or exhibits spin, we may find il departing from the expected trajectory, and we must adjust our movement accordingly It means that feedforward will help teachers to give the anticipated problems at the begining
so thal students can fee] marc
In addition, testing may bring many impacts on teaching and learning, Hughes (1989: 01) calls the offoct of lesting on teaching and taming as “backwash” He approviates the role of
‘backwash in the teaching-leaming process, Rackwash can be harmful if the test content doesn’t
go with the objectives af the course, It Ieads to the problem of teaching in one way and testing in
another way and vice versa, However, backwash necd not always be lari, il can be positive, too A test which would be based directly on the needs of a specific group of leamers will be useful for them to perform in real life
In view of the mportant role of language test in education system, Shohamy (2001 2) emphasizes that “language lests need lo be of high quality and follow careful tules of science of psychometries.” In other words, a good language test must present accurate answers to the test takers in reference to the aspect of knowledge that 1t measures Furthermore, a high-qualily language test must be teliable and valid so as lo give precise information on the lest lakers” language abilily Language tes may differ according lo the
purposes of their design and how they are designed (see figure 1)
Purpose
Trang 14Figure 1: Three Considerations for Test Choice
A test’s intended impacts refer to the efffects that the test designer intends (see figure 2) Bachman and Palmer (1996) point out entilics polcntially affected by a lest inelude
individuals (students and teachers), language classes and programs: and socicty
Impact
am student sludent, sludent, student,
Individual and teachers, teachers, - teachers, student teachers classes, and
institutions institutions
and society Figure 2: The Scope of Impact of Language Tests
Obviously, the importance of testing can not be denied In detailed, this research focuses on testing English for Specific Purposes esting (ESP) ESP has been playing an important role in teaching and learning SP at universities now Krom the early 1960s, KSP has grown to became one of the most promingmt arsas of English forcign language teaching, This development is reflected in an increasing number of publications, conferences and joumals dedicated to ESP discussions Similarly, more traditional general English courses gave place to courses aimed al specific arcas, for example, Engtish far Business Purposes In addition to the emergence of ESP, a strong need for testing of specific groups of learners was created As a result, ESP testing movement has shown a slow but definite growth over the past few years Obviously, ESP testing and EFL testing are very indispensable in the teaching and leaming, process,
Trang 15‘To sum up, the relationship between teaching, learning and assessment are
correlated because
sting, Ieaching and learning are nol separate culifics A good lest can
be used as a valuable teaching and learning device Teaching has always been a process of helping others to discover “new” ideas and “new” ways of organizing that what they have learned Whether this process lakes place through systematic teaching, learning and tssling,
or whether it is through a discovery approach, testing was, and remains an integral part of teaching and learning,
2.2 Purposes of formative and summative assessments
‘As said above, assessment is the process of documenting to measure knowledge,
skills, attitudes and beliefs ‘here are many assessments collected in a course such as:
continuous assessmenl, formalive asscssment, summative assessment, pocr-assessment,
self-assessment and so on However, in tlus research the author will focus on the relationship between two main kinds of assessment: formative assessment and summahve assessment "As coach and facilitator, the teacher uses formative assessment to help support and enhance student learning As judge and jury, the teacher makes summative judgments about a student's achievement " (Atkin, Black & Coffey, 2001)
Formative assessment is designed to provide feedback and feedforward to studenls and instructors for the purpose of the development of teaching and leaning From a student's perspective, formative assessment provides information on a student's performance, how they are progressing wilh the skills and knowledge veqnired by a particular course and the problems which they will have in a course Goucrally the results
of formative assessment do not contribute to a student's final grade but are purely for the purpose of assisting students to understand their strengths and weaknesses in order to work towards improving their overall performane, From an insiruolos perspcolive, formative
assessment is a diagnoshe tool that can be used to evaluate the effectiveness of course and curriculum design Formative assessment has the potential to highlight areas in which teaching and curriculum design necds to be improved as well as any arcas where teaching methods have been very effective in improving student, ‘I'he sample tests in this kind are diagnostic test and placement test Placement test is used at the beginning of a course to indontify a sluden’s level of language and find the best class for then Diagnostic test is used to identify problems that students have with language The teacher diagnoses the
Trang 16language problems students have It helps the teacher to plan what to teach in future and provide students with the anticipated problems and solutions
The purpose of summative assessment is to provide “a sampling of student achievements which lead to.a meaningful statement of what they know, understand and can
do" (Brown & Kmghl, 1999: 37) Generally summative assessment occurs al Ihe end of a
topic or the end of a course in order to evaluate how well students have acquired the knowledge and skills presented in that section or during the complete course
Achievement tes mnalive asscssmnerit
s w Iypieal sample in thi
Clearly, the relationship between formative and summative assessment is cohesive which is expressed through their purposes, the teacher needs to use both to evaluats the sludent’s ability and crhance th: quatity of toaching and leaming The lcacher has to indemtify student’s problem fo assign student’s level, adjust teaching method and finally test to know how well students have acquired the lesson
2.5 Achievement tests and their characteristics
There arc two above mentioned asscssments and in this rescarch, the author only uses summative assessment because of its purpose ‘his research evaluates the final ESP
test, so summualive ass
school levels
In the view of Sparatt (1985: 145), he supposes that “an achievement test is one of the means available to teachers and students alike of assessing progress [t is the aim and conten! of an achievement tesl thal distinguishes it from other kinds of lest”
David (1999: 2) also shares an 1dea that “achievement refers to the mastery of what has been leamt, what has been taught or what is in the syllabus, textbook, materials, ete
An achievement test therefore is an instrument designed to measure what a person has learnt within or up to a given time”
Similarly, Brown (1994b:259) proposes a concept that “An achievement test is related dircctly to classroom Tesson, units or even a total cuniculum They are limited to particular materials covered in a cuniculum within a parteular time frame” Unlike progress test, achievement test should attempt to cover as much of the syllabus as possible
Trang 17If we confine our test to only part of the syllabus, the contents of the test will not reflect all that has been learned,
There are two kinds of achievement tests: final achievement test and progress
achievement test
Progress achievement tests (short-lerm achievernent lesls) are always administered
duzing the course after a chapter or a term, and often written by the teacher, These tests are
of course based on the teaching program Hughes (1900:12) claims “these tests are
intended to maasure the progress that students ars making” In other words, progress achievement tests are supposed to help the teachers to judge the degree of success of his or her teaching and help to find ont how mmch students have gained ftom what has been IaughL Aceordinply, te loachers can identify the woaknoss of tha teamnors or diagnose the areas not properly achieved dunng the course of study, In the other hand, for students, this test can be regarded as a usefill device that provides the students with a good chance to perform the target language in a positive and effective manner and to gain additional confidence in doing them This way can be a good preparative and supportive step towards the final achievement test for the students because they will get familiar with the tests and
the stralegy to do ther
Final achievement tests (longer term achievement tests) are those administered at
the end of a course of study They may be written and administered by ministies of education, officiat exantining boards, or by members of leaching institutions They ara
used Lo check how well lenners have đong allor a whole course in terms of objective and content of the course, Therefore, according to Hughes (1990-11), there are two kind of ñnal achievement tests: syllabus-content approach and syllabus-objective approach
The syllabus-conlent approach is based dircefly on a detailed course syllabus or on the books and other material used The test only contains what it is thought that the students have actually encountered, and thus can be considered, in this respect at least, a fair test The disadvantage of this type is that if the syllabus is badly designed, or the books and other materials are badly chosen, then the results of a test can be very misleading
Trang 18forces course designers to elicit course objectives Second, students on the test can show how far they have achieved those objectives Thi in tum puts pressure on those who are
responsible for the syllabus and for the selection off books and materials to ensure that these are consistent with the course objectives Tests based on course objectives work
against the perpetuation of poor leaching practice The author believes that lest content
based on course objectives is much preferable, which provides more accurate information about individual and group achievement, and is likely to promote a more beneficial
‘backwash fect on (caching
2.3.2 Characteristics of a good EGP test
In order to make a well-cesigned test, teachers have to take into account a variety
of factors such as the purpose of the test, the content of the syllabus, the studgnts” background, the goal of administrators and so forth, Moreover, test characteristics play a
‘very important role m constructing good English for General Purpose (EGP) tests The most important quality of a test is its usefulness The usefulness quality generally consists
of 4 main components: reliability, validity, practicality and weshback
Reliability has been defined in different ways by different authors Berkowitz, Wolkowitz, Fitch and Kopriva (2000) define rsliability as “the degree to which lest scores for a group of test takers are consistent over repeated applications of a measurement produce and hence are inferred to be dependable and repeatable for an individual test taker” Bachman (1990: 24) considers lest reliabilily as “a qualily of tgsl score Clearly,
both vicws scores ablaimed on a test Rvery test should
be reliable, If a group of students were to take the same test on two occasions, their results should be roughly the same — provided that nothing has happened in the interval ‘Thus if
the students” (lis are very different, the cannot be described as reliable
Validity reters to the degree that a test actually measures what it was designed to
measure Validity is often discussed under the headings: face, content, construct and
criterion-related
Content validity
‘This is non-statistical type of validity that involves “the systematic examination of
the test content to determine whether il covers a representative sample of the behavior
domain to be measured” (Anastasi & Urbina, 1997: 114) A test has content validity built into it by careful selection of which items to include Items are chosen so that they comply
Trang 19with the test specification which is drawn up through a thorough examination of the subject domain Foxerall ef al (2004: 49) notes thal by using a panel of experts lo review the Lest
specifications and the selection of items the content validity of a test can be improved, The
experts will be able to review the items and comment on whether the items cover a
representative sample of the behavior domain
Construct validity
A test has construct validity if it accurately measures a theoretical, non-observable
construct or trait, The construct validily of a tes! is worked oul over a period of time on the
basis of an accumulation of evidence There are a number of ways to establish construct
validrty Iwo methods of establishing a tests construct validity are convergent/divergent validation and factor amulysis,
‘A test has convergent validity if it has a high correlation with another test that measures the same construct By contrast, a test’s divergent validity is demonstrated
through a low correlation with a test that measures a different construct
Factor analysis is a complex statistieal procedure which is conducted for a varicty
of purposes, one of which is to assess the construct validity of a test or a number of tests
Face validity
Hughes (1989) defines “a test is said to have face validity if it looks as it is
measures what it is supposed to measure Anatasi (1982: 136) pointed out that face validity
is nol validity in technical sense: il refers, nol to what the test actually measures, tut to
what il appears superficially measure
Face validity 1s very closely related to content validity While content validity
depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion, face validity relates to whether a test appears to be good measure or nol
Criterion-related validity
Criterion-related validity is a concern for tests that are designed to predict
somicone’s status on an cxtcrnal criterion measure A test has critcrion-rclated validity if it
is useful for predicting a person’s behavior in a specified situation Criterion-related
validity consists of two types (Davies, 1977): concurrent validity and predrcative validity
In conerrrent validalion, the predictor and criterion data arc collecied al or aboul
the same time This kind of validation is appropriate for tests designed to assess a person’s
current criterion status It is good diagnostic screening tests when you want to diagnose
Trang 20InPredictive validation, the predictor scores are collected first and criterion data
are calle cd al some Taler/futurs point This is appropriate for te s designed to
person’s future status on a criterion
Practicality is the ability of a test to be easy to construct, administer, score and
inlerpret, A lest must be carefully organized well in advance [ow long will the test lke?
What special arrangements have to be made (for example, what happens to the rest of the
class while individual speaking test take place)? Is any equipment needed (tape recorder,
language lab, overhsad projector)? How is marking the work handled? How ars tests stored
between sittings of tests? All of these questions are practical since they help ensure the
success of a test and testing (Heaton, 1988; Hughes, 1997, Carroll & Hall, 1985)
The last important factor in tong is backwash or washhack effect Washback is the effect of testing on the teaching and learning processes Washback can be harmfil of beneficial If a test is regarded as important then preparation for it can dominate all teaching and leaming activities negatively or positively In the case the test content and testing techniques are at variance with the objectives of the course, then there is likely to
be hamaful washback If the skill of writing, for example, is tested only by multiple choice ilems, then there is pressive lo practise such items rather than practise the skill of writing itself This harmful washback is clearly undesirable, An example that often comes up is the effect of the university entrance examinations in Vietnam on high school language isaching and learning Mowever, washback need nol always be harmful: indeed it can be positively beneficial, If an English test for first yoar undergraduate students is designed or the basis of an analysis of the English language needs of these students and which includes tasks as similarly as possible to those which they would have to perform as undergraduates (reading textbooks, taking notes during leelures, cle) and administer instead of anc which was entirely multiple choice, then beneficial washback can be achieved, There will be an immediate effect on teaching and learning the syllabus will be redesigned, new books will
be sclected, classcs will bc conducted differently and students’ way of leaming will change toreflect the demand of the new test
Ina nutshell, the author has just give a common overview about achievement test and charactcristies of a good achievernent EGP test so thet readers can understand haw lo
evaluate a good final achievement EGP test.
Trang 212.3.3, Characteristics of a goad ESP test
Nowadays, the FSP teaching and research has achicved tremendous improvement home and abroad In the aspect of teaching, it has formed the system of Vocational English (VE: Business English, ‘Tourism English, Hotel English, Medical English ) and English for Academic Purposes
“ESP is not a matter of teaching specialized varieties of English The fact that language is used for a specific purpose does not imply that it is a special form of language,
different in kind ftom other forms Though the content of teaming may vary, there is wo reason to suppose that processes of learning should be any different any different for the ESP learner than for the general English learner” (Hutchinson, 1987)
From the above view, w2 acquire bwo paints that ESP is one kind of English, with its specific language characteristics, which 1s not applied to teach some particular items, and the similarity between ESP and EGP is more distinguishable than their difference, the other is there is no diffrence in essence in the teaching principles and procedure between ESP and EGP In other words, EGP is the premicr stage for ESP, and ESP is the advanced stage of EGP teaching The testing and evaluation for ESP should be carried out in accordance with the teaching contents and ebjectives Therefore, only with the efficient principles, available teaching methods and modes, it makes ESP useful for stimulating the students’ motive of language leaming, arousing their enthusiasm of learning, and contribuling lo the construction of harmony bslween leachers and students Clearly, ESP
tests ars the same as all good EGP esis Tt means that every ESP tests consists of 4
mentioned components: reliability, validity, practicality and washback
1lowever, two aspects of SP testing that may be said to distinguish it from more general purpose Ianguags testing: authenticity of task and tha inleraction between language knowledge and specific purpose content knowledge
Authenticity of task means that the ESP test tasks should share critical features of tasks in the target language usc situation of interest to the test takers The key to this assessment is to present leamers with tasks that resemble in some ways that they may have
to do with the language in real life ‘Therefore, the ESP approach in testing is based on the
analysis of learners” largol Iangnage uss situations and specialized knowted
English tor real communication,
Trang 22The interaction between language knowledge and specific purpose content
purpose language ability
‘To sum up, EGP is pre-stage for ESP ESP will be taught when the students have
had gonoral English grammar and knowledge ESP tests arc similar to EGP tests but focus
exam has used passages, comprehension questions anul grammar exercises taken directly from English 9 Students have prepared for the exam by memorizing the book This year, the Foreign Language Specialist writes the exam using parallel texts and exercises, not taken directly Grom the book without warming anyone This test lacks face validity Face validity is hardly a
mnlifie concepl, yel il is very important A test which does not have
face validity may not be accepted by candidates, teachers, education authouities or
employers In favor of this view, Mc Namara (2000: 133) defines face validity as a degree
Trang 2316
comment on the appearance of the language test, although there may be little attention paid
to the content of test items Analyzing face validity of an English test is thus an attempt for
gathering people’s opinion on whether the test looks valid as an English test or not
2.4.2 Relationship between reliability and validity
We often think of reliability and validity as separate ideas but, in fact, theyre related to each other Reliability and validity are the two vital characteristics that constitute
a good test However, validity and reliability have a complicated relationship
If the test is not reliable, it cannot be valid at all, To be valid, according to Hughes
(1988:42), “a test must provide consistently accurate measurements It must therefore be
must also be reliable Therefore, reliability is a necessary but not sufficient condition for
validity To understand more, the author wants to show their relationship through the
following figures Think of the center of the target as the concept that you are trying to
measure Imagine that for each person you are measuring, you are taking a shot at the
target If you measure the concept perfectly for a person, you are hitting the center of the
target If you don't, you are missing the center The more you are off for that person, the further you are from the center
Figure 3: Relationship between reliability and vali The figure above shows three possible situations In the first one, you are hitting the
target consistently, but you are missing the center of the target That is, you are
consistently and systematically measuring the wrong value for all respondents This
Trang 24measure is reliable, but no valid (that is, it's consistent but wrong) ‘The second shows hits that are randomly spread across the largel You seldom hil the conter of the targel In this case, your measure is neither reliable nor valid Finally, you conustently hit the center of the target Your measure is both reliable and valid In brief, reliability is a necessary but nol sufficient condilion for validity
2.43, Reasons for choosing face validity
‘As the relationship between reliability and validity shown above, validity is an indispensable quatily of alt gond losis Hughes (1982: 22) says thal, “lhe greater a tesl’s content validity 1s, the more likely it is to be an accurate measure of what it is to measure”
‘Therefore, from the outset of test constmotion, test validity should be the most essential par of all
Validity of a language test has four facets, namely face validity, content validity, construct validity and criterion - referenced validity However, the author focuses on face validity because of some reasons,
Fusstly, the later three fhocts of validity, content validity, construct validity and ctiterion — referenced validity are excluded from this research because of the limitation of lime and source Arastasi (1982: 136) as cited by Weir (1990: 26) stated that “face validity
is not validity in the technical sense”, Face validation is significant in that it involves in whether or not the test “looks valid” to those who deal with the test, so the researcher performs the analysis of face validation, Iealon (1988:60) contributed hat “face validity can provide not only # quick and reuseable guide bul alse a balance ta too great of concem with statistical analysis.”He points that the students’ motivation is maintained if a test has good face validity plays a certain role in any test and it is of great concem in this thesis According to Anastasi & Urbina (1997: 114), content validily is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured”, Content validity evidence involves the degree to which the content of the test matches @ content domain associated with the construct, Obviously, content validity has to need a representative sample test to analyze and compare According to Bachman and Cohen (1998: 50), construct validation
aks with the “judgmental ane empirical justifications supporting the inferences made from test scores” Bachman and Palmer (1996: 21) also mention that construct validation is elated to the “meaningfilness and appropriateness” of
Trang 25the researcher’s interpretations relevant to the actual test scores Bachman (1990: 248)
mentions thal erilerion — referenced validity deals with demonstrating “a relationship between test scores and some criterion which is believed as an indicator of the ability tested” However, this research is not provided with the actual test scores or a sample reliable (est, and thus excluded the three validation processes (rom ils investigation
Secondly, face vali y is chosen because of its importance in society As Hughes (1989) says that face validity is hardly a scientific concept, yet it is very important and a lest which docs uot have fice vatidity may ol be accopied by candidates, Isachers, education authorities or employers Agreemg with this view, Huong (2000; 69) points out that test appearance is an important consideration in test use She supposed that useful information can be obtained to inform test development by investigating the test takars? perception of the appropriate and connection between test takers and relevant real life tasks that test takers later encounter Clearly, face validity of the test 1s very important in society's evaluation because the later three facets of validity belong to the specialization of
test designers
This study only helps the author to give a test suitable with students’ ability at NUTT Therefore, the reasons discussed here ave regarded as a strong impetus thal initiates this thesis into investigating the face validity of the achievement language test 12
be redesigned (Alderson, Clapham & Wall, 1995) Therefore, it’s necessary to give measures to inercase face validity about administration and content of the test as follows
- ‘Test format is familiar and clear to the students;
- ‘The quantity of questions designed in a test is suitable with the time allowance;
+ Tost conditions (spees and atmosphere in the testing room in partioular) thal arc biased for best that bring out students’ best performance;
Trang 26- ‘Test has to be assured equally by collecting materials hefore testing and stricter in
- Test content retlects syllabus objectives and cover totally what the students have been taught in the course;
+ Task types are familiar with students,
- A difficulty level of task is appropriate to students’ language ability,
- Instructions are erystal clear
faclors lo evaluals the final achicvernent test 12 in Lorms of face
Trang 2720
CHIAPTER 3: TIIE STUDY
3.1 English learning and teaching at Nam Dinh University of Technology Education
3.12, Students’ hackgrounds
Studenis who have boon learning af Nam Dirh University of Technology Education (NUTE) are of different levels of English because of their own background Generally, students who are ftom small towns and mountains have less chance of learning Linglish than those from the big cities whore foreign language lesrring is paid much aliention Lo Moreover, there are some students who have never leamed English before because in the high school they learned other foreign languages such as Russian, l'rench or Chinese; some have just started learning English for few years Tssides they go to the university easity because they don’t have to take any entrance cxams, Instead, they only apply their dossicrs
to be considered and evaluated, Therefore, their attitude towards learning English in particular and olter subjects in general is nol appreciated
3.1.2 The English teaching staff
Nam Dinh Lniversity of ‘Technology Education is one of the main universities that traim ongincors and tcachers TL has boon a university for 5 ysars Previously, English subject belonged to General Department, Since 2009, English subject has become faculty
of foreign languages Due to new foundation, foreign department only focuses on English, not other foreign languages ‘Ihe department includes 11 teachers, three of them are ald (over 43 years old) and the rest are young (fom 23 — 30 years old) They have lo teach both English for General Purposes (EGP) and English for Specitic Purposes (ESP) majoring economies, compnting, electronics, welding and automobiles All the English teachers here have been well trained in Vietnam and none of them has studied abroad
‘Three of them obtained Master Degree in Managing Education, not Master Degree of English, six of them have been doing an M.A English course They prefer using
‘Vietnamese im class as they find it easy to explain lessons in Vietnamese due to the limitation of students’ English ability, Furthermore, they are always fully aware of adapting suitable methods of teaching classes, ‘his results in students’ high involvement in
the Tesson.
Trang 2821
3.43 Objectives of the English course
The universily has many technological fields bul clectronies is the very big field of the university So, seven of eleven teachers taught English m electronic field, The English for Specific Purposes (ESP) syllabus for Electronic — Hlectrical Engineering Department sludenis is designed by teachers of the foreign language department, which has heen applied for 4 years Before starting an ESP tem, students have to learn two EGP terms with 60 periods each, is covered by Headway (at Blementary and Pre- intermediate level)
in which the students only pay attertion lo reading skill, vocabulary and grammar, and some other skills such as listening skill, writing skill and speaking skill are ignored, After
finishing two EGP terms, the students work with Electrome — Electrical English textbook
(textbook edited by teachers of Foreign Laryuage Department al NUTE) with 30 periods consisting of 8 units practising reading skills such as skimming, scanning, detailed, translation ability (consisting of translating into Vietnamese or English) and providing ESP vocatnilary shown as table 1
Table 1: ESP syllabus content allocation
2 |CiraitPlsmens 03 Reading skill, vocatulary and translation
3 [Ths DC Molor 03 Reading skill, vocabulary and translation
4 |The Cathods Ray Tube 03 Roading skill, vocabulary and translation
6 _ [Electronics in the home 03 Reading skill, vocabulary and translation
7 | Semiconductor Diodes 03 Reading skill, vocabulary and translation
9 [Audio Recording System 03 Reading skill, vocabulary and translation
10 [Review 03 Reading skill, vocabulary and translation
‘This course book focuses on reading skills, vocabulary and translation ability, a little to language focus Therefore, the final test only intends to measure the reading skills,
Trang 29grammar abilities, and vocabulary but not listening and speaking skills Besides, the
— in 1* and 2“ term, teachers revise students’ grammar knowledge and help them to
praofice skills with §
— In 3 term, teachers consolidate students’ reading skills, provide ESP vocabulary and instruct them how to do translation tasks which are very usefil in their future job
special focus on reading skill in order to s:
However, the commnon goal covering the above objectives is to equip the students with the
general English grammar, vocabulary, some skills about readmg comprehension and translation ability and general background of Electronic Electrical English necessary for
their future work
3.L4, Checklist of the course book
The checklist of language skills and sub-skills taught in the course will help the aulhor easily compare the content areas wilh the current firal test 12
- Grammar,
— simple present tense;
present perfect lense,
— simple past ter
linking words: and, or, but, however, therefore, because, although, etc:
— pronoun refercnecs and possessive adjectives,
- ‘Translation: mainly sentences related to reading text in each unit;
- Reading: topics of the reading texts are mentioned in table 1 with focusing on main roading skills as follows:
~ skimming skills;
~ skinning skills;
Trang 303.5 Objectives af the final test
The teachers give final achievement tests at the end of each semester to achieve the
following purposes:
- To assess students’ ability in reading comprehension which limits at understanding main
ideas, extracting information, guessing words in context and making a little inference and
— recognition and use of tenses;
recognition and use of voices;
— recognition and use of modal verbs:
— recognition and use of relative clauses
- ‘To assess student’s ability to translate the technical materials into Lnglish in such a way
as follows
— express grammatically correct ideas within basic structures presented in the course book, express students” technical vocabulary field;
~ link ideas using linking words “and, but, or, because, so, although, ctc”,
— use reference pronouns and possessive adjectives
- ‘l'o assess student's ability to translate the technical materials into Vietnamese im such a
way as follows:
— express students’ technical vocabulary field;
— express students” background and specialized knowledge;
Trang 31- ‘To check how much the students have required the target language skills and knowledge,
and how far the objectives of the course have boon achieved in the scl timeframe,
- To help students to see what they have achieved during their leaming process,
- To help teachers indentify teaching method, syllabus and material in order to adjust and
adapt lo the students’ needs and capacities
3.1.6, Difficulty level and discrimination of the final test
According to Bloom (1956), the cognitive domain involves knowledge and the
development of intcllcetual skills This includes the
all or recognition of specific facts,
procedural pattems, and concepts that serve in the development of intellectual abilities and
skills ‘here are six major categories, which are listed in order below, starting from the situplest behavior Lo the most complex The calegonies can be thought of as degrees af
difficulties The first ones must normally be mastered before the next ones can take place
- Knowledge: Recall data or information
- Comprehension: Understand the meaning, translation, interpolation, and interpretation of
instructions and problems State a problem in one's own wards
- Application: Lise a concept in a new situation or unprompted use of an abstraction
Applies what was leamsdin (he classroom into novel situations in the work place
- Analysis: Separate material or concepts into component parts so that its organizational
structure may be understood Distinguish between facts and inferences
- Synthesis: Build a structure or pattern (rom diverse elements Pul parts logether to form a
whole, with cmphasis on ercaling anew meaning or structure
- Evaluation: Make judgments about the value of ideas or materials
Based on the objective of the course book and the special objective of the test, the lest aims al measuring the students’ knowledge and comprehension ability Clearly, the
difficulty level and discrimination of the test in the course only limit at the students’
knowledge and comprehension level The current test does not check students’ ability of
using English (application Icvel) but check whether students carefully study the textbook
or not,
3.2 English testing at Nam Dinh University of I echnology Education
3.2) Texting situation
ESP tests for students at NUTE are designed by the teachers of Foreign Language
Department Hach teacher is responsible for making two tests for each class which they are
Trang 32teaching at the end of the semester It means that in the ¥* semester there are fourteen LSP
tests All the fourte:
are designed under the lighl of syllabus — conlent approach and followed a common format, Then, the fourteen tests wall be collected and checked by the head of ESP subject Final, the fourteen tests will be sent to educational testing and quality assurance department This department is in charge of preparing the test by choosing randomly one of the fourteen tests and prints out the test Within limited scope of the
study, the author would like to focus on test 12 which was sent to students K3 at Electronic
—Flecirical Engineering Department in the final exam of somester 3
3.2.2 The current final achievement text
ESP test 12 is a syllabus — based final achievement test in semester 3 ‘The test
consisis of four parts In the first part, the test requires the students to read an ESP text and
then decide tine or false information through the text, Part 2 is grammar part aims at testing their knowledge about grammar This part also helps students to improve ther mark Finally, the two last parts of translation are aimed at testing the students” general understanding about their vocabulary and their use of language (sec table 2)
Table 2: Specification of test 12
1 [Reading comprehension [Narrative text relating | x5, deciding tusor | 2.5
to clectric (related to false
topic in unit 1)
1 |Giammar Sontenecs (passive x5, puting theright | 2.5
translation (rctaicd to tapi
2, 4, 6, 7,8)
translation Vietnamese (related to
topics in unit 2, 3, 4, 9)
From the above table, we can see thal the eurrent test 12 reveals the tes! content
totally aligns with what students have been taught, including a reading, comprehension text with a topic in unit 1, a grammar task with using passive voices and a translation task with
Trang 3326
the content of the sentences from unit2 to unit 9 in the syllabus ‘Ihe task types are also
3.3.2, Interview and informal discussion
The author collects more information by giving the interviews with teachers of forcign language department and students of class DDT-3A, DDT-3B, The used questions are primarily based on the above-mentioned questionnaires but the author focuses on the reasons for their choice The resulis of the interviews are noled down or recorded to compare with those of the questionnaire so that any variance can be revealed and adjusted with other methods
For more information, the aulhor also gives some discussions with Leachers of
forcign languags department aboul the failure and success of language testing in goncral and test 12 in particular, The results are used as supportive data for the above-mentioned methods
3.4 Duta analysis of survey questionnaires and interviews
More than two hundred questionnaires (see the questionnaire in appendix 1) were administered to the second-year students of K3, but only 142 samples were collected and 7 questionnaires (sce the questionnaire in appendix 2) were applied to teachers at forcign language department at NUTE ‘The author intends to collect data to classify the similarities
and differences in perceptions of the test between teachers and students at NUE ‘he data
was collected from the sludenis and teachers’ survey questionnaires with 31-34 questions (including 31 questions for students and 34 questions for teachers) in total which are
divided into two main parts Part A consists of 10 questions to ask students and teachers”
Trang 34comments on the administration of the test Part 3 has four small main parts with 21-24
questions to give students and Icachers" comments on the face validity of the test such as
general opinion (6 questions for students and 7 questions for the teachers), reading
comprehension task (6 questions for students and 8 questions for the teachers), grammar task (4 questions for sludents and teachers), and translation lask (S questions for sindents
and teachers)
3.4.1 Data onalysis of the administration of the test
34.14 Data analysis of the format of the test
The resuilts of the survey questionnaies on the format of the test are analyzed in a
table 3 (see appendix 4) From the table 3, we can find that there is no big difference in
teachers and students’ perception of the format of the test In the first question, 100% of
the teachers and 69.7% of the students agree that font Times New Roman and size 12 is suitable while 22.5% ofthe students think that it is rather suitable and 7.8% of the students
admit that it is not suitable for the test When being asked to give the reason why it is not
suitable, most students want to have the test with largcr font if possible (font 14 as
suggested) However, they still can read the test with the font 12 It means that the font is
Ok in this lest
In the second question, 100% of the teachers and 69% of the students suppose that
copies of the test are clear While only 26.7% of the students think that they are rather clear
and 4.3% of the sndents think that they are not clear, Few stndsnts suppose that they
of questions When interviewing the studemts the reason why they cannot complete the test
Trang 35with the same quantity of questions as mid-term tests, they answer that in class they can
complete the lest because ther
exchange the test easily When sitting a testing room, they have te do the test by themselves, so they find it difficult to finish the test with too many questions It means that
re many students silting in a room Therefore, they can
the difference in the opinion results in the unfairmess of rid-term test and the students’
laziness
Regarding the suggestions for the improvement of test format, 86 students (60.6%) and 2 toavhers (28.6%) give the suggestions They suppose thal sontenoss 3 and 4 in part
IV of the test should be shortened and simplified because exercises on translating into
English are difficult exercises although they are available in the course book While 5 leachers (71.4%) and 56 sludents (39.4%) do iol give any suggestions for the format of the
test because they suppose that the test format is Ok
This can be deducted that in the two first questions, both students and teachers
agree that fant size and copies of the test are good and do nat affect the students’ test resulls Ti means that te current iest assures dhe fairness in the exam However in the third questions students and teachers differ in their perception of the quantity of questions
in the test is due to administering the mid-term: test unequally As a result, they give
suggestions in the fourth question
44.12 Data analysis of the logistics of the test
The resulis of the survey questionnaires on the fngistics of the lest are analyzed in
lable 4 (sae appendix 4), In students’ opinion, time allowance for this test is nol cnaugh
with 82 choices (57.7%), adequate with 50 choices (35.2%) and too much with 10 choices 7.1%) When being interviewed to give the reasons, the students say that they cannot complete the test m such a short time beeause of their Tinit language abilities, Whorcas 28.6% of the teachers suppose that time allowance for this test is not enough and 71.4% think itis enough Most teachers believe that students can do this test completely in a short time because they arc informed about the time of test and practice mid- term tests in that same time with the same format The difference in the opinion also results in not having mid-term test fairly and the students” laziness
To the question of whther the rooms for losling ars big enough, 83.1% of the students and 100% of the teachers choose option “Yes” Only 16.9% of the students who choose “No” believe that the testing rooms are not big enough ‘The reason for this choice