TRAN THUY QUYNH THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR NON MAJOR STUDENTS AT PHUONG DONG UNIVERSITY Đánh giá sự phù hợp về nội dung của bài kiểm tra
Trang 1TRAN THUY QUYNH
THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR
NON MAJOR STUDENTS AT PHUONG DONG
UNIVERSITY
(Đánh giá sự phù hợp về nội dung của bài kiểm tra tiếng Anh
cuối kỳ dành cho sinh viên không chuyên năm thứ hai
"Trường Đại học dân lập Phương Đông)
M.A MINOR THESIS
Field: ENGLISH TEACHING METHODOLOGY Code: 60-14-10
Course: 18 (2009-2011)
Supervisor: M.A Kim Van Tat
HA NOI- SEPTEMBER 2011
Trang 2
List of tables and figures
The table of contents
2.1.2 The roles of language testing
2.1.3, Relationship between testing and leaching- learning
2.2 Major Characteristics of a good test
2.2.1 Test validity
2.2.1.1 Whal is tes! validity?
2.2.1.2 Types of test validity
Trang 32.2.3, Relationship between ratiability and validity 16
Chapter 3: The study
3.1 English learning, teaching and testing at Phuong Dong University
BLL The students 000i eeee ersten cteeererrc tered
Appendix 1: The content of the course book 1
Appendix 2: Survey questionnaires for students Ww Appendix 3: Survey questionnaires for teachers V7 Appendix 4: Answer key for reading task VI
Appendix 5; Answer key for the new final achicvement test VIZ
Trang 4aw
§
3
LIST OF TABLES AND CHARTS
Table 1: Scores on test A (invented data) by Arthur Hughes
Table 1: Scores on test B (invented data) by Arthur Hugles
Table 3: The components of the final achievement test
Table 4: Whal students had been laugh! and what they had been checked in part [,
I, I of the test ‘Table 5: What students had been taught and checked in the writing part
Table 6: Paper specification grids for the Gnat achievement! test
Chart 1: Students’ comment on validity of the test
Chart 2: Students' comment on time allowance of the test
Chart 3: Studcnts' comment on difficult level of the test
10 Chart 4: The result of the test
LL Chart 5: The purpose of the fast
Trang 5ways of leaching and lcarning I addilion, tesling is onc effective way lo cvaluatc icaching
and learmng They are so closely related Testing validates the teaching-learning process while teaching and teaming provides a greal source of language materials Tor testing to
exploit, And testing is a concerned matter to all teachers
During the teaching time at Phuong Dong University, the writer heard both teachers
and students here complaining that the Linglish test did not offen faithfully reflect the
teaching and learning process or in other words, the test did nol reflect what lhe students
learnt and what the teachers taught What was tested was not really taught and the test
measures neither the achievernent of the course objectives nor the expected skills and knowledge of students It is shared by some test researchers as Brown (1994: 373) and Hughes (1989:1) on recent language testing’
“4 great deal of language testing is of very poor quality Too often language
testing has a harmful effect on teaching und learning and too after: they fail to
measure accurately whatever it is they are intended to measure.”
Another reason lor Ihe selection of this research lopic lays in the facl that language
testing at Phuong Dong University has not been paid enough attention to Classroom
language tests were often written in a hurry because the teachers here could not find time to
think carefully and plan the last Sometimes, they did not have a elzar idsa of what they
design a good test lo have exacl, fair and effective evaluation of studsnls? knowledge and
Trang 6techniques and testing theorics
Because of all above-mentioned reasons, the writer is encouraged to undertake this
sludy entitled: “Content validily of the curtenl English achievernent tesl for second-year-
non-major students of English at Phuong Dong University” with aims at finding out the strengths and weaknesses of this test in terms of the content validity and some, if any,
suggested solutions for the improvement of it
1.2 Scope of study
The scope of this thesis is limited to a research on evaluating the final achievement
test in terms of its content validity by comparing the objectives, the syllabus and the textbook allocation with the test contents The study provides investigated and analyzed data of the curently used test and proposes practical suggestions on the improvements of this test
Due to the limitations of time, ability and conditions, it is impossible for the writer
to cover all the tests Only some suggestions for the improvements of the test are presented
1.3 Aims of study
The study aims at checking the content validity of the final achievement test for second-year-non-major students at Phuong Dong University {t places high emphasis on
analyzing the contents of the final achicveiment test
‘The specific aims of this research are:
- ‘To find out the strengths and weaknesses of the currently used test with reference to
the content validity
- To suggest some improvements for the test
1.4 Methods of study
In order to achieve the above-mentioned aims, a combination of many
methodologies was utilized
Firstly, the writer based herself both on the theories and principles of language testing and major characteristics of a good test with a special focus on test content validity From her own reading, many reference materials have been gathered and analyzed to draw
oul a thearelical basis lo evaluate the achievernent test being used for second year students
Trang 7in terms of its content validity Basing on what students had leamt in heir ñrst semester and the contcnts of this test, the writer would cxaminc its content validity
In addition, qualitative methodologies involving data collected through survey questionmaires were employed, Two seis of questionmaires were adminisiered lo both English teachers and students at Phuong Dong University to investigate their evaluative
comments on the content validity of the final achievement test and some suggestions for its
improvements
1.5, Research questions
In this study, the writer tries to answer the two following questions:
Question 1: What are the strengths and weaknesses of the final achievement test with reference to the content validity for sccond year non major students at Phuong Dong
University?
Question 2: What are some suggested solutions for the improvements of the test?
1.6, Design of study
‘The thesis is organized into five major chapters:
1 Chapter 1 INTRODUCTION presents such basic information as: the rationale, the aims, the methods, the research questions and the design of the study
2 Chapter 2 LITERATURE REVIEW presents a review of related literature that provides the thoorstical basis for cvaluating and buildin a good language test This review includes background on language testing, criteria of good tests and theoretical
issues on test content validity
3 Chapter 3 TIIE STUDY mentions the methods used in the research and which shows its detailed results of the surveys including the questionnaires and the analysis of the final achievement test in order to find out its problems with reference to the content validity
4 Chapter 4 RECOMMENDATIONS AND CONCLUSIONS Recommendations provide some suggestions for the improvements of the final achievement test basing ơn the mentioned theoretical and practical study Conclusions summarize the matters of research, its findings a5 well as its limitations
Trang 8This chapter provides an overview of the theoretical background of the study It includes three main sections
2.1, Language testing
2.1.1 Definition of language testing
Testing is an imporlant part of every leaching and learning experisnice and becomes
one of the main aspeols of methodology The issuz of language lesting and its signi ficanl role has been discussed a great deal by many professionals and research worldwide, Different definitions of language testing are given out with various points of view
According to Allen (1974:313), testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which condition a last can Lake place Tz says: “Test is a measuring device which we use when we want fo compare un individual with oiher individuals who belongs ta the same group.”
Carroll (1986:46) stresses a psychological or educational test is a procedure
designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual In other words, a test is a measerement instrument designed
to elicit a particular behavior of each individual
According to Bachman (1990:20), what distinguishes ø test (rom other types of measurement is that it is designed to obtain specific sample of behavior This distinction is believed to be of great importance as if reflects the primary justification for the use of Tanguage and bas implications for how we design, develop, and use thew to their bes! usc Thus, language tests can provide the means for more focus on the specific assure of
interest
Tn the poinl of view of the (1981:1), “a sample of behavior under the contral of specified condilions aims toward providing a basis for performing judgment.” The tem a sample of behavior used here is quite board and it means something else rather than the traditional lypes of paper and pencils,
Yet, Heaton (1988:5) has different opinion In his ideal, tests are considered as a mean of assessing the students” performance and to motivate the students, He looks at tests
with positive cycs as many students are cager to take tests at the end of the scmester to
Trang 9*mow how much knowledge they have One important thing is that he points out the relationship between testing and tcaching
+ ‘To assess a learner’s proficiency in language in relation to future language use; for example, to find oul if'a porson’s language 1s good cnough for him lo become a tourist guide, This is the future use of the language regardless of what language programs or
materials the testee went through
| To diagnose a learner's strengths and weaknesses in a language and (o atlernpt Ja
explain why the certain problems occur and what treatments could be used to tackle thase
problems
+To classify or place the testous in the appropriate language
+To measure the testee’s aptitude for learning a language
| To evaluate the effectiveness of a language program ‘This is often done by using experimental and control classes with the same educational objectives but using different methods and materials to achieve these objectives, Brown (2000:5)
In another way, Bebecca.M Valette (1977:3) comments that classroom tests play
three important rolss im sccond language teaching program They arc defining course objectives, stimulating student progress and evaluating, class achievement
Iirstly, classroom tests help us to define the course objectives Students are quick to
observe types of tests given and to study accordingly Thus, much as the teacher may emphasize oral fluency in the classroom, if any tests are written tests the students will soon concenlrale on perfecting the skills of reading and writing
Secondly, tests help stimulating student progress, As much as possible, the time given over to classroom testing should provide a rewarding experience The test should fwnish an opportunity for the students to show how well they can handle the specific
elements of the target language, gone arc the days when the teacher designed a tes! to point
Trang 10advance to permit the students to prepare adequately If the students themselves are expected to demonstrate their abilities, it is only proper that they should learn as soon as
Telalion Lo Ihe rest of the class
2.1.3 Relationship between Lesting and teaching- learning
In the past, teaching and testing uscd to separate both theoretically and practically,
According to Williams (1983), a test is necessary imposition but ontside the classroom, 1t is
unpleasant one because of two main reasons ‘The first one is that testing is concerned with competition rather [han coupsration Thus, while claasroor activities may involve pair works and group works, such cooperation during a test is coudemned as copying, and the individual is expected to work alone If these are perfectly possible, the results of a group test may toll us very little about cach individual in that group In the same way, testing docs not admit cooperation between teachers and learners, The teacher who helps and
encourages the tearners with their Lasks and responds to their difficullies, in a Lest siluation, withdraws cooperation The other reason followed fiom the first is that there should be a winner and Joser in the test To be sure, those who close to win themselves do not feel too
upset, but those who gain little from experience, may feel conscious
Trang 11Nowadays, a new irend and development with a remarkable emphasis on integrative
and communicative tests has brought about many imovations in English testing techniques
Most of the researchers comment that teaching and testing are so closely related As Brown
(1994) slales: “Teaching and testing are so interwoven and interdependent that it tx difficult
lo tear them apar?” Tests are coustructed primarily as the devices to reinforce learning and
to motivate the students and as a means of assessing the student’s performance in the
language In the other words, a test is an extension of classroom work providing teachers
and students with the useful information that can improve both teaching and learning
process In turn, teaching and learning provide a great source of language materials for
tasling lo exploit
A good test is a valuable teaching device for some reasons Firstly, a test provides
the teachers information on how effective teaching has been It helps the teaching process
to find out if students are capable of performing behavior And from that, we ean know the
characteristics of an individual Sccondly, with the aids of tests, teachers can monitor and
evaluate student’s learning and diagnose the strengths and weaknesses as they ocour Last
‘but not feast, basing on the test resulls, the leachers can evaluate the ellecliveness of the syllabus as well as the method and materials they ae using
However, testing has a harmful and beneficial effect an teaching and learning Kor example, if test is regarded as important, then preparation for il can come fo dorminale alt
teaching and learning activities If the end goal is to help students to pass the test or
examination, many teachers will focus their teaching on the content of the test only So the
leaching programm may be distorted in many ways
2.3 Major characteristics of a good test
Before writing the fest, if is very necessary to answer this question: “What are the major characteristics of a good test?” Harrison (1983: 10) claims that there are four basic
characteris
sof all good tests, They are validily, reliability, praclicatity and
discrimination.
Trang 12Validity is one of the most important characteristics of a good test It has been a
controversial issue for a long time A recent trend in language testing discussion is to
consider validity as a unitary concept with different types of validity and it is now
considered as aspect of validity
Tlenning (1987:5) defines validity as follows:
“In general validity refers to the appropriateness of a given lest or any of its
component parts as a measure of what it is meported to measure s\ test is said to
he valid to the extent it measures what if is supposed to measure Ht follows that the
term valid when used to describe a ivst should usually be accompanied by the
Proposition for any test, then may be valid for some purposes, but not for others."
A test is considered valid when if specifically measures whal is supposed to
measure A listening test with written multiple choice options may lack validity if the
printed choices are so difficult to read that the exam actually measures reading
commpreh
sion as inuch as il doos listening comprehension Tt is least valid for siudents who are much better at listening than at reading In other words, the test results are interpreted as appropriate to the purposes of tesung ‘That is, validity can be defined as the degree to which a test actually tests what it is intended to test For cxample, if the purpose
of a test is to test ability to communicate in English And this test is valid if it does actually test ability to communicate When considering test validity is the degree to which a test
Tmeasuros what i is supposed to measure, il hus bwo very important aspects The first one is
a matter of degree There is a degree of validity, and some tests are more valid than the others A second important aspect of this definition is that tests are only valid or invalid in terms of their intended uses If a test is intended to test reading ability, but it also tests writing, then it may not be valid for testing reading buf it may test reading and writing
together
Validity refers to the appropriateness or correctness of the inferences and discussions made about individuals and groups fiom the test results Valid must be considered in terms of the comectness of a particular inference about test takers, ‘Therefore,
validily is nol always casy 10 measure.
Trang 132.3.1.2 Types of test validity
There are many types of validily such as’ (icc validity, content validity, construct validity, concurrent validity and predictive validity, In this part, the writer will focus on only two main types: face validity and content validity,
2.3.1.2.1 Face validity
When mentioning face validity, we should concern with this questions: “Does the test
on the face of it appear fromm the learners? perspective Io tesl whal il is designed Lo test?”
Face validity is almost always peresived in terms of content, It the test samples the actual content of what the leamer has achieved or expects to achieve, then face validity will be porevived According lo Arthur Hughos (1989:40), a taal is suid lo have favo validily if looks as if it measures what it is supposed to measure For example, a test which pretended
to measure pronunciation ability but which did not require the candidate to speak may be thought to lack face validity Candidates, teachers and education authorities may nol accept
a test, which does not have face validity, Face validity concerns the appeal of the test to the popular or non-expert judgment such as the candidates, the candidates’ families, members
of the public and it is calculated by asking ofher teachers lo give Iheir opinions about the
test
Llowever, with the advent of communicative language testing, there has been increased cmphasis on face validity It is important for communicative tanguage test to look like something one might do “in the real world” with language They attribute such appeals
to “real life” to face validity While opinions of students about the test are not expert, it can
bo important boca it is one kind of response thal you can gel from the poopls who arc taking the test Ii'a test does not appear to be valid to the tast takers, they may not try their best, so the perceptions of non-experts are nsefill
In other words, the face validity affects the response validity of the test This critical view of face validity provides a usefill method for language test validation
Face validily can provide not only a quick and reasonable guide bat also a balance to
a great concern with statistical analysis Moreover, students’ motivation is maintained if a
test has good face validity On the othar hand, the test appears to have little of relevance in the eyes of the students, it will clearly lack fhce validity It is possible for a test to inchide all the components of a particular (caching program being followed and yet at the same
Trang 14time lack face validity The concept of fice validity is far fom now in language testing but the emphasis now placed on if is relatively new In the past, many test writers regarded face validity simply as a public relation exercise Today, most designers of communicative tests
In Read’s opinions (1983-6), the most relevant type of validity for classroom testing
18 content validity, which moms that the contents of tha tost should reflect the conteriis andl the objectives of the syllabus that is being followed In the other words, if we want to find out students’ progress of what they have learnt, the test should contain a representative
sample of the items, tules, skills or ñmelions that they ave supposed lo achieve Obviously,
the test contents are the main concern if content validity is to be achieved
Kerlinger (1973) defines content validity is the representative or sampling adequacy
of the contenl, the substance, the matler ard the topies of a nicasuring instrument
Inthe same way, Harrison (1983: 11) defines content validity as:
"Content validity is concerned with what goes info the test
The content of a test should be decided by considering the purpose of the
assessment, and then drawing up a list known as a content specification."
According to Cyril J Weir (1990), the purpose of content validity is to examine wheller the test is a good rgpresentalion of the material that needs to be Losted and to ensure the defensibility and fairness of interpretation based on the test performances, It
involves looking at empirical evidence- the hard factors emerging from data from test trials
or operational administrations and is calculated by comparing the test with its course
objectives Last but not least, a test is said to be valid if it is relevant to the aims and
‘purposes of the learning areas on which il is sel
The most distinction between face validity and content validity was pointed out by Aldeison et al (1995: 173) as follows:
"In face validation we do not necessary accept the judgment of others, although we respect it, and appreciate that for those people it ix real and important
Trang 151L
and may, therefore, influence behaviors In content validation, we gather judgments
from people we are prepared to believe."
In this case, if face validity is an appeal to the lay observers who are students, administrators for example, the coment validily is the opinion of the subject experts (ie.,
teachers, test makers ) as to whether a test is valid
For Kelly (1978), content validity seems as “an almost completely overlapping concept” with construct validity And for Moller (1982: 68), “Ihe distinction between
construct and content validity in language testing is not always very marked, particularly
for tests of general language proficiency." in these cases, particular attention must be paid
to content validity in an atettpt 1o ensure thal the sample of activities included in a test is
as representative of the target domain as possible
To sum up, the writer does in favor of Read's ideas, the most important
characlcrisli
3 of a youd Jest is content validity which means the conlonts of the test should
reficct the contents and the objectives of the syllabus that is being followed
2.3.1.2.2.2 How to make the test more valid?
Firstly, in content validation, we should look at whether the test is representative of the skills they are trying to test It means that we should look at the content of the tests and
compare them with a stat
nent of what the contents ought Ip be This invalves looking at the syllabus in the case of an achievement test, the test specifications and deciding what the teat was intended to test and whether it accomplishes what it is intended to do in the other words, the content validity depends on the particular course objectives In addition, the test would have content validity only if it included a proper sample of the relevant structures, Just what the relevant stracturss are will depend of course upon the purposes of the test In order to judge whether a test has content validity or not, we need a specification of the skills or structures that it is meant to cover Such a specification should be made at a very
early stage in test construction It is not to be expected that everything in the specification
will always appear in the test But it will provide the test construction with the basis for making a principled selection of elements included in the test A comparison of test specifications and test contents is the basis for the judgments of content validity
Trang 16Tlowever, how important is content validity? Arthur [Iughes (1989) gave two aumportant things of it First, the grcatcr a test's content validity, the morc likcly it is to be an
accurate measure of what it is supposed to measure A test in which major areas identified
in the specification are not represented al all is unlikely to be accurale Secondly, a test is
likely to have a harmful backwash effect Areas, which are not tested, are likely to become areas ignored in teaching and learning Too often the content of tests is determined by what
as easy to test rather than what is important to test ‘he best safeguard against this is ta
‘write fall test specifications and to cnsurc that the test content is a fair reflection of thesc In
‘the other words, when embarking on the construction of a test, the test writer should first
draw up a table of test specifications, describing in very clear and precise terms the
particular language skilis and areas to be included in the test If the test or sub-test being
constructed is a test of grammar, each of the grammatical areas should then be given a perectilage weighting For exarple, the fulure simple tense 10%, uncountable nouns 15%,
relative pronouns 10% 1f the test or sub-test concerns reading, the cach of the reading
sub-skills should be given a weighting in a similar way For instance, deducing word incanings fom contextual clues 20%, searcl-reading for specific information 30%, reading
between the lines and inferring 12%, intensive reading comprehension 40%
According to Heaton, J.8 (1982) the test writer has attempted ta quantify and
balance the test comporenls, assigning # cerlain value 1o indivale the importance of cach component in relation to other components in the test, In this way, the test should achieve content validity and reflect the component skills and areas that the test wrifer wishes to include in the test
Anastasi (1982:131) defines content validity as: “essentially the systematic
examination of the test content to determine whether it covers a representative sample of
the behavior domain to be measured.” She provided a sct of useful guidctmes for establishing, content validity:
1 ‘The behavior domain to be tested must be systematically analyzed to make certain that all major aspects are covered by the test items, and in the comect proportions
2 The domain under consideration should be fully described im advance, rather
Trang 1713
3 Content validity depends on the relevance of the individual's test responses to the behavior arca under consideration, rather than on the apparcut relevance of item
content
Brown (1994: 385) gives a list ofniecessary factors to improve the test validity:
+ A carefid-construct well thought out format,
+ Item that is clear and uncomplicated
| Direction that is crystal clear
+ Tasks that arc familiar and relate to their course work
+ A difficulty level thatis appropriate to your students
| Test conditions that are biased for bes! thal bring out students’ best performances,
In the same way, Moore (1992: 11) stressed: “Content validity is established by
determining whether the instrument's test items corespond to the content that the students
arc supposed to learn."
Correspondinely, to cvaluate the test content validity, the test items should be inspected regarding theix correspondences to the teachers’ stated objectives
In short, test content validity is the most imporlanil characteristic of a geod test The
basis to evaluate content validity is a comparison between the test specifications and the
test contents
2.3.2 Test reliability
Reliability is another necessary characteristic of any good test A reliable test can be used as a micasuring instrument Tf the tes! is administered to the sare atudonts on different occasions (with no language practice work taking place among these occasions) then produces different results it is not reliable So a test is said to be reliable if it can produce the same results when administering to the same students under different times
There are two types of reliabikty The first one refers to the ability of a test to
produce the consistent results from the same students whenever it is used namely test-retest
reliability and the othar type of reliability is the inter-item consistency which means that the test should be able to measure the same thing all the time
Bachman (1990), a leading expert, describes reliability as "a quality of test score"
We can look al the hypothictical data in lable 1, They present the scores obtained by 3
Trang 18students who tock a 100-item test A on a particular occasion and those that they world have obtaincd if they had taken it a day later The most obvious thing of these is simply to have people take the same test twice We should note the size of the differences between
the two scores for each sludent:
‘Table 1: Scores on test A (invented data) by Arthur LEughes (1989: 30)
Students Score obtained Score which would have been
obtained on the following day
‘Now have a look at table 2, which displays the same kind of mformalion for a second
100-item test B again, note the difference in score for each student:
‘Table 2: Scores on test B (invented data) by Arthur Hughes (1989: 30)
Students Score obtained Score which would have been
obtained on the following day
validily is concerned wilh the contents of the sample, rztiability is conecrned with Uh
‘The larger the sample is, the greater the probability the test is reliable If there are very few
Trang 19items in the test, the test may rely too heavily on luck-weak candiđates may score 50% or more ona short,
+ Second factor affects the test reliability is the administration of the test if individual (est items are tap hard for everyone or toa easy for everyone then they are nol reliable test items, They do not differentiate between the strong and weak candidates The important factor in deciding reliability is whether the same test is administered to different groups under different conditions or not
+ The third one is test instructions: Are the various tasks expected from the testers made clear to all candidates in the rubrics?
| Another factor thai influences om the raliabilily of a tesi is how much the test is based on the passages and questions taken directly fiom a textbook and how much it is based on the syllabus within the textbook, not the book itself An over-emphasis on
“quoting” the textbook in a test wilt produce resulis thal, do nial reveal achievement professional progress of the leamers in terms of rcading, writing, listening, spcaking, vocabulary and grammar T'he results will only reveal how well students have memorized the passages and the correct answers
+ Last but not least, one of the most important factors affgcting reliability is the scoring the test Sometimes, a test can be unreliable because of the way it is marked Kor cxampte, ian average composilion is marked immediately after a very good composition, the average composition may be given a mark that is actually below average The marker’s subconscious comparison of the two compositions will result in the average composition appearing worse than it really is Mlowever, if the same average composition ix marked immediately after a very poor composition, then it may appear above average and be awarded a higher mark than it deserves In addition, different markers may award different marks Lo the samme eormpasilion; for exarngle, some of Ihe markers may he very tericnl and others may be unthirly strict
‘To sum up, reliability is an undeniable important characteristic of a good test If the test result is not reliable, the asscssment of it is not reliable cither In order to make the test more reliable, it is important for the testers to consider many influential factors such as: test administration involving scoring, tinring, lesting conditions, observation or control of doing,
Trang 20the test, the size of the test; test instructions and scoring methods right from the outsets of
the test constructing process
2.3.3 Relationship between reliability and validity
Reliability and validity are essential measurement qualities of a good test They are
qualities that provide major justification for using test scares and numbers as the basis for taking inforences or decisions (Bachnem el af (1996: 19)
They have a complicated relationship On the one hand, it is possible for a test to be teliable without being valid That is, a test can give the same result time after time but does
not measure what it was intended to measure On the other hand, if the test is not reliable, it
cannot bs valid at all To be valid, according to Hughes (1988:42), a test must provide
consistently accurate measurements It must therefore be reliable A reliable test, however,
Tnay not be valid at all, For example, in a wriling test, the carulidates are requires Lo
translate a text of 500 words into their own langnage This could well be a reliable test but
it is unlikely to be a valid test of writing In our efforts to make test reliable, we must be wary of reducing their validity,
The problem is that while one can have test reliability without test validity, a test
can only be valid if it is also reliable There is thns sometimes said to be a reliability- validily tension This tension cxists in the sense thal if is somclimes csscntiat to sacrifice a degree of reliability in order to enhance validity However, if validity is lost to increase
reliability, we finish up with a test which is a reliable measurs of something rather than
whal we wish to measure The lwo concepts are: if'a choice has to be made “validity afler
all, is the more important one”, (Guilford (1965:481))
Moller (1981:67) comments that while it is understond that a valid test must be
reliable, it would seem that in such a highly complex and personal behavior as using a
Janguage rather than one’s mother tongue, validity could be claimed for measures that
might have a lower than normally acceptable level of reliability Reliability is something
we should always try to achieve in our tests Test reliabili
harmful affect on the validity of the instrument
y can wot be ignored without a
Trang 211?
Therefore, test validity and reliability are the two chief criteria for evaluating any tests And the idcal test should be both valid and reliable However, the greater the reliability of a test is, the less validity it has
2.3.4, Practicality
In addition to reliability and validity, practicality plays an important role in deciding
whether a (est is good or nal The tin question of practicality is administrative A Lest
must be carefidlly organized well in advance; How long will the test take? What special arrangements have to be made? (For example, what happens to the rest of the class while individual speaking tests take place)? Is any equipment needed (tape recorder, language lab, overhcad projector)? How is marking the work handled? How are tests stored among, the settings of tests? All of these questions are practical since they help ensure the success of a last and losting, (Heaton: 1988), Thorolore, practicality includes Gnancial limitations, time contains, ease of administration, scoring and interpretation
According to Brown (1994), if a test which is prohibitively expensive, takes a student ten hours to complete and lakes a few inimales for students to da nt several hours tor teachers to evaluate, is impractical
Another important aspect of practicality we have to concem is that the test should
have “instructional value”, Oller (1979) The test should enhance the delivery of the instructions into the students The teachers need to make clear and usefisl interpretation for students to understand and learn better, ‘The instructions of the test should be clear and easy for the students to know what they have lo do Prom knowing what to do, they can get higher marks, In contrast, a too complicated or too difficult test may not be practical to the teachers and the smdents
To sum up, in order to be useftd and efficient, tests should be as economical as possible in terms of time and cost In addition, the test’s instructions should be well-written for students to know what they ought to do
2.3.8 Discrimination
Discrimination is another irmportanl factor thal test designers have lo concern wien writing a test, Heaton (1988) defines discrimination of a test is the capacity to discriminate
Trang 22the different students and to reflect the differences in the performances of the individual in groups The test can not realize discrimination if the test items is cither too casy or foo
difficult, Therefore, the test items must be written in ranging ftom “extremely easy items”
is on the purposes of the test Vor example, if
a placement test is able to cflicicntly discriminate among students, it will be much casier to divide students into the suitable groups In many classroom tests, the teacher will be mich more concerned with finding out how well the students have mastered the syllabus so the
teachers will hope higher results fiom the students
Summary
In this chapter, the writer has reviewed definitions and the Toles of language testing; and four major characteristics of a good test with aims at finding out the emphasis
on the content validity and how to make the test more valid in addition, the relationship
‘belwcen reliability and validily is also presented in ordzr Lo have the ideal test.
Trang 2319
Chapter 3: The study
3.1 English learning, teaching and testing at Phuong Dong University
3.1.1 The students
At Phuong Dong University, students come from different parts of the country
Most of these students commonly did not spend nmch time learning English at high
school as they had to devote most of their time to learning different subjects, for example: mathematics, physics, chemistry, drawing in order to pass the university
entrance examination, Thus, (hey are Teal boginmers of English when cnlering university, and of different language proticiency levels
3.1.2 The teachers
English teachers working with 2° year students are at different ages Half of
them are at the age fiom 45 to 55 and the rest from 25 to 38 years old ‘They graduated from three education institutions: La Noi National University, Ila Noi oreign Language
University and Phuong Dong University
3.1.3 The course book: “New Headway Elementary- The third edition”
The book “New Tleadway-Tlemgntary- The third edition” has been usedt as the textbook to teach the second year students at Phuong Dong University, This material is designed for students at elementary level
dt consists of 14 units, designed in a harmonious combination with
powerful lexical to increase leamers? vocaluitary amd develop awareness of the
English culture
Each unit is divided into three patts, and each part lays a focus on grammer,
function or vocabulary Every unit provides students with opportunities ta learn and
develop their Imowledge in categories of grammar, vocabulary, communication skills and pronunciation through practice activities of listening, speaking, reading and writing (see Appendix I- page
Trang 243.1.4 Syllabns and its objccfivcs
For the first semester of socơnd year stnderts, seven units from anit 7 10 14 are taught in 45 periods (50 minutes per period) and delivered within about 9 weeks
Simdenis still work on four areas of grammar, vocabulary, commumication skills, and
pronunciation and they have chance of dealing with different topics The aims of
the course are to help increase students’ basic knowledge of vocabulary, grammar and
also practics of four basic language skills such as lislowing, spoakinys, roading and wriling
in social situations
3.1.5 The final achievement test for second year non major students
‘The final achievement test consists of the following parts: types, items, tasks
Part 2 Guided sentence 5 Use the following sets of words to | 2
Pat3 | Comect mistakes 5 Find and correct one mistake in | 2
each sentence
Part 4 | Write a paragraph 1 Write a paragraph of 100-120 words | 4
aboul your capilal city
Table 3: The components of the final achievement tesi
Looking at the marking criteria for the test, we can see that it has confused many teachers
and worried students It is very difficult for teachers to mark part 4 as there are no detailed
Tnarking criteria such as: language, conlenl, grammar, cle
3.2, Research method
In this study, both quantitative and qualitative methods are used, They are survey questionnaires and document analysis However, with the scope and purposes of this study,
document analysis is taken as the main method to find out the strengths and the weaknesses
of the final achicvement test regarding to content validity In addition, survey
questionnaires help the writer collect more information of both teachers and students about
Trang 2521
this test Obviously although each method helps to collect and confirm different kinds of data, it has its own unavoidable shortcomings
3.2.1 The survey questionnaires
There are many ways to collect data and survey questionnaize is one effective way because of some reasons Kirstly, they can be used to gather information about teachers”
and sludenils’ alfiiudes, views and thouglds to the content validily of the ond-af-term 1 Lest
Secondly, there are no conffontations between the persons who do the surveys and the informants because it is often a list of questions Therefore, the informants can feel tree to express their thoughts, Thirdly, most of the answers for the questions are closed ones so it
is casicr for the writer to collect and analyze the data, Finally, it can gather a lage numbers
of responses
3.2.2 Document analysis
Besides survey questionnaires, document analysis is considered as the main method
to cvalluate the final achicvemend lest in terms of the content validity
Firstly, the writer will analyze the “The New Headway- Elementary- The third adition” to find out what the teachers have to teach, what the students ought to learn Because the purpose of this study is investigating into the content validity of the final achievement test for second year students at Phuong Dong University, analyzing this test is one effective way to get this purpose Basing on the theories about testing, designing a test and characteristics of a good (est, the wriler will analyze this (esl by comparing the course objectives and what the students lad leamt with the test contents in order to find out the strengths and weaknesses of the test and then give some suggested solutions for its improvements
Last but not least, the writer will analyze the data of survey questionnaires from
both leachers and students to see how (heir comments aboul this lest are
Summary Evidsntly, it is important to use several methodologies to compare the results received and lo gnsure the authenticity of the results Of course, the informants’ real feelings and full views are expressed Besides, document analysis is a rich source of the
Trang 26information as the writer captures what the teachers and students, in fact, do Therefore,
using document analysis in combination with survey questionnaires helps the writer give
‘the objective and reliable results
3.3 Data analysis
In this part, basing on the final achievement test, the writer will compare the
content af this test wilh whal Iho students had learnt in the firs! semester in order lo find out the strengths and weaknesses of the curently used test with reference to the content validity In addition, the students and teacher’s opinions through survey questionnaires is also analyzcd in order lo valuate the
PHUONG DONG TINIVERSITY
Forcign languages department
The final achievement test - Nol
Time allowed: 6ft minutes
Marker’s signature2:
TL Rewrite euch sentence, beginning ax shown, so that the meaning slays the same
1 My watch is cheaper than yours
Trang 2723
4 No one és more intelligent than Anna in her class
Ll Guided sentence building: use the following sets of words and phrases to write complete sentences
1 Vẻ chicken/and chipsmain course
TH Find and correct ONE mistake in each of the folluwing ventences
1 My brother can play badininton when he was five years old
Trang 28
The ñmal achievemel tesl is more formal than progress tests and is intended to measure achievement on a larger scale for all students, In addition, tinal achievement tests are based on what the students are presumed to have learnt not necessarily on what they have acinally learnt nor ơn what have actnally been taughl, And the contents of these tests
aust be closcly related to the teaching contents and the objectives conczrned Now Iet’s sec
what students had been taught about grammar and vocabulary im this semester and what they had been checked in part 1, 2 and 3
What they have heen taught
Past simple tense
Regular verbs Irregular verbs
Time expression
Past sitnple 2
Negative- ago
Time expression Coun! and unconnt nouns
1 Iike and I’d like Aand some
Much and many Comparalive and superlatives
Question, 4 (part 1) Question 2 (pait 2) Question 5 (part 2)
Trang 29
Infinitive of purpose
Adjectives and adverbs Question 3 (part 3)
Question 5 (part 3) Unit 14 | Present perfect
Present perfect and past simple Question 3 (part 2)
Question 4 (part 3) Table 4: What students had been taught and what they had been checked in
part J, II, TIT of the test
And whal students have boon taught in wriling part
Unit? Describing a holiday
Uni Witting about a fiend
Unt 9 Filling in forms
Unit 10 Describing a place
Unit 11 Describing people
Unit 12 Writing a postoard
Unit 13 Writing a story
Table 5: What students had been taught and checked in the wriling part
When analyzing the content of the lest, you can sce that he test is quile sufficient
with clear instructions and format There are no new words and new grammar structures to students All of them have heen tanght in the semester Tlowever, there are some problems hore When looking at th charts above, it is clear thal some granmnars have not been checked in the tast, for example, grammar part in Unit 8 (negative form of past simple), Unit 11 (going 6) In the writing patt, this topic was closely related to what the students