The contents tested in the fourty five minute tests ‘Yable 6: he contents tested in the fifteen minute tests Table 7: Test Specification for final written tests Table &: The contents o
Trang 1
VIETNAM NATIONAL UNIVERSITY, HA NOT
COLLEGE OF FOREIGN LANGUAGES
`
NGUYEN THI HOANG LAN
AN EVALUATION ON TIE VALIDITY OF ENGLISII TESTS
USED FOR ENGLISII 10 AT SOME TNIGIIER SECONDARY
SCHOOLS IN THE MIDDLE AND NORTH OF VIET NAM,
FROM HA TINH TO HA NAM
ĐÁNH GIÁ TÍNH HIỆU LỤC CỦA CÁC BÀI KIỂM TRA TIENG ANH DANH CHO
HỌC SINH LỚP 10 6 MOT SO TRUGNG THPT MIEN TRUNG VA MIEN BẮC VIỆT
Trang 2
VIETNAM NATIONAL UNIVERSITY, HA NOT
COLLEGE OF FOREIGN LANGUAGES
`
NGUYEN THI HOANG LAN
AN EVALUATION ON TIE VALIDITY OF ENGLISII TESTS
USED FOR ENGLISII 10 AT SOME TNIGIIER SECONDARY
SCHOOLS IN THE MIDDLE AND NORTH OF VIET NAM,
FROM HA TINH TO HA NAM
ĐÁNH GIÁ TÍNH HIỆU LỰC CỦA CÁC BÀI KIỂM TRA TIENG ANH DANH CHO
HỌC SINH LỚP 10 Ở MỘT SỐ TRIỜNG THPï MIỄN TRUNG VÀ MIỄN BẮC VIỆT
Trang 3TABLE OF CONTENTS
INTRODUCTION:
L._ Rationale of the study
Scope of the study
Alms of the sludy
| Methods of the study
CHAPTER 1: LIFERATURE REVIEW
LL Basic concepts of testing
1.2 Achievement Tests
1.2.1 Definition
1.2.1.1 Kinds of achiovernent lests
1.2.1.2 Final achievement tests
1.2.1.3 Progress achievement tests
1.3 Characteristics of a good language lest
Trang 41.5 Syllabus Objectives on language components
CHAPTER 2: THE STUDY
2.3 Data Analysis and Discussion
2.3.1, Content validity of the tests used
2.3.1.1 Conlent, validity of 45 minule written lesis
2.3.1.2, Content validity of 15 minute written tests
2.3.1.3 Content validity of final written tests
2.3.2 Construct validity of the tests used
2.3.3, Face validity of the tests used
Trang 5REFERENCES
vii
61 64
Trang 6LIST OF TABLES
‘Table 1: The test item types
Table 2: Syllabus Objectives
Table 3: Format of 45 minute tests and final tests
Table 4: Specification for fourty Give mnimale tests
Table 5 The contents tested in the fourty five minute tests
‘Yable 6: he contents tested in the fifteen minute tests
Table 7: Test Specification for final written tests
Table &: The contents of language eormponents in the final tests Table 9: Specification for language components
Table 10: Teachers’ opinions on the investigated tests
Table 11: Testing techniques
Trang 7L
INTRODUCTION
1, Rationale of the study
Euplish has played an inlsgral role in increasing the develogrnent of science, technology, cultwe and international relations This fact has resulted in the growing demand for English language learning and teaching in many parts of the world In addition, the world-wide globalization process has confined Fngtish the mast widely used moans of
international communication The necd to mas cr English to ac lo infonmation and
To cvalualc mnd assess tha English learning and toaching process, testing is apparently employed as an important and powerfil tool This is because ever since
Janguage began to be taught in formal settings, the development of tests to assess the
‘both for the students who are being tested and for the professionals who are administering the tests, ‘Through the tests students can learn from the work they do with the teacher, and
by thomsclves in preparation for ths losts, tho opportunitios arising daring the tosis for developing what they know and what they can do, especially the feedback which they receive after the tests, both ftom their own reflection and from professionals who have
monitored their performances on the tosis The (cachet can pinpoint strengths and
weaknes
sin the laamed abilities of the students and gain the information about the
progress the students are making or what the students are likely to be able to do with the
Trang 8explicilly and implicilly) about the targel language In general, a language fest can be a
“sample language behaviour and infer general ability in the language learnt” (Brown
DH, 1994: 252} From the results of the tests and depending on different kinds of tests
with different purposes as well, the teacher can infer a cerlain level of language
competence of his students in such different areas as grammar, vocabulary, prommeiation,
or speaking, listening , writing and reading Lanwerys & Seanlon (1969) contends in their
hook “test#g is an important tool in educational research and for programme evaluatian,
and may even throw light on both the nature of language proficiency and language
learning”
“Language testing is a form of measurement It ix sa closely related fo teaching that we cannot work in testing without being constanily concerned with teaching” (Heulon,
1988:5) Therefore, it is undeniable that the most effective and fastest way to check
students’ understanding is testing Besides, thanks to testing, teachers can evaluate the
effectiveness of the uscd syllabus or its contents, objectives, mcthods and to identify,
locate the difficult areas that their pupils are being confronted with in learning process
through lesis,
For the past ten years or so, there have bean a nunniber of changes im the practice of English teaching in Viet Nam tertiary education Some regard methodology, from Grammar translation method io Communicative approach Some involve in course books Some are concerned with technology, from traditional tape recorders to modem LCD
‘projectors Some are related to testing For example, at Higher Secondary Schools in recent
‘years there is a shift in testing from Subjective tests to Objective tests, which has great cfiects on tcuching and loaning process Therefore, in testing pupils’ progress, teachers tend to design more objective tests and many mid-term or final tests are multiple choice questions This is considered as a good preparation for students to perform well in the cnfrance university tests which cxists in multiple choice questions However, the problem
is that the English 10 is one of the three new course books of Ministry of Hducation winch focus om improving the four skills reading writing, speaking and listening and help students to consolidate their grammar in the Language Focus part, Thus, multiple chơicz questions seem to fail to test pupil’s progress accurately The question arising is that
whether the tests used at Iigh Schools test what students are supposed to acquire
Trang 93 according to the objectives of the textbook ‘This is also one of the major reasons why E carted the rescarch on vatidily
In addition, Test researchers and developars have admitted that validity are oritical for lesis and referred to as inlagral measurement quatiies Because this quality provides imajor justification for using test score numbers as a basic for making inferences or
decisions (Bachman and Palmer, 1996-19) From educative perspectives that both teachers
and students should have their voice board abonl inslrnelional cơnlenL, mode of syllabus delivery, and assessment As analyzed above, validity is an indispensable quality of all
good tests Opinions from test takers and test raters, therefore are essential and important
to the process of Lest construction More imporlanily, if is impossible for lest writers to try
in vain to increas: the validity of a relidble test duc to the features of test ilems thai constructs it, From the outset of test construction, test validity should be of most essentiat focus of all Heaton (1988:60) argued that “face validity can provide not only a quick and reasonable guid: bat also a balance to too grcat of conccin with statistical analysis.” He stated that the students’ motivation is maintained if a test has good face validity and most students will ty harder if the lest looks sound Thus, the face vatidity plays a verlain roÏs in any test and itis also of great cancernin this thesis Moreover, the emphasis on test
‘validity is also confirmed in Hughes (1989) that, "the greater a test's content validity is, the
more likely it is to be an accurate measure of what it is to measure." ‘l'o put it in another
way, iC major arcas in the {esl specification are nol identified or not represented, the lest is said to be inaccurate, Furthermore, such an inaccurate test is likely to have hati backwash effect because those are not presented or tested will probably be ignored in
caching and loaning Bachman (1990: 289) also insists thal :" The most important quality
to consider in the development, interpretation and use of language tests is validity, which has been described as a tnitary concept related to the adequacy and appropriateness of the way we interprot and use test scores." In goncral , the reasons discussed here arc regarded
as a stiong impetus that initiates this thesis into investigating the validity of the
achievement tests at Higher Secondary Schools ftom Ha ‘Tinh to Ha Nam
Some studies and researches have been done in some particular schools to design
an English achievement test for the 10th form pupils as a Case Study such as the stady by
Ta Thi Minh Hien (2005) However, the
and evaluation on tests used for 10th form pupils High Schools in The Middle and in the
has nol been any study on investigation into
Trang 10North of Viet Nam Whils it is undeniable that good evaluation of tests can help us rivasure skills and knowledge of pupils more accurately For cxampls, lest analysis can
help us remove weak items even before we record the results of the tests
Another reason [or the selection of this research topic lies in Ihe fact (hal language testing at Higher Secondary Schools has not been paid enough attention to, Asa teacher, I have been involved in designing, administering and marking any kinds of English tests
‘Yet have also witnessed neither earmprehonsive nor systematic evaluation nor tescarch or the effectiveness and appropriateness of these tests, No formal discussions or seminar on test construction or test methods have been carried out ‘There is a lack of a langnage test
Hiern bank, a professionals testing commitiee, who judges the quality of the tests and takes
the responsibility for the given tests
Tor the above-mentioned reasons, as a learner, a teacher, and a beginning
sesearcher of English, the author has becn cncouraged to conduct the study entitled: “dn
evaluation on the validity of English tests used for Lnglish I0 at Higher Secondary Schools
in the middle and north of Viet Nam , from Ifa Tinh to a Nam” with a view to evaluate
the validity of the tests used for pupils at Higher Secondary School It is hoped that the
study will benefit the author as well as teachers at Higher Secondary Schools and those who are concemed with language testing in general and English testing techniques at
Higher Secondary Schaol in particular
2 SCOPE OF THE STUDY
In this study the author intends to focus mainly on the content validity, construct validity, face validity of progress achievement tests including 15 minufe tests, midterm tests, and final achievernent tests consisting of final- lermn Lesis and finial tesls int the schaok
‘years of 2007-2008 and 2008-2009at the 12 high schools in 6 provinces from Ha Tinh to
Ha Nam The results can be seen as the basis for providing some suggestions for test designers as well as raters
Trang 113 AIMS OF TIE STUDY
Parallel with the above reasons leading to ths research are some following aims
- To assess the validity of tests uscd for English 10 at Highcr Sccondary School from the Middle to the North of Viet Nam focusing on content validity, construct validity, face validity,
- To sugges! some implications on dosigning a writton English tost_ lo better the teaching and learning English at Higher Secondary Schools in Viet Nam
008
achievement tests comprising of term tests and final tests in the school years 2007
and 2008-2009 in a number of Higher Secondary Schools from the Middle lo the North of
‘Viet Nam were analyzed Content validity is evaluated basing on the comparison between the test specification relying on the syllabus objectives of the English 10 and the content tasted in the collected fasts, Construcl validity is assessed relying on the Last specification constmneted basing on theoretical background of testing and the syllabus of the English 10
‘The survey questionnaire was administered to the teachers of the Upper Secondary Schools
o investigate thelr evahunlive commonts on the face validity of the tests they designed
Beside the use of critical reading, analysis and questionnaires for data collection, the study made use of alher supporting methods such as interviews, informal disenssions, opinion exchanges with teachers and students to gather necessary information about the jearning, teaching and testing situations at High Schools
Trang 12§ RESEARCH QUESTIONS
‘This study is implemented to find the answers to the following research question:
- Do the achievement tests for Higher Secondary School pupits of grade meet the
following criteria: content validity, construct validity, face validity?
6 ORGANIZATION OF THE THESIS
‘This thesis is comprised three parts
Part one introdvices the rationale of the study, the scope, the aims, the methods,
rescarch questions
Part two is the development of the thesis which is divided info three chapters
Chapter one reviews the literature related to language tssting (hasic concepts, roles,
Content validity of the tests used, iace validity of the tests used)
Parl three demonstrates the conclusion comprising of main finding, implications and suggestions for further studies,
Trang 13DEVELOPMENT
CHAPTER 1, LITERATURE REVIEW
‘This chapter reviews the theories and literature relevant to the topic under investigation in the presenl study The chapier staris with bastes concepis of lesting and then the definition and types of achievement tests are reviewed A brief review of major characteristics of a good language test is presented with a major focus on test validity, capociatly construct, content and face validity Next , ierns for phonetic , structures and,
vooabrdary is discussed, Finally, Curricula of English 16 is provided with the objectives and the content of the English 10
1.1 Basic concepts of testing
Trang 14‘Testing is an essential part of every teaching and learning experience end becomes one of the main aspeets of methodology Many rescarchers have demonstrated defimitions
of testing with different point of view
Allen (1974: 313) emphasizes testing as an insirument to ensure that students have
a sense of competition rather than to know how good their performance is and in which condition a test can take place He contends that “test is a measuring device which we use when we want lo compare an individual with other individuals who belong to the samc
group."
Carrol (1968: 46) holds that 2 psychological or cduvational test is a procedure designed to clicit certain behavier fom which onc can make inferenecs aboul vertain characteristics of an individual In other words, a test is a measurement instrament designed to elicit a particular behavior of each individual,
Besides, Ibe (1981: 1) points out that "a sample of behavior under the control of specified conditions aims towards providing a basis for performing judgment" The term a sample of behavior used here is rather broad and it means something else rather than the traditional types of paper and pencils Read (1983) shares the same idea in the sense that a sample of behavior suggests language testing certainly includes listening and speaking skitts as wall as reading and writing ones
Lowever, Heaton (1988:5) looks at testing in a different way in his opinion, tests ave considered as a means of as:
sing the studcuis’ performanee and to motivate the
students, He looks at tests with a positive eyes as many students are eager to take tests at the end of semester to know how much knowledge they have One important thing is that
he points out the relationship between testing and teaching
Harrison (1986:1) notices that a natural extension of classroom work, providing leachers and sludents with uscful infonmation that can serve cach as a basis far improvement and a test is necessary but unpleasant imposition from outside the classroom
‘That means a test is a usefil tool to measure leamers' ability in a certain situation
especially in classroom
Trang 159
According to Bachman (1990:20), what distinguishes a test from other types of
rrvasurement is thal il is designed (o obtain speaific sample of behavior This distinetion is Delieved to be of great importance because it reflects the primary justification for the use
of language tests and has implications for how we design, develop and use them to their
‘best use Thus, language tests can provide the means for more focus on the specific assure
of interest
Brown (1994-252) stalos thal “A fesl, in plain or ordinary words, is a method of measuring @ person's abulity or knowledge in a given area", Moore (1992:138) proposes that evaluation is an essential toot for teachers because it gives them feedback conceming what the students have teamed and indicates whet should be done nexl in the learning process Evaluation hielps us to belicr understand students, their abilitics, inlercsts, attitudes, and needs so as to teach more effectively and motivate them, However, in the book of Brown (1994:373) he stresses that are seen by learners as dak clonds hanging over their heads, upsctting them with thunderous auxicty as they anticipate the lightning bolts of questions they do not know and worst of all a flood of disappointed if they do not make the grade
From the above descriptions, though different researchers holds different paint of view on testing, in short testing is an effective means of measuring and assessing students’ language knowledge and skills TLis of groal usc to both language Izaching and learning
4.2, Achievement tests
Just as there ara reany purposes for which language lasts arc developed, so thers ars amany types of language tests, Some types of tests serve a variety of purposes while others are more restricted in their applicability ‘The tests collected were designed basing on the
to us
iext book Fnglish 10 and wore intends 88 pupils’ progress, therefore in this part
definition as well as kinds of achievement tests are presented
1.2.1 Definition
Trang 16Achievement tests are defined differently depending on researchers’ paints of view Tlughes (1990-10) hetd thal.“ achievernent tests are directly related to language course, their purpose being to establish how successful individual students, groups of students , or the courses themselves have been in achieving objectives.” Achievement tests are usually carried out after a course on a group of leamers who take the course Brown (1994:259) also suggests thal “An achicvement test is related dircelly lo classrnarn Tessons,
units or even total curriculum.” achievement tests in his point of view “are limited to a
particular material covered in a curriculum within a particular time frame” Another comment on achicvement test offered by Finocchiaro and Sako (1983:°5) is that
achievement tesis or attainment test are widely employed in many language teaching
institutiens They are used lo measure the degree of control of discrete language and cultural items and of integrated language skills acquired by the students within a specific period of instruction in a specific course, Hazrison (1983:7) demonstrates that “an achievement test looks back over a longer period of leaming than the diagnostic test, for
” Hà
cxumpte, a yoar’s work, or the whote course, or oven a variety of differen coms
also states that achievement tests are intended to show the standard, which the pupils have reached in relation to other pupils at the same level In short, Achievement tests are dircefly related ta tanguage courses The purpose af this kind of test is to know how successful students, courses or the teaching itself have been in achieving the objectives stated beforehand (in the program of the course, for example)
In short, achievement tests play a crucial role in the school programs, especially in evaluating students’ acquired language knowledge and skills during the course, and they
ave widely used al different schoot level
1.2.2 Kinds af achievement tests
Achievement tests can be subdivided into the final achievement tests and progress achicverant tesls classified according to the time administration and tho designed
objeotivities
1.2.2.1 Final achicvement tests
Trang 1711
Tinal achisveruent (ests are ađmminislersd at the chủ oÍ 4 course and ils purpose is ta measure the achievement of the course as a whole These tests may be written and administered by ministries of education, official examining boards, or by members of teaching institutions Obviously, the content of these tests must be related to the courses with which they are conecrncd, bul hz nature of this relationship is a maller of disagreement amongst language testers
According fo some tasting experts, the content ofa final achievement test should be
‘based direetty on a detailed course syllabus or on the books and othor mnteriais used This
is known as the syilabus-content approach The test should has an obvious appearance for
it only contains what it is thought that the pupils have actually encouraged and therefore can be considered, in [hs re
disadvantage that iff the syllabus is badly designed, ot the books and other materials are
cl al least, a fair esl However, this test holds a
badly chosen, then the results of the test can be very misleading Successful performance
on the test may not truly indicate successful achievement of course objectives
The alternative approach is to design the test content basing directly on the objectives of the course, which has a variety of advantages First, it forces course designers
to elicit about course objectives This it tum puis pressure on those who are responsible for the syllabus and the selection of books and materials to ensure that these are consistent with the course adjectives ‘ests based on course objectives work against the perpetuation
of poor lzaching practice, a kin of coursc-conlent-basod (asl, lost as if conspiracy fails lo
do I strongly believe that test content based on course objectives is much preferable, which provides more accurate information about individual and group achievement, and is
likcly to promote a more bencficial hackwash cffeet on teaching
Progress achievement tests
Progress achievement tests are intended to measure the progress students are making in order to plat fnlure work (inchading remedial work) They are usually administered at the end of a specific unit or lesson Obviously, these tests should be related
to the course objectives These should make a clear progression towards the final
Trang 18achievement test based on course objectives ‘Then if the syllabus and teaching methods are appropriate to these objectives, progress lusts based on shori-lerm objectives will fit well with what have been taught If not, there will be pressure to create a better fit, If itis the syllabus that is at fault, it is the tester’s responsibility to make clear that it is there, that
change is needed, nal in the lests
Moreover, more formal achievement test require careful preparation; teachers could feel free to set their own ways to make a rough check on pupil’s progress to keep pupils on
X
their tocs Simoc such (csls will not form part of formal assessment procedu
construction and scoring need not be purely towards the intermediate objectives on which a more formal progress achievement tests are based Liowever, they can reflect a particular
“Soule” that an individual teacher is taking towards the achievernent of objectives
1.3 Characteristics of a good test
In order to make a good test, teachers have to take the various factors into consideration such as the purpose of a test, the content of the syllabus, the pupils! background and so on In addition to these factors, test characteristics play a very
Morcover, we will have further details as follow
constructing a good tesl, According to a number of leading scholars in
1.3.1 Reliability
Reliability is a necessary characteristic of any good test, ILis of primary importance
in the use of profivicney tosts for both public achievernent and classroom losls Án appropriateness of the various factors affecting reliability is important for the teacher at the
Trang 1913 very outset, since many teachers tent to regard fests as infallible measuring instruments and
fail 10 realize thal cven the bes! test is indeed a somewhat irmprecisc instrument wilh which
to measure language skills
A fundamental crilerion agains! any language test, which has to be judged is its reliability The concern bere is with how far we can depend on the results that a test produces ‘Ihree aspects of reliability are usually taken into account, ‘he first concern the consistency of scoring among different makers The sccond is the concom of the tesler how to enhance the agreement between makers by establishing, and maintaining adherence
to, explicit guidelines for the conduct of this making ‘The third aspect of reliability is that
of paralldl-forms reliability, the requirements of which have to be born in sind when fulure allemalive forms ofa Lest have to bs devised
The concept of reliability is particularly important when considering language tests within the communicative paradigm Moreover, Davies (1968) stresses that reliahility is the first csscntial for any test, but for ccrtain kinds of language tests, they may be very difficult to achieve the appropriate results
1.3.2 Discrimination
Another important feature of a test is ils capacity lo discriminate among the
ditterent candidates and to reflect the differences in the performances of the individuals in
‘the group It is ime for both teacher-made tests and standardized test ‘he extend of the
need fo discriminate will vary depending on the purpose of the
tests, for example, the teacher will be much more concemed with finding out how well the pupils have mastered the syllabus and will hope for a cluster of marks around the 80 percent and percent brackets, Nevertheless, there may be occurrences in which the teacher may require @ test to discriminate to some degree in order to assess relative abilities and locate areas of difficulty Here are the items should be spread over a wide difficulty level
in the test
- extremely easy items
= very chạy items
sy items
Trang 20- fairly easy items
~ lems below average difficult level
~ items of average difficult level
~items above average difficult level
- fairly efifficull items
- difficult items
- very difficult items
- extremely di fficndt items
1.3.3 Practicability
‘A test must be practical, in other words, it must be fairly straight forward to the administers The most obvious practical considerations conceming the tests overlook Firstly, tho Jongth of time available for the administration of the test if frequenily misjudged even by experienced test writers, especially if the complete test consists of a number of subtests, Another practical consideration concerns the answer sheets and the stationary used, The use of answer sheets, however, greatly facilitates marking and is strongly recommended when large numbers of pupils are being tested, ‘Ihe question of practicability, is not confined solely to oral tests, such written tests as situational composition and controlled writing (ests depend nol onty on the availabilily of qualified amarkers who can make valid judgment concerning the use of language, ete, but also on the iength of time available for the scoring of the test A final point concems the presentation
of the lest paper itself, where possible
tidy and authentically pleasing,
Trang 211.3.4, 1 Content validity
" A test is said to have content validity if its content constitutes a representative
saruple of the language skills, structures etc wilh which il ix went io be concerned.” (Huges, A.,1989:22) This kind of validity depends on careful analysis of the language being tested and of the particular corse objectives It is obvious that a grammar test, for instants, must be made up af ilems testing knowledge or control af grammar But this in itself does not ensure content validity The test would have content validity only if it included a proper sample of the relevant structures Inst what are the relevant structures will depend, of conrse, upon the purpose of the tesL Therefore, in order to judge whether or nol a test has a contenl validily, we nced a specification of the skills or structures cle, that if
is meant to cover, Such a specification should be made at a very early stage in test construction It isn't to be expected that everything in the specification will always appear
in the test, there may simply be too many things for all of thom to appear in a single test But it will provide the test constructor with the basis for making a principled selection of elements for inclusion in the lest A comparison of lest specification and lest cantent is the
‘basis for judgments as to content validity,
What is important about content validity? First, the greater a test's content's validily, the more likely it is Lo be an accurate moasure of what it is supposed lo incasure
A test in which major areas identified in the specification are under-represented - or not represented at all- is unlikely to he accurate Secondly, such a test is likely to have a harmful backwash cffseL Arc
whieh arz nal tested are likely lo become areas ignored im teaching and learning
Anastasi (1982:131) defined content validity as " csscntially the systcmatic
examination of the test content to determine whether it covers a representative sample of the behavior domain fo be measured” She shows a set of useful guideline for establishing
content validity:
~The behavior domain to be tested must be systematically analyzed to make certain
that all mnajor aspeels are covered by the lest ilems, and in the correct proportions,
Trang 22‘being defined after the test has been prepared
~ the content validity depends on the relevance of the individual’s test relevance of
‘fern content
“The more a test stimulates the dimensions of observable performance and accords with whal is known about thal performance, the more likely it is to have conlont and construct validity, According to Kelly (1978:8), content validity seems “an almost and completely overlapping concept” with construct validity, and for Moller (1982-68), " the distinction between conslruel and content validity language proficizney."
1.3.4,.2 Construct validity
Construct validity is defined by Anastasi (1982:144) as " the extent to which the
est many be said to measure a theoretical construct of trail, Rach constmet is developed ta explain and organize observed response consistencies It derives from establish inter relationships among behavioral measures focusing on a broader, more enduring and moe abstract kind of behavioral description construct validation requires the gradual accumulation of information from a varicly of source, Any data thrawing lighl on the nature of the trait under consideration and the condition affecting its development and manifestations are grist for this validity mill.”
Construct validity is viewed from a purely statistical perspective in much of the zecent American literature Bachman and Palmer (19814) ït is seen principle as 4 matter of the posterior statistical validation of whether a test has measured a construct that has a reality independence of other constructs
According to Hughes, A, 1989: 26, a test, part of a test, or a testing technique is said to have construct validity if i ean bo demonstrated that it mzasuros just the ability which is supposed to measure The word " construct" refers to any underlying ability (or trait) which is hypothesised in a theory of language ability 1'or example, it can be argued
Trang 2317 reading of short passages relates closely to the ability to read a hook quickly and efficiently andis a proven factor in reading abilily
rin related validity
Another approach to test validity is to see how far results on the test agree with
men of the eandidale's
those provided by some independent and righty dependable as
ability This independent assessment is therefore the critzrion measure against which the test is validated Criterion-related validity consists of two types, concurrent validity and proilictive vatidity
Coneurrent validity is the degree to which a test correlates with other tests testing the same thing In other words, if a test is valid it should give a similar result to other xueasures that are valid for the samc purpose, When considcring concurrent validity, there
are several concerns
First, the measure that is being used for comparison of the test in question umust be valid If the measure is not valid, there is no point in testing another test's validity against
it Vor instance, teacher's ranking might be used to test validity but the teacher's ranking xay be affected by a munber of factors thal are nol relalzd to the students’ actual proficiency One possible solution is to average the rankings of several teachers to make up for this
Second, the measure must be valid for the same purpose as the test whose validity
is being considered A reading test can not be used to test the concurrent validity of a grammar test, In addition, if teachers’ ranking are being uscd, it is csscntial to make suze that they understand on what basis they are expected to rank the students If the test being considered is a grammar test, then the teachers should be asked to rank the students according to their graramar proficicney, not their overall Eryglish language ability
it is said that predictive validity is different fom concurrent validity in that *
instead of collecting the extermal measures al the samng lừng as the adnunistration of experimental test, the external imcasure will only bz gathered some lime afler the test has
Trang 24been given" (Alderso et al, 1995) 'To put it in a simple way predictive validity is the
extent to which the tes! in quzstion oan be used o make predictions about the Culture performance, For example, does a test of English ability accurately predict how well students will get along in a university in an English- speaking country? There are numerous problems with allempling lo answer such questions, Measures used to know how well a student does at a university are sometimes employed to measure predictive validity, but the problem is that there are many factors other than English proficiency involved in academic success, Furthermore, il is nat possible to know whether the sindsnis who scored low on the tests and therefore did not get to go to university would have done if they had been allowed to go Llowever, it is undeniable that prediction is an important and justifiable
In short, information about criterion relatedness- concurrent or predictive - is by
itself insufficient evidence for validation ( Bachman 1990: 253) That is one of the reasons
why in this thesis, the author do not evahmle the crilerion-related validity in tesls
1.3.4.4, Face validity
Anastasi (1982:1 36) points out that face validity is the technical sense; it refers, not
to what the lest actually measures, bal to what it appears superficially to measure Face
vatidily portains to whether the test "looks valid" (o he examinces who take it, the administrative personnel who decide on its use and other technically untrained observers Fundamentally, the questions of face validity cancerns report and public relations, Lado (1961), Davies (1968), Ingram (1977), Pakmer (1981) lave alt discounted the value of facz validity If a test does not have face validity though, it may not be acceptable to the students taking it, or the teachers using it, If the students do not accept it as valid, their
adverse reaction to it may mean that they do not perform in a way that truly reflects their
ability Anastasi (1982:136) takes a similar position ” Certainly if tost content appears inrelevant, inappropriate, silly or childish, the result will be poor co-operation, regardless of
Trang 2519 the actual validity of the test specially in adult testing, it is not sufficient for a test to be
objectively valid TL also nazds face validity to function cffectively in practical siluations
in short, a test is said to have face validity if it fooks as if it measures what is supposed to measures, For example, test which intended lo measure pronunciation ability but which did not require the candidate to speak (and there have been some) might be thought to lack face valldity Face validity is hardly a scientifte concept, yet it is very imporlanl A test which docs nol have face validily may not be accopted by candidates, teachers, education authorities or employers, It may simply not be used; and if it is used, the candidates’ reaction to it may mean that they do not perform on it in a way that truly reflcels their ability Face validity cam be judged by teachers or pupils
1.3.4.5 Backwash validity
Language teachers operating in a communicative fame work normally allempt to equip students with skills that are judged relevant to present of future needs, and to the extent that tests are designed to reflect these, the closer the relationship between the test and the teaching that precede it, the more the test is likely to enhance construct validity A suitable criterion for jurlging commumivalive tests in the falure might well be the
dogros to which they satisfy pupils, teachers and fisture users of test results, as judged by some systematic attempt to gather data on the perceived validity of the test If the first stage, with
‘ls crnphasis on construct, conten, face, backwash validity, (he bypassed procedures do nol suit the purpose for which it was intended
On balance, special attention must be paid to the validity of a test whon onc constructs it Although there are many kinds of validity, from Harrison's conclusion, only face validity and content validity are most vital for the teacher setting his own tests ‘This view of validity provides a spocifie and uscful framework for language test ovaluation and
is also adapted in this thesis
1.3.4.6 Souces of invalidity
Trang 26Hemmings (1987) believes that the follawing considerations bring aboul a reduction
of validity:
Invalid application of tests: Invalid arises {om misapplication of lests A Lest may
be valid for specific puzposes, but it is invalid in terms of the manner in which the test is
‘used
Inappropriate Selection of Content Invalidity occurs when items do not match the
objectives or the content of instruction, or the test itams are not comprehensive in the sense
of reflecting alt of the major points of the instruciion programs, cle Flaboralc
specifications and zxpert opinion musl be used lo cnsure Lhal a test exhibit validity
Imperfect Cooperation of the Bxaminee: This might ocour if the examinees are insincere, misinformed, or hostile with regard to the test or the testing situation For example, students have wrong answers to a question just because of unclear test
instruction
Jnappropriate Referent or Norming Population: Referent means a distinct population from which subjects of the test are developed In Toefl tests, for example, the population targeted is applicants to American universities from diversc national and linguistics background,
Poor Criterion Selection: Validity
only in terms of specified criteria If the ctiteria selected are the wrong one, then the fact that the test is valid or of little practical significance This is particularly important in case of criterion related validity If the criterion mcasure itsclf has low reliability or validity as a measure of the targct competence, the validity coetticients obtained by this procedure will tend to underestimate true validity
Sample Trncation: Sample Truncation is artificial restriction of the range of ability represented in the examinees, which will result in underestimation of both reliability and validity
Trang 272 Use of Invalid Construct: ‘Vests are said invalid in so far as they measure the constructs they arc purported to measure if ws want to determine the validity of intelligence tests, we must know what kind of intelligence is being measured, whether or not the items accurately reflect that kind of intelligence
1.4, Test items for phonetics, structures and vocabulary
14.1.Test items
Tests usually consisis of a scrics of items Cohen (1992; 488) defines “an ilamn is a specific lask to perform, can test one or mere points or objectives, For example, am ilem may fest one point such as the meaning of a given word, or several points, such as an item
‘that tests the abili
y to obtain facts from a passage and then makes inferences based on the facts.” He also suggests that “sometimes an intcgrative item is really more a procedure than an item as in the case of a free composition which could have a number of objectives
> Furthermove , he stresses that the objectivity of an item is determined by the way it is scored A multiple-choice item, for example, is objective in that there is only one Tight answer He also points out that a free composition may be more subjective in nature if the scorer does not look for any one right answer but rather for a series of factors namely creative style, Cohesion and eaherence, grammar and mechanics
Item types for testing phonetics are ordering task, note-taking, multiple-choice
naiching, Hems lypes for lesting strutures are multiple-choice item
nglction items, matching items, transformation exercises, error-tecognition item, rearrangement items Item types for testing vocabulary are composition and essay multiple-choice items,
cloz items, matching items, completion items, word transformation
Trang 28together in phrases, clauses and sentences to build well-formed sentences Moreover, the third, semantics, with the way we assign mncaning lo a corlain uni ofa language in order to communicate, Each of these has additional levels, phonology is supplemented by phonetics, the study of the physical characteristics of sound; syntax by morphology is the study of the slructuzes of words and lexis is the study af vocabulary, The language components we focus on in this thesis are grammar (structures), vocabulary, phonology, grammar belongs te syntax, vocabulary belongs to lexis, and phonology belongs to phonotics
1.4.3 The test item types used to evaluate language components
‘The following table describes the test item types which are often used to evaluate phonetics, strictures and vocabulary:
Table 1, The test item types
~ Re-ordening, ~ Multiple choice items | - Mulfiple- choice items
- Multiple — choice - Rearrangement items | - Matching
- Pairing and Malching — | - Completion items - Word formation
- Transformation items | - Items involving
- Constructing items synonyms
- Fior- recognition + Reordering
mulliple- choiec ilems | - Definitions (explaining
- Broken sentence items | the meaning of each
- Pairing and: matching | word) items - Sentence Completion
- Gap filling
1.5 Syllabus Objectives on language components
English 10 is the first cowse book of the set of ones used for Higher Secondary School in Viet Nam English 10 is the confinuation of English 6, English 7, English, 8 and
Trang 29
23 English 9 at Secondary Schools Linglish 10 was designed basing on the themes! topics which arc Ganiliar with pupils in daily life The course books consists of 16 unis in which there is a topie/ theme, School leavers are supposed to be able to use English as a means of communication in speaking, listening, writing and reading at the pre-intermediate Level Pupils can gain basic knowledge aboul Bngtish speaking people and countries,
According to the content of the course book, after studying 16 units covering 6 themes, pupils are cxpocted 10 be able Lo grasp the following content on phonetics, structures and vocabulary:
Table 2: Syllabus Objectives
= Distinguish the -Usc the following icnsvs corraotty: - Understand and use
following vowels; simple present, simple past tense; past | the words relating to
Acl-i1,/M-/aé,/ perfect ; present perfect:present the following topics
afi oi - fy sf: progressive appropriate: School
Aử Av fof ~ ; Be going to(expressing prediction); | talks;
foil; foil-/aif~ _ simple future (Will: making predictions, | Daily activities;
fief -/ea/- /ua/
: - Use the following connectors ‘Technology;
consonants: /p/ -
appropriately: because! because of, School outdoor
Pols dl - Ar, tsi — fz
despite’ in spite off although; which activities, The media;
- Ramember some verbs followed by Life in the
- Turn the active sentences into passive | nature;
ones, direct statemens into indirect oncs | National park, Music,
- Use the + adj as a noun theatre and film;
correctly in the
individual words
Trang 30
and distinguish defining and non- English cities,
- Use Attitudinal adjectives formed from
V-ing/ V-ed comectly
-Uso Corparatives and Superlatives correctly
~ Master the use of article (a/ arv the);
the slructure was not until that, Whe
CHAPTER 2: TILE STUDY
In this chapter, the writer provides information about the research questions, data
description, data description, analytical framework, data analysis and discussion
2.1, Research Questions
For the purpose of the thesis and based on the theoretical background and given
context, four research questions are proposed with a focus on construct, content and face
validily Those questions can serve as rifestones for the analysis
QL: Do the achievement tests for Higher Secondary School pupils meet the following
criteria: content validity, construct validity, face validity?
Question 1 is further specified by minor questions evaluating on each type of validity (construct, conten! and face validity) respectively as follows
QL: Can the testing iechniques used in the tesis correctly measure pupils's ability
ta remeniber, understand and produce the leanct language camponents?
Trang 31
the oncs situated m the cies and districts So in total, Untly achievement tesls were
analyzed (0 evaluate the contonl, construct, fave validity, Basing on the expericness in teaching English and short interviews with teachers, the author has leamed that classroom teachers may design tests themselves or the teachers of the English group design the tests together, English has recently been a compulsory subject in the final school cxaminations, therefore teachers of English section regard the class progress tests as a reliable means to estimate the pupils’ result in learning as well as to reinforce their knowledge and motivate their leaming Every two ot three lessons, pupils will have to take a written test, In a term, there are eight lessons so pupils are supposed to have five written tests including the final
term test At the examinations, different tests about the same content are given to different
Glasses in order to avoid cheating Fiflcn tiinute tests are supposed to fest one of the thre
skills (phonetics, structures, vocabulary and listening or reading or writing) The choice of the skill to be tested bases on the teaching and testing situation of each school Llowever, leachots lend to test reading or writing skill or vocabulary and grarmmar that siudents loan from the previous lesson In general, the difficult level depends on the level of pupils at different schools The fourty five written tests which are administered at the end of three
or two Icssons serve as testing pupils’ rcading, listening, writing skills, phonctics, structures and vocabulary The term and final term tests are usually structured like fouty
five minute tests ‘This kind of tests are administered to test grammar, vocabulary and
reading, writing, Histoning skills relating to the theres thal pupils team during dhe term ar the school year However, as stated earlier, only language components in the tests were involved in the study
Trang 32used for Upper Secondary School with five parts as botow:
Table 3: Format of 45 minute tests and final tests
- Narrative or factual or | - Verb form story text ( approx 100-
Trang 3327 week ‘The test can test only structures and phonetics or vocabulary or all the three
Janguags components,
However, the author only focuses on language components that are structures, vocalndary and phonetics in the collected lasts The author collected the Lests mainly from teachers, others were got tom pupils Both qualitative and quantitative methods have been applied in this study All comments, remarks, assumptions and conclusion of the study are
‘based on the analysis of 30 lasts from those schools and the survey questionaire asking teachers who designed the collected tests about face validity,
2.3.Analytcal framework for data analy sis
In order to find out whether the available English tests designed and used at Igher
Sccondary Schools under investigation ( from the Middl: to the North of Vict Nam respectively) are in accordance with the syllabus and objectives of the 10" form or not,
both objective and subjective methods which mainly based on the analysis of random samples of progress achievement tests and final achievement fests at the given Higher Secondary Schools ‘The author mainly concentrated on analyzing the content validity, construct validity and face validity of the test Specifically, the author only focused on aceessing the language compancuts ( Phonclios, vocabulary and grammar) lasted in the
tests
2.2.3.1, Coutent validity
To investigate the content validity, the author analyzed and compared the tests’ content ( language components; phonetics, vocabulary and grammar) with the test specification basing on the objectives and syllabus of the textbook As mentioned earlier, after every two ar three lessons, pupils have fo lake @ 45 minule wrilton lost, Therefore, pupils are supposed to take four 45 minute written tests in total and two final tests
Firsily, the author invesligaled the content validity of the written tests coffeoted
randomly fom Upper High Schools from Ha Tinh to Ha Nam basing on test specifications.