TRẦN THÚY QUỲNH THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR NON MAJOR STUDENTS AT PHUONG DONG UNIVERSITY Đánh giá sự phù hợp về nội dung của bài kiểm tra
Trang 1FALCUTY OF POST-GRADUATE STUDIES
……… ***………
TRẦN THÚY QUỲNH
THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR NON MAJOR STUDENTS AT PHUONG DONG
UNIVERSITY
(Đánh giá sự phù hợp về nội dung của bài kiểm tra tiếng Anh cuối kỳ dành cho sinh viên không chuyên năm thứ hai Trường Đại học dân lập Phương Đông)
M.A MINOR THESIS
Field: ENGLISH TEACHING METHODOLOGY Code: 60-14-10
Course: 18 (2009-2011) Supervisor: M.A Kim Van Tat
HA NOI- SEPTEMBER 2011
Trang 2TABLE OF CONTENTS
Page
Acknowledgement……… i
Abstract……… ii
List of tables and figures………iii
The table of contents……… iv
Chapter 1: Introduction 1.1 Rationale……… 1
1.2 Scope of study……….2
1.3 Aims of study……… 2
1.4 Methods of study……….2
1.5 Research questions……… 3
1.6 Design of study………3
Chapter 2: Literature review 2.1 Language testing 2.1.1 Definition of language testing……… 4
2.1.2 The roles of language testing……… 5
2.1.3 Relationship between testing and teaching- learning……… 6
2.2 Major Characteristics of a good test 2.2.1 Test validity……….8
2.2.1.1 What is test validity? 8
2.2.1.2 Types of test validity……… 9
2.2.1.2 Face validity……….9
2.2.1.2 Content validity……… 10
2.2.1.2.1 What is content validity? 10
2.2.1.2 2 How to make the test more valid? 11
2.2.2 Test reliability……… 13
Trang 32.2.3 Relationship between reliability and validity……… 16
2.2.4 Practicality………17
2.2.5 Discrimination………17
Chapter 3: The study 3.1 English learning, teaching and testing at Phuong Dong University 3.1.1 The students……… 19
3.1.2 The teachers……….……….19
3.1.3 The course book “New Headway Elementary- The third edition” 19
3.1.4 Syllabus and its objectives……… 20
3.1.5 The final achievement test………20
3.2 Research method………20
3.2.1 The survey questionnaires………21
3.2.2 Document analysis……… 21
3.3 Data analysis……… 22
3.3.1 Analysis of the final achievement test……… 22
3.3.2 Analysis of the survey questionnaire for students………26
3.3.3 Analysis of the survey questionnaire for teachers………30
3.4 Results……….32
Chapter 4: Recommendations and conclusions 4.1 Recommendations……… 34
4.2 Conclusion ……… ……… 43
4.3 Limitations……….……… 43
References……… 45 Appendixes
Appendix 1: The content of the course book……… I Appendix 2: Survey questionnaires for students………… IV Appendix 3: Survey questionnaires for teachers………V Appendix 4: Answer key for reading task ……….VII Appendix 5: Answer key for the new final achievement test.VIII
Trang 4LIST OF TABLES AND CHARTS
1 Table 1: Scores on test A (invented data) by Arthur Hughes
2 Table 1: Scores on test B (invented data) by Arthur Hughes
3 Table 3: The components of the final achievement test
4 Table 4: What students had been taught and what they had been checked in part I,
II, III of the test
5 Table 5: What students had been taught and checked in the writing part
6 Table 6: Paper specification grids for the final achievement test
7 Chart 1: Students’ comment on validity of the test
8 Chart 2: Students' comment on time allowance of the test
9 Chart 3: Students' comment on difficult level of the test
10 Chart 4: The result of the test
11 Chart 5: The purpose of the test
Trang 5Chapter 1: Introduction
1 1 Rationale
These days, the need of learning English has become greater and greater In ourcountry Viet Nam, having recognized the importance of it, the Ministry of Education andTraining (MOET) has recently decided that English is a compulsory subject in most highschools and universities This decision requires both teachers and students to alter theirways of teaching and learning In addition, testing is one effective way to evaluate teachingand learning They are so closely related Testing validates the teaching-learning processwhile teaching and learning provides a great source of language materials for testing toexploit And testing is a concerned matter to all teachers
During the teaching time at Phuong Dong University, the writer heard both teachersand students here complaining that the English test did not often faithfully reflect theteaching and learning process or in other words, the test did not reflect what the studentslearnt and what the teachers taught What was tested was not really taught and the testmeasures neither the achievement of the course objectives nor the expected skills andknowledge of students It is shared by some test researchers as Brown (1994: 373) andHughes (1989:1) on recent language testing:
“A great deal of language testing is of very poor quality Too often language
testing has a harmful effect on teaching and learning and too often they fail to measure accurately whatever it is they are intended to measure.”
Another reason for the selection of this research topic lays in the fact that languagetesting at Phuong Dong University has not been paid enough attention to Classroomlanguage tests were often written in a hurry because the teachers here could not find time tothink carefully and plan the test Sometimes, they did not have a clear idea of what theywere testing students for and why They were busy mixing the number of various questiontypes and from that many students got low marks
Due to its close relationship with language teaching and learning, testing deservesproper attention from teachers and students in order for a positive backwash on theteachers‟ teaching, students‟ satisfaction and encouragement in their study In order todesign a good test to have exact, fair and effective evaluation of students‟ knowledge and
Trang 6performance of English, teachers are supposed to have good knowledge of test writingtechniques and testing theories.
Because of all above-mentioned reasons, the writer is encouraged to undertake thisstudy entitled: “Content validity of the current English achievement test for second-year-non-major students of English at Phuong Dong University” with aims at finding out thestrengths and weaknesses of this test in terms of the content validity and some, if any,suggested solutions for the improvement of it
1.2 Scope of study
The scope of this thesis is limited to a research on evaluating the final achievementtest in terms of its content validity by comparing the objectives, the syllabus and thetextbook allocation with the test contents The study provides investigated and analyzeddata of the currently used test and proposes practical suggestions on the improvements ofthis test
Due to the limitations of time, ability and conditions, it is impossible for the writer
to cover all the tests Only some suggestions for the improvements of the test are presented
1.3 Aims of study
The study aims at checking the content validity of the final achievement test forsecond-year-non-major students at Phuong Dong University It places high emphasis onanalyzing the contents of the final achievement test The specific aims of this research are:
- To find out the strengths and weaknesses of the currently used test with reference to the content validity
- To suggest some improvements for the test
Trang 7in terms of its content validity Basing on what students had learnt in their first semesterand the contents of this test, the writer would examine its content validity.
In addition, qualitative methodologies involving data collected through surveyquestionnaires were employed Two sets of questionnaires were administered to bothEnglish teachers and students at Phuong Dong University to investigate their evaluativecomments on the content validity of the final achievement test and some suggestions for itsimprovements
1.5 Research questions
In this study, the writer tries to answer the two following questions:
Question 1: What are the strengths and weaknesses of the final achievement test with
reference to the content validity for second year non major students at Phuong DongUniversity?
Question 2: What are some suggested solutions for the improvements of the test?
1.6 Design of study
The thesis is organized into five major chapters:
1 Chapter 1 INTRODUCTION presents such basic information as: the rationale, theaims, the methods, the research questions and the design of the study
2 Chapter 2 LITERATURE REVIEW presents a review of related literature thatprovides the theoretical basis for evaluating and building a good language test This reviewincludes background on language testing, criteria of good tests and theoretical issues on testcontent validity
3 Chapter 3 THE STUDY mentions the methods used in the research and which showsits detailed results of the surveys including the questionnaires and the analysis of the finalachievement test in order to find out its problems with reference to the content validity
provide some suggestions for the improvements of the final achievement test basing on thementioned theoretical and practical study Conclusions summarize the matters of research, itsfindings as well as its limitations
Trang 8Chapter 2: Literature review
This chapter provides an overview of the theoretical background of the study Itincludes three main sections
2.1 Language testing
2.1.1 Definition of language testing
Testing is an important part of every teaching and learning experience and becomesone of the main aspects of methodology The issue of language testing and its significantrole has been discussed a great deal by many professionals and research worldwide.Different definitions of language testing are given out with various points of view
According to Allen (1974:313), testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which
condition a test can take place He says: “Test is a measuring device which we use when we
want to compare an individual with other individuals who belongs to the same group.”
Carroll (1986:46) stresses a psychological or educational test is a proceduredesigned to elicit certain behavior from which one can make inferences about certaincharacteristics of an individual In other words, a test is a measurement instrument designed
to elicit a particular behavior of each individual
According to Bachman (1990:20), what distinguishes a test from other types ofmeasurement is that it is designed to obtain specific sample of behavior This distinction isbelieved to be of great importance as it reflects the primary justification for the use oflanguage and has implications for how we design, develop, and use them to their best use.Thus, language tests can provide the means for more focus on the specific assure ofinterest
In the point of view of Ibe (1981:1), “a sample of behavior under the control of
specified conditions aims toward providing a basis for performing judgment.” The term a
sample of behavior used here is quite board and it means something else rather than thetraditional types of paper and pencils
Yet, Heaton (1988:5) has different opinion In his ideal, tests are considered as amean of assessing the students‟ performance and to motivate the students He looks at testswith positive eyes as many students are eager to take tests at the end of the semester to
Trang 9know how much knowledge they have One important thing is that he points out the
relationship between testing and teaching
2.1.2 The roles of language testing
Language testing is a form of measurement It helps the teachers:
+ To assess the learner‟s achievement in a language program, for example, toevaluate the testee‟s language knowledge in relation to a given curriculum or material
which the testee has gone through in a given course
+ To assess a learner‟s proficiency in language in relation to future language use;for example, to find out if a person‟s language is good enough for him to become a tourist
guide This is the future use of the language regardless of what language programs ormaterials the testee went through
+ To diagnose a learner‟s strengths and weaknesses in a language and to attempt toexplain why the certain problems occur and what treatments could be used to tackle theseproblems
+ To classify or place the testees in the appropriate language classes
+ To measure the testee‟s aptitude for learning a language
+ To evaluate the effectiveness of a language program This is often done by usingexperimental and control classes with the same educational objectives but using differentmethods and materials to achieve these objectives, Brown (2000:5)
In another way, Bebecca.M.Valette (1977:3) comments that classroom tests playthree important roles in second language teaching program They are defining courseobjectives, stimulating student progress and evaluating class achievement
Firstly, classroom tests help us to define the course objectives Students are quick toobserve types of tests given and to study accordingly Thus, much as the teacher mayemphasize oral fluency in the classroom, if any tests are written tests the students will soonconcentrate on perfecting the skills of reading and writing
Secondly, tests help stimulating student progress As much as possible, the timegiven over to classroom testing should provide a rewarding experience The test shouldfurnish an opportunity for the students to show how well they can handle the specificelements of the target language; gone are the days when the teacher designed a test to point
Trang 10up the students‟ ignorance or lack of application Tests should be distinctly announced inadvance to permit the students to prepare adequately If the students themselves areexpected to demonstrate their abilities, it is only proper that they should learn as soon aspossible after the test how well they did The test best fulfills its functions as a part of thelearning process if the correct performance is immediately confirmed and the errors arepointed out.
The last role of testing is evaluating class achievement Through frequent testing,the teacher can determine which aspects of the program are presenting difficulties forindividual students and for the class as a whole By analyzing the mistakes made on a giventest, the teacher can determine where to concentrate extra class drills and how best to assisteach student At the same time, testing enables the teacher to discover whether the classobjectives are being met Through tests, the teacher can evaluate the effectiveness of a newteaching method, of a different approach to a difficult pattern, or of new materials Themost familiar role of the classroom test is to furnish an objective evaluation of eachstudent‟s progress: his or her attainment of course objectives and his or her performances
in relation to the rest of the class
2.1.3 Relationship between testing and teaching- learning
In the past, teaching and testing used to separate both theoretically and practically.According to Williams (1983), a test is necessary imposition but outside the classroom, it isunpleasant one because of two main reasons The first one is that testing is concerned withcompetition rather than cooperation Thus, while classroom activities may involve pairworks and group works, such cooperation during a test is condemned as copying, and theindividual is expected to work alone If these are perfectly possible, the results of a grouptest may tell us very little about each individual in that group In the same way, testing doesnot admit cooperation between teachers and learners The teacher who helps andencourages the learners with their tasks and responds to their difficulties, in a test situation,withdraws cooperation The other reason followed from the first is that there should be awinner and loser in the test To be sure, those who close to win themselves do not feel tooupset, but those who gain little from experience, may feel conscious
Trang 11Nowadays, a new trend and development with a remarkable emphasis on integrativeand communicative tests has brought about many innovations in English testing techniques.Most of the researchers comment that teaching and testing are so closely related As Brown
(1994) states: “Teaching and testing are so interwoven and interdependent that it is
difficult to tear them apart” Tests are constructed primarily as the devices to reinforce
learning and to motivate the students and as a means of assessing the student‟s performance
in the language In the other words, a test is an extension of classroom work, providingteachers and students with the useful information that can improve both teaching andlearning process In turn, teaching and learning provide a great source of language materialsfor testing to exploit
A good test is a valuable teaching device for some reasons Firstly, a test providesthe teachers information on how effective teaching has been It helps the teaching process
to find out if students are capable of performing behavior And from that, we can know thecharacteristics of an individual Secondly, with the aids of tests, teachers can monitor andevaluate student‟s learning and diagnose the strengths and weaknesses as they occur Lastbut not least, basing on the test results, the teachers can evaluate the effectiveness of thesyllabus as well as the method and materials they are using
However, testing has a harmful and beneficial effect on teaching and learning Forexample, if a test is regarded as important, then preparation for it can come to dominate allteaching and learning activities If the end goal is to help students to pass the test orexamination, many teachers will focus their teaching on the content of the test only So theteaching program may be distorted in many ways
2.3 Major characteristics of a good test
Before writing the test, it is very necessary to answer this question: “What are themajor characteristics of a good test?” Harrison (1983: 10) claims that there are four basiccharacteristics of all good tests They are validity, reliability, practicality anddiscrimination
Trang 122.3.1 Test validity
2.3.1.1 What is test validity?
Validity is one of the most important characteristics of a good test It has been acontroversial issue for a long time A recent trend in language testing discussion is toconsider validity as a unitary concept with different types of validity and it is nowconsidered as aspect of validity
Henning (1987:5) defines validity as follows:
“In general validity refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure A test is said to
be valid to the extent it measures what it is supposed to measure It follows that the term valid when used to describe a test should usually be accompanied by the proposition for any test, then may be valid for some purposes, but not for others."
A test is considered valid when it specifically measures what is supposed tomeasure A listening test with written multiple choice options may lack validity if theprinted choices are so difficult to read that the exam actually measures readingcomprehension as much as it does listening comprehension It is least valid for studentswho are much better at listening than at reading In other words, the test results areinterpreted as appropriate to the purposes of testing That is, validity can be defined as thedegree to which a test actually tests what it is intended to test For example, if the purpose
of a test is to test ability to communicate in English And this test is valid if it does actuallytest ability to communicate When considering test validity is the degree to which a testmeasures what it is supposed to measure, it has two very important aspects The first one is
a matter of degree There is a degree of validity, and some tests are more valid than theothers A second important aspect of this definition is that tests are only valid or invalid interms of their intended uses If a test is intended to test reading ability, but it also testswriting, then it may not be valid for testing reading but it may test reading and writingtogether
Validity refers to the appropriateness or correctness of the inferences anddiscussions made about individuals and groups from the test results Valid must beconsidered in terms of the correctness of a particular inference about test takers Therefore,validity is not always easy to measure
Trang 132.3.1.2 Types of test validity
There are many types of validity such as: face validity, content validity, constructvalidity, concurrent validity and predictive validity In this part, the writer will focus ononly two main types: face validity and content validity
2.3.1.2.1 Face validity
When mentioning face validity, we should concern with this questions: “Does the test
on the face of it appear from the learners‟ perspective to test what it is designed to test?”.Face validity is almost always perceived in terms of content If the test samples the actualcontent of what the learner has achieved or expects to achieve, then face validity will beperceived According to Arthur Hughes (1989:40), a test is said to have face validity if itlooks as if it measures what it is supposed to measure For example, a test which pretended
to measure pronunciation ability but which did not require the candidate to speak may bethought to lack face validity Candidates, teachers and education authorities may not accept
a test, which does not have face validity Face validity concerns the appeal of the test to thepopular or non-expert judgment such as the candidates, the candidates‟ families, members
of the public and it is calculated by asking other teachers to give their opinions about thetest
However, with the advent of communicative language testing, there has beenincreased emphasis on face validity It is important for communicative language test to looklike something one might do “in the real world” with language They attribute such appeals
to “real life” to face validity While opinions of students about the test are not expert, it can
be important because it is one kind of response that you can get from the people who aretaking the test If a test does not appear to be valid to the test takers, they may not try theirbest, so the perceptions of non-experts are useful
In other words, the face validity affects the response validity of the test This criticalview of face validity provides a useful method for language test validation
Face validity can provide not only a quick and reasonable guide but also a balance to
a great concern with statistical analysis Moreover, students‟ motivation is maintained if atest has good face validity On the other hand, the test appears to have little of relevance inthe eyes of the students, it will clearly lack face validity It is possible for a test to includeall the components of a particular teaching program being followed and yet at the same
Trang 14time lack face validity The concept of face validity is far from now in language testing butthe emphasis now placed on it is relatively new In the past, many test writers regarded facevalidity simply as a public relation exercise Today, most designers of communicative testsregard face validity as the most important character of all types of test validity.
2.3.1.2.2 Content validity
2.3.1.2.2.1 What is content validity?
Among several kinds of validity, the simplest and most important one to thelanguage teachers is content validity
In Read‟s opinions (1983:6), the most relevant type of validity for classroom testing iscontent validity, which means that the contents of the test should reflect the contents and theobjectives of the syllabus that is being followed In the other words, if we want to find outstudents' progress of what they have learnt, the test should contain a representative sample ofthe items, rules, skills or functions that they are supposed to achieve Obviously, the testcontents are the main concern if content validity is to be achieved
Kerlinger (1973) defines content validity is the representative or sampling adequacy
of the content, the substance, the matter and the topics of a measuring instrument
In the same way, Harrison (1983: 11) defines content validity as:
"Content validity is concerned with what goes into the test The content of a test should be decided by considering the purpose of the assessment, and then drawing up a list known as a content specification."
According to Cyril J.Weir (1990), the purpose of content validity is to examinewhether the test is a good representation of the material that needs to be tested and toensure the defensibility and fairness of interpretation based on the test performances Itinvolves looking at empirical evidence- the hard factors emerging from data from test trials
or operational administrations and is calculated by comparing the test with its courseobjectives Last but not least, a test is said to be valid if it is relevant to the aims andpurposes of the learning areas on which it is set
The most distinction between face validity and content validity was pointed out byAlderson et al (1995: 173) as follows:
"In face validation, we do not necessary accept the judgment of others, although we respect it, and appreciate that for those people it is real and important
Trang 15and may, therefore, influence behaviors In content validation, we gather judgments from people we are prepared to believe."
In this case, if face validity is an appeal to the lay observers who are students,administrators for example, the content validity is the opinion of the subject experts (i.e.,teachers, test makers ) as to whether a test is valid
For Kelly (1978), content validity seems as “an almost completely overlappingconcept" with construct validity And for Moller (1982: 68), “The distinction betweenconstruct and content validity in language testing is not always very marked, particularlyfor tests of general language proficiency." In these cases, particular attention must be paid
to content validity in an attempt to ensure that the sample of activities included in a test is
as representative of the target domain as possible
To sum up, the writer does in favor of Read‟s ideas, the most importantcharacteristics of a good test is content validity which means the contents of the test shouldreflect the contents and the objectives of the syllabus that is being followed
2.3.1.2.2.2 How to make the test more valid?
Firstly, in content validation, we should look at whether the test is representative ofthe skills they are trying to test It means that we should look at the content of the tests andcompare them with a statement of what the contents ought to be This involves looking atthe syllabus in the case of an achievement test, the test specifications and deciding what thetest was intended to test and whether it accomplishes what it is intended to do In the otherwords, the content validity depends on the particular course objectives In addition, the testwould have content validity only if it included a proper sample of the relevant structures.Just what the relevant structures are will depend of course upon the purposes of the test Inorder to judge whether a test has content validity or not, we need a specification of theskills or structures that it is meant to cover Such a specification should be made at a veryearly stage in test construction It is not to be expected that everything in the specificationwill always appear in the test But it will provide the test construction with the basis formaking a principled selection of elements included in the test A comparison of testspecifications and test contents is the basis for the judgments of content validity
Trang 16However, how important is content validity? Arthur Hughes (1989) gave twoimportant things of it First, the greater a test's content validity, the more likely it is to be anaccurate measure of what it is supposed to measure A test in which major areas identified
in the specification are not represented at all is unlikely to be accurate Secondly, a test islikely to have a harmful backwash effect Areas, which are not tested, are likely to becomeareas ignored in teaching and learning Too often the content of tests is determined by what
is easy to test rather than what is important to test The best safeguard against this is towrite full test specifications and to ensure that the test content is a fair reflection of these Inthe other words, when embarking on the construction of a test, the test writer should firstdraw up a table of test specifications, describing in very clear and precise terms theparticular language skills and areas to be included in the test If the test or sub-test beingconstructed is a test of grammar, each of the grammatical areas should then be given apercentage weighting For example, the future simple tense 10%, uncountable nouns 15%,relative pronouns 10% If the test or sub-test concerns reading, the each of the reading
sub-skills should be given a weighting in a similar way For instance, deducing wordmeanings from contextual clues 20%, search-reading for specific information 30%, readingbetween the lines and inferring 12%, intensive reading comprehension 40%
According to Heaton, J.B (1982) the test writer has attempted to quantify andbalance the test components, assigning a certain value to indicate the importance of eachcomponent in relation to other components in the test In this way, the test should achievecontent validity and reflect the component skills and areas that the test writer wishes toinclude in the test
Anastasi (1982:131) defines content validity as: “essentially the systematicexamination of the test content to determine whether it covers a representative sample ofthe behavior domain to be measured.” She provided a set of useful guidelines forestablishing content validity:
1 The behavior domain to be tested must be systematically analyzed to make certainthat all major aspects are covered by the test items, and in the correct proportions
2 The domain under consideration should be fully described in advance, rather than being defined after the test has been prepared
Trang 173 Content validity depends on the relevance of the individual's test responses tothe behavior area under consideration, rather than on the apparent relevance of item content.
Brown (1994: 385) gives a list of necessary factors to improve the test validity:+ A careful-construct well thought out format
+ Item that is clear and uncomplicated
+ Direction that is crystal clear
+ Tasks that are familiar and relate to their course work
+ A difficulty level that is appropriate to your students
+ Test conditions that are biased for best that bring out students' best performances
In the same way, Moore (1992: 11) stressed: “Content validity is established bydetermining whether the instrument's test items correspond to the content that the studentsare supposed to learn."
Correspondingly, to evaluate the test content validity, the test items should beinspected regarding their correspondences to the teachers' stated objectives
In short, test content validity is the most important characteristic of a good test Thebasis to evaluate content validity is a comparison between the test specifications and thetest contents
2.3.2 Test reliability
Reliability is another necessary characteristic of any good test A reliable test can beused as a measuring instrument If the test is administered to the same students on differentoccasions (with no language practice work taking place among these occasions) thenproduces different results, it is not reliable So a test is said to be reliable if it can producethe same results when administering to the same students under different times
There are two types of reliability The first one refers to the ability of a test toproduce the consistent results from the same students whenever it is used namely test-retestreliability and the other type of reliability is the inter-item consistency which means that thetest should be able to measure the same thing all the time
Bachman (1990), a leading expert, describes reliability as "a quality of test score"
We can look at the hypothetical data in table 1 They present the scores obtained by 5
Trang 18students who took a 100-item test A on a particular occasion and those that they wouldhave obtained if they had taken it a day later The most obvious thing of these is simply tohave people take the same test twice We should note the size of the differences betweenthe two scores for each student:
Table 1: Scores on test A (invented data) by Arthur Hughes (1989: 30)
obtained on the following day
Now have a look at table 2, which displays the same kind of information for a second
100-item test B again, note the difference in score for each student:
Table 2: Scores on test B (invented data) by Arthur Hughes (1989: 30)
obtained on the following day
Trang 19items in the test, the test may rely too heavily on luck-weak candidates may score 50% or more on a short.
+ Second factor affects the test reliability is the administration of the test Ifindividual test items are too hard for everyone or too easy for everyone then they are not reliabletest items They do not differentiate between the strong and weak candidates The importantfactor in deciding reliability is whether the same test is administered to different groups underdifferent conditions or not
+ The third one is test instructions: Are the various tasks expected from the testers made clear to all candidates in the rubrics?
+ Another factor that influences on the reliability of a test is how much the test isbased on the passages and questions taken directly from a textbook and how much it is based onthe syllabus within the textbook, not the book itself An over-emphasis on
“quoting” the textbook in a test will produce results that do not reveal achievementprofessional progress of the learners in terms of reading, writing, listening, speaking,vocabulary and grammar The results will only reveal how well students have memorizedthe passages and the correct answers
+ Last but not least, one of the most important factors affecting reliability is thescoring the test Sometimes, a test can be unreliable because of the way it is marked Forexample, if an average composition is marked immediately after a very good composition,
the average composition may be given a mark that is actually below average The marker‟ssubconscious comparison of the two compositions will result in the average compositionappearing worse than it really is However, if the same average composition is markedimmediately after a very poor composition, then it may appear above average and beawarded a higher mark than it deserves In addition, different markers may award differentmarks to the same composition; for example, some of the markers may be very lenient andothers may be unfairly strict
To sum up, reliability is an undeniable important characteristic of a good test If thetest result is not reliable, the assessment of it is not reliable either In order to make the testmore reliable, it is important for the testers to consider many influential factors such as: testadministration involving scoring, timing, testing conditions, observation or control of doing
Trang 20the test; the size of the test; test instructions and scoring methods right from the outsets ofthe test constructing process.
2.3.3 Relationship between reliability and validity
Reliability and validity are essential measurement qualities of a good test They arequalities that provide major justification for using test scores and numbers as the basis formaking inferences or decisions (Bachman et al (1996: 19))
They have a complicated relationship On the one hand, it is possible for a test to bereliable without being valid That is, a test can give the same result time after time but doesnot measure what it was intended to measure On the other hand, if the test is not reliable, itcannot be valid at all To be valid, according to Hughes (1988:42), a test must provideconsistently accurate measurements It must therefore be reliable A reliable test, however,may not be valid at all For example, in a writing test, the candidates are requires totranslate a text of 500 words into their own language This could well be a reliable test but
it is unlikely to be a valid test of writing In our efforts to make test reliable, we must bewary of reducing their validity
The problem is that while one can have test reliability without test validity, a testcan only be valid if it is also reliable There is thus sometimes said to be a reliability-validity tension This tension exists in the sense that it is sometimes essential to sacrifice adegree of reliability in order to enhance validity However, if validity is lost to increasereliability, we finish up with a test which is a reliable measure of something rather thanwhat we wish to measure The two concepts are: if a choice has to be made “validity afterall, is the more important one”, (Guilford (1965:481))
Moller (1981:67) comments that while it is understood that a valid test must bereliable, it would seem that in such a highly complex and personal behavior as using alanguage rather than one‟s mother tongue, validity could be claimed for measures thatmight have a lower than normally acceptable level of reliability Reliability is something
we should always try to achieve in our tests Test reliability can not be ignored without aharmful affect on the validity of the instrument
Trang 21Therefore, test validity and reliability are the two chief criteria for evaluating anytests And the ideal test should be both valid and reliable However, the greater thereliability of a test is, the less validity it has.
2.3.4 Practicality
In addition to reliability and validity, practicality plays an important role in decidingwhether a test is good or not The main question of practicality is administrative A testmust be carefully organized well in advance: How long will the test take? What specialarrangements have to be made? (For example, what happens to the rest of the class whileindividual speaking tests take place)? Is any equipment needed (tape recorder, language lab,overhead projector)? How is marking the work handled? How are tests stored among thesettings of tests? All of these questions are practical since they help ensure the success of atest and testing, (Heaton: 1988) Therefore, practicality includes financial limitations, timecontains, ease of administration, scoring and interpretation
According to Brown (1994), if a test which is prohibitively expensive, takes astudent ten hours to complete and takes a few minutes for students to do but several hoursfor teachers to evaluate, is impractical
Another important aspect of practicality we have to concern is that the test should
have “instructional value”, Oller (1979) The test should enhance the delivery of the
instructions into the students The teachers need to make clear and useful interpretation forstudents to understand and learn better The instructions of the test should be clear and easyfor the students to know what they have to do From knowing what to do, they can gethigher marks In contrast, a too complicated or too difficult test may not be practical to theteachers and the students
To sum up, in order to be useful and efficient, tests should be as economical aspossible in terms of time and cost In addition, the test‟s instructions should be well-writtenfor students to know what they ought to do
2.3.5 Discrimination
Discrimination is another important factor that test designers have to concern when writing a test Heaton (1988) defines discrimination of a test is the capacity to discriminate
Trang 22the different students and to reflect the differences in the performances of the individual ingroups The test can not realize discrimination if the test items is either too easy or toodifficult Therefore, the test items must be written in ranging from “extremely easy items”
to “extremely difficult items” In the other way, Harrison (1994: 14) defines discriminationas: “The extent to which a test separates the students from each other." Discrimination tells
us whether the test can differentiate between the more proficient students and the lessproficient ones The extent of the need depends on the purposes of the test For example, if
a placement test is able to efficiently discriminate among students, it will be much easier todivide students into the suitable groups In many classroom tests, the teacher will be muchmore concerned with finding out how well the students have mastered the syllabus so theteachers will hope higher results from the students
Trang 23Chapter 3: The study
3.1 English learning, teaching and testing at Phuong Dong University3.1.1 The students
At Phuong Dong University, students come from different parts of the country.Most of these students commonly did not spend much time learning English at high school
as they had to devote most of their time to learning different subjects, for example:mathematics, physics, chemistry, drawing… in order to pass the u n i v e r s it y entranceexamination Thus, they are real beginners of English when entering university, and ofdifferent language proficiency levels
3.1.2 The teachers
English teachers working with 2nd
year students are at different ages Half of themare at the age from 45 to 55 and the rest from 25 to 38 years old They graduated from threeeducation institutions: Ha Noi National University, Ha Noi Foreign Language Universityand Phuong Dong University
3.1.3 The course book: “New Headway Elementary- The third edition”
The book “New Headway-Elementary- The third edition” has been used as thetextbook to teach the second year students at Phuong Dong University This material isdesigned for students at elementary level
It consists of 14 units, designed in a harmonious combination with powerful
lexical to increase learners‟ vocabulary and develop awareness of the English culture
Each unit is divided into three parts, and each part lays a focus on grammar,
function or vocabulary Every unit provides students with opportunities to learn and
develop their knowledge in categories of grammar, vocabulary, communication skills and
pronunciation through practice activities of listening, speaking, reading and writing (see
Appendix 1- page I)
Trang 243.1.4 Syllabus and its objectives
For the first semester of second year students, seven units from unit 7 to 14 aretaught in 45 periods (50 minutes per period) and delivered within about 9 weeks Studentsstill work on four areas of grammar, vo ca bu lar y, communication skills, andpronunciation a nd t he y ha ve c ha nc e o f dealing with different topics The aims of thecourse are to help increase students‟ basic knowledge of vocabulary, grammar and alsopractice of four basic language skills such as listening, speaking, reading and writing insocial situations
3.1.5 The final achievement test for second year non major students
The final achievement test consists of the following parts: types, items, tasks
Part 1 Rewrite the sentences 5 Rewrite sentences so that there is no 2
change of meaningsPart 2 Guided sentence 5 Use the following sets of words to 2
Part 3 Correct mistakes 5 Find and correct one mistake in 2
each sentencePart 4 Write a paragraph 1 Write a paragraph of 100-120 words 4
about your capital city
Table 3: The components of the final achievement test
Looking at the marking criteria for the test, we can see that it has confused many teachersand worried students It is very difficult for teachers to mark part 4 as there are no detailedmarking criteria such as: language, content, grammar, etc…
3.2 Research method
In this study, both quantitative and qualitative methods are used They are surveyquestionnaires and document analysis However, with the scope and purposes of this study,document analysis is taken as the main method to find out the strengths and the weaknesses
of the final achievement test regarding to content validity In addition, surveyquestionnaires help the writer collect more information of both teachers and students about
Trang 25this test Obviously, although each method helps to collect and confirm different kinds ofdata, it has its own unavoidable shortcomings.
3.2.1 The survey questionnaires
There are many ways to collect data and survey questionnaire is one effective waybecause of some reasons Firstly, they can be used to gather information about teachers‟and students‟ attitudes, views and thoughts to the content validity of the end-of-term 1 test.Secondly, there are no confrontations between the persons who do the surveys and theinformants because it is often a list of questions Therefore, the informants can feel free toexpress their thoughts Thirdly, most of the answers for the questions are closed ones so it
is easier for the writer to collect and analyze the data Finally, it can gather a large numbers
of responses
3.2.2 Document analysis
Besides survey questionnaires, document analysis is considered as the main method
to evaluate the final achievement test in terms of the content validity
Firstly, the writer will analyze the “The New Headway- Elementary- The thirdedition” to find out what the teachers have to teach, what the students ought to learn.Because the purpose of this study is investigating into the content validity of the finalachievement test for second year students at Phuong Dong University, analyzing this test isone effective way to get this purpose Basing on the theories about testing, designing a testand characteristics of a good test, the writer will analyze this test by comparing the courseobjectives and what the students had learnt with the test contents in order to find out thestrengths and weaknesses of the test and then give some suggested solutions for itsimprovements
Last but not least, the writer will analyze the data of survey questionnaires fromboth teachers and students to see how their comments about this test are
Summary
Evidently, it is important to use several methodologies to compare the resultsreceived and to ensure the authenticity of the results Of course, the informants‟ realfeelings and full views are expressed Besides, document analysis is a rich source of the
Trang 26information as the writer captures what the teachers and students, in fact, do Therefore,using document analysis in combination with survey questionnaires helps the writer givethe objective and reliable results.
3.3 Data analysis
In this part, basing on the final achievement test, the writer will compare thecontent of this test with what the students had learnt in the first semester in order to find outthe strengths and weaknesses of the currently used test with reference to the contentvalidity In addition, the students and teacher‟s opinions through survey questionnaires isalso analyzed in order to evaluate the test content validity more in depth
3.3.1 Analysis of the final achievement test
It is necessary to examine the layout as well as the content of the final achievementtest for second year non major students The test includes 4 main parts which can berepresented as follows:
Phuong Dong University
Foreign languages department
-The final achievement test - No1
Time allowed: 60 minutes
Marker’s signature2:
I. Rewrite each sentence, beginning as shown, so that the meaning stays the same
1 My watch is cheaper than yours
Trang 274 No one is more intelligent than Anna in her class.
Anna is………
5 Do you want some fish and chips?
Would……….?
II. Guided sentence building: use the following sets of words and phrases to write complete sentences
1 I‟d/ chicken/and chips/main course
III Find and correct ONE mistake in each of the following sentences 1 My brother can play
badminton when he was five years old
Trang 28A and some
Much and many
Unit 10 Comparative and superlatives Question1, 4 (part 1)
Question 5 (part 2)
Trang 29Unit 11 Present continuous Question 2(part 3)
Whose is it?
Possessive pronoun
Unit 12 Going to
Infinitive of purpose
Question 5 (part 3)Unit 14 Present perfect
Question 3 (part 2)Present perfect and past simple
Question 4 (part 3)
Table 4: What students had been taught and what they had been checked in part
I, II, III of the test
And what students have been taught in writing part
Unit 7 Describing a holiday
Unit 8 Writing about a friend
Unit 9 Filling in forms
Unit 10 Describing a place
Unit 11 Describing people
Unit 12 Writing a postcard
Unit 13 Writing a story
Unit 14 Writing an email
Table 5: What students had been taught and checked in the writing part.
When analyzing the content of the test, you can see that the test is quite sufficientwith clear instructions and format There are no new words and new grammar structures tostudents All of them have been taught in the semester However, there are some problemshere When looking at the charts above, it is clear that some grammars have not beenchecked in the test, for example, grammar part in Unit 8 (negative form of past simple),
Unit 11 (going to) In the writing part, this topic was closely related to what the students