The content validity of the current english achievement test for second year non major students at phuong dong university

TRẦN THÚY QUỲNH THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR NON MAJOR STUDENTS AT PHUONG DONG UNIVERSITY Đánh giá sự phù hợp về nội dung của bài kiểm tra

Trang 1

FALCUTY OF POST-GRADUATE STUDIES

……… ***………

TRẦN THÚY QUỲNH

THE CONTENT VALIDITY OF THE CURRENT ENGLISH ACHIEVEMENT TEST FOR SECOND YEAR NON MAJOR STUDENTS AT PHUONG DONG

UNIVERSITY

(Đánh giá sự phù hợp về nội dung của bài kiểm tra tiếng Anh cuối kỳ dành cho sinh viên không chuyên năm thứ hai Trường Đại học dân lập Phương Đông)

M.A MINOR THESIS

Field: ENGLISH TEACHING METHODOLOGY Code: 60-14-10

Course: 18 (2009-2011) Supervisor: M.A Kim Van Tat

HA NOI- SEPTEMBER 2011

Trang 2

TABLE OF CONTENTS

Page

Acknowledgement……… i

Abstract……… ii

List of tables and figures………iii

The table of contents……… iv

Chapter 1: Introduction 1.1 Rationale……… 1

1.2 Scope of study……….2

1.3 Aims of study……… 2

1.4 Methods of study……….2

1.5 Research questions……… 3

1.6 Design of study………3

Chapter 2: Literature review 2.1 Language testing 2.1.1 Definition of language testing……… 4

2.1.2 The roles of language testing……… 5

2.1.3 Relationship between testing and teaching- learning……… 6

2.2 Major Characteristics of a good test 2.2.1 Test validity……….8

2.2.1.1 What is test validity? 8

2.2.1.2 Types of test validity……… 9

2.2.1.2 Face validity……….9

2.2.1.2 Content validity……… 10

2.2.1.2.1 What is content validity? 10

2.2.1.2 2 How to make the test more valid? 11

2.2.2 Test reliability……… 13

Trang 3

2.2.3 Relationship between reliability and validity……… 16

2.2.4 Practicality………17

2.2.5 Discrimination………17

Chapter 3: The study 3.1 English learning, teaching and testing at Phuong Dong University 3.1.1 The students……… 19

3.1.2 The teachers……….……….19

3.1.3 The course book “New Headway Elementary- The third edition” 19

3.1.4 Syllabus and its objectives……… 20

3.1.5 The final achievement test………20

3.2 Research method………20

3.2.1 The survey questionnaires………21

3.2.2 Document analysis……… 21

3.3 Data analysis……… 22

3.3.1 Analysis of the final achievement test……… 22

3.3.2 Analysis of the survey questionnaire for students………26

3.3.3 Analysis of the survey questionnaire for teachers………30

3.4 Results……….32

Chapter 4: Recommendations and conclusions 4.1 Recommendations……… 34

4.2 Conclusion ……… ……… 43

4.3 Limitations……….……… 43

References……… 45 Appendixes

Appendix 1: The content of the course book……… I Appendix 2: Survey questionnaires for students………… IV Appendix 3: Survey questionnaires for teachers………V Appendix 4: Answer key for reading task ……….VII Appendix 5: Answer key for the new final achievement test.VIII

Trang 4

LIST OF TABLES AND CHARTS

1 Table 1: Scores on test A (invented data) by Arthur Hughes

2 Table 1: Scores on test B (invented data) by Arthur Hughes

3 Table 3: The components of the final achievement test

4 Table 4: What students had been taught and what they had been checked in part I,

II, III of the test

5 Table 5: What students had been taught and checked in the writing part

6 Table 6: Paper specification grids for the final achievement test

7 Chart 1: Students’ comment on validity of the test

8 Chart 2: Students' comment on time allowance of the test

9 Chart 3: Students' comment on difficult level of the test

10 Chart 4: The result of the test

11 Chart 5: The purpose of the test

Trang 5

Chapter 1: Introduction

1 1 Rationale

These days, the need of learning English has become greater and greater In ourcountry Viet Nam, having recognized the importance of it, the Ministry of Education andTraining (MOET) has recently decided that English is a compulsory subject in most highschools and universities This decision requires both teachers and students to alter theirways of teaching and learning In addition, testing is one effective way to evaluate teachingand learning They are so closely related Testing validates the teaching-learning processwhile teaching and learning provides a great source of language materials for testing toexploit And testing is a concerned matter to all teachers

During the teaching time at Phuong Dong University, the writer heard both teachersand students here complaining that the English test did not often faithfully reflect theteaching and learning process or in other words, the test did not reflect what the studentslearnt and what the teachers taught What was tested was not really taught and the testmeasures neither the achievement of the course objectives nor the expected skills andknowledge of students It is shared by some test researchers as Brown (1994: 373) andHughes (1989:1) on recent language testing:

“A great deal of language testing is of very poor quality Too often language

testing has a harmful effect on teaching and learning and too often they fail to measure accurately whatever it is they are intended to measure.”

Another reason for the selection of this research topic lays in the fact that languagetesting at Phuong Dong University has not been paid enough attention to Classroomlanguage tests were often written in a hurry because the teachers here could not find time tothink carefully and plan the test Sometimes, they did not have a clear idea of what theywere testing students for and why They were busy mixing the number of various questiontypes and from that many students got low marks

Due to its close relationship with language teaching and learning, testing deservesproper attention from teachers and students in order for a positive backwash on theteachers‟ teaching, students‟ satisfaction and encouragement in their study In order todesign a good test to have exact, fair and effective evaluation of students‟ knowledge and

Trang 6

performance of English, teachers are supposed to have good knowledge of test writingtechniques and testing theories.

Because of all above-mentioned reasons, the writer is encouraged to undertake thisstudy entitled: “Content validity of the current English achievement test for second-year-non-major students of English at Phuong Dong University” with aims at finding out thestrengths and weaknesses of this test in terms of the content validity and some, if any,suggested solutions for the improvement of it

1.2 Scope of study

The scope of this thesis is limited to a research on evaluating the final achievementtest in terms of its content validity by comparing the objectives, the syllabus and thetextbook allocation with the test contents The study provides investigated and analyzeddata of the currently used test and proposes practical suggestions on the improvements ofthis test

Due to the limitations of time, ability and conditions, it is impossible for the writer

to cover all the tests Only some suggestions for the improvements of the test are presented

1.3 Aims of study

The study aims at checking the content validity of the final achievement test forsecond-year-non-major students at Phuong Dong University It places high emphasis onanalyzing the contents of the final achievement test The specific aims of this research are:

- To find out the strengths and weaknesses of the currently used test with reference to the content validity

- To suggest some improvements for the test

Trang 7

in terms of its content validity Basing on what students had learnt in their first semesterand the contents of this test, the writer would examine its content validity.

In addition, qualitative methodologies involving data collected through surveyquestionnaires were employed Two sets of questionnaires were administered to bothEnglish teachers and students at Phuong Dong University to investigate their evaluativecomments on the content validity of the final achievement test and some suggestions for itsimprovements

1.5 Research questions

In this study, the writer tries to answer the two following questions:

Question 1: What are the strengths and weaknesses of the final achievement test with

reference to the content validity for second year non major students at Phuong DongUniversity?

Question 2: What are some suggested solutions for the improvements of the test?

1.6 Design of study

The thesis is organized into five major chapters:

1 Chapter 1 INTRODUCTION presents such basic information as: the rationale, theaims, the methods, the research questions and the design of the study

2 Chapter 2 LITERATURE REVIEW presents a review of related literature thatprovides the theoretical basis for evaluating and building a good language test This reviewincludes background on language testing, criteria of good tests and theoretical issues on testcontent validity

3 Chapter 3 THE STUDY mentions the methods used in the research and which showsits detailed results of the surveys including the questionnaires and the analysis of the finalachievement test in order to find out its problems with reference to the content validity

provide some suggestions for the improvements of the final achievement test basing on thementioned theoretical and practical study Conclusions summarize the matters of research, itsfindings as well as its limitations

Trang 8

Chapter 2: Literature review

This chapter provides an overview of the theoretical background of the study Itincludes three main sections

2.1 Language testing

2.1.1 Definition of language testing

Testing is an important part of every teaching and learning experience and becomesone of the main aspects of methodology The issue of language testing and its significantrole has been discussed a great deal by many professionals and research worldwide.Different definitions of language testing are given out with various points of view

According to Allen (1974:313), testing as an instrument to ensure that students have

a sense of competition rather than to know how good their performance is and in which

condition a test can take place He says: “Test is a measuring device which we use when we

want to compare an individual with other individuals who belongs to the same group.”

Carroll (1986:46) stresses a psychological or educational test is a proceduredesigned to elicit certain behavior from which one can make inferences about certaincharacteristics of an individual In other words, a test is a measurement instrument designed

to elicit a particular behavior of each individual

According to Bachman (1990:20), what distinguishes a test from other types ofmeasurement is that it is designed to obtain specific sample of behavior This distinction isbelieved to be of great importance as it reflects the primary justification for the use oflanguage and has implications for how we design, develop, and use them to their best use.Thus, language tests can provide the means for more focus on the specific assure ofinterest

In the point of view of Ibe (1981:1), “a sample of behavior under the control of

specified conditions aims toward providing a basis for performing judgment.” The term a

sample of behavior used here is quite board and it means something else rather than thetraditional types of paper and pencils

Yet, Heaton (1988:5) has different opinion In his ideal, tests are considered as amean of assessing the students‟ performance and to motivate the students He looks at testswith positive eyes as many students are eager to take tests at the end of the semester to

Trang 9

know how much knowledge they have One important thing is that he points out the

relationship between testing and teaching

2.1.2 The roles of language testing

Language testing is a form of measurement It helps the teachers:

+ To assess the learner‟s achievement in a language program, for example, toevaluate the testee‟s language knowledge in relation to a given curriculum or material

which the testee has gone through in a given course

+ To assess a learner‟s proficiency in language in relation to future language use;for example, to find out if a person‟s language is good enough for him to become a tourist

guide This is the future use of the language regardless of what language programs ormaterials the testee went through

+ To diagnose a learner‟s strengths and weaknesses in a language and to attempt toexplain why the certain problems occur and what treatments could be used to tackle theseproblems

+ To classify or place the testees in the appropriate language classes

+ To measure the testee‟s aptitude for learning a language

+ To evaluate the effectiveness of a language program This is often done by usingexperimental and control classes with the same educational objectives but using differentmethods and materials to achieve these objectives, Brown (2000:5)

In another way, Bebecca.M.Valette (1977:3) comments that classroom tests playthree important roles in second language teaching program They are defining courseobjectives, stimulating student progress and evaluating class achievement

Firstly, classroom tests help us to define the course objectives Students are quick toobserve types of tests given and to study accordingly Thus, much as the teacher mayemphasize oral fluency in the classroom, if any tests are written tests the students will soonconcentrate on perfecting the skills of reading and writing

Secondly, tests help stimulating student progress As much as possible, the timegiven over to classroom testing should provide a rewarding experience The test shouldfurnish an opportunity for the students to show how well they can handle the specificelements of the target language; gone are the days when the teacher designed a test to point

Trang 10

up the students‟ ignorance or lack of application Tests should be distinctly announced inadvance to permit the students to prepare adequately If the students themselves areexpected to demonstrate their abilities, it is only proper that they should learn as soon aspossible after the test how well they did The test best fulfills its functions as a part of thelearning process if the correct performance is immediately confirmed and the errors arepointed out.

The last role of testing is evaluating class achievement Through frequent testing,the teacher can determine which aspects of the program are presenting difficulties forindividual students and for the class as a whole By analyzing the mistakes made on a giventest, the teacher can determine where to concentrate extra class drills and how best to assisteach student At the same time, testing enables the teacher to discover whether the classobjectives are being met Through tests, the teacher can evaluate the effectiveness of a newteaching method, of a different approach to a difficult pattern, or of new materials Themost familiar role of the classroom test is to furnish an objective evaluation of eachstudent‟s progress: his or her attainment of course objectives and his or her performances

in relation to the rest of the class

2.1.3 Relationship between testing and teaching- learning

In the past, teaching and testing used to separate both theoretically and practically.According to Williams (1983), a test is necessary imposition but outside the classroom, it isunpleasant one because of two main reasons The first one is that testing is concerned withcompetition rather than cooperation Thus, while classroom activities may involve pairworks and group works, such cooperation during a test is condemned as copying, and theindividual is expected to work alone If these are perfectly possible, the results of a grouptest may tell us very little about each individual in that group In the same way, testing doesnot admit cooperation between teachers and learners The teacher who helps andencourages the learners with their tasks and responds to their difficulties, in a test situation,withdraws cooperation The other reason followed from the first is that there should be awinner and loser in the test To be sure, those who close to win themselves do not feel tooupset, but those who gain little from experience, may feel conscious

Trang 11

Nowadays, a new trend and development with a remarkable emphasis on integrativeand communicative tests has brought about many innovations in English testing techniques.Most of the researchers comment that teaching and testing are so closely related As Brown

(1994) states: “Teaching and testing are so interwoven and interdependent that it is

difficult to tear them apart” Tests are constructed primarily as the devices to reinforce

learning and to motivate the students and as a means of assessing the student‟s performance

in the language In the other words, a test is an extension of classroom work, providingteachers and students with the useful information that can improve both teaching andlearning process In turn, teaching and learning provide a great source of language materialsfor testing to exploit

A good test is a valuable teaching device for some reasons Firstly, a test providesthe teachers information on how effective teaching has been It helps the teaching process

to find out if students are capable of performing behavior And from that, we can know thecharacteristics of an individual Secondly, with the aids of tests, teachers can monitor andevaluate student‟s learning and diagnose the strengths and weaknesses as they occur Lastbut not least, basing on the test results, the teachers can evaluate the effectiveness of thesyllabus as well as the method and materials they are using

However, testing has a harmful and beneficial effect on teaching and learning Forexample, if a test is regarded as important, then preparation for it can come to dominate allteaching and learning activities If the end goal is to help students to pass the test orexamination, many teachers will focus their teaching on the content of the test only So theteaching program may be distorted in many ways

2.3 Major characteristics of a good test

Before writing the test, it is very necessary to answer this question: “What are themajor characteristics of a good test?” Harrison (1983: 10) claims that there are four basiccharacteristics of all good tests They are validity, reliability, practicality anddiscrimination

Trang 12

2.3.1 Test validity

2.3.1.1 What is test validity?

Validity is one of the most important characteristics of a good test It has been acontroversial issue for a long time A recent trend in language testing discussion is toconsider validity as a unitary concept with different types of validity and it is nowconsidered as aspect of validity

Henning (1987:5) defines validity as follows:

“In general validity refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure A test is said to

be valid to the extent it measures what it is supposed to measure It follows that the term valid when used to describe a test should usually be accompanied by the proposition for any test, then may be valid for some purposes, but not for others."

A test is considered valid when it specifically measures what is supposed tomeasure A listening test with written multiple choice options may lack validity if theprinted choices are so difficult to read that the exam actually measures readingcomprehension as much as it does listening comprehension It is least valid for studentswho are much better at listening than at reading In other words, the test results areinterpreted as appropriate to the purposes of testing That is, validity can be defined as thedegree to which a test actually tests what it is intended to test For example, if the purpose

of a test is to test ability to communicate in English And this test is valid if it does actuallytest ability to communicate When considering test validity is the degree to which a testmeasures what it is supposed to measure, it has two very important aspects The first one is

a matter of degree There is a degree of validity, and some tests are more valid than theothers A second important aspect of this definition is that tests are only valid or invalid interms of their intended uses If a test is intended to test reading ability, but it also testswriting, then it may not be valid for testing reading but it may test reading and writingtogether

Validity refers to the appropriateness or correctness of the inferences anddiscussions made about individuals and groups from the test results Valid must beconsidered in terms of the correctness of a particular inference about test takers Therefore,validity is not always easy to measure

Trang 13

2.3.1.2 Types of test validity

There are many types of validity such as: face validity, content validity, constructvalidity, concurrent validity and predictive validity In this part, the writer will focus ononly two main types: face validity and content validity

2.3.1.2.1 Face validity

When mentioning face validity, we should concern with this questions: “Does the test

on the face of it appear from the learners‟ perspective to test what it is designed to test?”.Face validity is almost always perceived in terms of content If the test samples the actualcontent of what the learner has achieved or expects to achieve, then face validity will beperceived According to Arthur Hughes (1989:40), a test is said to have face validity if itlooks as if it measures what it is supposed to measure For example, a test which pretended

to measure pronunciation ability but which did not require the candidate to speak may bethought to lack face validity Candidates, teachers and education authorities may not accept

a test, which does not have face validity Face validity concerns the appeal of the test to thepopular or non-expert judgment such as the candidates, the candidates‟ families, members

of the public and it is calculated by asking other teachers to give their opinions about thetest

However, with the advent of communicative language testing, there has beenincreased emphasis on face validity It is important for communicative language test to looklike something one might do “in the real world” with language They attribute such appeals

to “real life” to face validity While opinions of students about the test are not expert, it can

be important because it is one kind of response that you can get from the people who aretaking the test If a test does not appear to be valid to the test takers, they may not try theirbest, so the perceptions of non-experts are useful

In other words, the face validity affects the response validity of the test This criticalview of face validity provides a useful method for language test validation

Face validity can provide not only a quick and reasonable guide but also a balance to

a great concern with statistical analysis Moreover, students‟ motivation is maintained if atest has good face validity On the other hand, the test appears to have little of relevance inthe eyes of the students, it will clearly lack face validity It is possible for a test to includeall the components of a particular teaching program being followed and yet at the same

Trang 14

time lack face validity The concept of face validity is far from now in language testing butthe emphasis now placed on it is relatively new In the past, many test writers regarded facevalidity simply as a public relation exercise Today, most designers of communicative testsregard face validity as the most important character of all types of test validity.

2.3.1.2.2 Content validity

2.3.1.2.2.1 What is content validity?

Among several kinds of validity, the simplest and most important one to thelanguage teachers is content validity

In Read‟s opinions (1983:6), the most relevant type of validity for classroom testing iscontent validity, which means that the contents of the test should reflect the contents and theobjectives of the syllabus that is being followed In the other words, if we want to find outstudents' progress of what they have learnt, the test should contain a representative sample ofthe items, rules, skills or functions that they are supposed to achieve Obviously, the testcontents are the main concern if content validity is to be achieved

Kerlinger (1973) defines content validity is the representative or sampling adequacy

of the content, the substance, the matter and the topics of a measuring instrument

In the same way, Harrison (1983: 11) defines content validity as:

"Content validity is concerned with what goes into the test The content of a test should be decided by considering the purpose of the assessment, and then drawing up a list known as a content specification."

According to Cyril J.Weir (1990), the purpose of content validity is to examinewhether the test is a good representation of the material that needs to be tested and toensure the defensibility and fairness of interpretation based on the test performances Itinvolves looking at empirical evidence- the hard factors emerging from data from test trials

or operational administrations and is calculated by comparing the test with its courseobjectives Last but not least, a test is said to be valid if it is relevant to the aims andpurposes of the learning areas on which it is set

The most distinction between face validity and content validity was pointed out byAlderson et al (1995: 173) as follows:

"In face validation, we do not necessary accept the judgment of others, although we respect it, and appreciate that for those people it is real and important

Trang 15

and may, therefore, influence behaviors In content validation, we gather judgments from people we are prepared to believe."

In this case, if face validity is an appeal to the lay observers who are students,administrators for example, the content validity is the opinion of the subject experts (i.e.,teachers, test makers ) as to whether a test is valid

For Kelly (1978), content validity seems as “an almost completely overlappingconcept" with construct validity And for Moller (1982: 68), “The distinction betweenconstruct and content validity in language testing is not always very marked, particularlyfor tests of general language proficiency." In these cases, particular attention must be paid

to content validity in an attempt to ensure that the sample of activities included in a test is

as representative of the target domain as possible

To sum up, the writer does in favor of Read‟s ideas, the most importantcharacteristics of a good test is content validity which means the contents of the test shouldreflect the contents and the objectives of the syllabus that is being followed

2.3.1.2.2.2 How to make the test more valid?

Firstly, in content validation, we should look at whether the test is representative ofthe skills they are trying to test It means that we should look at the content of the tests andcompare them with a statement of what the contents ought to be This involves looking atthe syllabus in the case of an achievement test, the test specifications and deciding what thetest was intended to test and whether it accomplishes what it is intended to do In the otherwords, the content validity depends on the particular course objectives In addition, the testwould have content validity only if it included a proper sample of the relevant structures.Just what the relevant structures are will depend of course upon the purposes of the test Inorder to judge whether a test has content validity or not, we need a specification of theskills or structures that it is meant to cover Such a specification should be made at a veryearly stage in test construction It is not to be expected that everything in the specificationwill always appear in the test But it will provide the test construction with the basis formaking a principled selection of elements included in the test A comparison of testspecifications and test contents is the basis for the judgments of content validity

Trang 16

However, how important is content validity? Arthur Hughes (1989) gave twoimportant things of it First, the greater a test's content validity, the more likely it is to be anaccurate measure of what it is supposed to measure A test in which major areas identified

in the specification are not represented at all is unlikely to be accurate Secondly, a test islikely to have a harmful backwash effect Areas, which are not tested, are likely to becomeareas ignored in teaching and learning Too often the content of tests is determined by what

is easy to test rather than what is important to test The best safeguard against this is towrite full test specifications and to ensure that the test content is a fair reflection of these Inthe other words, when embarking on the construction of a test, the test writer should firstdraw up a table of test specifications, describing in very clear and precise terms theparticular language skills and areas to be included in the test If the test or sub-test beingconstructed is a test of grammar, each of the grammatical areas should then be given apercentage weighting For example, the future simple tense 10%, uncountable nouns 15%,relative pronouns 10% If the test or sub-test concerns reading, the each of the reading

sub-skills should be given a weighting in a similar way For instance, deducing wordmeanings from contextual clues 20%, search-reading for specific information 30%, readingbetween the lines and inferring 12%, intensive reading comprehension 40%

According to Heaton, J.B (1982) the test writer has attempted to quantify andbalance the test components, assigning a certain value to indicate the importance of eachcomponent in relation to other components in the test In this way, the test should achievecontent validity and reflect the component skills and areas that the test writer wishes toinclude in the test

Anastasi (1982:131) defines content validity as: “essentially the systematicexamination of the test content to determine whether it covers a representative sample ofthe behavior domain to be measured.” She provided a set of useful guidelines forestablishing content validity:

1 The behavior domain to be tested must be systematically analyzed to make certainthat all major aspects are covered by the test items, and in the correct proportions

2 The domain under consideration should be fully described in advance, rather than being defined after the test has been prepared

Trang 17

3 Content validity depends on the relevance of the individual's test responses tothe behavior area under consideration, rather than on the apparent relevance of item content.

Brown (1994: 385) gives a list of necessary factors to improve the test validity:+ A careful-construct well thought out format

+ Item that is clear and uncomplicated

+ Direction that is crystal clear

+ Tasks that are familiar and relate to their course work

+ A difficulty level that is appropriate to your students

+ Test conditions that are biased for best that bring out students' best performances

In the same way, Moore (1992: 11) stressed: “Content validity is established bydetermining whether the instrument's test items correspond to the content that the studentsare supposed to learn."

Correspondingly, to evaluate the test content validity, the test items should beinspected regarding their correspondences to the teachers' stated objectives

In short, test content validity is the most important characteristic of a good test Thebasis to evaluate content validity is a comparison between the test specifications and thetest contents

2.3.2 Test reliability

Reliability is another necessary characteristic of any good test A reliable test can beused as a measuring instrument If the test is administered to the same students on differentoccasions (with no language practice work taking place among these occasions) thenproduces different results, it is not reliable So a test is said to be reliable if it can producethe same results when administering to the same students under different times

There are two types of reliability The first one refers to the ability of a test toproduce the consistent results from the same students whenever it is used namely test-retestreliability and the other type of reliability is the inter-item consistency which means that thetest should be able to measure the same thing all the time

Bachman (1990), a leading expert, describes reliability as "a quality of test score"

We can look at the hypothetical data in table 1 They present the scores obtained by 5

Trang 18

students who took a 100-item test A on a particular occasion and those that they wouldhave obtained if they had taken it a day later The most obvious thing of these is simply tohave people take the same test twice We should note the size of the differences betweenthe two scores for each student:

Table 1: Scores on test A (invented data) by Arthur Hughes (1989: 30)

obtained on the following day

Now have a look at table 2, which displays the same kind of information for a second

100-item test B again, note the difference in score for each student:

Table 2: Scores on test B (invented data) by Arthur Hughes (1989: 30)

obtained on the following day

Trang 19

items in the test, the test may rely too heavily on luck-weak candidates may score 50% or more on a short.

+ Second factor affects the test reliability is the administration of the test Ifindividual test items are too hard for everyone or too easy for everyone then they are not reliabletest items They do not differentiate between the strong and weak candidates The importantfactor in deciding reliability is whether the same test is administered to different groups underdifferent conditions or not

+ The third one is test instructions: Are the various tasks expected from the testers made clear to all candidates in the rubrics?

+ Another factor that influences on the reliability of a test is how much the test isbased on the passages and questions taken directly from a textbook and how much it is based onthe syllabus within the textbook, not the book itself An over-emphasis on

“quoting” the textbook in a test will produce results that do not reveal achievementprofessional progress of the learners in terms of reading, writing, listening, speaking,vocabulary and grammar The results will only reveal how well students have memorizedthe passages and the correct answers

+ Last but not least, one of the most important factors affecting reliability is thescoring the test Sometimes, a test can be unreliable because of the way it is marked Forexample, if an average composition is marked immediately after a very good composition,

the average composition may be given a mark that is actually below average The marker‟ssubconscious comparison of the two compositions will result in the average compositionappearing worse than it really is However, if the same average composition is markedimmediately after a very poor composition, then it may appear above average and beawarded a higher mark than it deserves In addition, different markers may award differentmarks to the same composition; for example, some of the markers may be very lenient andothers may be unfairly strict

To sum up, reliability is an undeniable important characteristic of a good test If thetest result is not reliable, the assessment of it is not reliable either In order to make the testmore reliable, it is important for the testers to consider many influential factors such as: testadministration involving scoring, timing, testing conditions, observation or control of doing

Trang 20

the test; the size of the test; test instructions and scoring methods right from the outsets ofthe test constructing process.

2.3.3 Relationship between reliability and validity

Reliability and validity are essential measurement qualities of a good test They arequalities that provide major justification for using test scores and numbers as the basis formaking inferences or decisions (Bachman et al (1996: 19))

They have a complicated relationship On the one hand, it is possible for a test to bereliable without being valid That is, a test can give the same result time after time but doesnot measure what it was intended to measure On the other hand, if the test is not reliable, itcannot be valid at all To be valid, according to Hughes (1988:42), a test must provideconsistently accurate measurements It must therefore be reliable A reliable test, however,may not be valid at all For example, in a writing test, the candidates are requires totranslate a text of 500 words into their own language This could well be a reliable test but

it is unlikely to be a valid test of writing In our efforts to make test reliable, we must bewary of reducing their validity

The problem is that while one can have test reliability without test validity, a testcan only be valid if it is also reliable There is thus sometimes said to be a reliability-validity tension This tension exists in the sense that it is sometimes essential to sacrifice adegree of reliability in order to enhance validity However, if validity is lost to increasereliability, we finish up with a test which is a reliable measure of something rather thanwhat we wish to measure The two concepts are: if a choice has to be made “validity afterall, is the more important one”, (Guilford (1965:481))

Moller (1981:67) comments that while it is understood that a valid test must bereliable, it would seem that in such a highly complex and personal behavior as using alanguage rather than one‟s mother tongue, validity could be claimed for measures thatmight have a lower than normally acceptable level of reliability Reliability is something

we should always try to achieve in our tests Test reliability can not be ignored without aharmful affect on the validity of the instrument

Trang 21

Therefore, test validity and reliability are the two chief criteria for evaluating anytests And the ideal test should be both valid and reliable However, the greater thereliability of a test is, the less validity it has.

2.3.4 Practicality

In addition to reliability and validity, practicality plays an important role in decidingwhether a test is good or not The main question of practicality is administrative A testmust be carefully organized well in advance: How long will the test take? What specialarrangements have to be made? (For example, what happens to the rest of the class whileindividual speaking tests take place)? Is any equipment needed (tape recorder, language lab,overhead projector)? How is marking the work handled? How are tests stored among thesettings of tests? All of these questions are practical since they help ensure the success of atest and testing, (Heaton: 1988) Therefore, practicality includes financial limitations, timecontains, ease of administration, scoring and interpretation

According to Brown (1994), if a test which is prohibitively expensive, takes astudent ten hours to complete and takes a few minutes for students to do but several hoursfor teachers to evaluate, is impractical

Another important aspect of practicality we have to concern is that the test should

have “instructional value”, Oller (1979) The test should enhance the delivery of the

instructions into the students The teachers need to make clear and useful interpretation forstudents to understand and learn better The instructions of the test should be clear and easyfor the students to know what they have to do From knowing what to do, they can gethigher marks In contrast, a too complicated or too difficult test may not be practical to theteachers and the students

To sum up, in order to be useful and efficient, tests should be as economical aspossible in terms of time and cost In addition, the test‟s instructions should be well-writtenfor students to know what they ought to do

2.3.5 Discrimination

Discrimination is another important factor that test designers have to concern when writing a test Heaton (1988) defines discrimination of a test is the capacity to discriminate

Trang 22

the different students and to reflect the differences in the performances of the individual ingroups The test can not realize discrimination if the test items is either too easy or toodifficult Therefore, the test items must be written in ranging from “extremely easy items”

to “extremely difficult items” In the other way, Harrison (1994: 14) defines discriminationas: “The extent to which a test separates the students from each other." Discrimination tells

us whether the test can differentiate between the more proficient students and the lessproficient ones The extent of the need depends on the purposes of the test For example, if

a placement test is able to efficiently discriminate among students, it will be much easier todivide students into the suitable groups In many classroom tests, the teacher will be muchmore concerned with finding out how well the students have mastered the syllabus so theteachers will hope higher results from the students

Trang 23

Chapter 3: The study

3.1 English learning, teaching and testing at Phuong Dong University3.1.1 The students

At Phuong Dong University, students come from different parts of the country.Most of these students commonly did not spend much time learning English at high school

as they had to devote most of their time to learning different subjects, for example:mathematics, physics, chemistry, drawing… in order to pass the u n i v e r s it y entranceexamination Thus, they are real beginners of English when entering university, and ofdifferent language proficiency levels

3.1.2 The teachers

English teachers working with 2nd

year students are at different ages Half of themare at the age from 45 to 55 and the rest from 25 to 38 years old They graduated from threeeducation institutions: Ha Noi National University, Ha Noi Foreign Language Universityand Phuong Dong University

3.1.3 The course book: “New Headway Elementary- The third edition”

The book “New Headway-Elementary- The third edition” has been used as thetextbook to teach the second year students at Phuong Dong University This material isdesigned for students at elementary level

It consists of 14 units, designed in a harmonious combination with powerful

lexical to increase learners‟ vocabulary and develop awareness of the English culture

Each unit is divided into three parts, and each part lays a focus on grammar,

function or vocabulary Every unit provides students with opportunities to learn and

develop their knowledge in categories of grammar, vocabulary, communication skills and

pronunciation through practice activities of listening, speaking, reading and writing (see

Appendix 1- page I)

Trang 24

3.1.4 Syllabus and its objectives

For the first semester of second year students, seven units from unit 7 to 14 aretaught in 45 periods (50 minutes per period) and delivered within about 9 weeks Studentsstill work on four areas of grammar, vo ca bu lar y, communication skills, andpronunciation a nd t he y ha ve c ha nc e o f dealing with different topics The aims of thecourse are to help increase students‟ basic knowledge of vocabulary, grammar and alsopractice of four basic language skills such as listening, speaking, reading and writing insocial situations

3.1.5 The final achievement test for second year non major students

The final achievement test consists of the following parts: types, items, tasks

Part 1 Rewrite the sentences 5 Rewrite sentences so that there is no 2

change of meaningsPart 2 Guided sentence 5 Use the following sets of words to 2

Part 3 Correct mistakes 5 Find and correct one mistake in 2

each sentencePart 4 Write a paragraph 1 Write a paragraph of 100-120 words 4

about your capital city

Table 3: The components of the final achievement test

Looking at the marking criteria for the test, we can see that it has confused many teachersand worried students It is very difficult for teachers to mark part 4 as there are no detailedmarking criteria such as: language, content, grammar, etc…

3.2 Research method

In this study, both quantitative and qualitative methods are used They are surveyquestionnaires and document analysis However, with the scope and purposes of this study,document analysis is taken as the main method to find out the strengths and the weaknesses

of the final achievement test regarding to content validity In addition, surveyquestionnaires help the writer collect more information of both teachers and students about

Trang 25

this test Obviously, although each method helps to collect and confirm different kinds ofdata, it has its own unavoidable shortcomings.

3.2.1 The survey questionnaires

There are many ways to collect data and survey questionnaire is one effective waybecause of some reasons Firstly, they can be used to gather information about teachers‟and students‟ attitudes, views and thoughts to the content validity of the end-of-term 1 test.Secondly, there are no confrontations between the persons who do the surveys and theinformants because it is often a list of questions Therefore, the informants can feel free toexpress their thoughts Thirdly, most of the answers for the questions are closed ones so it

is easier for the writer to collect and analyze the data Finally, it can gather a large numbers

of responses

3.2.2 Document analysis

Besides survey questionnaires, document analysis is considered as the main method

to evaluate the final achievement test in terms of the content validity

Firstly, the writer will analyze the “The New Headway- Elementary- The thirdedition” to find out what the teachers have to teach, what the students ought to learn.Because the purpose of this study is investigating into the content validity of the finalachievement test for second year students at Phuong Dong University, analyzing this test isone effective way to get this purpose Basing on the theories about testing, designing a testand characteristics of a good test, the writer will analyze this test by comparing the courseobjectives and what the students had learnt with the test contents in order to find out thestrengths and weaknesses of the test and then give some suggested solutions for itsimprovements

Last but not least, the writer will analyze the data of survey questionnaires fromboth teachers and students to see how their comments about this test are

Summary

Evidently, it is important to use several methodologies to compare the resultsreceived and to ensure the authenticity of the results Of course, the informants‟ realfeelings and full views are expressed Besides, document analysis is a rich source of the

Trang 26

information as the writer captures what the teachers and students, in fact, do Therefore,using document analysis in combination with survey questionnaires helps the writer givethe objective and reliable results.

3.3 Data analysis

In this part, basing on the final achievement test, the writer will compare thecontent of this test with what the students had learnt in the first semester in order to find outthe strengths and weaknesses of the currently used test with reference to the contentvalidity In addition, the students and teacher‟s opinions through survey questionnaires isalso analyzed in order to evaluate the test content validity more in depth

3.3.1 Analysis of the final achievement test

It is necessary to examine the layout as well as the content of the final achievementtest for second year non major students The test includes 4 main parts which can berepresented as follows:

Phuong Dong University

Foreign languages department

-The final achievement test - No1

Time allowed: 60 minutes

Marker’s signature2:

I. Rewrite each sentence, beginning as shown, so that the meaning stays the same

1 My watch is cheaper than yours

Trang 27

4 No one is more intelligent than Anna in her class.

Anna is………

5 Do you want some fish and chips?

Would……….?

II. Guided sentence building: use the following sets of words and phrases to write complete sentences

1 I‟d/ chicken/and chips/main course

III Find and correct ONE mistake in each of the following sentences 1 My brother can play

badminton when he was five years old

Trang 28

A and some

Much and many

Unit 10 Comparative and superlatives Question1, 4 (part 1)

Question 5 (part 2)

Trang 29

Unit 11 Present continuous Question 2(part 3)

Whose is it?

Possessive pronoun

Unit 12 Going to

Infinitive of purpose

Question 5 (part 3)Unit 14 Present perfect

Question 3 (part 2)Present perfect and past simple

Question 4 (part 3)

Table 4: What students had been taught and what they had been checked in part

I, II, III of the test

And what students have been taught in writing part

Unit 7 Describing a holiday

Unit 8 Writing about a friend

Unit 9 Filling in forms

Unit 10 Describing a place

Unit 11 Describing people

Unit 12 Writing a postcard

Unit 13 Writing a story

Unit 14 Writing an email

Table 5: What students had been taught and checked in the writing part.

When analyzing the content of the test, you can see that the test is quite sufficientwith clear instructions and format There are no new words and new grammar structures tostudents All of them have been taught in the semester However, there are some problemshere When looking at the charts above, it is clear that some grammars have not beenchecked in the test, for example, grammar part in Unit 8 (negative form of past simple),

Unit 11 (going to) In the writing part, this topic was closely related to what the students

Định dạng
Số trang	59
Dung lượng	118,91 KB