During the teaching time at Phuong Dong University, the writer heard both teachers and students here complaining that the English test did not often faithfully reflect the teaching and l
Trang 1FALCUTY OF POST-GRADUATE STUDIES
Trường Đại học dân lập Phương Đông)
M.A MINOR THESIS
Field: ENGLISH TEACHING METHODOLOGY Code: 60-14-10
Course: 18 (2009-2011) Supervisor: M.A Kim Van Tat
HA NOI- SEPTEMBER 2011
Trang 2TABLE OF CONTENTS
Page
Acknowledgement……… i
Abstract……… ii
List of tables and figures………iii
The table of contents……… iv
Chapter 1: Introduction 1.1 Rationale……… 1
1.2 Scope of study……….2
1.3 Aims of study……… 2
1.4 Methods of study……….2
1.5 Research questions……… 3
1.6 Design of study………3
Chapter 2: Literature review 2.1 Language testing 2.1.1 Definition of language testing……… 4
2.1.2 The roles of language testing……… 5
2.1.3 Relationship between testing and teaching- learning……… 6
2.2 Major Characteristics of a good test 2.2.1 Test validity……….8
2.2.1.1 What is test validity? 8
2.2.1.2 Types of test validity……… 9
2.2.1.2 Face validity……….9
2.2.1.2 Content validity……… 10
2.2.1.2.1 What is content validity? 10
2.2.1.2 2 How to make the test more valid? 11
2.2.2 Test reliability……… 13
Trang 32.2.3 Relationship between reliability and validity……… 16
2.2.4 Practicality………17
2.2.5 Discrimination………17
Chapter 3: The study 3.1 English learning, teaching and testing at Phuong Dong University 3.1.1 The students……… 19
3.1.2 The teachers……….……….19
3.1.3 The course book “New Headway Elementary- The third edition” 19
3.1.4 Syllabus and its objectives……… 20
3.1.5 The final achievement test………20
3.2 Research method………20
3.2.1 The survey questionnaires………21
3.2.2 Document analysis……… 21
3.3 Data analysis……… 22
3.3.1 Analysis of the final achievement test……… 22
3.3.2 Analysis of the survey questionnaire for students………26
3.3.3 Analysis of the survey questionnaire for teachers………30
3.4 Results……….32
Chapter 4: Recommendations and conclusions 4.1 Recommendations……… 34
4.2 Conclusion ……… ……… 43
4.3 Limitations……….……… 43
References……… 45 Appendixes
Appendix 1: The content of the course book……… I Appendix 2: Survey questionnaires for students………… IV Appendix 3: Survey questionnaires for teachers………V Appendix 4: Answer key for reading task ……….VII Appendix 5: Answer key for the new final achievement test.VIII
Trang 4LIST OF TABLES AND CHARTS
1 Table 1: Scores on test A (invented data) by Arthur Hughes
2 Table 1: Scores on test B (invented data) by Arthur Hughes
3 Table 3: The components of the final achievement test
4 Table 4: What students had been taught and what they had been checked in part I,
II, III of the test
5 Table 5: What students had been taught and checked in the writing part
6 Table 6: Paper specification grids for the final achievement test
7 Chart 1: Students’ comment on validity of the test
8 Chart 2: Students' comment on time allowance of the test
9 Chart 3: Students' comment on difficult level of the test
10 Chart 4: The result of the test
11 Chart 5: The purpose of the test
Trang 5Chapter 1: Introduction
1 1 Rationale
These days, the need of learning English has become greater and greater In our country Viet Nam, having recognized the importance of it, the Ministry of Education and Training (MOET) has recently decided that English is a compulsory subject in most high schools and universities This decision requires both teachers and students to alter their ways of teaching and learning In addition, testing is one effective way to evaluate teaching and learning They are so closely related Testing validates the teaching-learning process while teaching and learning provides a great source of language materials for testing to exploit And testing is a concerned matter to all teachers
During the teaching time at Phuong Dong University, the writer heard both teachers and students here complaining that the English test did not often faithfully reflect the teaching and learning process or in other words, the test did not reflect what the students learnt and what the teachers taught What was tested was not really taught and the test measures neither the achievement of the course objectives nor the expected skills and knowledge of students It is shared by some test researchers as Brown (1994: 373) and Hughes (1989:1) on recent language testing:
“A great deal of language testing is of very poor quality Too often language
testing has a harmful effect on teaching and learning and too often they fail to measure accurately whatever it is they are intended to measure.”
Another reason for the selection of this research topic lays in the fact that language testing at Phuong Dong University has not been paid enough attention to Classroom language tests were often written in a hurry because the teachers here could not find time to think carefully and plan the test Sometimes, they did not have a clear idea of what they were testing students for and why They were busy mixing the number of various question types and from that many students got low marks
Due to its close relationship with language teaching and learning, testing deserves proper attention from teachers and students in order for a positive backwash on the teachers‟ teaching, students‟ satisfaction and encouragement in their study In order to design a good test to have exact, fair and effective evaluation of students‟ knowledge and
Trang 6performance of English, teachers are supposed to have good knowledge of test writing techniques and testing theories
Because of all above-mentioned reasons, the writer is encouraged to undertake this study entitled: “Content validity of the current English achievement test for second-year-non-major students of English at Phuong Dong University” with aims at finding out the strengths and weaknesses of this test in terms of the content validity and some, if any, suggested solutions for the improvement of it
1.2 Scope of study
The scope of this thesis is limited to a research on evaluating the final achievement test in terms of its content validity by comparing the objectives, the syllabus and the textbook allocation with the test contents The study provides investigated and analyzed data of the currently used test and proposes practical suggestions on the improvements of this test
Due to the limitations of time, ability and conditions, it is impossible for the writer
to cover all the tests Only some suggestions for the improvements of the test are presented
1.3 Aims of study
The study aims at checking the content validity of the final achievement test for second-year-non-major students at Phuong Dong University It places high emphasis on analyzing the contents of the final achievement test
The specific aims of this research are:
- To find out the strengths and weaknesses of the currently used test with reference to the content validity
- To suggest some improvements for the test
Trang 7in terms of its content validity Basing on what students had learnt in their first semester and the contents of this test, the writer would examine its content validity
In addition, qualitative methodologies involving data collected through survey questionnaires were employed Two sets of questionnaires were administered to both English teachers and students at Phuong Dong University to investigate their evaluative comments on the content validity of the final achievement test and some suggestions for its improvements
1.5 Research questions
In this study, the writer tries to answer the two following questions:
Question 1: What are the strengths and weaknesses of the final achievement test with
reference to the content validity for second year non major students at Phuong Dong University?
Question 2: What are some suggested solutions for the improvements of the test?
1.6 Design of study
The thesis is organized into five major chapters:
1 Chapter 1 INTRODUCTION presents such basic information as: the rationale, the aims, the methods, the research questions and the design of the study
2 Chapter 2 LITERATURE REVIEW presents a review of related literature that provides the theoretical basis for evaluating and building a good language test This review includes background on language testing, criteria of good tests and theoretical issues on test content validity
3 Chapter 3 THE STUDY mentions the methods used in the research and which shows its detailed results of the surveys including the questionnaires and the analysis of the final achievement test in order to find out its problems with reference to the content validity
4 Chapter 4 RECOMMENDATIONS AND CONCLUSIONS Recommendations provide some suggestions for the improvements of the final achievement test basing on the mentioned theoretical and practical study Conclusions summarize the matters of research, its findings as well as its limitations
Trang 8Chapter 2: Literature review
This chapter provides an overview of the theoretical background of the study It includes three main sections
2.1 Language testing
2.1.1 Definition of language testing
Testing is an important part of every teaching and learning experience and becomes one of the main aspects of methodology The issue of language testing and its significant role has been discussed a great deal by many professionals and research worldwide Different definitions of language testing are given out with various points of view
According to Allen (1974:313), testing as an instrument to ensure that students have
a sense of competition rather than to know how good their performance is and in which
condition a test can take place He says: “Test is a measuring device which we use when we
want to compare an individual with other individuals who belongs to the same group.”
Carroll (1986:46) stresses a psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual In other words, a test is a measurement instrument designed
to elicit a particular behavior of each individual
According to Bachman (1990:20), what distinguishes a test from other types of measurement is that it is designed to obtain specific sample of behavior This distinction is believed to be of great importance as it reflects the primary justification for the use of language and has implications for how we design, develop, and use them to their best use Thus, language tests can provide the means for more focus on the specific assure of interest
In the point of view of Ibe (1981:1), “a sample of behavior under the control of
specified conditions aims toward providing a basis for performing judgment.” The term a
sample of behavior used here is quite board and it means something else rather than the traditional types of paper and pencils
Yet, Heaton (1988:5) has different opinion In his ideal, tests are considered as a mean of assessing the students‟ performance and to motivate the students He looks at tests with positive eyes as many students are eager to take tests at the end of the semester to
Trang 9know how much knowledge they have One important thing is that he points out the relationship between testing and teaching
2.1.2 The roles of language testing
Language testing is a form of measurement It helps the teachers:
+ To assess the learner‟s achievement in a language program, for example, to evaluate the testee‟s language knowledge in relation to a given curriculum or material which the testee has gone through in a given course
+ To assess a learner‟s proficiency in language in relation to future language use; for example, to find out if a person‟s language is good enough for him to become a tourist guide This is the future use of the language regardless of what language programs or materials the testee went through
+ To diagnose a learner‟s strengths and weaknesses in a language and to attempt to explain why the certain problems occur and what treatments could be used to tackle these problems
+ To classify or place the testees in the appropriate language classes
+ To measure the testee‟s aptitude for learning a language
+ To evaluate the effectiveness of a language program This is often done by using experimental and control classes with the same educational objectives but using different methods and materials to achieve these objectives, Brown (2000:5)
In another way, Bebecca.M.Valette (1977:3) comments that classroom tests play three important roles in second language teaching program They are defining course objectives, stimulating student progress and evaluating class achievement
Firstly, classroom tests help us to define the course objectives Students are quick to observe types of tests given and to study accordingly Thus, much as the teacher may emphasize oral fluency in the classroom, if any tests are written tests the students will soon concentrate on perfecting the skills of reading and writing
Secondly, tests help stimulating student progress As much as possible, the time given over to classroom testing should provide a rewarding experience The test should furnish an opportunity for the students to show how well they can handle the specific elements of the target language; gone are the days when the teacher designed a test to point
Trang 10up the students‟ ignorance or lack of application Tests should be distinctly announced in advance to permit the students to prepare adequately If the students themselves are expected to demonstrate their abilities, it is only proper that they should learn as soon as possible after the test how well they did The test best fulfills its functions as a part of the learning process if the correct performance is immediately confirmed and the errors are pointed out
The last role of testing is evaluating class achievement Through frequent testing, the teacher can determine which aspects of the program are presenting difficulties for individual students and for the class as a whole By analyzing the mistakes made on a given test, the teacher can determine where to concentrate extra class drills and how best to assist each student At the same time, testing enables the teacher to discover whether the class objectives are being met Through tests, the teacher can evaluate the effectiveness of a new teaching method, of a different approach to a difficult pattern, or of new materials The most familiar role of the classroom test is to furnish an objective evaluation of each student‟s progress: his or her attainment of course objectives and his or her performances in relation to the rest of the class
2.1.3 Relationship between testing and teaching- learning
In the past, teaching and testing used to separate both theoretically and practically According to Williams (1983), a test is necessary imposition but outside the classroom, it is unpleasant one because of two main reasons The first one is that testing is concerned with competition rather than cooperation Thus, while classroom activities may involve pair works and group works, such cooperation during a test is condemned as copying, and the individual is expected to work alone If these are perfectly possible, the results of a group test may tell us very little about each individual in that group In the same way, testing does not admit cooperation between teachers and learners The teacher who helps and encourages the learners with their tasks and responds to their difficulties, in a test situation, withdraws cooperation The other reason followed from the first is that there should be a winner and loser in the test To be sure, those who close to win themselves do not feel too upset, but those who gain little from experience, may feel conscious
Trang 11Nowadays, a new trend and development with a remarkable emphasis on integrative and communicative tests has brought about many innovations in English testing techniques Most of the researchers comment that teaching and testing are so closely related As Brown
(1994) states: “Teaching and testing are so interwoven and interdependent that it is difficult
to tear them apart” Tests are constructed primarily as the devices to reinforce learning and
to motivate the students and as a means of assessing the student‟s performance in the language In the other words, a test is an extension of classroom work, providing teachers and students with the useful information that can improve both teaching and learning process In turn, teaching and learning provide a great source of language materials for testing to exploit
A good test is a valuable teaching device for some reasons Firstly, a test provides the teachers information on how effective teaching has been It helps the teaching process
to find out if students are capable of performing behavior And from that, we can know the characteristics of an individual Secondly, with the aids of tests, teachers can monitor and evaluate student‟s learning and diagnose the strengths and weaknesses as they occur Last but not least, basing on the test results, the teachers can evaluate the effectiveness of the syllabus as well as the method and materials they are using
However, testing has a harmful and beneficial effect on teaching and learning For example, if a test is regarded as important, then preparation for it can come to dominate all teaching and learning activities If the end goal is to help students to pass the test or examination, many teachers will focus their teaching on the content of the test only So the teaching program may be distorted in many ways
2.3 Major characteristics of a good test
Before writing the test, it is very necessary to answer this question: “What are the major characteristics of a good test?” Harrison (1983: 10) claims that there are four basic characteristics of all good tests They are validity, reliability, practicality and discrimination
Trang 122.3.1 Test validity
2.3.1.1 What is test validity?
Validity is one of the most important characteristics of a good test It has been a controversial issue for a long time A recent trend in language testing discussion is to consider validity as a unitary concept with different types of validity and it is now considered as aspect of validity
Henning (1987:5) defines validity as follows:
“In general validity refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure A test is said to
be valid to the extent it measures what it is supposed to measure It follows that the term valid when used to describe a test should usually be accompanied by the proposition for any test, then may be valid for some purposes, but not for others."
A test is considered valid when it specifically measures what is supposed to measure A listening test with written multiple choice options may lack validity if the printed choices are so difficult to read that the exam actually measures reading comprehension as much as it does listening comprehension It is least valid for students who are much better at listening than at reading In other words, the test results are interpreted as appropriate to the purposes of testing That is, validity can be defined as the degree to which a test actually tests what it is intended to test For example, if the purpose
of a test is to test ability to communicate in English And this test is valid if it does actually test ability to communicate When considering test validity is the degree to which a test measures what it is supposed to measure, it has two very important aspects The first one is
a matter of degree There is a degree of validity, and some tests are more valid than the others A second important aspect of this definition is that tests are only valid or invalid in terms of their intended uses If a test is intended to test reading ability, but it also tests writing, then it may not be valid for testing reading but it may test reading and writing together
Validity refers to the appropriateness or correctness of the inferences and discussions made about individuals and groups from the test results Valid must be considered in terms of the correctness of a particular inference about test takers Therefore, validity is not always easy to measure
Trang 132.3.1.2 Types of test validity
There are many types of validity such as: face validity, content validity, construct validity, concurrent validity and predictive validity In this part, the writer will focus on only two main types: face validity and content validity
2.3.1.2.1 Face validity
When mentioning face validity, we should concern with this questions: “Does the test
on the face of it appear from the learners‟ perspective to test what it is designed to test?” Face validity is almost always perceived in terms of content If the test samples the actual content of what the learner has achieved or expects to achieve, then face validity will be perceived According to Arthur Hughes (1989:40), a test is said to have face validity if it looks as if it measures what it is supposed to measure For example, a test which pretended
to measure pronunciation ability but which did not require the candidate to speak may be thought to lack face validity Candidates, teachers and education authorities may not accept
a test, which does not have face validity Face validity concerns the appeal of the test to the popular or non-expert judgment such as the candidates, the candidates‟ families, members
of the public and it is calculated by asking other teachers to give their opinions about the test
However, with the advent of communicative language testing, there has been increased emphasis on face validity It is important for communicative language test to look like something one might do “in the real world” with language They attribute such appeals
to “real life” to face validity While opinions of students about the test are not expert, it can
be important because it is one kind of response that you can get from the people who are taking the test If a test does not appear to be valid to the test takers, they may not try their best, so the perceptions of non-experts are useful
In other words, the face validity affects the response validity of the test This critical view of face validity provides a useful method for language test validation
Face validity can provide not only a quick and reasonable guide but also a balance to
a great concern with statistical analysis Moreover, students‟ motivation is maintained if a test has good face validity On the other hand, the test appears to have little of relevance in the eyes of the students, it will clearly lack face validity It is possible for a test to include all the components of a particular teaching program being followed and yet at the same
Trang 14time lack face validity The concept of face validity is far from now in language testing but
the emphasis now placed on it is relatively new In the past, many test writers regarded face
validity simply as a public relation exercise Today, most designers of communicative tests
regard face validity as the most important character of all types of test validity
2.3.1.2.2 Content validity
2.3.1.2.2.1 What is content validity?
Among several kinds of validity, the simplest and most important one to the
language teachers is content validity
In Read‟s opinions (1983:6), the most relevant type of validity for classroom testing
is content validity, which means that the contents of the test should reflect the contents and
the objectives of the syllabus that is being followed In the other words, if we want to find
out students' progress of what they have learnt, the test should contain a representative
sample of the items, rules, skills or functions that they are supposed to achieve Obviously,
the test contents are the main concern if content validity is to be achieved
Kerlinger (1973) defines content validity is the representative or sampling adequacy
of the content, the substance, the matter and the topics of a measuring instrument
In the same way, Harrison (1983: 11) defines content validity as:
"Content validity is concerned with what goes into the test The content of a test should be decided by considering the purpose of the
assessment, and then drawing up a list known as a content specification."
According to Cyril J.Weir (1990), the purpose of content validity is to examine
whether the test is a good representation of the material that needs to be tested and to
ensure the defensibility and fairness of interpretation based on the test performances It
involves looking at empirical evidence- the hard factors emerging from data from test trials
or operational administrations and is calculated by comparing the test with its course
objectives Last but not least, a test is said to be valid if it is relevant to the aims and
purposes of the learning areas on which it is set
The most distinction between face validity and content validity was pointed out by
Alderson et al (1995: 173) as follows:
"In face validation, we do not necessary accept the judgment of others, although we respect it, and appreciate that for those people it is real and important
Trang 15and may, therefore, influence behaviors In content validation, we gather judgments from people we are prepared to believe."
In this case, if face validity is an appeal to the lay observers who are students, administrators for example, the content validity is the opinion of the subject experts (i.e., teachers, test makers ) as to whether a test is valid
For Kelly (1978), content validity seems as “an almost completely overlapping concept" with construct validity And for Moller (1982: 68), “The distinction between construct and content validity in language testing is not always very marked, particularly for tests of general language proficiency." In these cases, particular attention must be paid
to content validity in an attempt to ensure that the sample of activities included in a test is
as representative of the target domain as possible
To sum up, the writer does in favor of Read‟s ideas, the most important characteristics of a good test is content validity which means the contents of the test should reflect the contents and the objectives of the syllabus that is being followed
2.3.1.2.2.2 How to make the test more valid?
Firstly, in content validation, we should look at whether the test is representative of the skills they are trying to test It means that we should look at the content of the tests and compare them with a statement of what the contents ought to be This involves looking at the syllabus in the case of an achievement test, the test specifications and deciding what the test was intended to test and whether it accomplishes what it is intended to do In the other words, the content validity depends on the particular course objectives In addition, the test would have content validity only if it included a proper sample of the relevant structures Just what the relevant structures are will depend of course upon the purposes of the test In order to judge whether a test has content validity or not, we need a specification of the skills or structures that it is meant to cover Such a specification should be made at a very early stage in test construction It is not to be expected that everything in the specification will always appear in the test But it will provide the test construction with the basis for making a principled selection of elements included in the test A comparison of test specifications and test contents is the basis for the judgments of content validity
Trang 16However, how important is content validity? Arthur Hughes (1989) gave two important things of it First, the greater a test's content validity, the more likely it is to be an accurate measure of what it is supposed to measure A test in which major areas identified
in the specification are not represented at all is unlikely to be accurate Secondly, a test is likely to have a harmful backwash effect Areas, which are not tested, are likely to become areas ignored in teaching and learning Too often the content of tests is determined by what
is easy to test rather than what is important to test The best safeguard against this is to write full test specifications and to ensure that the test content is a fair reflection of these In the other words, when embarking on the construction of a test, the test writer should first draw up a table of test specifications, describing in very clear and precise terms the particular language skills and areas to be included in the test If the test or sub-test being constructed is a test of grammar, each of the grammatical areas should then be given a percentage weighting For example, the future simple tense 10%, uncountable nouns 15%, relative pronouns 10% If the test or sub-test concerns reading, the each of the reading sub-skills should be given a weighting in a similar way For instance, deducing word meanings from contextual clues 20%, search-reading for specific information 30%, reading between the lines and inferring 12%, intensive reading comprehension 40%
According to Heaton, J.B (1982) the test writer has attempted to quantify and balance the test components, assigning a certain value to indicate the importance of each component in relation to other components in the test In this way, the test should achieve content validity and reflect the component skills and areas that the test writer wishes to include in the test
Anastasi (1982:131) defines content validity as: “essentially the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured.” She provided a set of useful guidelines for establishing content validity:
1 The behavior domain to be tested must be systematically analyzed to make certain that all major aspects are covered by the test items, and in the correct proportions
2 The domain under consideration should be fully described in advance, rather than being defined after the test has been prepared
Trang 173 Content validity depends on the relevance of the individual's test responses to the behavior area under consideration, rather than on the apparent relevance of item content
Brown (1994: 385) gives a list of necessary factors to improve the test validity: + A careful-construct well thought out format
+ Item that is clear and uncomplicated
+ Direction that is crystal clear
+ Tasks that are familiar and relate to their course work
+ A difficulty level that is appropriate to your students
+ Test conditions that are biased for best that bring out students' best performances
In the same way, Moore (1992: 11) stressed: “Content validity is established by determining whether the instrument's test items correspond to the content that the students are supposed to learn."
Correspondingly, to evaluate the test content validity, the test items should be inspected regarding their correspondences to the teachers' stated objectives
In short, test content validity is the most important characteristic of a good test The basis to evaluate content validity is a comparison between the test specifications and the test contents
2.3.2 Test reliability
Reliability is another necessary characteristic of any good test A reliable test can be used as a measuring instrument If the test is administered to the same students on different occasions (with no language practice work taking place among these occasions) then produces different results, it is not reliable So a test is said to be reliable if it can produce the same results when administering to the same students under different times
There are two types of reliability The first one refers to the ability of a test to produce the consistent results from the same students whenever it is used namely test-retest reliability and the other type of reliability is the inter-item consistency which means that the test should be able to measure the same thing all the time
Bachman (1990), a leading expert, describes reliability as "a quality of test score"
We can look at the hypothetical data in table 1 They present the scores obtained by 5
Trang 18students who took a 100-item test A on a particular occasion and those that they would have obtained if they had taken it a day later The most obvious thing of these is simply to have people take the same test twice We should note the size of the differences between the two scores for each student:
Table 1: Scores on test A (invented data) by Arthur Hughes (1989: 30)
Students Score obtained Score which would have been
obtained on the following day
Table 2: Scores on test B (invented data) by Arthur Hughes (1989: 30)
Students Score obtained Score which would have been
obtained on the following day
Trang 19items in the test, the test may rely too heavily on luck-weak candidates may score 50% or more on a short
+ Second factor affects the test reliability is the administration of the test If individual test items are too hard for everyone or too easy for everyone then they are not reliable test items They do not differentiate between the strong and weak candidates The important factor in deciding reliability is whether the same test is administered to different groups under different conditions or not
+ The third one is test instructions: Are the various tasks expected from the testers made clear to all candidates in the rubrics?
+ Another factor that influences on the reliability of a test is how much the test is based on the passages and questions taken directly from a textbook and how much it is based on the syllabus within the textbook, not the book itself An over-emphasis on
“quoting” the textbook in a test will produce results that do not reveal achievement professional progress of the learners in terms of reading, writing, listening, speaking, vocabulary and grammar The results will only reveal how well students have memorized the passages and the correct answers
+ Last but not least, one of the most important factors affecting reliability is the scoring the test Sometimes, a test can be unreliable because of the way it is marked For example, if an average composition is marked immediately after a very good composition, the average composition may be given a mark that is actually below average The marker‟s subconscious comparison of the two compositions will result in the average composition appearing worse than it really is However, if the same average composition is marked immediately after a very poor composition, then it may appear above average and be awarded a higher mark than it deserves In addition, different markers may award different marks to the same composition; for example, some of the markers may be very lenient and others may be unfairly strict
To sum up, reliability is an undeniable important characteristic of a good test If the test result is not reliable, the assessment of it is not reliable either In order to make the test more reliable, it is important for the testers to consider many influential factors such as: test administration involving scoring, timing, testing conditions, observation or control of doing
Trang 20the test; the size of the test; test instructions and scoring methods right from the outsets of the test constructing process
2.3.3 Relationship between reliability and validity
Reliability and validity are essential measurement qualities of a good test They are qualities that provide major justification for using test scores and numbers as the basis for making inferences or decisions (Bachman et al (1996: 19))
They have a complicated relationship On the one hand, it is possible for a test to be reliable without being valid That is, a test can give the same result time after time but does not measure what it was intended to measure On the other hand, if the test is not reliable, it cannot be valid at all To be valid, according to Hughes (1988:42), a test must provide consistently accurate measurements It must therefore be reliable A reliable test, however, may not be valid at all For example, in a writing test, the candidates are requires to translate a text of 500 words into their own language This could well be a reliable test but
it is unlikely to be a valid test of writing In our efforts to make test reliable, we must be wary of reducing their validity
The problem is that while one can have test reliability without test validity, a test can only be valid if it is also reliable There is thus sometimes said to be a reliability-validity tension This tension exists in the sense that it is sometimes essential to sacrifice a degree of reliability in order to enhance validity However, if validity is lost to increase reliability, we finish up with a test which is a reliable measure of something rather than what we wish to measure The two concepts are: if a choice has to be made “validity after all, is the more important one”, (Guilford (1965:481))
Moller (1981:67) comments that while it is understood that a valid test must be reliable, it would seem that in such a highly complex and personal behavior as using a language rather than one‟s mother tongue, validity could be claimed for measures that might have a lower than normally acceptable level of reliability Reliability is something
we should always try to achieve in our tests Test reliability can not be ignored without a harmful affect on the validity of the instrument
Trang 21Therefore, test validity and reliability are the two chief criteria for evaluating any tests And the ideal test should be both valid and reliable However, the greater the reliability of a test is, the less validity it has
2.3.4 Practicality
In addition to reliability and validity, practicality plays an important role in deciding whether a test is good or not The main question of practicality is administrative A test must be carefully organized well in advance: How long will the test take? What special arrangements have to be made? (For example, what happens to the rest of the class while individual speaking tests take place)? Is any equipment needed (tape recorder, language lab, overhead projector)? How is marking the work handled? How are tests stored among the settings of tests? All of these questions are practical since they help ensure the success of a test and testing, (Heaton: 1988) Therefore, practicality includes financial limitations, time
contains, ease of administration, scoring and interpretation
According to Brown (1994), if a test which is prohibitively expensive, takes a student ten hours to complete and takes a few minutes for students to do but several hours for teachers to evaluate, is impractical
Another important aspect of practicality we have to concern is that the test should
have “instructional value”, Oller (1979) The test should enhance the delivery of the
instructions into the students The teachers need to make clear and useful interpretation for students to understand and learn better The instructions of the test should be clear and easy for the students to know what they have to do From knowing what to do, they can get higher marks In contrast, a too complicated or too difficult test may not be practical to the teachers and the students
To sum up, in order to be useful and efficient, tests should be as economical as possible in terms of time and cost In addition, the test‟s instructions should be well-written for students to know what they ought to do
2.3.5 Discrimination
Discrimination is another important factor that test designers have to concern when writing a test Heaton (1988) defines discrimination of a test is the capacity to discriminate
Trang 22the different students and to reflect the differences in the performances of the individual in groups The test can not realize discrimination if the test items is either too easy or too difficult Therefore, the test items must be written in ranging from “extremely easy items”
to “extremely difficult items” In the other way, Harrison (1994: 14) defines discrimination as: “The extent to which a test separates the students from each other." Discrimination tells
us whether the test can differentiate between the more proficient students and the less proficient ones The extent of the need depends on the purposes of the test For example, if
a placement test is able to efficiently discriminate among students, it will be much easier to divide students into the suitable groups In many classroom tests, the teacher will be much more concerned with finding out how well the students have mastered the syllabus so the teachers will hope higher results from the students
Trang 23Chapter 3: The study
3.1 English learning, teaching and testing at Phuong Dong University 3.1.1 The students
At Phuong Dong University, students come from different parts of the country Most of these students commonly did not spend much time learning English at high school as they had to devote most of their time to learning different subjects, for example: mathematics, physics, chemistry, drawing… in order to pass the u n i v e r s it y entrance examination Thus, they are real beginners of English when entering university, and of different language proficiency levels
3.1.2 The teachers
English teachers working with 2nd year students are at different ages Half of them are at the age from 45 to 55 and the rest from 25 to 38 years old They graduated from three education institutions: Ha Noi National University, Ha Noi Foreign Language University and Phuong Dong University
3.1.3 The course book: “New Headway Elementary- The third edition”
The book “New Headway-Elementary- The third edition” has been used as the textbook to teach the second year students at Phuong Dong University This material is designed for students at elementary level
It consists of 14 units, designed in a harmonious combination with
powerful lexical to increase learners‟ vocabulary and develop awareness of the English culture
Each unit is divided into three parts, and each part lays a focus on grammar, function or vocabulary Every unit provides students with opportunities to learn and develop their knowledge in categories of grammar, vocabulary, communication skills and pronunciation through practice activities of listening, speaking, reading and writing
(see Appendix 1- page I)
Trang 243.1.4 Syllabus and its objectives
For the first semester of second year students, seven units from unit 7 to 14 are taught in 45 periods (50 minutes per period) and delivered within about 9 weeks Students still work on four areas of grammar, v o c a bu la r y, communication skills, and pronunciation a n d t h e y h a v e c h a n c e o f dealing with different topics The aims of the course are to help increase students‟ basic knowledge of vocabulary, grammar and also practice of four basic language skills such as listening, speaking, reading and writing
in social situations
3.1.5 The final achievement test for second year non major students
The final achievement test consists of the following parts: types, items, tasks
Part 1 Rewrite the sentences 5 Rewrite sentences so that there is no
change of meanings
2
Part 2 Guided sentence
building
5 Use the following sets of words to
write complete sentences
2
Part 3 Correct mistakes 5 Find and correct one mistake in
each sentence
2
Part 4 Write a paragraph 1 Write a paragraph of 100-120 words
about your capital city
4
Table 3: The components of the final achievement test
Looking at the marking criteria for the test, we can see that it has confused many teachers and worried students It is very difficult for teachers to mark part 4 as there are no detailed marking criteria such as: language, content, grammar, etc…
3.2 Research method
In this study, both quantitative and qualitative methods are used They are survey questionnaires and document analysis However, with the scope and purposes of this study, document analysis is taken as the main method to find out the strengths and the weaknesses
of the final achievement test regarding to content validity In addition, survey questionnaires help the writer collect more information of both teachers and students about
Trang 25this test Obviously, although each method helps to collect and confirm different kinds of data, it has its own unavoidable shortcomings
3.2.1 The survey questionnaires
There are many ways to collect data and survey questionnaire is one effective way because of some reasons Firstly, they can be used to gather information about teachers‟ and students‟ attitudes, views and thoughts to the content validity of the end-of-term 1 test Secondly, there are no confrontations between the persons who do the surveys and the informants because it is often a list of questions Therefore, the informants can feel free to express their thoughts Thirdly, most of the answers for the questions are closed ones so it
is easier for the writer to collect and analyze the data Finally, it can gather a large numbers
of responses
3.2.2 Document analysis
Besides survey questionnaires, document analysis is considered as the main method
to evaluate the final achievement test in terms of the content validity
Firstly, the writer will analyze the “The New Headway- Elementary- The third edition” to find out what the teachers have to teach, what the students ought to learn Because the purpose of this study is investigating into the content validity of the final achievement test for second year students at Phuong Dong University, analyzing this test is one effective way to get this purpose Basing on the theories about testing, designing a test and characteristics of a good test, the writer will analyze this test by comparing the course objectives and what the students had learnt with the test contents in order to find out the strengths and weaknesses of the test and then give some suggested solutions for its improvements
Last but not least, the writer will analyze the data of survey questionnaires from both teachers and students to see how their comments about this test are
Summary
Evidently, it is important to use several methodologies to compare the results received and to ensure the authenticity of the results Of course, the informants‟ real feelings and full views are expressed Besides, document analysis is a rich source of the
Trang 26information as the writer captures what the teachers and students, in fact, do Therefore, using document analysis in combination with survey questionnaires helps the writer give the objective and reliable results
3.3 Data analysis
In this part, basing on the final achievement test, the writer will compare the content of this test with what the students had learnt in the first semester in order to find out the strengths and weaknesses of the currently used test with reference to the content validity In addition, the students and teacher‟s opinions through survey questionnaires is also analyzed in order to evaluate the test content validity more in depth
3.3.1 Analysis of the final achievement test
It is necessary to examine the layout as well as the content of the final achievement test for second year non major students The test includes 4 main parts which can be represented as follows:
Phuong Dong University
Foreign languages department
-
The final achievement test - No1
Time allowed: 60 minutes Total score Marker’s signature1:
Marker’s signature2:
Code
I Rewrite each sentence, beginning as shown, so that the meaning stays the same
1 My watch is cheaper than yours
Trang 274 No one is more intelligent than Anna in her class
Anna is………
5 Do you want some fish and chips?
Would……….?
II Guided sentence building: use the following sets of words and phrases to write complete sentences
1 I‟d/ chicken/and chips/main course
III Find and correct ONE mistake in each of the following sentences
1 My brother can play badminton when he was five years old
Trang 28Unit What they have been taught What they have been tested Unit 7 Past simple tense
Regular verbs
Irregular verbs
Time expression
Question 1 (part 3) Question 3 (part 3)
Unit 8 Past simple 2
Negative- ago
Time expression
Unit 9 Count and uncount nouns
I like and I‟d like
A and some
Much and many
Question 5(part 1) Question 1 (part 2)
Unit 10 Comparative and superlatives
Have got
Question1, 4 (part 1) Question 2 (part 2) Question 5 (part 2)
Trang 29Unit 11 Present continuous
Unit 13 Question forms
Adjectives and adverbs
Question 2(part 1) Question 3 (part 3) Question 5 (part 3) Unit 14 Present perfect
Present perfect and past simple Question 3 (part 2)
Question 4 (part 3)
Table 4: What students had been taught and what they had been checked in
part I, II, III of the test
And what students have been taught in writing part
Unit 7 Describing a holiday
Unit 8 Writing about a friend
Unit 9 Filling in forms
Unit 10 Describing a place
Unit 11 Describing people
Unit 12 Writing a postcard
Unit 13 Writing a story
Unit 14 Writing an email
Table 5: What students had been taught and checked in the writing part
When analyzing the content of the test, you can see that the test is quite sufficient with clear instructions and format There are no new words and new grammar structures to students All of them have been taught in the semester However, there are some problems here When looking at the charts above, it is clear that some grammars have not been checked in the test, for example, grammar part in Unit 8 (negative form of past simple),
Unit 11 (going to) In the writing part, this topic was closely related to what the students