The study was intended to give an evaluation on the current final achievement testsfor non English majors at QNTTC from perspectives of the teachers and non-English majors at QNTTC.. In
Trang 1VIETNAM NATIONAL UNIVERSITY, HA NOI UNIVERSITY OF LANGUAGES & INTERNATIONAL STUDIES FACULTY OF POST-
GRADUATE STUDIES
VŨ THANH HÒA
AN EVALUATIVE STUDY ON THE CURRENT FINAL ACHIEVEMENT TESTS FOR NON-ENGLISH MAJORS AT QUANG NINH TEACHER TRAINING COLLEGE
ĐÁNH GIÁ BÀI KIỂM TRA CUỐI KỲ CHO SINH VIÊN KHÔNG CHUYÊN NGỮ TRƯỜNG CAO ĐẲNG SƯ PHẠM QUẢNG NINH
M.A MINOR PROGRAMME THESIS
Field: English Teaching Methodology Code: 60 140 111
HANOI – 2016
Trang 2VIETNAM NATIONAL UNIVERSITY, HA NOI UNIVERSITY OF LANGUAGES & INTERNATIONAL STUDIES FACULTY OF POST-
GRADUATE STUDIES
VŨ THANH HÒA
AN EVALUATIVE STUDY ON THE CURRENT FINAL ACHIEVEMENT TESTS FOR NON-ENGLISH MAJORS AT QUANG NINH TEACHER TRAINING COLLEGE
ĐÁNH GIÁ BÀI KIỂM TRA CUỐI KỲ CHO SINH VIÊN KHÔNG CHUYÊN NGỮ TRƯỜNG CAO ĐẲNG SƯ PHẠM QUẢNG NINH
M.A MINOR PROGRAMME THESIS
Field: English Teaching Methodology Code: 60 140 111
Supervisor: Đỗ Thị Thanh Hà, Ph.D
HANOI – 2016
Trang 3CANDIDATE’S STATEMENT
-*** -I, Vu Thanh Hoa, hereby certify that this minor thesis entitled
AN EVALUATIVE STUDY ON THE CURRENT FINAL ACHIEVEMENT TESTS FOR NON – ENGLISH MAJORS AT QUANG NINH TEACHER
TRAINING COLLEGE
is completely the result of my own work for the Degree of Master at University ofLanguages and International Studies, Vietnam National University, Hanoi andthat this thesis has not been submitted for any degree at any other university or
institution
Trang 4Second, I would also like to acknowledge my debt of gratitude to the staff members
of the Faculty of Post-Graduate Studies and the lecturers at University of Languagesand International Studies, Vietnam National University-Hanoi for their valuablelectures, which laid the foundation for this thesis and for their knowledge as well astheir sympathy
Third, a special thank would also go to the teachers and the non-English majors atQuang Ninh Teacher Training College, who took part in the research Without theirparticipation and cooperation I would not be able to complete this research paper.Fourth, I should be grateful to the librarians at ULIS for their constant help thanks
to which I was able to access to all materials needed to accomplish the thesis
Finally but importantly, I would like to express my appreciation to my family and
my friends who have been continuously giving me a lot of support andencouragement for the fulfillment of this challenging work
Hanoi, 2016
Trang 5The study was intended to give an evaluation on the current final achievement testsfor non English majors at QNTTC from perspectives of the teachers and non-English majors at QNTTC In addition, this study also investigated how the currentfinal achievement tests for non-English majors at QNTTC aligned to the CEFR.The study was carried out by means of two sets of survey questionnaires, andanalysis of the current final achievement tests at QNTTC and the CEFR, usingsome softwares
From perspectives of the students, some test items of these tests such as phonetics,vocabulary and grammatical structures were too difficult for them to do
The teachers found that the current final achievement tests at QNTTC were notvery reasonable because they lacked two skills: speaking and listening and thewriting section did not test some useful skills such as writing letters, writing cards,creating stories
The analysis of the alignment between the current final achievement tests atQNTTC and the CEFR showed that most of the vocabulary and test items in thesetests were in the ranges from levels A2 to B1 and some of them were at level A2and did not reach the target level, B1 Moreover, the current final achievementtests at QNTTC differed from other international tests (PET) in terms of length andconstructs
The study will hopefully contribute to the test making at QNTTC by showing anexample of evaluation on the current final achievement tests for non-Englishmajors at QNTTC and the alignment between these tests and the CEFR
Trang 6TABLE OF CONTENTS
ABSTRACT iii
LIST OF ABBREVIATIONS vii
LIST OF TABLES viii
CHAPTER 1 INTRODUCTION 1
1.1 Rationale of the study 1
1.2 Aims of the study 2
1.3 Research questions 3
1.4 Scope of the study 3
1.5 Significance of the study 3
1.6 Methodology 4
1.7 Outline of the thesis 4
CHAPTER 2 LITERATURE REVIEW 6
2.1 Basic concepts of testing/ Language testing 6
2.2 The role of testing in teaching and learning 7
2.3 Types of tests according to test purpose 8
2.3.1 Diagnostic tests 9
2.3.2 Placement tests 9
2.3.3 Proficiency tests 9
2.3.4 Achievement tests 10
2.4 Criteria of a good test 12
2.4.1 Validity 12
2.4.2 Reliability 14
2.4.3 Practicality 15
2.4.4 Discrimination 16
2.4.4.1 Item difficulty 16
2.4.4.2 Item discrimination 17
2.5 The CEFR 17
2.5.1 What is the CEFR? 17
2.5.2 Levels of the CEFR 17
2.6 Target level for the non-English majors 18
2.7 Review of related studies 19
2.8 Summary of Chapter 2 20
Trang 7CHAPTER 3 METHODOLOGY 21
3.1 Setting of the study 21
3.1.1 English teaching and learning of non-English majors at QNTTC 21
3.1.2 Brief description of the materials used for non-English majors at QNTTC 21
3.1.3 The testing practice at QNTTC 22
3.2 Informants 23
3.3 Data collection instruments 23
3.4 The alignment framework 27
3.5 Data collection and data analysis procedure 277 3.6 Summary of chapter 3 29
CHAPTER 4 FINDINGS AND DISCUSSION 30
4.1 The current tests at QNTTC 30
4.1.1 Students’ comments on the existing tests 30
4.1.2 Students’ opinions towards the improvement of the tests 32
4.1.3 Teachers’ comments on the existing tests 33
4.1.4 Teachers’ opinions towards the improvement of the tests 35
4.2 The alignment between the current tests at QNTTC and the tests according to the CEFR 36
4.2.1 In terms of their constructs 36
4.2.2 In terms of contents 36
4.3 Summary of Chapter 4 44
CHAPTER 5 CONCLUSION 45
5.1 Summary of the study 45
5.2 Concluding remarks 46
5.3 Limitations and suggestions for further study 46
REFERENCES 48 APPENDIXES I APPENDIX 1: KEY POINTS FOR THIRD SEMESTER I APPENDIX 2 IV QUESTIONNAIRE FOR NON-ENGLISH MAJORS AT QNTTC IV PHỤ LỤC 2 VII BẢNG CÂU HỎI ĐIỀU TRA DÀNH CHO SV KHÔNG CHUYÊN NGỮ VII TRƯỜNG CAO ĐẲNG SƯ PHẠM QUẢNG NINH VII APPENDIX 3 X
Trang 8QUESTIONNAIRE FOR TEACHERS AT QNTTC XAPPENDIX 4 VIIIFINAL ACHIEVEMENT TEST FOR NON-ENGLISH MAJORS VIIIAPPENDIX 5 XFINAL ACHIEVEMENT TEST FOR NON-ENGLISH MAJORS X
Trang 9LIST OF ABBREVIATIONS
1 QNTTC Quang Ninh Teacher Training College
3 CEFR The Common European Framework of Reference for Languages:
Learning, Teaching, Assessment
4 TOEFL Testing English as a Foreign Language
5 IELTS International English Language Testing System
6 TOEIC Test of English for International Communication
8 RMM Pearson Reading Maturity Metric
Trang 10LIST OF TABLES
Table 2.1: Common European Framework of Reference (CEFR) 19Table 4.1: Analysis of the degree of difficulty of the test items in phonetics
in Test number 1 37Table 4.2: Analysis of the degree of difficulty of the test items in phonetics
in Test number 2 37Table 4.3: Analysis of the degree of difficulty of the test items
in grammar and vocabulary in Test number 1 38Table 4.4: Analysis of the degree of difficulty of the test items
in grammar and vocabulary in Test number 2 39Table 4.5: Analysis of the degree of difficulty of the reading texts in Test
number 1 41Table 4.6: Analysis of the degree of difficulty of the reading texts in Test
number 2 41Table 4.7: The analysis of question items of the reading texts of Tests number 1and 2 42Table 4.8: The comparison between Tests number 1 and 2 43Table 4.9: Comparison of the length between the current tests at QNTTC
and the reading – writing tests of PET 43
Trang 11LIST OF FIGURES
Figure 4.1: Students’ accomplishment of a test 30
Figure 4.2: Students’ difficulty/ difficulties when doing the test 31
Figure 4.3: Students’ interests in test items 32
Figure 4.4: Students’ comments and suggestions 32
Figure 4.5: Teachers’ making the English achievement test 33
Figure 4.6: Teachers’ attitudes toward the current test 34
Figure 4.7: Teachers’ comments on the current test 34
Figure 4.8: Changes? 35
Trang 12CHAPTER 1 INTRODUCTION 1.1 Rationale of the study
Nowadays English has become increasingly important as a means of globalcommunication In the process of global integration, Vietnam has realized theimportance of English language learning and teaching; Thus, English has beenwidely used in many fields and it has become a compulsory subject at manyschools and universities
Quang Ninh Teacher Training College (QNTTC) was established in 1959 and now
it is considered the oldest institution in providing undergraduate teacher education
in Quang Ninh In 1991, the organization was restructured from four provincialteacher training institutions (TTI): Quang Ninh Early childhood TTI, Quang NinhPrimary TTI, Quang Ninh Education Management TTI and Quang Ninh LowSecondary TTI Having awareness of the importance of English, the collegeauthorities have paid due attention to the matter of improving the quality of teachingand learning English
In the teaching and learning in general and in the teaching and learning foreignlanguage process in particular, testing and assessment play a significant role Theimportance of language testing is recognised by virtually all professionals in thelanguage education Teachers should not be confined to imparting teaching andlearning with testing Testing is of special importance in educational system that ishighly competitive as testing is not only an indirect stimulus to learning, but plays acrucial role in determining the success or failure of an individual’s career with directimplications for his future career In the World Yearbook of education 1969,Lauwerys and Seaton state: “Thus, testing is an important tool in educationalresearch and for programme evaluation, and may even throw light on both thenature of language proficiency and language learning.”
Trang 13Nga (1997:1) shares the same idea: “Tests are assumed to be powerful determiners
of what happens in classrooms and it is commonly claimed that they affectteaching and learning activities both directly and indirectly”
It cannot be denied that testing is an important part in teaching and learningprocess, but has it been paid enough attention yet? Having taught English forstudents at a high school and then at QNTTC for 5 years, the author of this studyhas designed tests for both English majors and non-English majors She has alsoadministered and marked these tests Her teaching experience shows that there stillremain some problems that need to be solved such as the test content, the gapbetween what is tested and what is taught, the reuse of tests from years to years,from classes to classes As a result, tests may lack validity and reliability Hughes(1990:1) also gives another comment on recent language testing: “It cannot bedenied that a great deal of language testing is of very poor quality Too oftenlanguage tests have a harmful effect on teaching and learning and too often theyfail to measure accurately whatever it is they are intended to measure” Moreover,teachers frequently lack formal training in educational measurement techniquesand they tend to be alienated from the testing process
A well designed test is necessary for all language learners even though they havedifferent levels On the ground of the problems already mentioned, it is thoughtthat achievement tests for the non-English majors at QNTTC should be designed toassure the accuracy and fairness for all students so that they can produce goodbackwash in the teachers’ teaching and give students satisfaction andencouragement in study Those reasons above encourage me to conduct the study
“An evaluative study on the current final achievement tests for non-English majors at Quang Ninh Teacher Training College”
1.2 Aims of the study
The study aims at evaluating the current final achievement tests at QNTTC Toachieve this aim, the following objectives are established:
Trang 141 To evaluate the current final achievement tests for non-English majors from perspectives of the teachers and non-English majors at QNTTC.
2 To investigate the alignment of the current final achievement tests atQNTTC to The Common European Framework of Reference for Languages: Learning,Teaching, Assessment (CEFR)
2 How does the current test align to the CEFR?
1.4 Scope of the study
As the title “An evaluative study on the current final achievement tests for English majors at Quang Ninh Teacher Training College” suggests, this study isintended to touch upon some following issues:
non This study is only aimed at evaluating the existing testing situations at QNTTC from two stakeholders, the teachers and the students
- This study is limited to evaluate the final achievement tests for non-English majors
- This study focuses on evaluating the constructs of the final achievement tests at QNTTC and the tests based on the CEFR (PET)
- This study is a detailed survey at QNTTC Therefore, the findings of the study arenot intended to be generalized to other school contexts Indeed the findings may notapply beyond the actual participants in this particular study
1.5 Significance of the study
The findings of the thesis serves as a back- up for the improvements of the tests for non-English majors at QNTTC Practically, the findings are beneficial for both
Trang 15teachers and learners at QNTTC from the experience of reflection It is also hopedthat the thesis will be of contributions towards the development of the testingsituation at QNTTC in general and the testing situations for non-English majors atQNTTC in particular.
1.6 Methodology
The above-given aims are to be achieved by means of:
(1) A survey questionnaire carried out on 30 non-English majors at QNTTC toinvestigate their comments of the existing final achievement tests for non-English majors
to get their evaluation as well as their suggestions for improving the testing situations andlanguage tests at QNTTC
(2) A survey questionnaire carried out on 10 teachers of the English Faculty ofQNTTC about their comments on the final existing final achievement tests for non-English majors and their suggestions to improve the situation
(3) Analysis of the contents and constructs of the current final achievement tests
at QNTTC to find out the alignment of these tests to the CEFR
Besides the survey and analysis, more information and data needed for the studywere gathered by other methods such as formal and informal discussions withstudents and teachers as well as critical reading Moreover, the study employed acombination of qualitative and quantitative methodology that includes cross-tabulation data and statistical analysis of the results of the survey questionnaire andthe analysis of degree of difficulty of the current final achievement tests at QNTTC
in accordance with the CEFR
1.7 Outline of the thesis
The author divided this study into five chapters:
- Chapter 1: Introduction, this chapter provides the author’ reasons for choosing
the topic, aims, research questions, scope, significance, methodology and outline of thestudy
Trang 16- Chapter 2: Literature review, this chapter is the most theoretical one, looks at
the background knowledge on language testing such as the basic concepts of languagetesting, the role of testing, types of tests according to test purpose, criteria of a good test
as well as the CEFR, target language for non-English majors
- Chapter 3: Methodology, this chapter discusses on methodology, presents the
deep analysis of the setting including English teaching and learning at QNTTC, briefdescription of the material used for non-English majors and the current testing situations
at QNTTC; the informants; data collection instrument and data collection and dataanalysis procedure
- Chapter 4: Findings and discussion, discusses the major findings of the thesis A
brief discussion about the actual English teaching and learning context and the currenttests at QNTTC and the alignment between these tests and the tests according to theCEFR
- Chapter 5: Conclusion, the author sheds the mantle of reviewing the study and
suggesting further research
Trang 17CHAPTER 2 LITERATURE REVIEW 2.1 Basic concepts of testing/ Language testing
The importance of language testing cannot be denied and is recognized by allprofessionals Language tests are considered as valuable tools in providinginformation concerning language teaching They provide evidence for the results
of learning and instructing the effectiveness of teaching as well as information forboth teachers and students to make decisions
For these reasons, testing should be part of language teaching and one of the mainaspects of methodology Many definitions of testing from different points of viewhave been given
According to Allen (1974:313), a test is a measuring device which we use when wewant to compare an individual with other individuals who belong to the same group.Carroll (1968:40) defines: “A psychological and educational test is a proceduredesigned to elicit certain behavior form which one can make inferences aboutcertain characteristics of an individual” Brown (1971:8) has a different point ofview to define a test as “a systematic procedure for measuring an individual’sbehavior” Peny Urr (1996:33) provides the following definition of a test: “Test is
an activity whose main purpose is to convey (usually to the tester) how well thetestees know or can do something” Moore (1992:138) proposes: “evaluation is anessential tool for teachers because it gives them feedback concerning what thestudents have learned and indicates what should be done next in the learningprocess Evaluation helps us to understand students better, their abilities, interests,attitudes, and needs in order to better teach and motivate them” However, Brown(1994a:373) stresses that tests are seen by learners as dark clouds hanging over theirheads, upsetting them with thunderous anxiety as they anticipate the lightning bolts
of questions they do not know and worst of all a flood of disappointed if they do notmake the grade Read (1983:3) shares the ideas saying a language test is a sample oflinguistic performance or a demonstration of language proficiency Nga (1999:2)
Trang 18also states that “Test most commonly refers to a set of items or questions designed
to be presented to one or more students under specified conditions” Broughton(1990:1) thinks the word “test” is much more complicated with at least three quitedistinct meanings The first meaning refers to a carefully prepared measuringinstrument The second one refers to what is usually “a short quick teacher-devisedactivity” carried out in the classroom and used by the teacher as the basis of an on-going assessment Assessment is the process of documenting knowledge, skills,attitudes and beliefs, usually in measurable terms The goal of assessment is tomake improvements, as opposed to simply being judged In an educational context,assessment is the process of describing, collecting, recording, scoring, andinterpreting information about learning It may include a test, but also includesmethods such as observations, interviews, behavior monitoring, etc The last one isthat “of an item within a larger test, part of a test battery, or even sometimes what isoften called a question in an examination” Harrison (1983a:1) notices that a naturalextension of classroom work, providing teachers and students with usefulinformation that can serve as a basis for improvement and a test is necessary butunpleasant imposition from outside the classroom That means test is a useful tool tomeasure learners’ ability in a certain situation especially in a classroom
In short, testing is an effective means of measuring and assessing students’language knowledge and skills The meaning given to the term “testing” is defineddifferently by test researchers and can be understood as the use of means requiringstudents to respond to questions or tests that are designed to focus on a particularaspects of learning and also perceived rather broadly as a process of assessment,consisting of different stages such as preparation, data collection and evaluation
2.2 The role of testing in teaching and learning
In the past, testing and teaching tended to be separated Many applied linguisticresearchers and professional designers have shared the idea that language testing
Trang 19plays a decision part in language teaching in general and language learning inparticular.
Heaton (1988:5 ) states that “teaching and testing in some ways are so interwovenand independent that is very difficult to tease apart Both testing and teaching are
so closely interrelated that it is virtually impossible to work in either field withoutbeing constantly concerned with the others”
Heaton (1988:5) also emphasizes that tests may be constructed primarily asdevices to reinforce learning and motivate the students or as a means of assessingthe students’ performance in the language In the former case, testing is geared tothe teaching, whereas in the latter case, teaching is often geared largely to thetesting
However, testing has both good and bad effects on teaching Hughes (1989:1)shares this point of view: “Backwash can be harmful or beneficial” He states that
if the content of the test is in accordance with the content of teaching and method
of the course being followed, the test can be of beneficial effect to the teachingprocess Otherwise, it is likely to have bad effect
In short, testing and teaching activities cannot be separated from each other andfrom the programme or from the objectives of the course Testing may influenceteaching in either good or bad ways
2.3 Types of tests according to test purpose
Language tests are developed basing on so many purposes that there are manytypes of language tests Since language tests have different purposes and theinformation obtained from tests is used for different types of decisions, let usconsider a brief description of some types of tests according to test purposes
Trang 202.3.1 Diagnostic tests
Hughes (1990:13) states: “Diagnostic tests are used to identify students’ strengthsand weaknesses They are intended primarily to ascertain what further teaching isnecessary”
Brown, H.D (1994b:112) shares this point of view by noting that “diagnostic testsare focused on the strengths and weaknesses of each individual, the instructionalobjectives for purposes of correcting deficiencies “before it is too late”
In addition to it, Brown (1994b:259) gives another comment on this type of tests asfollows “A diagnostic test is designed to diagnose a particular aspect of a particularlanguage.” Moreover, Harrison (1983b) also states that this kind of tests is used,for example, at the end of a unit in the course-book after a lesson designed to teachone particular point
From these definitions, it is clear that the main purpose of diagnostic tests is toidentify test-takers’ strengths and weaknesses in the language, as well as to giveexplanations to the problems, and what treatment can be assigned to fosterachievement by promoting strengths and eliminating weaknesses
2.3.2 Placement tests
According to Hughes (1990:14): “placement tests are intended to provideinformation which will help to place students at the stage of the teaching programmost appropriate to their abilities Typically, they are used to assign students toclasses at different levels.” In other words, it is used to assign students to classesaccording to their abilities so that they can start a course at approximately the samelevel as the other students in the class So as a rule, the results of placement testsare needed quickly so that teaching may begin (Harrison, 1983b :4)
2.3.3 Proficiency tests
According to Brown (1995), proficiency tests are originated from the hope todetermine how much of a given language their students have learned and retained,which focus on overall language ability without reference to any particular
Trang 21programme (and its objectives, teaching and materials) Likewise, a proficiencytest looks to the future situation of language use without necessarily any reference
to the previous process of teaching (McNamara 2000:7)
Hughes (1990:9) states that “Proficiency tests are designed to measure people’sability in language regardless of any training they may have had in that language.”That is to say the content of a proficiency test is not based on the content orobjectives of any language course test takers may have followed It is rather based
on a specification of what they have to be able to do in the language to meet therequirement of their future aims
Other test specialists, such as Carroll and Hall (1985), Harrison (1983a) andHenning (1987) share the same view that proficiency test helps both teachers andlearners know whether the learners can be able to follow a particular course or theyhave to take some pre-departure training to some other popular tests such asTOEFL, IELTS, which are used to test students’ proficiency for their study insome English speaking countries In Vietnam, proficiency tests are of differentlevels namely A, B, C in the past and now A1, A2, B1, B2, C1 and C2 according
to the CEFR or the Vietnam’s English competence framework
2.3.4 Achievement tests
According to Hughes (1990:10), “in contrast to proficiency tests, achievement testsare directly related to language courses, their purpose being to establish howsuccessful individual students, groups of students, or the courses themselves havebeen in achieving objectives” Achievement tests are commonly used at school ofall levels and of great importance in evaluating language knowledge and skillsstudents have acquired during the English teaching learning process
McNamara (2000:6) states that “achievement tests are associated with the process
of instruction Achievement tests accumulate evidence during, or at the end of acourse of study in order to see whether and where progress has been made in terms
of the goals of learning Achievement tests should support the teaching to which
Trang 22they relate An achievement test may be self-enclosed in the sense that it may notbear any direct relationship to language use in the world outside the classroom (itmay focus on knowledge of particular points of grammar or vocabulary, forexample).” Brown (1994b:259) shares McNamara’s viewpoint, “an achievementtest is related directly to classroom lessons, units or even a total curriculum.”
Achievement tests are divided into two basic types according to the time ofadministration They are namely progress achievement tests and final achievementtests
(1) Progress achievement tests
Progress achievement tests (criterion-referenced or objective-referenced), as thename suggests, are intended to measure the progress that learners are making.Since “progress” in achieving course objectives, these tests should be related toobjectives These should make a clear progression towards the final achievementtest based on course objectives They are usually carried out to measure the extentwhich students have mastered from what has been taught in the classroom
Thanks to the results of the achievement tests, teachers will be able to find out anddiagnose areas not properly mastered by students during the course, which needremedial action Moreover, these tests also provide students with a good chance tostimulate learning and performing the target language they have learnt in a positiveand effective manner with confidence This is also considered a preparative step tomake students familiar with the test
(2) Final achievement tests
Final achievement tests are given at the end of the course They may be written oradministered by ministries of education, official examining boards, or by members
of teaching institutions Clearly, the content of these tests must be related to thecourses with which they are concerned, but the nature of this relationship is still amatter of disagreement amongst language testers It is a good chance for teachers
to judge the degree of success of their teaching and identify students’ weakness
Trang 23Hughes (1990:10) divided them into two kinds depending on different approachesused The syllabus - content test is the one in which its content should be baseddirectly on a detailed course syllabus or on books and other materials used.Whereas the syllabus – objective test is used to test objectives so it is good tomeasure students’ ability to meet course objectives However, it is bad as theywork against the teaching because this approach copes with testing problems ratherthan what students have achieved.
2.4 Criteria of a good test
As mentioned before, testing may have good or bad effects on teaching so before
making tests, test designers often ask themselves these questions: How do we design a test that can test all the language skills? Who is it for? Is it suitable for all of them? What is it meant to test? How do we know that it is a good one? Does this test get the target level?
In order to construct a good test, teachers have to take into consideration thevarious factors such as the purpose of the test, the course content and above allstudents’ background and so on In addition to these factors, good tests mustpossess some characteristics namely validity, reliability, practicality anddiscrimination According to a number of leading scholars in testing as Valette(1977), Harrison (1983a), Carroll and Hall (1985), Henning (1987), Heaton (1988),Hughes (1990) and Brown (1994a) all good tests possess all these fourcharacteristics These characteristics will be critically reviewed bellow
2.4.1 Validity
Validity is certainly the most important single characteristic of a test If not valid,even a reliable test does not worth much Carmen (1995) defines that: “a test isvalid if it measures what you want to measure” Hughes (1989) also shares thesame ideas: “a test is said to have validity if it measures accurately what it isintended to be measured” According to Aik’s opinion (1983:2), “a test is said to
be valid if it is relevant to the aims and purposes of the areas of learning on which
Trang 24it is set” In this sense, validity of the test and purposes of the course syllabus areclosely related.
There are different kinds of validity such as face validity, content validity,criterion-related validity, construct validity, empirical validity, predictive validity,etc but among them content validity, face validity and criterion-related validity arethe most important
Content validity refers to the correspondence between the content of the test andthe content of the materials to be tested Of course, a test cannot include all theelements of the content to be tested Nevertheless, the content of the test should be
a reasonable sample and representative of the total content to be tested In Read’sopinion (1983:6), the most relevant type of validity for classroom testing is contentvalidity, which means that the content of the test should reflect the content andobjectives of the syllabus that is being followed According to Anastasi (1982:131)defines content validity as “essentially the systematic examination of the testcontent to determine whether it covers a representative sample of the behaviordomain to be measured.” She shows a fact of useful guideline for establishingcontent validity:
- The behavior domain to be tested must be systematically analyzed to make certain that major aspects are covered by the test items with correct proportions;
- The domain under consideration should be fully described in advance, rather thanbeing defined after the test has been prepared;
- The content validity depends on the relevance of the individual test relevance of item content
From the above concepts, it is obvious that the contents of a tests are main concern
in achieving its content validity
Whereas, face validity refers to the extent to which the physical appearance of the test corresponds to what it is claimed to measured Anastasi (1982:136) points out
Trang 25that face validity is not validity in the technical sense; it refers, not to what the testactually measures, but to what it appears who take it, the administrative personnelwho decide on its use and other technically untrained observers Face validity issupported by the judgment that a test is appealing to laymen–students,administrations, etc Hughes (1990) in “testing for Language Teachers” states: “atest is said to have face validity if it looks as if it supposed to measure” In otherwords, tests should be based on the course content and methodological teachingapproaches.
Criterion-related validity refers to the correspondence between the results of thetest in question and the results obtained from an outside criterion The outsidecriterion is usually a measurement device for which the validity is alreadyestablished In contrast to face validity and content validity, which are determinedsubjectively, criterion-related validity is established quiet objectively
In short, validity is the “must” for testers to take into consideration when theyconstruct a language test
2.4.2 Reliability
Reliability is one of the most important characteristics of all tests in general, andlanguage tests in particular In fact, an unreliable test is worth nothing It is ofprimary importance in the use of proficiency tests for both public achievement andclassroom tests An appropriateness of the various factors affecting reliability isimportant for the teachers at the very outset, since many teachers tend to regardtests as infallible measuring instruments and fail to realize that even the best test isindeed a somewhat imprecise instrument with which to measure skills
The two things need to be considered about reliability are the consistency ofperformance from candidates and scoring The former is affected by several factorssuch as the number of questions, test administration and test instructions This isdefined by Moore (1992:110) that “reliability refers to the consistency with which
a measurement device measures some target behavior or trait To put it another
Trang 26way, it means the dependability or trustworthiness of the measurement device”.Likewise, Bachman (1990:24) describes reliability as “a quality of test score” Forinstance, a multiple-choice test would probably yield different scores from oneadministration to another, and would thus be extremely unreliable Moreover,administration, that is the circumstance under which a test can be taken, affects testresults a lot It involves in such problems as timing, testing conditions, observation
or control of testees doing the test, scoring, etc
Finally, it should be noted that a test could be reliable without possessing validity.However, reliability is clearly inadequate by itself if a test does not succeed inmeasuring what is supposed to measure
2.4.3 Practicality
It would be not good if test constructors are to separate tests’ validity andreliability from practicality Practicality refers to facilities available to testdevelopers regarding both administration and scoring procedures of a test InHarrison’s opinion (1983a:13): “a valid and reliable test is of little use if it does notprove to be a practical one A test ought to be practical - in the sense of financiallimitations, time constrains, ease of administration, scoring and interpretation Atest is impractical in case it is prohibitively expensive and it takes much time toconstruct” Brown (1994b:253) gives some useful guidelines: if a test isprohibitively expensive, it is impractical and if it takes ten hours to complete, it isalso impractical
Bachman and Palmer (1996:39) state that “the relationship between the recoursethat will be required in the design, development, and use of the test the resourcesthat will be available for these activities” They link practicality to “the ways inwhich the test will be implemented in a given situation or whether the test will beused at all.”
In conclusion, a test has practicality if it does not involve much time or money inconstructing, implementing and scoring it
Trang 272.4.4 Discrimination
Another important feature of a test is its capacity to discriminate among thedifferent candidates and to reflect the differences in the performances of theindividuals in the group Generally speaking, all assessment is based oncomparison either between one student and another (norm-reference comparison)
or between the student as he/she is now and as he/she was earlier (Harrison,1983b) This is true for both teacher-made tests and standardized tests A goodlanguage test should be able to discriminate between a student and others takingthe same test So if a test is either too easy or too difficult, it cannot realize itspurpose of discrimination between candidates According to Heaton (1988:165),70% means nothing at all unless all the other scores obtained in the test are known.Furthermore, tests on which almost all the candidates score 70% clearly fail todiscriminate between the various students
In order to have the discrimination feature, a test must have a scale ranging fromextremely easy items to extremely difficult items (extremely easy items, very easyitems, easy items, fairly easy items below average difficult level, items of averagedifficult level, items above average difficult level, fairly difficult items, difficultitems, very difficult items and extremely difficult items)
2.4.4.1 Item difficulty
According to Hai (1999:26), difficulty level relates to show how easy and difficultthe item is from the point of view of the students who took the test It is importantsince test items that are too easy (that all students get right answers) can tellnothing about differences within the test population On the other hand, Henning(1987:49) states that perhaps the single most important characteristic of an item to
be accurately determined is its difficulty
Another argument for including items of difficulty levels is the students’ stimulusmotivation It has been assumed that while the difficult items are necessarilyincluded in the test in order to create the motivation among the good students,
Trang 28“ the inclusion of very easy items will encourage and motivate the poor students”(Heaton, 1988:179).
2.4.4.2 Item discrimination
Another important characteristic of a test item is how well it discriminates betweenweak and strong candidates in the ability being tested Difficulty alone is notsufficient information upon which to base the decision to accept or reject a givenitem (Henning, 1987:51)
When answering an item, some candidates will respond correctly, and someincorrectly After having the results, we can divide them into 2 groups: weak groupwith less correct answers and strong group with more correct answers For eachitem we can then count how many students answer correctly each group If morehigh scorers than low scorers answered correctly then the item is distinguishingbetween strong and weak candidates and it is said to be a good discriminator If thenumbers are the same or more low scorers responded correctly, then the item issuspect and may need to be changed (Baker, 1989)
2.5 The CEFR
2.5.1 What is the CEFR?
The Common European Framework of Reference (CEFR) is a framework,published by the Council of Europe (2001), which describes language learners’ability in terms of speaking, reading, listening and writing at six reference levels
2.5.2 Levels of the CEFR
In November 2001, an European Union Council Resolution recommended using theCEFR to set up systems of validation of language ability The six reference levels(A1, A2, B1, B2, C1 and C2) are becoming widely accepted as the Europeanstandard for grading an individual's language proficiency
The CEFR divides learners into three broad divisions (Basic User, Independent Userand Proficient User) that can be divided into six mentioned levels For each
Trang 29level, it describes what a learner is supposed to be able to do in reading, listening,speaking and writing.
2.6 Target level for the non-English majors
According to the Decision 1400/QĐ-TTg dated September 30th 2008, non-Englishmajors at college and university must get KNLNN level 3 (B1) of English to begraduated Based on the CEFR assessment (CEFR, 2001), the students get level B1,that is to say, they:
Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc
Can understand the main point of many radio or TV programmes on currentaffairs or topics of personal or professional interest when the delivery is relatively slowand clear
Can understand texts that consist mainly of high frequency everyday or related language
job- Can understand the description of events, feelings and wishes in personalletters
Can deal with most situations likely to arise while traveling in an area where the language is spoken
Can produce simple connected text on topics that are familiar or of personal interest
Can enter unprepared into conversation on topics that are familiar, ofpersonal interest or pertinent to everyday life (e.g family, hobbies, work, travel andcurrent events)
Can connect phrases in a simple way in order to describe experiences and events, my dreams, hopes and ambitions
Can briefly give reasons and explanations for opinions and plans I can narrate a story or relate the plot of a book or film and describe my reactions
Can describe experiences and events, dreams, hopes and ambitions and briefly give reasons and explanations for opinions and plans
Trang 30 Can write personal letters describing experiences and impressions.
Can write simple connected text on topics which are familiar or of personal
interest
The 6 levels of the CEFR aligned with international English tests can be
summarized as follows:
Table 2.1: Common European Framework of Reference (CEFR) 2.7 Review of related studies
Trang 31Specifying the relationship between a test product and the CEFR is challengingbecause, in order to function as a framework, the CEFR is deliberatelyunderspecified (Davidson & Fulcher, 2007; Milanovic, 2009; Weir, 2005).Establishing the relationship is also not a one-off activity, but rather involves the
19
Trang 32accumulation of evidence over time (e.g it needs to be shown that test quality andstandards are maintained).
As a result, so far not many studies have been conducted to find out the alignment
of different kinds of tests to the CEFR In the research memorandum on “TheAssociation between TOEFL iBT Test Scores and the Common EuropeanFramework of Reference (CEFR) levels” by Spiros Papagoergiou, Richard J.Tannnebaum, Brent Bridgeman, Yeonsuck Cho (2015), the authors noted thecontent alignment of the TOEFR iBT to an external language framework such as theCEFR
2.8 Summary of Chapter 2
Chapter 2 has briefly discussed the basic concepts of language testing The Chapterhas been concerned with the issues relating to different test types according to thetest purpose Besides this, the author has introduced the characteristics of a goodtest Finally, the CEFR, the target level for the non-English majors and the relatedstudies to the CEFR have also been introduced
Trang 33CHAPTER 3 METHODOLOGY
This chapter is aimed at providing a detail description of the research carried out toget the results for this study First, it begins with an introduction including thesetting of the study, the informants and data collection instruments Then, therecomes a minute account of data collection and data analysis procedure used in thisstudy
3.1 Setting of the study
3.1.1 English teaching and learning of non-English majors at QNTTC
QNTTC is considered the oldest institution in providing undergraduate teachereducation in Quang Ninh English has been taught here for both English majors andnon-English majors since 1982
During the non-English majors’ course of learning English (three semesters with
105 lesson periods or 10 credits in total), they are required to do three tests inaccordance with the first, second and third semester respectively After finishingthis course, the students are supposed to get level B1 To meet the demand, manybooks were used First, Head Way was chosen as the course book, then it was NewHead Way Sometime later, Lifelines replaced New Head Way and now NewCutting Edge, Pre-Intermediate is chosen Besides the materials and teaching andlearning condition, the teaching staff are also improved both in quality and quantity
3.1.2 Brief description of the materials used for non-English majors at QNTTC
As mentioned earlier, to get Level B1 they have to take an English courseconsisting of 3 semesters, the course book “New Cutting Edge - Elementary” and
“New Cutting Edge - Pre - Intermediate” are now chosen as the main ones.According to the authors (Sarah Cunningham, Peter Moor and Jane Comyns Carr),
“New Cutting Edge - Elementary” takes students from A1 to A2 level and “NewCutting Edge – Pre-Intermediate” takes them from A2 to B1 level of the CEFRincluding these features: with a task-based learning approach, the main objective isfor students to use the language that they know in order to achieve a particular
Trang 34communication goal These books have a comprehensive syllabus with thoroughgrammar, vocabulary and skills work, systematic vocabulary building whichfocuses on high-frequency, useful words and phrases and clearly-structured tasks
to encourage student’s fluency and confidence These books are to provide astepping stone to enable a student to cross from the knowledge of General English
to a position where he can handle the sort of textbook and instructions he will meet
at college and in their future career
The key points of 8 modules (New Cutting Edge – Pre - Intermediate) for thirdsemester are described as in Appendix 1
3.1.3 The testing practice at QNTTC
English is a compulsory subject in the whole educational programme therefore, testactivities are paid much more considerations At the examinations, the students areasked to sit in the alphabetical order and each one is given a different test paper sothat they can hardly copy from one another and do the test individually
Besides this, objective tests such as multiple choice, mistake correcting, sentencebuilding and questions and answers are used in order to get the highest reliabilityand discrimination among the test takers In general, QNTTC English tests lookgood and reasonable for students In addition to that, to make it easier for teachers
to score the examination papers, separate answer sheets are provided for the testtakers to write down the answer and their names in these answer sheets are cut out
to make it more equal for them
The non-English majors at QNTTC have to take 3 tests Test 1 with 50 items isgiven at the end of the first semester when students have learned 13 Modules of theNew Cutting Edge – Elementary book in 90 minutes Test 2 is a test with 40 itemswhich is carried out in 60 minutes at the end of the second semester after they havefinished the next 2 Modules of the New Cutting Edge – Elementary and 7 Modules
of the New Cutting Edge – Pre-Intermediate After the last 8 Modules of the New
Trang 35Cutting Edge – Pre-Intermediate, students have to take the final achievement test(Test 3) with 40 items in 60 minutes.
3.2 Informants
The informants of the study were selected from the student and teacher population
of QNTTC The students were selected on the account that they were the targetlearners of the course who took the final achievement tests They had taken all thethree achievement tests by the time they were asked to answer the questionnaires.The teachers who were taken as the informants of the study had at least three-yearteaching experience and had ever made final achievement tests for non-Englishmajors
* The students
The informants of the study are the non-English majors in the college, ranging from
19 to 21 Most of them have been learning English for 7 to 10 years (77%) whilejust 3% have studied English from 4 to 6 years and no one has learnt English forless than 4 years 20% of the students have been learning English for more than 10years This means that they started learning English since they were in primaryschool and they all come from urban areas Therefore, their English proficiency issomehow better than students from rural parts due to the availability ofopportunities to attend part-time English courses and chances to contact withEnglish speaking people However, they are non-English majors so they are notreally interested in learning English and unmotivated Although they tend to regardEnglish as less important than other subjects and they study English in order to passthe exams, 40% think that English is quite important However, only 5 students(17%) realize the importance of English and they think that English is nationallanguage and they can be used in communication and their lives Up to 37%consider English not very important or even not important at all (7%) because they
do not need it in their future job
Trang 36* The teachers
There are 12 teachers of English currently working at QNTTC and 10 of themattended in this study Of whom, three fourths are MA Degree holders or arestudying to get MA degrees In general, they are not very young (from 27 to 50years old) and half of them have been working as a teacher for more than 10 yearsand only 10% have been a teacher for 1 to 3 years 20% had 4 to 6 years’ teachingexperience and the same number of teachers spent 7 to 10 years teaching English
3.3 Data collection instruments
This research is conducted by using surveys, the current final achievement tests fornon-English majors at QNTTC, the CEFR and some softwares such as Pearson,Estim
In examining the actual English testing situation at QNTTC, survey questionnaireswere used They were used as the main instruments for collecting the data in thisstudy because by using questions the researchers can collect information quicklyfrom large numbers of respondents To find out the alignment between the currentfinal achievement tests for non-English majors at QNTTC, the constructs andcontents of the tests were also taken into consideration to make the research morereliable
Two sets of survey questionnaires are conducted with the assistance of 10 teachers
of the English Faculty and 30 non-English majors
The first questionnaire with 12 questions was administered to 30 second-year
students of non-English majors at QNTTC:
- The first question was written to find out the students’ time of learning English
- The second question is to investigate whether the students think English is
important to their future career or not
- The third question aims at finding out how well they complete a test
Trang 37- The fourth question is to examine the difficulty/difficulties the students have whenthey do a test.
- Questions 5 - 7 aim at eliciting their interest in some test items (reading and answering questions, making up sentences and correcting the mistakes)
- Questions 8, 9 and 11 were written to investigate their attitudes toward the currenttest
- Question 10 was used to get the students’ opinions about correcting their workright after they have done the test
- Question 12 was written to find out other opinions of the students about the
current test
The second survey questionnaire was administered to 10 teachers of the English Faculty at QNTTC including 12 questions:
- Question 1 is to investigate the teachers’ teaching experience
- Question 2 was written to find out the necessity of an English achievement test at the end of each semester
- Questions 3 - 5 aim at investigating the teachers’ test making as well as theirreasons for their answers
- Questions 6 - 8 were written to investigate the teachers’ opinion about the content,the marking scale, the time allowance of the current English achievement test fornon-English majors with their own reasons
- Question 9 - 11 were used to get their opinions and reasons for changing the construction of the current test (the content, the marking scale, the time allowance)
- Question 12 is to investigate the teachers’ further comments and suggestionstowards the improvement of the current test
These two sets of questionnaire can be seen in Appendixes 2 and 3
Trang 38The two final achievement tests used for this thesis were chosen at random andsupposed to be the latest and equivalent to other final achievement tests for non-English majors at QNTTC These tests consists of 4 sections as follows:
Section 1: Phonetics (1.25 points)
- Item format: Multiple choice questions
- Number of items: 5
- Scores: 0.25 points for each item
Section 2: Grammar and vocabulary (5 points)
Part 1: Choose the best answer
- Item format: Multiple choice questions
- Number of items: 15
- Scores: 0.25 points for each item
- Item format: Q & A
- Number of items: 5
- Scores: 0.4 points for each item
Section 3: Reading comprehension (2.5 points)
Trang 39- Scores: 0.25 points for each item
Section 4: Writing (1.25 points)
Use the set of words and phrases to make meaningful sentences
- Item format: Q & A
- Number of items: 5
- Scores: 0.25 points for each item
The chosen tests can be seen in Appendixes 4 and 5
The softwares Englishprofile, Pearson and Estim
The software Englishprofile, Pearson and Estim were used to make the evaluationmore reliable The author used these softwares as the instruments to evaluatevocabulary, grammar as well as reading and writing items to find out how they arealigned to the CEFR The author searched the vocabulary and grammar items in thesoftware Englishprofile to find out the levels they were at With the help of thesoftwares Pearson and Estim, the reading texts were also checked to see theirdegrees of difficulty as well as the levels of difficulty of the questions
To investigate the alignment of the reading texts to the CEFR, the author with thehelp of other English teachers analyzed the test items to see whether they test theskills described at the descriptors of the CEFR or not (see Appendix 6)
3.4 The alignment framework
The alignment between the current final achievement tests at QNTTC and the
CEFR (tests at Level B1 – PET) can be evaluated in terms of:
- The constructs of the two tests
- The contents of the two tests
- The length of the two tests
Trang 40- The degree of difficulty of the final achievement tests at QNTTC based on the CEFR.
3.5 Data collection and data analysis procedure
To accomplish the purpose of the study, the following procedures were pursued:First, two sets of survey questionnaires were given to 10 teachers of English at theEnglish Faculty and 30 non-English majors at QNTTC The questionnaire for theteachers was administered at the break time of the English group’s weekly meeting.For the students, it was administered at the closing of the class At the time theywere asked to answer the questions, the students had taken the current finalachievement tests for non-English majors at QNTTC
Before administrating the instruments, the purposes and the importance of the studywere clarified to the participants They also received oral instruction about how tocomplete the surveys Each survey was gathered after 30 minutes of administration.The data obtained from these two surveys were imported into the computer andtreated in Excel The data were then subjected to some descriptive and inferentialstatistics For accurate and effective interpretation of data, the author usedfrequencies and sorting to find out the percentage that indicates more emphasisgiven to each item
Finally, to find out the alignment between the final achievement tests to the CEFR,these tests are taken into consideration in terms of their constructs, contents andlength To make the evaluation more reliable and valid, expert judgments wereused Two English teachers who were considered to be good at testing andassessment were asked to help the author These teachers used their experience andthe softwares Englishprofile, Pearson and Estim to evaluate the tests by checkingthe level each item (vocabulary, grammar) was at as well as the degree of difficulty
of the reading text The alignment of the vocabulary and grammar items weresearched using the English profile vocabulary and grammar The author also used