Therefore, “Evaluating the validity of the final achievement test for second – year non – major students at Electronic – Electrical Engineering Department, Nam Dinh University of Technol
Trang 1SUBMITTED BY: TRẦN THỊ THU HƯƠNG
A thesis submitted in partial fulfillment of the requirements
for the degree of Master of Arts
EVALUATING THE VALIDITY OF THE FINAL ACHIEVEMENT TEST FOR SECOND – YEAR NON – MAJOR STUDENTS AT ELECTRONIC – ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH UNIVERSITY OF TECHNOLOGY EDUCATION
(Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Định)
M.A MINOR THESIS
Field: Language Teaching Methodology
Code: 60 14 10
HANOI, 2011
Trang 2SUBMITTED BY: TRẦN THỊ THU HƯƠNG
A thesis submitted in partial fulfillment of the requirements
for the degree of Master of Arts
EVALUATING THE VALIDITY OF THE FINAL ACHIEVEMENT TEST FOR SECOND – YEAR NON – MAJOR STUDENTS AT ELECTRONIC – ELECTRICAL ENGINEERING DEPARTMENT, NAM DINH UNIVERSITY OF TECHNOLOGY EDUCATION
(Đánh giá độ giá trị của bài kiểm tra cuối kỳ cho sinh viên không chuyên tiếng Anh năm thứ hai tại khoa Điện – Điện tử, Trường
Đại học Sư phạm Kỹ thuật Nam Định)
M.A MINOR THESIS
Field: Language Teaching Methodology Code: 60 14 10
Supervisor: Phạm Lan Anh, M.A
Trang 3
ACKNOWLEDGEMENTS ii
ABSTRACT iii
LIST OF FIGURES, TABLES AND CHARTS iv
LIST OF ABBREVIATIONS v
TABLE OF CONTENTS vi
CHAPTER 1: INTRODUCTION 1
1.1 Rationale 1
1.2 Scope of the study 2
1.3 Aims of the study 2
1.4 Methods of the study 2
1.5 Research questions 3
1.6 Design of the study 3
CHAPTER 2: LITERATURE REVIEW 4
2.1 Relationship between teaching, learning and assessment 4
2.2 Purposes of formative and summative assessments 8
2.3 Achievement tests and their characteristics 9
2.3.1 Achievements tests 9
2.3.2 Characteristics of a good EGP test 11
2.3.3 Characteristics of a good ESP test 14
2.4 Face validity 15
2.4.1 Definition 15
2.4.2 Relationship between reliability and validity 16
2.4.3 Reasons for choosing face validity 17
2.5 Some measures to increase face validity 18
Trang 43.1.1 Students’ backgrounds 20
3.1.2 The English teaching staff 20
3.1.3 Objectives of the English course 21
3.1.4 Checklist of the course book 22
3.1.5 Objectives of the final test 23
3.1.6 Difficulty level and discrimination of the final test 24
3.2 English testing at Nam Dinh University of Technology Education 24
3.2.1 Testing situation 24
3.2.2 The current final achievement test 25
3.3 Research methods 26
3.3.1 Survey questionnaire 26
3.3.2 Interview and informal discussion 26
3.4 Data analysis of survey questionnaires and interviews 26
3.4.1 Data analysis of the administration of the test 27
3.4.1.1 Data analysis of the format of the test 27
3.4.1.2 Data analysis of the logistics of the test 28
3.4.2 Data analysis of face validity of the test 29
3.4.2.1 Data analysis of general opinion about the test 30
3.4.2.2 Data analysis of reading comprehension task 31
3.4.2.3 Data analysis of grammar knowledge task 33
3.4.2.4 Data analysis of translation task 34
3.5 Discussion and findings 36
3.5.1 Similarities in teachers and students’ perception 36
3.5.1.1 Test administration 36
Trang 53.5.1.2.2 Reading comprehension task 36
3.5.1.2.3 Grammar task 37
3.5.1.2.4 Translation task 37
3.5.2 Differences in teachers and students’ perception 37
3.5.2.1 Test administration 37
3.5.2.2 Face validity 37
3.5.2.2.1 Grammar task 37
3.5.2.2.2 Translation task 37
3.6 Suggestions to improve the final achievement test 38
CHAPTER 4: CONCLUSION 41
REFERENCES 42 APPENDICES I
APPENDIX 1 I
APPENDIX 2 V
APPENDIX 3 IX
APPENDIX 4 XIII
Trang 6LIST OF FIGURES, TABLES AND CHARTS
Figures
Figure 1: Three Considerations for Test Choice
Figure 2: The Scope of Impact of Language Tests
Figure 3: Relationship between reliability and validity
Tables
Table 1: ESP syllabus content allocation
Table 2: Specification of test 12
Table 3: Teachers and students’ comments on the format of the test
Table 4: Teachers and students’ comments on the administration of the test
Table 5: Teachers and students’ comment on the whole test
Table 6: Teachers and students’ comment on students’ reading comprehension ability, theme and instruction of the reading comprehension task
Table 7: Teachers and students’ comment on the grammar task
Table 8: Teachers and students’ comment on the translation task
Chart 3: Percentage of the results which students get in the test
Chart 4: Percentage of what test tasks students cannot do
Chart 5: Percentage of teachers and students’ comment on the length of the reading text
Chart 6: Students’ comment on whether or not the reading text is difficult
Chart 7: Teachers’ comment on which types of reading skills are expressed in the reading comprehension task
Chart 8: Teachers and students’ comment on which students’ ability this translation task requires
Trang 8CHAPTER 1: INTRODUCTION
1.1 Rationale
Learning English in Vietnam today is quite popular and its popularity is increasing day by day This is due to the fact that Vietnam has recently adopted an open-door policy which encourages broadening and improving its relationship and cooperation with other countries in many aspects of life such as diplomatic, economic, cultural, scientific and technological areas English is not only a means but also a key to gaining access to latest scientific and technological achievements for a developing country like Vietnam, where modern science and technology is badly needed
Recognizing the necessity of this global language, most of the schools, colleges and universities in Vietnam consider English as the main, compulsory subjects that students must learn In the English language learning and teaching process, evaluating a test is significant Test is a part of teaching and learning process because it also provides feedback about the achievement of teaching and learning objectives for those who are involved in the education system Moreover, the use and necessity of knowing a person’s language ability is through a language test In education, especially at the Faculty of Foreign Languages at Nam Dinh University of Technology Education (NUTE), testing the students’ achievement toward teaching objectives is needed Without an achievement test,
it is difficult to see how rational educational decisions can be made
NUTE is a technological university and students’ learning English ability is really low Evaluating a test is also one of the methods to improve students’ learning process and results Although there were many theses mentioned this problem but at NUTE it is still new and very necessary because how to evaluate the test after each semester still receives little attention and up to now the process of test analysis after each examination hasn’t been fully investigated Consequently, students’ results are getting worse and worse As a teacher myself, I see that we, teachers at NUTE, just stop at experienced level of test making procedure, test administration, test marking procedure during and after examination We do not care testing evaluation from other teachers and students, so the results of test are not still improved
Trang 9Moreover, test’s validity can be seen as an attempt for improving the test quality Being a measure of students’ achievement toward learning objectives, final examinations must be valid Validity is one of the characteristics of a qualified test Therefore,
“Evaluating the validity of the final achievement test for second – year non – major students at Electronic – Electrical Engineering Department, Nam Dinh University of Technology Education” is chosen with the hope that the study will be helpful with the author, the teachers, the test-takers and everyone who is concerned with language testing in general and validity of an achievement test in particular Due to limit time in collecting students’ scores, this study is different from the previous study The author only focuses on the face validity of this test The author hopes that the result of the study can then be applied to improve the current test and to create a new really reliable item bank It is also intended to encourage both teachers and learners in their teaching and learning
1.2 Scope of the study
The scope of this thesis is limited to a research on teachers’ and test-takers’ evaluation of the existing achievement test in terms of its face validity for the second-year non-English major students at Electronic – Electrical Engineering Department, NUTE due
to the limitations in time, ability and availability of data Moreover, it is impossible for the author to cover all used final achievement tests as well as design a sample achievement test for second-year students Instead, only a test specification for test 12 in semester 3 is presented
1.3 Aims of the study
Following the scope of the research above, the aims of this research are:
1 To indentify the English teachers and students’ evaluation of the final existing achievement test (test 12) at NUTE in terms of face validity
2 To provide suggestions for test designers
1.4 Methods of the study
In order to achieve the above aims, the study has been carried out as follows:
First, the author goes to library to read theory about assessment and testing, achievement test with characteristics of a good achievement test and test validity with a special focus on face validity and some measures to increase it From her critical reading, many reference materials have been gathered, analyzed, and synthesized to draw out a
Trang 10theoretical basis to evaluate the current test being used for the 3rd semester students in terms of its face validity
Then, qualitative methodologies involving data collected through survey questionnaires and interviews were employed from both teachers and students at NUTE
1.5 Research questions
This study is implemented to find answers to the following research questions:
1 What are the teachers’ and test takers’ (students’) perceptions of the final 3rdsemester English achievement test at NUTE in terms of its face validity?
2 What are suggestions to improve face validity of the final 3rd semester English achievement test at NUTE?
1.6 Design of the study
The thesis is divided into four major chapters:
Chapter 1: Introduction presents basic information such as: the rationale, the scope, the aims, the method, the research questions and the design of the study
Chapter 2: Literature review reviews theoretical backgrounds on evaluating a test, which includes relationship between teaching, learning and assessment, purposes of formative and summative assessments, achievement tests, characteristics of good EGP and ESP tests, face validity and some measures to increase face validity
Chapter 3: The study is the main part of the thesis showing the context of the study and the detailed result obtained from collected tests and findings in response to the research questions Then, the author gives some solutions to improve the final achievement test Chapter 4: Conclusion offers conclusions and proposes some suggestions for further research on the topic
Trang 11CHAPTER 2: LITERATURE REVIEW
This chapter provides an overview of the theoretical background of the study It includes three main sections Section 2.1 discusses the relationship between teaching, learning and assessment Section 2.2 focuses on the purposes of formative assessment and
summative assessment Section 2.3 gives a brief description of achievement tests,
characteristics of a good EGP test and ESP test It is then followed by section 2.4 in which face validity is focused Finally, section 2.5 suggests some measures to increase face validity
2.1 Relationship between teaching, learning and assessment
In the relationship between teaching, learning and assessment; curriculum and content standards also play an important role Curriculum is best characterized as what should take place in the classroom It describes the topics, themes, units and questions contained within the content standards Content standards are the framework for curriculum Curriculum can vary from programs to programs, as well as from instructors to instructors Unlike content standards, curriculum focuses on delivering the “big” ideas and concepts that the content standards identify as necessary for the learner to understand and apply Curriculum serves as a guide for instructors; addressing teaching techniques, recommending activities, scope and sequence, and modes of presentation considered most effective In addition, curriculum indicates the textbooks, materials, activities and equipment that help learners achieve the content standards best In the teaching and learning process, assessment is a tool to give the nature of evidence required to demonstrate that the content standards have been met To ensure valid and reliable accountability, the assessment selected should test the state standards Clearly, assessment, curriculum and content standards have close relationship; assessment is the basis to give the content standards and curriculum is generalized from the content standards
Longman dictionary of language teaching and applied linguistics (the 3rd edition) (Richard, etc., 2005) defines assessment as “a systematic approach to collecting information and making references about the ability of a student or the quality or success
of a teaching course on the basis of various sources of evidence”
Trang 12Assessment is a critical link for teaching and learning, which also plays a vital role
in the process of curriculum design and teaching implementation From the perspective of the behavior research in classroom teaching, Richards & Nunan (1990) hold that assessment refers to the set of processes through which we make judgments about a learner’s level of skills and knowledge Assessment should:
- Insure reliability and validity;
- Provide for pre-, while- and post-testing;
- Be criterion – or standards – referenced;
- Inform instruction;
- Serve as an accountability measure;
- Be adaptable to a variety of instructional environments;
- Accommodate learners with special needs
Various assessment measures are known to all, like evaluation, examination, questionnaire, interview, discussion and observation, so on And testing is the most available means to implement the assessment in the teaching process In Brown’s (2001) view, in the curriculum system, it needs analysis, objectives, testing, materials, teaching and evaluation So does Richards (1990) say, the language curriculum exploration needs analysis, goods and objectives, syllabus design, methodologies, testing and evaluation Both of them emphasize the importance of testing
Bachman (1990: 20) defines the term “test” as “a measurement instrument designed
to elicit a specific sample of an individual’s behavior” The definition provides the basis and general of tests Oller (1979: 1) defines language test as an instrument that attempts to measure the extent to which students have learned in a foreign language course From the two definitions, this research agrees that language test is a set of instruments in forms of questions and problems whose function is to measure an individual student’s language abilities and knowledge in relation to a foreign language that he or she has learned
Language test is a useful instrument with which educators can obtain reliable and valid information on their students’ language abilities Teachers can monitor and evaluate student learning and indentify students’ strengths and weaknesses to clarify what they really need to know Students’ test results can become an important feedback on how well
an English course has been taught or learned and a necessary feedforward for the students
in the beginning of the English courses Feedback and feedforward are very important in the teaching and learning process The author expresses the relationship between feedback
Trang 13and feedforward through an example of catching a ball When we move to catch a ball, we must interpret our view of the ball’s movement to estimate its future trajectory Our attempt to catch the ball incorporates this anticipation of the ball’s movement in determining our own movement As the ball gets closer, or exhibits spin, we may find it departing from the expected trajectory, and we must adjust our movement accordingly It means that feedforward will help teachers to give the anticipated problems at the beginning
of the course which students can have in the learning process so that students can feel more confident to avoid the problems and study more effectively Whereas feedback will help teachers to adjust the teaching method reasonably so that students can get the best results Feedback also helps the teacher to evaluate the effectiveness of the syllabus as well as the methods and materials he or she is using Test results become a feedback on the curriculum that have been developed and implemented
In addition, testing may bring many impacts on teaching and learning Hughes (1989: 01) calls the effect of testing on teaching and learning as “backwash” He appreciates the role of backwash in the teaching-learning process Backwash can be harmful if the test content doesn’t
go with the objectives of the course It leads to the problem of teaching in one way and testing in another way and vice versa However, backwash need not always be harmful, it can be positive, too A test which would be based directly on the needs of a specific group of learners will be useful for them to perform in real life
In view of the important role of language test in education system, Shohamy (2001: 2) emphasizes that “language tests need to be of high quality and follow careful rules of science of psychometrics.” In other words, a good language test must present accurate answers to the test takers in reference to the aspect of knowledge that it measures Furthermore, a high-quality language test must be reliable and valid so as to give precise information on the test takers’ language ability Language test may differ according to the purposes of their design and how they are designed (see figure 1)
Purpose
Justification Method
Trang 14Figure 1: Three Considerations for Test Choice
A test’s intended impacts refer to the effects that the test designer intends (see figure 2) Bachman and Palmer (1996) point out entities potentially affected by a test include individuals (students and teachers), language classes and programs; and society
Impact Narrow Broad
Figure 2: The Scope of Impact of Language Tests
Obviously, the importance of testing can not be denied In detailed, this research focuses on testing English for Specific Purposes Testing (ESP) ESP has been playing an important role in teaching and learning ESP at universities now From the early 1960s, ESP has grown to become one of the most prominent areas of English foreign language teaching This development is reflected in an increasing number of publications, conferences and journals dedicated to ESP discussions Similarly, more traditional general English courses gave place to courses aimed at specific areas, for example, English for Business Purposes In addition to the emergence of ESP, a strong need for testing of specific groups of learners was created As a result, ESP testing movement has shown a slow but definite growth over the past few years Obviously, ESP testing and EFL testing are very indispensable in the teaching and learning process
On
an Individual student
On student and teachers
On student, teachers, classes, and programs
On student, teachers, and programs and institutions
On student, teachers, classes, and programs, institutions and society
Trang 15To sum up, the relationship between teaching, learning and assessment are correlated because testing, teaching and learning are not separate entities A good test can
be used as a valuable teaching and learning device Teaching has always been a process of helping others to discover “new” ideas and “new” ways of organizing that what they have learned Whether this process takes place through systematic teaching, learning and testing,
or whether it is through a discovery approach, testing was, and remains an integral part of teaching and learning
2.2 Purposes of formative and summative assessments
As said above, assessment is the process of documenting to measure knowledge, skills, attitudes and beliefs There are many assessments collected in a course such as: continuous assessment, formative assessment, summative assessment, peer-assessment, self-assessment and so on However, in this research the author will focus on the relationship between two main kinds of assessment: formative assessment and summative assessment "As coach and facilitator, the teacher uses formative assessment to help support and enhance student learning As judge and jury, the teacher makes summative judgments about a student's achievement " (Atkin, Black & Coffey, 2001)
Formative assessment is designed to provide feedback and feedforward to students and instructors for the purpose of the development of teaching and learning From a student's perspective, formative assessment provides information on a student's performance, how they are progressing with the skills and knowledge required by a particular course and the problems which they will have in a course Generally the results
of formative assessment do not contribute to a student's final grade but are purely for the purpose of assisting students to understand their strengths and weaknesses in order to work towards improving their overall performance From an instructor's perspective, formative assessment is a diagnostic tool that can be used to evaluate the effectiveness of course and curriculum design Formative assessment has the potential to highlight areas in which teaching and curriculum design needs to be improved as well as any areas where teaching methods have been very effective in improving student The sample tests in this kind are diagnostic test and placement test Placement test is used at the beginning of a course to indentify a student’s level of language and find the best class for them Diagnostic test is used to identify problems that students have with language The teacher diagnoses the
Trang 16language problems students have It helps the teacher to plan what to teach in future and provide students with the anticipated problems and solutions
The purpose of summative assessment is to provide "a sampling of student achievements which lead to a meaningful statement of what they know, understand and can do" (Brown & Knight, 1999: 37) Generally summative assessment occurs at the end of a topic or the end of a course in order to evaluate how well students have acquired the knowledge and skills presented in that section or during the complete course Achievement test is a typical sample in this summative assessment
Clearly, the relationship between formative and summative assessment is cohesive which is expressed through their purposes, the teacher needs to use both to evaluate the student’s ability and enhance the quality of teaching and learning The teacher has to indentify student’s problem to assign student’s level, adjust teaching method and finally test to know how well students have acquired the lesson
2.3 Achievement tests and their characteristics
There are two above mentioned assessments and in this research, the author only uses summative assessment because of its purpose This research evaluates the final ESP test, so summative assessment is used reasonably in here, achievement test in detailed
2.3.1 Achievements tests
Achievement tests play an important role in the school programs, especially in evaluating students’ acquired language knowledge and skills during the course and they are widely used at different school levels
In the view of Sparatt (1985:145), he supposes that “an achievement test is one of the means available to teachers and students alike of assessing progress It is the aim and content of an achievement test that distinguishes it from other kinds of test”
David (1999: 2) also shares an idea that “achievement refers to the mastery of what has been learnt, what has been taught or what is in the syllabus, textbook, materials, etc
An achievement test therefore is an instrument designed to measure what a person has learnt within or up to a given time”
Similarly, Brown (1994b:259) proposes a concept that “An achievement test is related directly to classroom lesson, units or even a total curriculum They are limited to particular materials covered in a curriculum within a particular time frame” Unlike progress test, achievement test should attempt to cover as much of the syllabus as possible
Trang 17If we confine our test to only part of the syllabus, the contents of the test will not reflect all that has been learned
There are two kinds of achievement tests: final achievement test and progress achievement test
Progress achievement tests (short-term achievement tests) are always administered during the course after a chapter or a term, and often written by the teacher These tests are
of course based on the teaching program Hughes (1900:12) claims “these tests are intended to measure the progress that students are making” In other words, progress achievement tests are supposed to help the teachers to judge the degree of success of his or her teaching and help to find out how much students have gained from what has been taught Accordingly, the teachers can identify the weakness of the learners or diagnose the areas not properly achieved during the course of study In the other hand, for students, this test can be regarded as a useful device that provides the students with a good chance to perform the target language in a positive and effective manner and to gain additional confidence in doing them This way can be a good preparative and supportive step towards the final achievement test for the students because they will get familiar with the tests and the strategy to do them
Final achievement tests (longer – term achievement tests) are those administered at the end of a course of study They may be written and administered by ministries of education, official examining boards, or by members of teaching institutions They are used to check how well learners have done after a whole course in terms of objective and content of the course Therefore, according to Hughes (1990:11), there are two kind of final achievement tests: syllabus-content approach and syllabus-objective approach
The syllabus-content approach is based directly on a detailed course syllabus or on the books and other material used The test only contains what it is thought that the students have actually encountered, and thus can be considered, in this respect at least, a fair test The disadvantage of this type is that if the syllabus is badly designed, or the books and other materials are badly chosen, then the results of a test can be very misleading Successful performance on the test may not truly indicate successful achievement of course objectives
The syllabus-objective approach refers to the one in which the test contents are based directly on the objectives of the course This approach has some benefits First, it
Trang 18forces course designers to elicit course objectives Second, students on the test can show how far they have achieved those objectives This in turn puts pressure on those who are responsible for the syllabus and for the selection of books and materials to ensure that these are consistent with the course objectives Tests based on course objectives work against the perpetuation of poor teaching practice The author believes that test content based on course objectives is much preferable, which provides more accurate information about individual and group achievement, and is likely to promote a more beneficial backwash effect on teaching
2.3.2 Characteristics of a good EGP test
In order to make a well-designed test, teachers have to take into account a variety
of factors such as the purpose of the test, the content of the syllabus, the students’ background, the goal of administrators and so forth Moreover, test characteristics play a very important role in constructing good English for General Purpose (EGP) tests The most important quality of a test is its usefulness The usefulness quality generally consists
of 4 main components: reliability, validity, practicality and washback
Reliability has been defined in different ways by different authors Berkowitz,
Wolkowitz, Fitch and Kopriva (2000) define reliability as “the degree to which test scores for a group of test takers are consistent over repeated applications of a measurement produce and hence are inferred to be dependable and repeatable for an individual test taker” Bachman (1990: 24) considers test reliability as “a quality of test score Clearly, both views refer to the consistency of the test scores obtained on a test Every test should
be reliable If a group of students were to take the same test on two occasions, their results should be roughly the same – provided that nothing has happened in the interval Thus if the students’ results are very different, the test cannot be described as reliable
Validity refers to the degree that a test actually measures what it was designed to
measure Validity is often discussed under the headings: face, content, construct and
criterion-related
Content validity
This is non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured” (Anastasi & Urbina, 1997: 114) A test has content validity built into it by careful selection of which items to include Items are chosen so that they comply
Trang 19with the test specification which is drawn up through a thorough examination of the subject domain Foxcraft et al (2004: 49) notes that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved The experts will be able to review the items and comment on whether the items cover a
representative sample of the behavior domain
Construct validity
A test has construct validity if it accurately measures a theoretical, non-observable construct or trait The construct validity of a test is worked out over a period of time on the basis of an accumulation of evidence There are a number of ways to establish construct validity Two methods of establishing a test’s construct validity are convergent/divergent validation and factor analysis
A test has convergent validity if it has a high correlation with another test that measures the same construct By contrast, a test’s divergent validity is demonstrated
through a low correlation with a test that measures a different construct
Factor analysis is a complex statistical procedure which is conducted for a variety
of purposes, one of which is to assess the construct validity of a test or a number of tests
In concurrent validation, the predictor and criterion data are collected at or about
the same time This kind of validation is appropriate for tests designed to assess a person’s current criterion status It is good diagnostic screening tests when you want to diagnose
Trang 20In Predictive validation, the predictor scores are collected first and criterion data
are collected at some later/future point This is appropriate for tests designed to assess a person’s future status on a criterion
Practicality is the ability of a test to be easy to construct, administer, score and
interpret A test must be carefully organized well in advance How long will the test take? What special arrangements have to be made (for example, what happens to the rest of the class while individual speaking test take place)? Is any equipment needed (tape recorder, language lab, overhead projector)? How is marking the work handled? How are tests stored between sittings of tests? All of these questions are practical since they help ensure the success of a test and testing (Heaton, 1988; Hughes, 1997; Carroll & Hall, 1985)
The last important factor in testing is backwash or washback effect Washback is
the effect of testing on the teaching and learning processes Washback can be harmful of beneficial If a test is regarded as important then preparation for it can dominate all teaching and learning activities negatively or positively In the case the test content and testing techniques are at variance with the objectives of the course, then there is likely to
be harmful washback If the skill of writing, for example, is tested only by multiple choice items, then there is pressure to practise such items rather than practise the skill of writing itself This harmful washback is clearly undesirable An example that often comes up is the effect of the university entrance examinations in Vietnam on high school language teaching and learning However, washback need not always be harmful; indeed it can be positively beneficial If an English test for first year undergraduate students is designed on the basis of an analysis of the English language needs of these students and which includes tasks as similarly as possible to those which they would have to perform as undergraduates (reading textbooks, taking notes during lectures, etc) and administer instead of one which was entirely multiple choice, then beneficial washback can be achieved There will be an immediate effect on teaching and learning the syllabus will be redesigned, new books will
be selected, classes will be conducted differently and students’ way of learning will change
to reflect the demand of the new test
In a nutshell, the author has just give a common overview about achievement test and characteristics of a good achievement EGP test so that readers can understand how to evaluate a good final achievement EGP test
Trang 212.3.3 Characteristics of a good ESP test
Nowadays, the ESP teaching and research has achieved tremendous improvement home and abroad In the aspect of teaching, it has formed the system of Vocational English (VE: Business English, Tourism English, Hotel English, Medical English…) and English for Academic Purposes
“ESP is not a matter of teaching specialized varieties of English The fact that language is used for a specific purpose does not imply that it is a special form of language, different in kind from other forms Though the content of learning may vary, there is no reason to suppose that processes of learning should be any different any different for the ESP learner than for the general English learner” (Hutchinson, 1987)
From the above view, we acquire two points that ESP is one kind of English, with its specific language characteristics, which is not applied to teach some particular items, and the similarity between ESP and EGP is more distinguishable than their difference; the other is there is no difference in essence in the teaching principles and procedure between ESP and EGP In otherwords, EGP is the premier stage for ESP, and ESP is the advanced stage of EGP teaching The testing and evaluation for ESP should be carried out in accordance with the teaching contents and objectives Therefore, only with the efficient principles, available teaching methods and modes, it makes ESP useful for stimulating the students’ motive of language learning, arousing their enthusiasm of learning, and contributing to the construction of harmony between teachers and students Clearly, ESP tests are the same as all good EGP tests It means that every ESP tests consists of 4 mentioned components: reliability, validity, practicality and washback
However, two aspects of ESP testing that may be said to distinguish it from more general purpose language testing: authenticity of task and the interaction between language knowledge and specific purpose content knowledge
Authenticity of task means that the ESP test tasks should share critical features of
tasks in the target language use situation of interest to the test takers The key to this assessment is to present learners with tasks that resemble in some ways that they may have
to do with the language in real life Therefore, the ESP approach in testing is based on the analysis of learners’ target language use situations and specialized knowledge of using English for real communication
Trang 22The interaction between language knowledge and specific purpose content knowledge is perhaps the clearest defining feature of ESP testing In more general purpose
language teaching, the factor of background knowledge is usually seen as a confounding variable, contributing to measurement error and to be minimized as much as possible In ESP testing, background knowledge is a necessary, integral part of the concept of specific purpose language ability
To sum up, EGP is pre-stage for ESP ESP will be taught when the students have had general English grammar and knowledge ESP tests are similar to EGP tests but focus
on specific purpose in the target language use English situation for real communication
2.4 Face validity
2.4.1 Definition
Hughes (1989) defines “a test is said to have face validity if it looks as it is measures what it is supposed to measure” Its look means face validity of the test It concerns the appeal of the test to the popular (non-expert) judgment, typically that of the candidate, the candidate’s family and members of the public The test is what students and parents want and it looks familiar to them For example, for the past 8 years the Grade 9 exam has used passages, comprehension questions and grammar exercises taken directly from English 9 Students have prepared for the exam by memorizing the book This year, the Foreign Language Specialist writes the exam using parallel texts and exercises, not taken directly from the book without warning anyone This test lacks face validity Face validity is hardly a scientific concept, yet it is very important A test which does not have face validity may not be accepted by candidates, teachers, education authorities or employers In favor of this view, Mc Namara (2000: 133) defines face validity as a degree
of language test acceptability for those who are involved in its designing and use A language test is said to be face valid only if it satisfies their expectation Ingram (1977: 18),
as cited by Anderson et all (1995: 289), also agrees that face validity is “surface credibility
Trang 23comment on the appearance of the language test, although there may be little attention paid
to the content of test items Analyzing face validity of an English test is thus an attempt for gathering people’s opinion on whether the test looks valid as an English test or not
2.4.2 Relationship between reliability and validity
We often think of reliability and validity as separate ideas but, in fact, they're related to each other Reliability and validity are the two vital characteristics that constitute
a good test However, validity and reliability have a complicated relationship
If the test is not reliable, it cannot be valid at all To be valid, according to Hughes (1988:42), “a test must provide consistently accurate measurements It must therefore be reliable However, a reliable test may not be valid at all” For example, in a writing test, candidates are required to translate a text of 500 words into their own language This could well be a reliable test but it can’t be a valid test of writing To this end, if a test is valid, it must also be reliable Therefore, reliability is a necessary but not sufficient condition for validity To understand more, the author wants to show their relationship through the following figures Think of the center of the target as the concept that you are trying to measure Imagine that for each person you are measuring, you are taking a shot at the target If you measure the concept perfectly for a person, you are hitting the center of the target If you don't, you are missing the center The more you are off for that person, the further you are from the center
Figure 3: Relationship between reliability and validity
The figure above shows three possible situations In the first one, you are hitting the target consistently, but you are missing the center of the target That is, you are consistently and systematically measuring the wrong value for all respondents This
Trang 24measure is reliable, but no valid (that is, it's consistent but wrong) The second shows hits that are randomly spread across the target You seldom hit the center of the target In this case, your measure is neither reliable nor valid Finally, you consistently hit the center of the target Your measure is both reliable and valid In brief, reliability is a necessary but not sufficient condition for validity
2.4.3 Reasons for choosing face validity
As the relationship between reliability and validity shown above, validity is an indispensable quality of all good tests Hughes (1982: 22) says that, “the greater a test’s content validity is, the more likely it is to be an accurate measure of what it is to measure” Therefore, from the outset of test construction, test validity should be the most essential part of all
Validity of a language test has four facets, namely face validity, content validity, construct validity and criterion - referenced validity However, the author focuses on face validity because of some reasons
Firstly, the later three facets of validity, content validity, construct validity and criterion – referenced validity are excluded from this research because of the limitation of time and source Anastasi (1982: 136) as cited by Weir (1990: 26) stated that “face validity
is not validity in the technical sense” Face validation is significant in that it involves in whether or not the test “looks valid” to those who deal with the test, so the researcher performs the analysis of face validation Heaton (1988:60) contributed that “face validity can provide not only a quick and reasonable guide but also a balance to too great of concern with statistical analysis.”He points that the students’ motivation is maintained if a test has good face validity plays a certain role in any test and it is of great concern in this thesis According to Anastasi & Urbina (1997: 114), content validity is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured” Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct Obviously, content validity has to need a representative sample test to analyze and compare According to Bachman and Cohen (1998: 50), construct validation deals with the “judgmental and empirical justifications supporting the inferences made from test scores” Bachman and Palmer (1996: 21) also mention that construct validation is related to the “meaningfulness and appropriateness” of
Trang 25the researcher’s interpretations relevant to the actual test scores Bachman (1990: 248) mentions that criterion – referenced validity deals with demonstrating “a relationship between test scores and some criterion which is believed as an indicator of the ability tested” However, this research is not provided with the actual test scores or a sample reliable test, and thus excluded the three validation processes from its investigation
Secondly, face validity is chosen because of its importance in society As Hughes (1989) says that face validity is hardly a scientific concept, yet it is very important and a test which does not have face validity may not be accepted by candidates, teachers, education authorities or employers Agreeing with this view, Huong (2000: 69) points out that test appearance is an important consideration in test use She supposed that useful information can be obtained to inform test development by investigating the test takers’ perception of the appropriate and connection between test takers and relevant real life tasks that test takers later encounter Clearly, face validity of the test is very important in society’s evaluation because the later three facets of validity belong to the specialization of test designers
This study only helps the author to give a test suitable with students’ ability at NUTE Therefore, the reasons discussed here are regarded as a strong impetus that initiates this thesis into investigating the face validity of the achievement language test 12
at NUTE
2.5 Some measures to increase face validity
Face validity is an important aspect of a test; it relates to the question of whether non-professional testers such as parents and students think the test is appropriate If these non-specialists do not think the test is testing candidates’ knowledge in a suitable manner, they may complain vociferously and the candidates may not tackle the test with the required zeal If the test lacks face validity, it may not work as it should and may have to
be redesigned (Alderson, Clapham & Wall, 1995) Therefore, it’s necessary to give measures to increase face validity about administration and content of the test as follows:
- Test format is familiar and clear to the students;
- The quantity of questions designed in a test is suitable with the time allowance;
- Test conditions (space and atmosphere in the testing room in particular) that are biased for best that bring out students’ best performance;
Trang 26- Test has to be assured equally by collecting materials before testing and stricter in testing process;
- Test content reflects syllabus objectives and cover totally what the students have been taught in the course;
- Task types are familiar with students;
- A difficulty level of task is appropriate to students’ language ability;
- Instructions are crystal clear
The author uses these factors to evaluate the final achievement test 12 in terms of face validity from teachers and test takers’ perceptions at NUTE The author hopes that evaluating these factors will help her to give the suggestions for improving this test
Summary
In this chapter, I have reviewed the relationship between teaching, learning and assessment, different types of assessment, achievement tests, characteristics of a good EGP and ESP achievement test, face validity and some measures to increase face validity because of their importance in evaluating test validation in the process of teaching and learning in a course
Trang 27CHAPTER 3: THE STUDY
3.1 English learning and teaching at Nam Dinh University of Technology Education
3.1.1 Students’ backgrounds
Students who have been learning at Nam Dinh University of Technology Education (NUTE) are of different levels of English because of their own background Generally, students who are from small towns and mountains have less chance of learning English than those from the big cities where foreign language learning is paid much attention to Moreover, there are some students who have never learned English before because in the high school they learned other foreign languages such as Russian, French or Chinese; some have just started learning English for few years Besides they go to the university easily because they don’t have to take any entrance exams Instead, they only apply their dossiers
to be considered and evaluated Therefore, their attitude towards learning English in particular and other subjects in general is not appreciated
3.1.2 The English teaching staff
Nam Dinh University of Technology Education is one of the main universities that train engineers and teachers It has been a university for 5 years Previously, English subject belonged to General Department Since 2009, English subject has become faculty
of foreign languages Due to new foundation, foreign department only focuses on English, not other foreign languages The department includes 11 teachers, three of them are old (over 45 years old) and the rest are young (from 23 – 30 years old) They have to teach both English for General Purposes (EGP) and English for Specific Purposes (ESP) majoring economics, computing, electronics, welding and automobiles All the English teachers here have been well trained in Vietnam and none of them has studied abroad Three of them obtained Master Degree in Managing Education, not Master Degree of English; six of them have been doing an M.A English course They prefer using Vietnamese in class as they find it easy to explain lessons in Vietnamese due to the limitation of students’ English ability Furthermore, they are always fully aware of adapting suitable methods of teaching classes This results in students’ high involvement in the lesson
Trang 283.1.3 Objectives of the English course
The university has many technological fields but electronics is the very big field of the university So, seven of eleven teachers taught English in electronic field The English for Specific Purposes (ESP) syllabus for Electronic – Electrical Engineering Department students is designed by teachers of the foreign language department, which has been applied for 4 years Before starting an ESP term, students have to learn two EGP terms with 60 periods each, is covered by Headway (at Elementary and Pre- intermediate level)
in which the students only pay attention to reading skill, vocabulary and grammar; and some other skills such as listening skill, writing skill and speaking skill are ignored After finishing two EGP terms, the students work with Electronic – Electrical English textbook (textbook edited by teachers of Foreign Language Department at NUTE) with 30 periods consisting of 8 units practising reading skills such as skimming, scanning, detailed, translation ability (consisting of translating into Vietnamese or English) and providing ESP vocabulary shown as table 1
Table 1: ESP syllabus content allocation
1 Conductors, Insulators and Semiconductor
03 Reading skill, vocabulary and translation
2 Circuit Elements 03 Reading skill, vocabulary and translation
3 The DC Motor 03 Reading skill, vocabulary and translation
4 The Cathode Ray Tube 03 Reading skill, vocabulary and translation
6 Electronics in the home 03 Reading skill, vocabulary and translation
7 Semiconductor Diodes 03 Reading skill, vocabulary and translation
9 Audio Recording System 03 Reading skill, vocabulary and translation
This course book focuses on reading skills, vocabulary and translation ability, a little to language focus Therefore, the final test only intends to measure the reading skills,
Trang 29grammar abilities, and vocabulary but not listening and speaking skills Besides, the reading texts are meaningful and useful to the students because it first revises language items and then supplies the students with background knowledge and vocabulary relating
to their Electronic - Electrical specification Besides, the reading texts are also taken and edited mainly from the electronic - electrical textbooks written by Vietnamese authors In a nutshell, through three English terms, students can achieve objectives as follows:
+ In 1st and 2nd term, teachers revise students’ grammar knowledge and help them to practice skills with a special focus on reading skill in order to serve the next semester
+ In 3rd term, teachers consolidate students’ reading skills, provide ESP vocabulary and instruct them how to do translation tasks which are very useful in their future job
However, the common goal covering the above objectives is to equip the students with the general English grammar, vocabulary, some skills about reading comprehension and translation ability and general background of Electronic – Electrical English necessary for their future work
3.1.4 Checklist of the course book
The checklist of language skills and sub-skills taught in the course will help the author easily compare the content areas with the current final test 12
- Grammar:
+ simple present tense;
+ present perfect tense;
+ simple past tense;
+ will-future tense;
+ modal verbs;
+ relative clauses;
+ passive voices;
+ linking words: and, or, but, however, therefore, because, although, etc;
+ pronoun references and possessive adjectives;
- Translation: mainly sentences related to reading text in each unit;
- Reading: topics of the reading texts are mentioned in table 1 with focusing on main reading skills as follows:
+ skimming skills;
+ skinning skills;
Trang 303.1.5 Objectives of the final test
The teachers give final achievement tests at the end of each semester to achieve the following purposes:
- To assess students’ ability in reading comprehension which limits at understanding main ideas, extracting information, guessing words in context and making a little inference and understanding opinions;
- To assess students' ability to use general grammar knowledge by using correctly such grammatical input as below:
+ recognition and use of tenses;
+ recognition and use of voices;
+ recognition and use of modal verbs;
+ recognition and use of relative clauses
- To assess student’s ability to translate the technical materials into English in such a way
as follows:
+ express grammatically correct ideas within basic structures presented in the course book; + express students’ technical vocabulary field;
+ link ideas using linking words “and, but, or, because, so, although, etc”;
+ use reference pronouns and possessive adjectives
- To assess student’s ability to translate the technical materials into Vietnamese in such a way as follows:
+ express students’ technical vocabulary field;
+ express students’ background and specialized knowledge;
Trang 31- To check how much the students have required the target language skills and knowledge; and how far the objectives of the course have been achieved in the set timeframe;
- To help students to see what they have achieved during their learning process;
- To help teachers indentify teaching method, syllabus and material in order to adjust and adapt to the students’ needs and capacities
3.1.6 Difficulty level and discrimination of the final test
According to Bloom (1956), the cognitive domain involves knowledge and the development of intellectual skills This includes the recall or recognition of specific facts, procedural patterns, and concepts that serve in the development of intellectual abilities and skills There are six major categories, which are listed in order below, starting from the simplest behavior to the most complex The categories can be thought of as degrees of difficulties The first ones must normally be mastered before the next ones can take place
- Knowledge: Recall data or information
- Comprehension: Understand the meaning, translation, interpolation, and interpretation of
instructions and problems State a problem in one's own words
- Application: Use a concept in a new situation or unprompted use of an abstraction
Applies what was learned in the classroom into novel situations in the work place
- Analysis: Separate material or concepts into component parts so that its organizational
structure may be understood Distinguish between facts and inferences
- Synthesis: Build a structure or pattern from diverse elements Put parts together to form a
whole, with emphasis on creating a new meaning or structure
- Evaluation: Make judgments about the value of ideas or materials
Based on the objective of the course book and the special objective of the test, the test aims at measuring the students’ knowledge and comprehension ability Clearly, the difficulty level and discrimination of the test in the course only limit at the students’ knowledge and comprehension level The current test does not check students’ ability of using English (application level) but check whether students carefully study the textbook
Trang 32teaching at the end of the semester It means that in the 3rd semester there are fourteen ESP tests All the fourteen tests are designed under the light of syllabus – content approach and followed a common format Then, the fourteen tests will be collected and checked by the head of ESP subject Final, the fourteen tests will be sent to educational testing and quality assurance department This department is in charge of preparing the test by choosing randomly one of the fourteen tests and prints out the test Within limited scope of the study, the author would like to focus on test 12 which was sent to students K3 at Electronic – Electrical Engineering Department in the final exam of semester 3
3.2.2 The current final achievement test
ESP test 12 is a syllabus – based final achievement test in semester 3 The test consists of four parts In the first part, the test requires the students to read an ESP text and then decide true or false information through the text Part 2 is grammar part aims at testing their knowledge about grammar This part also helps students to improve their mark Finally, the two last parts of translation are aimed at testing the students’ general understanding about their vocabulary and their use of language (see table 2)
Table 2: Specification of test 12
I Reading comprehension Narrative text relating
to electric (related to topic in unit 1)
2, 4, 6, 7, 8)
IV Vietnamese – English translation
Sentences in Vietnamese (related to topics in unit 2, 3, 4, 9)
From the above table, we can see that the current test 12 reveals the test content totally aligns with what students have been taught, including a reading comprehension text with a topic in unit 1, a grammar task with using passive voices and a translation task with
Trang 33the content of the sentences from unit2 to unit 9 in the syllabus The task types are also familiar with the students including multiple choice, short answer and full answer Moreover, the current test reveals the students’ knowledge and comprehension level although the content of grammar task only concentrates on one part of grammar knowledge and the reading task does not include the basic reading skills totally
3.3 Research methods
3.3.1 Survey questionnaire
The study used qualitative method The qualitative method is applied to analyze the results from data collection of the survey questionnaire on 239 second-year students and 7 teachers of foreign language department at NUTE The questionnaire is conducted to students and teachers to investigate the face validity of the test and their suggestion for improvement
3.3.2 Interview and informal discussion
The author collects more information by giving the interviews with teachers of foreign language department and students of class DDT-3A, DDT-3B The used questions are primarily based on the above-mentioned questionnaires but the author focuses on the reasons for their choice The results of the interviews are noted down or recorded to compare with those of the questionnaire so that any variance can be revealed and adjusted with other methods
For more information, the author also gives some discussions with teachers of foreign language department about the failure and success of language testing in general and test 12 in particular The results are used as supportive data for the above-mentioned methods
3.4 Data analysis of survey questionnaires and interviews
More than two hundred questionnaires (see the questionnaire in appendix 1) were administered to the second-year students of K3, but only 142 samples were collected and 7 questionnaires (see the questionnaire in appendix 2) were applied to teachers at foreign language department at NUTE The author intends to collect data to classify the similarities and differences in perceptions of the test between teachers and students at NUTE The data was collected from the students and teachers’ survey questionnaires with 31-34 questions (including 31 questions for students and 34 questions for teachers) in total which are divided into two main parts.Part A consists of 10 questions to ask students and teachers’
Trang 34comments on the administration of the test Part B has four small main parts with 21-24 questions to give students and teachers’ comments on the face validity of the test such as general opinion (6 questions for students and 7 questions for the teachers), reading comprehension task (6 questions for students and 8 questions for the teachers), grammar task (4 questions for students and teachers), and translation task (5 questions for students and teachers).
3.4.1 Data analysis of the administration of the test
3.4.1.1 Data analysis of the format of the test
The results of the survey questionnaires on the format of the test are analyzed in a table 3 (see appendix 4) From the table 3, we can find that there is no big difference in teachers and students’ perception of the format of the test In the first question, 100% of the teachers and 69.7% of the students agree that font Times New Roman and size 12 is suitable while 22.5% of the students think that it is rather suitable and 7.8% of the students admit that it is not suitable for the test When being asked to give the reason why it is not suitable, most students want to have the test with larger font if possible (font 14 as suggested) However, they still can read the test with the font 12 It means that the font is
Ok in this test
In the second question, 100% of the teachers and 69% of the students suppose that copies of the test are clear While only26.7% of the students think that they are rather clear and 4.3% of the students think that they are not clear Few students suppose that they received the bad copies during the test when being asked to give the reasons It means that only a few copies are not good
Considering the quantity of questions of the test, 86 out of 142 students (60.6%) suppose that there are many questions in the test whereas only 2 out of 7 teachers (28.6%) think alike The reasons appear to them that the sentences 3 and 4 in part IV of the test are too long and difficult While 30.3% of the students agree that the quantity of questions is enough, the number of teachers finding them of average level is 5 out of 7 teachers (71.4%) Only 9.1% of the teachers think the quantity of questions is not adequate Clearly, there are big differences in the perceptions of teachers and students about the quantity of questions in the test The teachers suppose the students are familiar with the quantity of questions in the test because the students have done mid-term tests with the same quantity
of questions When interviewing the students the reason why they cannot complete the test
Trang 35with the same quantity of questions as mid-term tests, they answer that in class they can complete the test because there are many students sitting in a room Therefore, they can exchange the test easily When sitting a testing room, they have to do the test by themselves, so they find it difficult to finish the test with too many questions It means that the difference in the opinion results in the unfairness of mid-term test and the students’ laziness
Regarding the suggestions for the improvement of test format, 86 students (60.6%) and 2 teachers (28.6%) give the suggestions They suppose that sentences 3 and 4 in part
IV of the test should be shortened and simplified because exercises on translating into English are difficult exercises although they are available in the course book While 5 teachers (71.4%) and 56 students (39.4%) do not give any suggestions for the format of the test because they suppose that the test format is Ok
This can be deducted that in the two first questions, both students and teachers agree that font size, and copies of the test are good and do not affect the students’ test results It means that the current test assures the fairness in the exam However, in the third questions students and teachers differ in their perception of the quantity of questions
in the test is due to administering the mid-term test unequally As a result, they give suggestions in the fourth question.
3.4.1.2 Data analysis of the logistics of the test
The results of the survey questionnaires on the logistics of the test are analyzed in table 4 (see appendix 4) In students’ opinion, time allowance for this test is not enough with 82 choices (57.7%), adequate with 50 choices (35.2%) and too much with 10 choices (7.1%) When being interviewed to give the reasons, the students say that they cannot complete the test in such a short time because of their limit language abilities Whereas 28.6% of the teachers suppose that time allowance for this test is not enough and 71.4 % think it is enough Most teachers believe that students can do this test completely in a short time because they are informed about the time of test and practice mid- term tests in that same time with the same format The difference in the opinion also results in not having mid-term test fairly and the students’ laziness
To the question of whether the rooms for testing are big enough, 83.1% of the students and 100% of the teachers choose option “Yes” Only 16.9% of the students who choose “No” believe that the testing rooms are not big enough The reason for this choice