LIST OF TABLES ‘Table 1: Bookmap of the English 11 textbook Table 2: Recommended structure ofa 45 minute test ‘Table 3: Number of pronunciation test items having their underlined parts
Trang 110a NG TIONG TRANG
A STUDY ON VALIDITY OF 45 MINUTE TESTS FOR
THE 117 GRADE
NGHIÊN CỨU TÍNH GIA TRI CUA BAI KIEM TRA 45
PHUT TIENG ANH LOP 11
M.A COMBINED PROGRAMME THESIS
Major: Methodology Major code: 60.14.10
TIANOI - 2009
Trang 2
HOANG HONG TRANG
A STUDY ON VALIDITY OF 45 MINUTE TESTS FOR THE 11™ GRADE
NGHIEN CUU TINH GIA TRI CUA BAI KIEM TRA 45 PHUT TIENG ANH
LOP 11
MLA COMBINED PROGRAMME THESIS
Major: Methodology
Major code: 60.14.10 Supervisor: ASSOC PROF DR VO BAT QUANG
HANOI - 2009
Trang 3
TABLE OF CONTENTồS sen cesnnnesessansseeassanieressnamersseas cane
INTRODUCTION
1, Rationale ÍDT the sludiy non HH Hee
1.1.1 Language testing a brief history and its characteristics
1.1.2, Purposes of language testing
1.1.3 Validity in language testing
1.1.3.1 Definition and types of Validity sesnenmueininanesnen se
Trang 4
1.2.2, Class progress tests as a type of achievement tests 10
1,3 TESTING TECHNIQUE8 - c2: 2 vn hon the
CHAPTER 3: THE STUDY
3.1 THE CONTEXT OF TEACHING AND TESTING ENGLISH AT HIGH SCHOOLS
3.2 AN OVERVIEW OF THE TEACHING AND TESTING OF ENGLISH LANGUAGE
3.2.1 English textbook for the 11" grade seseemunieesneeneeetnsinnenent 3.2.2 Syllabus for 11” grade English language subject 24
CHAPTER 4: MAJOR FINDINGS
Trang 54.1 Phonetics section in 45-minute tests
4.1 1 Data conceming construct validity
4.1.2 Data concemiing content validify
4.2 Grammar section in 45-rinulc Lesls,
42 1 Data conceming construct validity
4.2.2 Data concemiing content validify
4.3 Vouabulary scetion in 45-rinnte tests
43.1, Data conceming construct validity
4.3.2 Data conceming content validity
Trang 7LIST OF TABLES
‘Table 1: Bookmap of the English 11 textbook
Table 2: Recommended structure ofa 45 minute test
‘Table 3: Number of pronunciation test items having their underlined parts dissimilar in
letter format
Table 4: No correct answer
‘Table 5: Apparent correct answer
‘Table 6: Underlined Letter(s) nol corrssponding lo the sounds tested
‘table 7: Content validity of phonetics section in Group | tests
Table & Content validity of phonetics section in Group 2 tests
Table
Content validity of phonetis
eclion in Group 3 tests
‘table 10: Content validity of phonetics section in Group 4 tests
Table 11: Summary of contant validity of pronunciation test items of 4 test groups
Table 12: Summary of Lcclmiques for grammar Losting in 30 tesls
‘table 13: Construct validity of grammar items of Group 1 tests
Table 14: Construct validity of grammar items of Group 2 tests
Table 15: Construct validity af grammar items of Group 3 lests
Table 16: Construct validity af grammar items of Group 4 tests
‘Table 17: Content of grammar component of Group 1 tests compared to the syllabus
Table 18: Contant of grammar component of Group 2 tests compared to the syllabus
Table 19: Content of gratumar component of Group 3 tests cormpared to the syllabus
‘Table 20: Content of grammar component of Group 4 tests compared to the syllabus
Trang 8Table 23
Table 24
‘Table 25:
Table 26:
Content validity of vocabulary test items of Group 1 tests
Content validity of vocabulary test itoms of Group 2 tests
Content validity of vocabulary test items of Group 3 tests
Content validity of vocabulary test items of Group 4 tests
Trang 9development within the last fourty (nearly fifty) years in terms of professionalization,
internationalization, cooperation and collaboration (Stansfield, 2008, p 319) Along the
process of its development, validity, together with fhiress, has become a matter of increasing concern and it is predicted that research into validity will form “the prominant paradigm for language testing in the next 20 years” (Bachman, 2000, p 25)
On discussing validity, nmuch has been said about validation of standardised tests, especially those large-scale KL tests such as TOBEL, IEL/I'S and TOKIC (Stoynoff, 2009; Bachman ct al., 1995, cited in Stansficld, 2008) since decisions based on the scores of these tests are usually cousidered of prime importance to test takers in both their career and life perspectives ‘eacher-produced tests, on the contrary, receive much less attention Studi
(Davidson and Lynch, 2002, p 65, cited in Coniam, 2009, p 22), sinoe in a language test
“language is both the instrument and the object of measurement” (Bachman, 1990) (which
means difficulty regarding the carci choice af finguistic
duc to teachers’ lack of time and resources (Popham, 1999, p 200, cited in Conia, 2009,
p 227) Also, teachers are “unlikely to be skilled in test construction techniques” (Popham,
2001, p 26, ciled in Coniam, 2009, p 227) That explains the +:
of teacher-produced tests is often lower than that of standardised tests in terms of reliability
(Cunningham, 1998, p 171, cited in Coniam, 2009, p 227), and this leads to the low
lemenls in ø language tast), and
son why (est item quality
validity of test scores interpretations as well
Nevertheless, however inferior teacher-prochiced tests are compared to standardised teats in terms of quality Gccording to several studios), Tittle fachml ovidence las boon found to support this (Coniam, 2009, p 227) Soranastaporn et al (2005) (cited in Coniam, 2009) attempted to compare concurrent validity between achievement tests designed by
"Thai language teachers and standardised tests like TOLVL and ILLI'S and has found low correlations between the two, Another study conducted by Coniam into the reliability and validity of teacher-produced tests for EFL students at a university in Hong Kong reported
Trang 10
In Vietnamese context of sducational reform, lexlbooks at primary and sccondary level have all been redesigned in structure and content to keep pace with current changes and development in society as well as in pedagogy English language textbooks following the lend, started to be replaced in 2004 and the replacernienl process has just been finished
in schoolyear 2008-2009 Despite the fact that techniques and guidelines for assessment have been provided in the new textbook set, there has not been any investigation into quetily of the actual tests that (oachers produce and use for their studenls al school and whether teachers follow these guidelines closely This situation calls out for research into quality of English language tests used at secondary schools so as to have a clearer and Tore a
urate pickure of language testing in Vietnam
2 Significance of the study
English language has been baing leamt by over 90% of school pupils and university students in Vietnam, not to count the munber of people learning English outside schools
and universilics Therefore, assessment of the qualily of teachcr-produeed tests will tay the
foundation for a valid interpretation of the quality of language cducation at schools, which
in tum helps form directions and guidelines for further instruction and assessment at
tertiary level and at othcr language cducation centers and institutions
In a narrow seale, results of the quality assessment of sehool tests will assist in improvement of test items quality, crcating more reliable and valid tests
3 Aims of the study
Within the small scope of an MA thesis, this study only aims at investigating two aspects of validity of a common type of English tests used in schools in Vietnam, In particular, (his escarch (ries to investigate content and construct validity of Ibe Tangtmgc components of English forty-five-minute tests used fox the 11" grade in some high schools
innorthem Vietnam.
Trang 11north of Vietnam, Na other types of tests or other grades were investigated The language used in those tests is English so all the findings and discussions are restricted to the English language only lowever, suggestions are usefill to the teaching of other foreign languages Furthermore, the scope of an MA thesis could only altow far an investigation into twa types of validity, that is, content and construct validity and the area chosen for
investigation is the language campanents in the tests collected
5, Rescarch questions
In short, this researeh aims at answering the following questions:
1.1 How valiđis the construct of the language components in 45 minute Engtish tests for the LE grade?
1.2, How validis the contant of the language components in 45 mirnte English tests for the 11°" grade?
Or pul it in other words, tho roscarch will focus on finding out (1) whether conten
of the language components of those 45 minute tests follows closely English 1f syllabus and (2) whether test items of the language components can really measure what they are Pirported to measure, Tn athor words, this tosvarch investigates content validity und
construct validity of the language components of forty five minute tests
6 Organization of the study
‘This research report is divided into four main parts, After the introduction with an
findings are relevant and beneficial to this one The second part discusses methodology of this study, including the research approach, methods of data collection and data analysis
the first part which revie
The third part presents the study in detail, including the contest of teaching and testing English at high schools when conducting this research, syllabus of the LI" grade and information on forty-five minute tests, ‘The fourth part reports all findings and their discussions as well as recommendations Finally, the report ends with the conclusion part
which summarizes the research in some main remarkable points.
Trang 121.1.4 Language testing —a brief history and its characteristics
Language testing, as 1 usually think of it, involves testing the examinee’s level of understandityg and using {he language However, the main fanctions that language testing serves varies according to different approaches and different periods in its history
As Stansficld (2008) reviewed in his article in Language Testing 25, Spolsky (1978) divided language testing history into three periods or stages, up to his time, ie., pre- scientific, psychometric-structuralist, and integrative-sociolinguistic In the first period, language cxperts were involved in the development of language tests and because of their presence, they claim their tests to be reliable and valid This stage corresponds to the first approach to language testing: the essay-translation approach, in which “subjective judgement of the tcachsr” is of utmost importance, rather than “Skill or expertise” in testing (Heaton, 1988) Popular components of a language test in this stage arz essay
writing, translation, and grammatical analysis (Lleaton, 1988)
The second period saw the dominance of structural linguistics and this explained the reason why test items in this stage were designed to test discrete language elements (such as sounds, words, and struclurcs) in isolation from context (Stansfield, 2008, p.312) This came to be known as discrete point testing, and named as the structuralist approach to language testing Also, the emphasis of this approach on quality of a language test was put
on reliabilily and objectivity (Healon, 1988, p.16)
‘The third period — the integrative-sociolinguistic stage — witnessed a more scientific appearance of language lesting compared to the previous slages as statistics slarled to be utilized in the examination of tests John Oller, an outstanding author of this period, proclaimed that there was “a general factor” constituting language proficiency, and he called il “a grammar of expectancies”, which could be “dircelly tested through the lars test” (Oller, 1972, 1973, 1975; cited in Stansfield, 2008), Cloze tests and dictation,
together with oral interviews, translation and essay writing, are present in most integrative
toals and this was callsd the intogrative approach to language lesting
Trang 13which, according to Stansfield, “brought US and European testing specialists much closer
together by the early 1990s.”
d the shift of conven from icliabitity to validity (Stansficld, p 318),
Throughout a nearly 50 year history, from the 60s of the twentieth century up to now, language testing has undergone several changes in its characteristics lts nature has
developed to become “Tess impositional, more humanistic”, “conaciver? not so mach to catch people out on what they do not know, but as a more neutral assessment of what they do” (McNamara, 2000, p 4) Also, the computerisation of language tests enables one test
to be carried out almost anywhere in the world, by examinees of any nation, any race as long as there is a computer comiccted to the internet, or computers may help tailor the content of the test to the particular abilities of candidates (in case of computer-based tests such as TORPL CRT) The liviled number of asses
makes language tests fairer and the interpretation of test scores more reliable and valid,
Sars OF automatic scoring somehow
1.1.2 Purposes of language testing
Trang 14students lave wastered what thay have been laught arc called progress tests and these are usually regarded as “the most important kind of tests for teachers” (Heaton, 1997)
2 Encouraging students: Lcamning languags is unique in the scnsc that students at certain levels of proficiency do not realize that they are making progress, which will, of course, disappoint them [hat is why a good test can help show students that they actually are moving forward, thus encouraging them lo conlinug taking efforts in their language
5 Placing studeris: Tzsls are sometimes also given to calegorixe stuctonis into
ditferent groups based on their ability Language tests are offen divided into several levels
1, CAL, CPE (as in the Cambridge rankings), or A-, B-, C- level in the Vietnamese language education system, and so on
of language proficiency such as KEI, PET, i
6 Selecting students: After the purpose of finding out about students’ ability, strengths and weaknesses comes the lask of selecting students for a job or a course Categorizing students is inevitably one part of identifying and selecting them
7, Finding oul abonl proficiency: This purpose of language lasts relates closely to two other purposes mentioned above, that is, placing and selecting, Actually, finding out about students’ language proficiency is just one step towards making decisions concerning students’ future cdlucalion or fisture Fife (migration, for oxampls) If language tos
serving
Trang 15
sient of attitudes and sociopsyehologieal differences” (Henning, 1987)
1.1.3 Validity in language testing
1.1.3.1, Definition and types of validity
Validity refers to “the appropriateness ofa given test or any of ils component parts
as a measure of what it is purported to measure” (Henning, 1987) Validity is “the most important consideration in test evaluation” according to the Standards for Educational and Psychological testing (1985, p.9, cited in Wright & Stone, 1999)
Traditionally, the Standards discussed three types of validity: content retated,
criterion related and construct related, which were considered th “yelated facels of a
single problem” (Wright & Stone, 1999)
In the modo limgs, validity is still considered a unitary concopt made up of several components, the validity of each of which will contribute to the overall validity of test application and use
Additionally, validity can be seen tiom both qualitative and quantitative aspects Qualitatively, validity includes content and constmict “hese two forms of validity explain the organivalion and construction of ilems and their use in cheiting manifestations of the variable” (Wright & Stone, 1999) The quantitative aspects of validity, however, have no
relation to text or content, and they are rather statistical and numerical Criterion-related
validity indoed falls into this category
Besides, we can also talk about empirical and non-empirical kinds of validity, which respectively corresponds to the quantitative and qualitative mentioned above Examples of non-empirical validity are face/content validity and response validity, while those of empirical are concurrent and predictive validity, namsty critsrion-related validity
Trang 16
from a specified procedure” (Cronback, 1971, p.147; cited in McNamara & Roever)
1.1.3.2 Content validity
It is generally assumed that content validity deals with the representativeness and
comprehensiveness of the content of the test so that the test is a valid measure of what it is
Borg & Gall, 1974; Bachman, 1990; McNamara,
supposed to measure (Henning, 1987
2000; Teaton, 1998) Therefore, in order ta assess content vahidity oÏ tesl, we have la look at two aspects of its content, that is, representativeness and comprehensivensss, or i
other words, content relevance and content coverage (Bachman, 1990, p 244)
With regard to content relevance, Messick (1980: p 1017) (cited in Bachman,
1990, p, 244) suggested that the investigation of content relevance requires “the
specification of the behavioral domain im question and the altsndant specification of the
task or test domain” This can be understood that not only the content of the test is a matter
of cantent validity but also the setting in which the test is given, or the measurement procedure Popham (1978) (cited in Bachman, 1990, p 245) specifies the clessznls in test
design: “what it is that the test measures”, “the attributes of the stimuli that will be
presented to the test taker”, and “the nature of the responses that the test taker is expected
to make Hamblcton (1984) relates thesc three clements to content validity (in Bachman)
Concerning content coverage, test developers need to closely analyse the language lested and the course abjsclives (Heaton, 1998) so Mimat there is always an apparenl correspondence between the two, This is especially true to the achievement tests while things would not be that easy in case of proficiency tests for test designers in this context
have to base on their knowledge, exparicnes and research results lo decids whieh content
to choose
Content vatidity is onc component of qualitative vatidily as mentioned abave, and il plays a central role in developing, language tests for specific purposes, for which content
Trang 17to be able to do with language in non-test contexts (in proficiency tests), construct validity
is concerned with the relationship between “performance on tests” and “a theory of abilities, or constructs” (Bachman, 1990, p 255) And a test which shows considerable correspondence between the two is said to have construct validity
Contruct validity has increasingly been viewed as a unified concept which is formed by tnze other aspeels of validily: content validity, crilcrion-rolated validily and construct validity This way of understanding construct validity was proposed first by Messick (1980) (cited in Bachman, 1990, p 241)
Contruct validity is indeed the unifying concept vat integrates criterion und corilenl considerations
into a common framework for testing ratienal hypotiieses ab out theoretically relevant relationships
(Messick, 1980, p 1015) (cited in Bachman, 1990, p 236)
In order ta assess construct validily, the construct Io be measured has lo be defined fmsl Bachman (1997) noted thal, investigalimg contruct validity necds to lake inta
consideration both construct defimtion and characteristics of the test task
Furlhertnore,
ording to Brown (2000), construct validity can be dsmonstrated via either an experimental study or an intervention one In an experimental study, two groups are compared based on their performance One group is with contruct and the other
is not If the group with construct performs better than the other one without construct, then the test is said to have construct validity For an intervention study, a group weak in
the construct is tested, then taught the construct and laler re-lested If there is a sigreificant
Trang 18difference in the results of the pre-test and the post-test, it may mean that the test has
construct validity
Additionally, in “Language ‘lest Construction and livaluation””, Alderson, Clapham and Wall (2001) prescns several approaches ta construct validation including comparison with theory, infernal correlations, comparisons with students’ biodata and psychological characteristics, and multitvait-multimethod analysis Among which, multitrait-nmultimethod
is the msl complicated method This study used “comparison with theory” whils assessing
characteristics of testing techniques as the method to evaluate construct validity of tests 1.2 CLASS PROGRESS TESTS
1.2.1, Language tests — definition and types
Language tests can be simply understood as the tests that evaluate examinees’
language ability (which may include “language competence”, “strategic competence”, and
“psychaphysiological competence” according Io the communiculive approach to language
testing (Weir, 1990)) Bachman (1990) mentioned five features to categorize language tests, and each criterion will result in different test types According to purpose or use, there are selection, entrance, and readiness tesls (related lo admission dzcisions): placement and diagnostic tests (regarding specific areas which need instruction), and
progress, achievement, attainment, or mastery tests (in terms of how well students achieve
the objectives of the study program, or how students should “proceed with the program’)
Or we can have fheory-based tests like proficicney tests and yélabus-based tests bike
achievement tests when talking about the content of the test Regarding frame of reference, there are norm-referenced awl criterion-referenced lexis, or subjective versus objective
tests if basing on scoring procedure, or mudiiple choice, completion, dictation, cloze tests,
and so on, when considering testing methods used in a test Also based on testing methods, MeNamara could divide Less inlo paper-and-pencif and performance losis
Generally, according to Heaton (1998), most testing specialists divide tests into
achicvernent/allainment, proficiency, aplitude and diagnostic tests
1.2.2, Class progress tests asa type ef achlevement tests
Trang 19According to Leming (1987), achievement tests “are used to measure the extent of
learning in a pr
ribed contml donmin, oflen in accordance wilh explicitly stated objectives of a learning program”, While proficiency tests are knowledge-based, achievement tests are syllabus-based and therefore, if it is not based on a specific syllabus,
il is no longer an achievernent lest, Syllabus conlent and objectives are the firsl and
foremost criteria on which achievement tests are based and assessed
Ls,
Class progress tests avg a snblype of achievement lasts, oflen referred lo as progress achievement tests, besides final achievement tests, and they are also the most popular test
type, commonly designed by teachers in and for a specific situation (Heaton, 1998) In
ordir lo desi @ class progress losl, ø loacher oflơn has tơ base on histher knowledge of students’ ability, objectives of the program he/she is teaching, content of the specific pat
of the program that he/she is hoping to incorporate into the test, and his‘her available source of Les! Lisks,
With a view to evaluating the extent to which students have mastered what they
have heen (aught in the program, class progress lest also provides students with a clunas fa show their progrsss, thus, cncouragc thcm to Jearn and to make continuous efforts in their study, It is similar to a teaching device which stimulates leaming and reinforces what has teen laugh (Heaton, 1998),
Via progress tests, students realize whether they have mastered the essential knowledge, how muich they have mastered il, and which language areas they should review
and pay more attention to
Unlike achievement 1:
which are usually given al the end of a semester or a course, progress tests are conducted throughout the course/semester, focusing on the very
recent, important items that students need acquire Without progress tests, certain quite
imporlant ilsms may be ignored since they are important al tho unil-fevst bul not so important at the programlevel to be included in the achievement test Progress tests accomodate them all and theretore, is a better and more comprehensive reflection of stndents’ understandings and progress
Trang 20As advocates for continuous, formative assessment continues to grow in number,
progress tests have long rermaincd a central role in any cducslional program
1.3 TESTING TECIINIQUES
In order to test students’ language skills or language areas, test designers have to
sting techniques can be simply understood as “mncans of cliciting behaviour from candidates which will tcll us about their language abilities” (Hughes, 1989, p, 59) According to Hughes, the ideal testing techniques will have to satisfy firor requirements:
base on different testing techniques or test methods
1, will elicit behaviour which is a reliable and valid indicator or the ability in which
we are interested,
2 will elicit behaviour which can be reliably scored;
3 arc as cconomical of time and cffort as possible;
4 will have # beneficial backwash ellecl
Regarding categorization, common testing techniques may be divided in terms of the language areas or skills they are applied to, for example, techniques to test grammar, vocabulary, roading, listoniny, writing, andl spcaking Rosidas, wo also have objective and subjective testing techniques according to whether the test items will be graded objectively
or subjectively
To seve the objectives of this study, this section will first discuss the differences between abjective and subjective testing, ‘hen common types of objective and subjective Lesting techmiqnes will be presented
‘To begin with, subjective and objective here refer to the scoring of tests, not the construction of tests or performance an lesls Every stage in devising a tes requires teachers/test designers to make subjective judgements on selecting what to test and how to test As for students, they also havs to carry out subjective judgements when doing the lests The only Ihing objective here is how leachers/markers grade the tests IT the tests will
Trang 21be scored the same no matter who grades it, they are objective Otherwise, they must be
However, objective testing is oflen criticised ơn the ground thai objective testing does not allow for real communicative ability to be tested, Instead, students are tested on their ability to manipulate language and such situations have never happened in everyday
language use Resides, objective testing gives roam to wild gu
though, most students base their guesses on partial knowledge (Heaton, 1998, p 27), it is highly likely that they do not know anything at all and just do ths test with simple, uneducated guesses Chanecs arc thal they will oflen have 25% of gelling the corrcet
sing and chances, Even
performance, they will contine lo occupy a slable and firm posilion in language Lesling
And because of the advantages and disadvantages of these two types of testing, it is
recommended that a good test should include both subjective and objective test items
* Multiplo-choice questions— the most common objective testing technique
Mulliple-choice questions are those in which there is only one carrect answer called key of answer among, several options, Those incorrect options are disiraciors, aiming at distracting students from the key
Reliable, rapid and economical scoring is the most striking charactezistic of multiple-choice questions, which explains the reason why multiple-choice questions
Trang 22(MCQs) are favoured in many cases (IJughes, 1989, Cohen, 1994) Llowever, there are several disadvantages of MCQs thal, Hughes (1989) anxl Weir (1990) have revealed:
1 ‘The technique tests only recognition knowledge
2 Guessing may have a considerable but unknowable effect on test scores
s what can be tested
3 The technique severcty r
A It is very difficult to write successful items (Common problem areas include: more than one vorreel answer, no eorteet answer, (here are clues in the options as to which is coxreel, ineffective distractors)
5, Backwash may be harmful
6 Cheating may be facilitated
7, There is considerable doubt about their validity as measures of language ability Answering MCQs is an unreal task with distractors presenting choices that otherwise might not have been thought of,
+ Gan-filling:
Gap-filling is “the test in which the candidate is given a short passage in which some words or phrases Ive been deleted, The candidale’s task is lo restore the missing words” (Aldsrson, Clapham, Wall, 1995) Gap-filling indeed is a modified form of cloze test and it has managed to avoid cloze tests’ weakness, Weir (1990) named it “selective deletion gap-filling” Gap-filling has been very useful in testing grammar, reading
ng thai arc
comprohonsion, or vocabulary since Lest wrilers are able to focus on the
considered important by selecting them to be deleted, The difficulty in using this testing technique is to ensure that students are led to write the expected words in the gaps It would be ideal if there is only one correel answer for cach gap, hawever, this is difficult ta achieve Therefore, in order to achieve marking reliability, it is essential that the number of alternative answers be reduced to the minimum and no other possible answers be not listed inthe answer key.
Trang 23A banked gap-filling task can be the solution to this (Alderson, Clapham, Wall, 1995) Ina banked gap-filling task, 1
sing words and phrases are provided, together with some distracting words, which means that there are more words/phrases than necessary And students’ task is just to select the correct word for each gap
According to Weir (1990), this technique “‘testricts to sampling a much more limited range of enabling skills than đo the short answer and multiple-choice formats”
Sometimes the deleted word does not at all affect the sentence, that is, the sentence
is equally good with or without the deleted word Such casz should be avoided because of
its confusion towards students
Sentence transformation Items:
This type of item is very useful for testing ability to produce structures, so it can test grammatical production It is the objective item type which “comes closest to measuring soma of the skills Iested in composition wriling”, although transforming sentences and producing sentences are not alike
There are two common types of sonlence Iransfermation In the first Lype, there is often at least one word given at the beginning of the new sentenee and the candidate’s job
is to finish the sentence with exactly the same meaning as ths original ane In the second ype, the candidate is given one word lo include it in the new sentence, and he can pul il anywhere in the new sentence as long as the word is not changed in form and the new sentence can still remain the meaning of the original one
This type of test format is somehow similar to completion items in the sense that there is often more than one correct answers However, test designers can still be aware of all possible correct answers, and of the specific arca they arc testing
‘This item type is more suitable for use in intermediate and advanced tests than in ests al an elementary levet (Ilealon, 1997, p 10T) maybe cue lo the fact thal elementary level often involves few and too simple structures for different ways of expressing, the same thing
Trang 24According to Lleaton, the major shortcoming of this item type is the lack of context
“Il is practically impossible to provide a context for items involving the rewriting of
sentences”
Although this item type is often used im the writing section of the test, and some people refar to it as a kind of controlled writing, stilf I have the feeling that this item type involves more grammatical knowledge than writing skills and this is more like testing grammar production
* Besides the above-mentioned techniques, other techniques may include true/false items (@ modification of multiple choice questions), error reengnition (either in multiple-choice format just like mulỖtiplz choice questions or having no options and students have to tind out the mistakes themselves}, sentence building (which is to some extent like sentence
Trang 25CHAPTER 2: METHODOLOGY OF THE STUDY
2.1, TYPE OF RESEARCH: A QUALITATIVE RESEARCH
This rescarch is conducted qualitatively in the sense thal iL docs not air at testing hypothesis or generalization, but rather “exploratory” and “discovery-oriented” (Nunan, 1992) as qualitative research “is not set out to test hypothesis” (Larsen, 1999)
Burnes (1999) detines qualitative research as the one conducted “to draw conclusions from the data collected to make sense of how human behaviours, situations
and experiences construct rzalitics” When onc carrics out qualitative rescarch, ons wants
to find out what is going on “from the actor’s own frame of reference” (Nunan), that is from the points of view of those being investigated Besides, qualitative researchers view cach individual as a unique entity so there is no point in generalization booanse there is no theory that fits all and is true to all Because of'ne generalization, the mumber of samples in qualitative research is often restricted and underplayed, While quantitative data are usually gathorcd usimg probability sampling, Uw is, cach unit in the population stands some chance of being selected, using some form of random selection, qualitative research mostly relies on non-probability sampling for data collection Non-probability sampling does not
involve random selection, and dozs not “depend on the rational of probability theory”
(Trochim) Also, cach rescarcher is a unique individual He brings his viewpoints into his
research so each research is actually biased by its researcher(s)’s individual perceptions (Trochinj; thus, establishing external validily or objectivily in any research, according Lo qualitative researchers, is just pointless
Additionally, white many researchers claim thal there would be no numbers (quantification) in qualitative data, Trochim (2006) argues that “all qualitative data can be coded quantitatively” or “anything that is qualitative can be assigned meaningfill numerical vutuos” Indood, “qualitative” data arc usually categorized in the analysis process and the
act of categorizing is quantitate in itself, which many people fail to realize (Trochim, 2006) Trochim furthers his statement by saying that “all quantitative data is based on qualitative judgement” and he believes that without qualitative judgement, quantitative data is just valueless
Trang 26provinces investigated does not allow for any generalization, but hypotheses instead And hypotheses only appear as a result of the investigation, not al the beginning of the researc since subjects of the resemch are forty five minte English written tests for the new English 11 and there has been no research into those prior to this one so there has been nothing for the rescarcher fo hypothesize aboul With a view te performing a close examination of the content and construct of those tasts, the qualitative approach proves to
be relevant and effective sine “data obtained from qualitative research is usually detailed, tích and deep” (Burnes, 1999, p 22-23) and il provides indopth understanding of thu issuc
Turthermore, in this study teachers” practice of test designing at some high schools
will partly be revealed through their which have all boon ntilized for their students There is no variable to be controlled here, Evarything is investigated just as itis in real life, in “naturalistic [ sstting without controlling variables” (Humes)
2.2, TECHNIQUES
2.2.1, Dala type and data collection
Data for this research was taken ftom 30 forly-five-minute tests collected from ten high schools in five provinces in the north of Vietnam ft included prommmeiation, grammar and vocabulary test items, all in werds, so the data collected existed in only the qualitative form The tests collected contain from 45 to 60 questions of mostly objective types, the length of which depe
in schools røtion-wido and has boon approved by Vietnamese Mi
Trang 27number of the ressarcher°s aoquaintanoss in other provinoss is limited, and dus to the fact
thal teaehers arc ofice hesitant and urnvilling to provide their
{designed tests for assessment for fear of losing faces and having troubles later on, collecting data for this research turned out to be really hard That explained for the fact that even with convenience sampling, the researcher had managed lo gel samples from ơnly 10 schools in
5 out of 26 northern provinces As a result, the initial purpose of generalizing results of this study appeared impossible, and this study had been switched into a preliminary one, potting a gross estimate of some aspects of the current testing practice instead,
With convenience sampling, the researcher took advantage of as many relationships
as possible to contact leachors in northern provineus and finally gol responses from tor people who is currently teaching English at high schools, They either sent the soft version
of their tests to the rescarcher’s email address or scanned the hard copies and sent the files
vould not be
again via cmails or faxcd the hard copies lo the researcher Sometimes if thi
done, the researcher had to travel direetly to the provinces to collect data
Regarding the dala collected, as the investigation focused or new English textbook and the new English 11 has just beon taught for two school years (2007-2008 and 2008- 2009), with four forty-five minute tests each year, in general there have been approximately sight forty-five minute tests used in each high school alt over Vietnam at most (if not to consider the situation when the tests are re-used by teachers) Thus, in order
to achieve the highest reliability, I had tried my best to collect as many 45 minute tests as possible in each school In some provinces, T mamaged to get over 10 tests (about 5 lesls zach school), but that number was just impossible to reach in other provinces That was why finally I had to settle with 3 tests each school, which meant 6 tests each province and
30 lesls in total, All those tests wore of course Tandomily sclcelod as long as thay clearly identity which units in the text book they are testing on
2.2.2 Data analysis
Ater the tests were gathered, they were first coded A, B,C, D, E according to the provinces they are from and then 1, 2, 3, 4, 5, 6 to put them in order in their province groups for the sake of casy idenlification and analysis Names of either the schools or the provinces will not be mentioned in the analysis so as to protect the faces of the teachers
Trang 28and the schools willing to provide data for this study (Gf necessary) Pronunciation test terns were taken from “phon:
assessment were mainly extracted ftom the “vocabulary and grammar” section Sometimes test items ftom reading, writing, and listening sections were also investigated provided that
ction” while grammar and vocabulary items: for
they focus mainly on testing students” sertence-level grammatical knowledge Questions testing text-based grammar were ignored, as they involve too many types of knowledge items (understandings of grammar, vocabulary, and discourse, and so on) in order to answer them, and thus, difficult to analyse which lype of knowledge they are cmphasizing
‘As for writing questions, almost all of them are sentence transformation and sentence building items, which in fact require students” production of grammar knowledge rather than their writing skills, Such questions, thersforc, were counted as grammar Lost items and included in the investigation as well
Fach Iamguage component (phonetics, grammar and vocabulary) was od
separately by first examining the testing techniques used to test it, evaluating those techniques and the actual items designed in order to arrive at some rough estimate of construct validity After that, content of those test items were compared to the content of the couresponding units in the textbook to find out the cxtent 4o which the test contained relevant phonological, grammatical and lexical points This stage was performed to discover coment validity of the Tanguage components of every esl, which also answered the second research question Results of this stage were expressed in number as each
content-relevant item was given one point and then those content-relevant items in each
instead of “yes” and “no” just for the sake of convenience in caculation
Results of data analysis process were presented in table format and then reported using descriptive method,
Trang 29CHAPTER 3: TILE STUDY
3.1 THE CONTEXT OF TEACHING AND TESTING ENCLISH AT HICH
curriculum in Vietnam
Aficr over 20 years, English cducstion in Vietnam has recorded several achievements such as the increasing number of qualified teachers of English and the frequent conduction of the English national examination for high school giffed students
However, several problems still remains For example, Ihe simullamzous implementation of two sets of textbooks results in curriculum inconsistency and knowledge waste since students who have already learnt English ftom grade 6 may have to start all over again in grade 10 no maller how mucl knowledge of Engtish they have acquired Or the number af teachers of English who tend to lack knowledge of English and ELT (English Language
“Teaching} training as they previously were teachers of Russian leaming inglish for a short period of time to change jobs are still high Those teachers are often not able to communicate in English with native speakers and mostly very weak at listening and speaking skills (Iloang et al., 2004)
Regarding methodology, several rescarch has shown that most English teachers in Vietnam still follows grammar-translation method, focussing on lecturing to their students and prachsing one-way leaching wilh studeris just taking noles and repeating what (heir teachers have just said (cited in Hoang ef al.), Grammar-translation approach still prevails
in Vietnam although in the world it has long been replaced by communicative and then task-based approach
Such a situation calls for a comprehensive reform in various aspects of English language cdncation in Vietnam Vichamese Ministry of Education and Training started thơ
Trang 30innovative plan by the program of redesigning all Lnglish textbooks fom 6” grade
3.1.2 The testing imovation
Changes in content and method also lead to changes in assessment and testing
Previously (before the application of the new set of textbooks), classroom tests focus mainly on grammar, vocabulary, reading ard sometimes wriling skills There were neither
speaking nor listening tests, and there was no part to test prommeiation, either However,
the new textbooks, with its emphasis on communicative competence (according to General Edneation Program for English subjcel by Ministry of Education and Training), listening
and speaking, besides other skills and linguistic knowledge like grammar, vocabulary and
phonetics, are included in the textbook and therefore, present in forty-five minute tests,
and/or fileen-minnte tests This has never happened before in English language education
in Vietnamese schools
Moreover, with the practice of both continuous and final assessment and the emphasis on the former, English tests have been directed more to monitoring, students’ progress, strengths and weaknesses, encouraging students to attain higher learning goals, instead of judging them based on a fixcd lis! of course objectives This really motivates students since it is worth their learning efirts, Most of what they have leamt will be tested, not just a small part of i as it used to be with final assessment As for teachers, their teaching will be continually given feedback through the tasks that they frequently assign their students lo do Teachers will bave an opportunity lo follow closcly their teaching activities as well as their students’ progress In brief, continuous assessment makes
Trang 31teachers and students involved more in their teaching and learning respectively For
Additionally, as the new textbooks are organized according to thames, every test is
lo bs developed basad on a specific thome in the book, This requires (enchers lo spend more time on test designing, choosing and modifying materials More importantly, this
practice of test designing brings teaching and testing together, forming a complete process,
the common ultimate goal of which is to develop siudenis? linguistic and communicative competence in dealing with everyday lite situations
Last bul not feast, the inclusion of both subjective and objective tes! items brings varicty to test tasks and helps reduce cach type’s weaknesses and take full advantage of their strengths Objective test items are said to test recognition more than production while subjective itams appear to take the latter as its focns The combination of the two types in one test certainly increases test reliability and validity
42 AN OVERVIEW OF THE TEACHING ANI} TESTING OF ENGLISII
LANGUAGE IN THE 11 GRADE
3.2.1 English texthook for the 11 prade
English 11, aiso following theme-based approach, is a continuation of English 10, with 16 umils and 6 review lessons named “Test yoursell™ Fach unil is corrasponding to a topic and includes the following parts
A Reatling: includes onc or several paragraphs containing 240-270 words, aiming
at getting students familiar with the topic, providing them information and linguistic materials, and developing their reading skill
Trang 32B, Speaking’ includes speaking activities which are developed ftom the linguistic materials in the unit, ba
D, Writing: includes tasks or activities to develop students’ writing skills based on
genres such as personal letters, invitation letters, chart and table description, and so on
E Language focus: there are two parts in this section: Pronunciation and Grammar
and Vocabulary Pronunciation part trains students to pronounce difficult sound pairs and consonant groups, whils Grammar and Vocabulary part considers main grammatical and lexical issues of the unit Those two parts are designed into exercises or activities for
writing, and Language focus
3.2.2, Syllabus for 11 grade English language subject
According to the Ministry of Education and Trainting’s General Education Program for English subject, students in grade 1] have three lessons of English per week and a school year usually lasts thirty five weeks so it means 1 grads studenis have 105 English lessons a school year in total The following table will present details of the English 11 lextbook in terms of specific goals (hal stuclenls are supposed lo obtain
‘Table 1: Book map of the English 11 textbook
Trang 33Themes/Topics ‘Attainment targets Language focus
- Tense review: past simple,
past progressive, and past perfect
- Parties foendship ~ Present Lense indicaling past
time
- Talk aboul a close friend
- infinitive willwilhoul “Lo”
= Talk about past experiences and how they affected one’s life - Perfect infinitive
- Talk about a party and how to plan | - Passive infinitive
Listening
- Infinitive and gerund Listen to a monologue! dialogue of
150-180 words for general or | Vocabulary:
specific information - Words to describe physical
and friendship Read a passage of 240-270 words
for general or specific informarion |- Words to express
or idea promptz
- Write abont a friend within 120-
130 words based on suggested word
cues of idea prompts
- Volunteer work
- Miiteracy
- Cummpelitions - Talk about types of valnteer work
- Talk abont literacy problems and offer sointions
- Ask for and give information about lypes of competitions - Gerd and present participle
- Perfect gerund and perfect
participte
= Reported speech wih
anfinilive
Trang 34
- Describe a recent competition or contest
Listening Lislen lo œ monologue/diilogue of 150-180 words for general or specific information
Reading Read a passage of 240-270 words for general or specific information
Writing
- Write a letter of gratitude within
120-130 words based on suggested word cues or idea prompts
- Write a letter to ask for and give information about competitions within 120-130 words based on suggested word cues or idea prompts
- Describe information from a table within 120-130 words bascd on a suggesicd oulline or word cues
~ Reported speech with gerand
Vocabulan
- Words to talk about volunteer
work: lypes, crgantization, clarity acliviies, donation, alilude
- Words relating (o illiteracy problems and solnions: literacy/illiteracy, problems,
solutions
- Words to talk about
competitions at school and in
the society: types, activities, performances and results
Listening
Listen to monologne!dialogue of 150-180 words for general or specific information
- Conditiomal sentences: type 1,
lype 2 and Lype 3
- Conditional in reported speech
- Pronouns: one(s), someone, anyone, no one, everyone
- Defining relative clauses (revision)
- Non-defining relative clauses
Vocabulary
- Words to talk about
‘population problems: situation, canses and sohutions
- Words lơœ describe the
celebration of Tet in Vietnam and other festivals’ activities in
Trang 35
- Desenbe population development
af 120-130 words based on a suggested oulline or idea cues
- Wole about celebration activities
of 120-130 words based on a suggested outline of idea cues
- Write 2 letter lo express
satisfaction or dissatistaction towards postal services af 120-130 words based on suggested word cues
or idea prompts
Speaking
- Talk about endangered nature
- Talk about mearnres for protecting, endangered nature
- Talk about Lypes and sources of energy
- ‘Talk about advantages and disactvantapes of each type of energy sources
Listening
Listen lơ a monologue/dislogue of 150-180 words for general or specific information
- Interpret and describe information from a chatt of 120-130 words nsing, suggested word cues or idea prompts
- Wards to talk abou
endangered alure: types of endangered species, problems, canes and measures for
protecting endangered nature
~ Words to lalk aboul sources
of energy: types, consumption, advantagen/disadvantages
Trang 36- Entertainment - Talk about collections
- Express agreement and disagreement and explain reasons
Listening Listen to a monologue/dialogue of 150-180 words for general or specific information
Reading
Read a passage of 240-270 words for general or specilic information Writing,
- Write abont the preparations for the coming Asian Games of 120-130
words based on suggested word cues
or idea prompls
- Wale about a collection of 120-
130 words based on suggested word cues or idea prompts
- Write about holiday activities of
120-130 words bused on suggested word cues or idea prompis
- Clef sentences (subject focus, object focus, adverbial focus)
~ Cleft sentences inthe passive
~ .both and not only but also; either or; neither nor
= Words to talk about hobbies
and cnicrtainmenis: collection,
participation, effects
6 People and places
- Space conquest
+ Wonders of the world: ‘Speaking
~ Talk about possibililies of events
- Talk about historical events in the space conquest
- Talk about features of man-made places
- Distingnish facts and opinions
- was! were able to + infinitive
- could have + past participle
Trang 37
achievements and progress Read a passage of 240-270 words
for general or specific information | - Words about wonders of the
'world/man-mậe places:
'Wriing, history, special Teatres,
— Write a report of 120-130 words ơn h
a visit to a man-made/popular place | - Words ta describe facts and
based on suggested word cues ot | ta express opinions idea prompts
- Write a biography of 120-130 words based on suggested word cnex
continuous assessment tool, aiming at cvaluating students’ progress and achicvcrnents
Like most other test types (fifteen minute or oral tests), forty five minute tests are created
by leachers for their own students Based on the specified program, every semester 11°
grade sindonis will have Iwo forty-five minuls losis, which means thal they will have four forty five minute tests of English per school year
As exemplified in Tes! Yourself scotion, forty five mime tests have a
recommended four pat structure including listening, reading, waiting, and linguistic knowledge (pronunciation, grammar and vocabulary) Hach part focuses on one diffrent aspool of a theme and bas differont length, which is usually shortor than a fileen mnimnls test on separate language skills/knowledge Besides, content of the test has to meet the requirement of assessing three levels of Bloom’s taxonomy that is, knowledge, conmprchension and application in both Wwo types of test items (subjective and objective) with a preference to the latter, Structure of a forty five minute test is recommended (though not required) as follows
‘Table 2: Recommended structure of a forty five minute test
Question 1; Listening
Trang 38
Text completion
Gap-fll (10-15
items}
Question 2: Grammar and Vocabulary
1 | Vocabulary and | Gapped Mntiple choice (25 | 2.5 25%
grammar sentences items}
Question 3; Reading,
1 [Reading for gist [Two short | Comprehension 25 25%
and details lexls: 80-200 | question
words
Gap fill Matching Truefalse Mnltiple choice (15-
20 items) Question 4: Writing
1 Semi-controlled — | Prompts, Guided writing 25 25%
‘writing, questions,
'word(s)/pictu
re cues, puidelines
(extracted from Vu (2006))
Trang 39CHAPTER 4: MAJOR FINDINGS
4.1 PHONETICS SECTION IN 45 MINUTE TESTS
So far phoncties test items in Vietnamese language tests have been cither on pronunciation (testing on sounds) and/or word stress, in only one format (that is, multiple- choice, with generally four options), with the test instruction looks somewhat like “pick out the word whose widerlined part ix pronowiced differentlvivhose stress is different from that of the other words by circling the corresponding letter A, B, C or Ð right next to the word yon chaase”
With regard to word stress, designing test items to test on it does not usually involve much problem, except for one possible mistake that teachers at times can make,
thal is, letting two types of stress co-exist in the distracting options, resulting in no correct
answer in the end, Due to its lack of problematicity, and to the fact that word stress is not
mentioned in the content of English 11, the researcher would like to put aside matters
ralating to word stress, and focus on investigating [he prommeiation part in phonotics section instead, Therefore, the data synthesized and analyzed below will be of the
pronunciation part only
4 Data concerning construct validity
As coustmucl vakdity of a test deals with whether the fest nneasurcs whml il is
purported to measure, data concerning construct validity bears close relation to the test methods used and item writing,
Since pronunciation items aim at testing students’ ability to recognize different
sounds through its letter representation, multiple-choice format proves to be effective and sutitable for that objective Therefore, examination was done on item writing After thorough examination of the phonetics section in every test collected, the researcher has found oul several issues/problems which can he placed into four calsgories
Issuc_1: Onc test item testing both the pronunciation of letter(s) im a word and the representation of a sound in letter(s)