Nghiên cứu và đánh giá độ tin cậy của bài thi viết cuối học kỳ i dành cho sinh viên chuyên anh năm thứ hai tại trường cao đẳng sư phạm nghệ an và một số gợi ý thay đổi

AIMS OF THE STUDY The major aims of this study are: - to explore the relevant notions of language testing - to analyze the achievement writing test for the second year English major stu

Trang 1

It is obvious that the teacher plays a very important role in the process of assessment and measurement which is conducted through testing It is said that “ language testing is a form

of measurement It is so closely related to teaching that we cannot work in testing without being constantly concerned with teaching.”(Heaton, 1988:5)

There are various types of test which serve different purposes in foreign language teaching and learning Among the kinds of tests and testing, writing tests are said to be less reliable from the point of both scorer and testee This situation can be seen clearly at Nghe An Junior Teacher Training College For many years, English writing has been considered the most difficult skill to be tested among teachers Teachers have found it difficult to mark the achievement writing tests accurately, in particular mark compositions, as they blame that there is no rating scale for scoring compositions, or the provided rating scale is too general Apart from this, many students are still worried about the results of the writing achievement tests, especially the task of writing a composition as they wonder if their writings are accurately evaluated by raters

That is the reason for choosing the topic of the research: A study on the reliability of the achievement writing test for the second year English major students at N.A.JTTC It is hoped that the study will be helpful to the author, the teachers at the English department of

Trang 2

N.A JTTC and to those who are concerned with language testing in general and the study

of the reliability of writing achievement tests in particular

2 AIMS OF THE STUDY

The major aims of this study are:

- to explore the relevant notions of language testing

- to analyze the achievement writing test for the second year English major students on the basis of the syllabus, purposes of teaching and testing; and available data such as test scores and scores of sample compositions for evidences on its validity and reliability with a focus on reliability

- to provide some suggestions for test- designers as well as raters

3 SCOPE OF THE STUDY

Evaluating an achievement writing test consists of complex procedures and needs a number of criteria to be set up However, due to the availability of data and limitation of time, this study focuses mainly on the reliability of the achievement writing test for the second year English major students at N.A JTTC The results can be seen as the basis for providing some suggestions for test designers as well as raters

4 METHODS OF THE STUDY

On the basis of analyzing the teaching aims, and syllabus for the second-year English major students as well as the content of the writing test (term 1) as the practical base for the study, the quantitative method, which focuses on analyzing the test scores of 156 second year students and the scores of 15 sample compositions collected randomly, is used

to measure the reliability of the test

Trang 3

5 DESIGN OF THE STUDY

The study is comprised of three parts:

Part I: Introduction

provides information on the rationale for choosing the topic, the aims, the scope, and the methods of the study

Part II: Development ( divided into 3 chapters)

*Chapter one: reviews the literature related to language testing (definitions, approaches, roles, purposes and relationships between teaching-learning- testing), testing writing (types of writing, criteria of testing writing and problems in testing writing) and the criteria

of a good test with a focus on reliability in which factors affecting language test scores, methods to determine the reliability of a language test and measures for improving test reliability are mentioned

* Chapter two: presents an overview of the teaching, learning and testing situations at Nghe An Teacher Training college including a description of the second year English teaching aims, the writing syllabus /course book as well as the content of the writing achievement test

* Chapter three: presents the methodology of the analysis of the format of the writing achievement test, of the data collected from the test scores of 156 second year students and the scores of 15 random sample compositions in order to find out “to what extent is the marking of the writing achievement test reliable?

Part III: Conclusion

presents a summary and some recommendations/suggestions for test- designers and raters

Trang 4

DEVELOPMENT

CHAPTER 1: LITERATURE REVIEW

In this chapter, the theoretical background for the study is established Firstly, the term’ language testing’ including the approaches, roles, purposes as well as the relationships between language testing, teaching and learning will be explored Then testing of writing will be discussed and followed by an examination of the criteria of a good language test with a focus on the reliability

1.1 Language testing

1.1.1 Definitions of language testing

Testing is an important part of every teaching and learning experience and becomes one of the main aspects of methodology Many researchers have given out definitions of testing with different points of view

Allen (1974:313) emphasizes testing as an instrument to ensure that students have a sense

of competition rather than to know how good their performance is and in which condition a test can take place He says ‘test is a measuring device which we use when we want to compare an individual with other individuals who belong to the same group.’

According to Carroll (1968:46), a psychological or educational test is a procedure designed

to elicit certain behavior from which one can make inferences about certain characteristics

of an individual In other words, a test is a measurement instrument designed to elicit a particular behavior of each individual

According to Bachman (1990:20), what distinguishes a test from other types of measurement is that it is designed to obtain specific sample of behavior This distinction is believed to be of great importance because it reflects the primary justification for the use

of language tests and has implications for how we design, develop and use them to their best use Thus, language tests can provide the means for more focus on the specific assure

of interest

Besides, Ibe (1981:1) points out that “a sample of behavior under the control of specified conditions aims towards providing a basis for performing judgment.” The term a sample

Trang 5

of behavior used here is rather broad and it means something else rather than the traditional types of paper and pencils Read (1983) shares the same idea with Ibe in the sense that a sample of behavior suggests language testing certainly includes listening and speaking skills as well as reading and writing ones

However, Heaton (1988:5) looks at testing in a different way In his opinion, tests are considered as a means of assessing the students’ performance and to motivate the students

He looks at tests with positive eyes as many students are eager to take tests at the end of the semester to know how much knowledge they have One important thing is that he points out the relationship between testing and teaching

In short, from the above descriptions, testing is an effective means of measuring and assessing students’ language knowledge and skills It is of great use to both language teaching and learning In order to understand more about language testing, we should have

a look at the different approaches to language testing in the following part

1.1.2 Approaches to language testing

According to Heaton (1988:15), there are four main approaches to testing: (i) the essay translation approach, (ii) the structuralist approach, (iii) the integrative approach, and (iv) the communicative approach

(i) The essay approach is considered as an old method in which tests often focus on essay writing, translation and grammatical analysis This approach requires no special skills or expertise in testing, but the subjective judgment of the teacher is considered to be of paramount importance

(ii) The structuralist approach places the main focus on testing concrete skills without relating them to the context The skills of listening, speaking, reading and writing are separated from one another as it is considered essential to test each one at a time The learners’ mastery of the separate elements of the target language (phonology, vocabulary and grammar) are also tested using words and sentences completely out of contexts so that

a large number of samples of language forms can be covered in the test in a comparatively short time Thus, the results of the students’ tests completely depend on the accurate forms

of separate language aspects or skills tested rather than the total meaning of the discourse

or ability to use the language appropriately and effectively

Trang 6

However, this approach is still valid for some types of tests for specific purposes as it is considered to be objective, precise, reliable and scientific That is the reason why the typical type of test following this approach, multiple choice, is still widely used nowadays even though there is still a limited use for multiple choice items in many communicative tests

(iii) The integrative approach, in contrast, involves the testing of language in contexts and

is thus concerned primarily with meaning and the total communicative effect of discourse (Heaton, 1988) Integrative tests, instead of separating the language into different aspects, are designed to test two or more skills at the same time (especially focusing on reading and listening or language components in integration as grammar and vocabulary In other words, this type of tests is concerned with students’ global proficiency, not their mastery of separate elements or skills The typical types of tests following this approach are cloze tests, dictation, oral interviews, translation and essay writing However, according to Heaton (1988:16), integrative testing involves ‘ functional language’ but not the use of functional language, and thus they are weak in communication

(iv) The communicative approach is considered to be interactive, purposive, authentic, contextualized and should be assessed in terms of behavioral outcomes Although both integrative and communicative approaches emphasize the importance of the meaning of utterances rather than their form and structure, communicative tests are concerned primarily with how language is used in communication ( Heaton, 1988:19) The communicative approach emphasizes the evaluation of language use rather than usage (‘use’ is concerned with how people actually use language for different purposes while

‘usage’ concerns the formal patterns of language) However, the communicative approach

is claimed to be less reliable because of various real- life situations in different areas/ countries and according to Heaton (1988), in order to increase the reliability, especially in scoring, a very carefully drawn -up and well- established criteria must be designed

In brief, each approach to language testing has its weak points as well as strong points Therefore, in Heaton’s point of view, a useful test will generally incorporate features of several of these approaches (Heaton, 1988:15)

Trang 7

1.1.3 The roles of language testing

According to Mc Namara (2003:4), language tests play a powerful role in many people’s lives, acting as gateways at important transitional moments in education, in employment, and in moving from one country to another Language testing is used for the assessment, employment, selection and considered by raters as a means of placing students on particular courses Moreover, language tests are also considered as the criterion to evaluate the language proficiency of the researchers who want to carry out their research in language study

F Bachman (1990:2) has also mentioned to the importance of language testing Language tests can be valuable sources of information about the effectiveness of learning and teaching (language teachers regularly use tests to help diagnose students’ strengths and weaknesses, to assess students’ progress, and to assist in evaluating students’ achievements); language tests are also frequently used as sources of information in evaluating the effectiveness of different approaches to language teaching, as sources of feedback on learning and teaching; and thus language tests can provide useful input into the process of language teaching Conversely, insights gained from language learning and teaching can provide valuable information for designing and developing more useful tests

1.1.4 The purposes of language testing

Language tests usually differ according to their purposes Hughes (1989:7) points out the different purposes of testing based on the different kinds of tests such as:

- to measure language proficiency regardless of any language courses that

candidates may have followed

- to discover how far students have achieved the objectives of a course of study

- to diagnose students’ strengths and weaknesses, to identify what they know and what they do not know

- to assist placement of students by identifying the stage or part of a teaching

program most appropriate to their ability

According to Henning (1987:1-2-3), there are six major purposes of language testing which can be represented as follows:

(i) Diagnosis and Feedback ( in order to find out students’ strengths and weaknesses)

Trang 8

(ii) Screening and Selection ( in order to decide who should be allowed to participate

in a particular program of study)

(iii) Placement ( in order to identify a particular level of the student and place him or her in a particular program of study)

(iv) Program Evaluation ( in order to provide the information about the effectiveness

of the programs)

(v) Providing Research Criteria ( in order to provide a standard of judgment in a variety of other research contexts)

(vi) Assessment of Attitudes and Sociopsychological Differences

1.1.5 Relationship between Language Testing and Language Teaching & Learning Though a large number of examinations and tests in the past tended to separate testing from teaching, Heaton (1988:5) emphasizes that teaching and testing in some ways are so interwoven and independent that it is very difficult to tease apart “Both testing and teaching are so closely interrelated that it is virtually impossible to work in either field without being constantly concerned with the others” Heaton (1988:5) also notes: “Tests may be constructed primarily as devices to reinforce learning and motivate the students or

as a mean of assessing the students’ performance in the language” In the former case, testing is geared to the teaching, whereas in the latter case, teaching is often geared largely

to the testing

According to Hughes (1989:1), the effect of testing on teaching and learning is known as backwash which can be harmful or beneficial He puts more focus on the harmful test If the test content does not go with the objectives of the course, then the backwash can be really harmful and it leads to problems of teaching in one way but testing in another way

In his view, some language tests have harmful effects on teaching and often fail to measure accurately whatever they are intended to measure That is proved in the case of a writing test with only multiple choice items in which learners only concentrate on practicing such items rather than practicing the skill of writing itself

In general, testing and teaching -learning process have a very close relationship Tests can

be effective for both teaching and learning process and vice versa Therefore, good tests

Trang 9

are useful, desirable and can be used as a valuable teaching device For educational purposes, educators should improve both language tests and language teaching methods to have beneficial backwash

1.2.2 Criteria of testing writing

There is a wide variety of writing tests which is needed to test many kinds of writing tasks that we engage in According to Madsen (1983:101), there are three main types of writing tasks and tests as well: controlled, guided and free writing which correspond with their testing criteria He points out the advantages and limitations of each type in which scoring

in guided and free writings is considered to be subjective Thus, McNamara (2000:38) says in the past, writing skills were assessed indirectly through examinations of control over the grammatical system and knowledge of vocabulary because of the problem of subjectivity However, nowadays educators place more emphasis on teaching and testing

Trang 10

learners’ communicative language abilities That is the reason why there is a current trend

in shifting from testing isolated items to testing writing compositions in which the managing of the rating process became an urgent necessity John Boker indicates the disadvantages of testing writing compositions such as limited amount of content sampled, time- consuming of score, and low reliability in scoring Therefore, in order to evaluate learners’ ability in the writing skill accurately, raters have to be carefully trained in rating learners’ writings, and pay much attention to some micro skills involving in the writing process that learners need to acquire Heason (1988:135) describes those micro skills as follows:

* language use: the ability to write correct and appropriate sentences;

* mechanical skills: the ability to use correctly those conventions peculiar to the

written language (e.g punctuation, spelling );

* treatment of content: the ability to think creatively and develop thoughts, excluding all irrelevant information;

* stylistic skills: the ability to manipulate sentences and paragraphs, and use language effectively;

* judgment skills: the ability to write in an appropriate manner for a particular purpose with a particular audience in mind, together with an ability to select, organize and order relevant information

He also represents the minimum criteria of testing writing which learners need to obtain based on the different levels This can be shown as belows:

* Basic level: No confusing errors of grammar or vocabulary; a piece of writing legible and readily intelligible; able to produce simple unsophisticated sentences

* Intermediate level: Accurate grammar, vocabulary and spelling, though possibly with some mistakes which do not destroy communication; handwriting generally legible; expression clear and appropriate, using a fair range of language; able to link themes and points coherently

* Advanced level: Extremely high standards of grammar, vocabulary and spelling; easily legible handwriting; no obvious limitations on range of language candidate is able to use accurately and appropriately; ability to produce organized, coherent writing, displaying considerable sophistication

Trang 11

1.2.3 Problems in testing writing

Many problems may occur in testing writing; however, Hughes (1989:75) represents the main considerations in testing writing which can be also seen as the main problems in testing writing They involve the setting of the writing / testing tasks that are properly representative of the population of tasks, truly represent the students’ abilities, and can be scored reliably

As being mentioned above about the subjectivity in scoring writings which possibly results

in the unreliability of the test - the major problem in testing writing, in this section, we will discuss about the causes that lead to this problem According to Mc Namara (2000:37), the rating given to a candidate in testing writing is a reflection , not only of the quality of the candidate’s performance, but of the qualities as a rater of the person who has judged it In fact, different raters may assign different scores on the same tests and even a grade that one rater gives on the same test may vary from day to day Thus, the subjectivity of the writing test is mostly caused by different raters’ interpreting and applying the same rating scale in different ways Besides, raters may be affected by some random factors such as health, climate, and so on Or as Sako (1983:219) states about personal bias which tend to affect the scores:

“ The personal bias may stem from a variety of factors, such as the appearance of the essay, poor spelling, paucity of ideas, or what has been called the ‘halo effect’, that is, a higher or lower grade resulting from previous pleasant or unpleasant experiences with the writer.” Moreover, the unreliability is possibly the result of the vague criteria set for the evaluation of writing tests Therefore, test designers find it difficult to write a good and well- designed rating scale and this also cause the difficulties in the scoring process

In short, in order to obtain the reliability in scoring writing tests, test designers and raters should be careful in designing, interpreting and applying the rating scale

1.3 Criteria of a good language test

Before making a test, test designers often ask themselves many questions such as: How do

we design a test that is a true indicator of students’ communicative ability? How do we know that it is a good one? and so on To write a good test, Harrison (1983:10) claims that

Trang 12

it is essential for test designers to considerate the four basic characteristics of all good tests namely validity, reliability, practicality and discrimination Among those, two of the qualities- validity and reliability- are considered to be critical for tests and sometimes referred to as essential measurement quality This is because these are the qualities that provide the major justification for using test scores ( numbers) as a basis for making inferences or decisions (Bachman, 1996:19) In this section, we will discuss briefly what those qualities are and how they must be considered

1.3.1 Validity

Validity is the quality that most affects the value of a test and the most critical factor to be judged in the total program of language testing According to Harrison (1983:10), the validity of a test is the extent to which the test measures what it is intended to measure Two questions must be considered when determining test validity of a language test: What aspects of the language are designed to measure, and how well does it, in fact, measure the global skills or the discrete elements of the language? (Sako, 1983:24) The validity of

a language test is determined by comparing the results it gives with some outside or independent criterion Sako also presents five primary validity concepts:

(i) content validity ( in order to ascertain how thoroughly a language test samples the instructional objectives or’ the universe of criterion behavior’)

(ii) concurrent validity ( by correlating the language test scores with the criterion behavior

or language performance established )

(iii) predictive validity ( in order to determine how well the test scores predict language learning)

(iv) construct validity ( by examining correspondence between the content of the test and the various components of the skill being measured)

(v) face validity (a judgment about a test based on the way the test looks to educators, students and the general public)

In short, validity is the “must” for testers to take into considerations when they construct a language test

Trang 13

1.3.2 Practicality

It would be unwise if test designers are to consider tests’ validity and reliability apart from its practicality According to Harrison (1991:13), a valid and reliable test is of little use if it does not prove to be a practical one Test practicality refers to financial limitations, time constrains, ease of administration, scoring and interpretation A test is impractical in case it

is prohibitively expensive and it takes much time to construct

In conclusion, a test has practicality if it does not involve much time or money in constructing, implementing and scoring it

1.3.3 Discrimination

It is said to be incomplete without considering the discrimination of a test According to Heaton (1988:165), an important feature of a test is its capacity to discriminate among the different candidates and to reflect the differences in the performances of the individuals in the group It is true for both teacher- made tests and standardized tests In order to have this feature, a test must have a scale ranging from extremely easy items to extremely difficult items However, the extent of the need to discriminate varies depending on the purpose of the test The more efficiently a test discriminates among students, the easier it is

to divide them into suitable groups and the more clearly it shows the level of each individual in the group

1.3.4 Reliability

Reliability is a necessary characteristic of any good test According to Sako (1983:28), the reliability of a language test is concerned with the degree to which it can be trusted to produce the same result upon repeated administration to the same individual, or to give consistent information about the value of a learning variable being measured Sako has represented several external factors which may affect the reliability of a test such as: variation in testing conditions (lighting, temperature, distraction, or noise), differences in administrative instructions, test compromise, inaccurate in scoring, inadequate sampling of test items, lack of motivation (fatigue or illness) On the other hand, he has given some factors that can improve the reliability of language tests: standardizing and optimizing the testing conditions, using a uniform procedure in administering the test, increasing the

Trang 14

In general, those are the criteria for a good test that test- designers should consider when designing tests However, in this study we have placed more emphasis on the reliability which is defined as a measure to evaluate the consistency of language test scores through analyzing students’ test scores and compare the results of students’ compositions Therefore, in this section, three issues about the reliability of a language test will be discussed in more details: (i) factors affecting language test scores, (ii) methods to determine the reliability of a language test, and (iii) measures for improving test reliability

1.3.4.1 Factors affecting language test scores

As mentioned above, the language test scores are affected by not only test- takers’ language knowledge and skills, but by other factors which has been concerned much and will be discussed here William Wiersma (1989:290) divides these factors into two major categories: personal factors associated with individuals being tested and the factors associated with the test setting called test administration factors According to Bachman (1990:163), the factors which can affect test scores, external to the language knowledge and skills of the students, can be grouped into the broad categories: (i) test method facet,(ii) personal attributes, and(iii) random factors They are represented as follows: (i) Test method facets can be seen as the characteristics of the test method which affect the test performance Bachman (1990:118) represents five categories of test method facets: the test environment, the test rubric, the nature of input the test taker receives, the nature of the expected response to that input, and the relationship between input and response

Trang 15

(ii)Personal attributes which are not related to language ability include individual characteristics such as cognitive style and knowledge of particular content areas, and group characteristics such as sex, race and ethnic background These attributes are considered to

be systematic in the sense that they are likely to affect a given individual’s test performance regularly

(iii) Random factors include unpredictive and largely temporary conditions such as his mental alertness or emotional state, and uncontrolled differences in test method facets (changes in the test environment from one day to the next, or idiosyncratic differences in the way different test administrators carry out their responsibilities)

In a nutshell, there are numerous factors that can affect test scores, and those factors are rather complicated Therefore, it is necessary to come to the conclusion that the administers should find the ways to control and minimize the effects of those factors to obtain the high reliability of the test

1.3.4.2 Methods to determine the reliability of a language test,

According to Hughes (1989:31), it is possible to quantify the reliability of tests in the form

of reliability coefficients that allow us to compare the reliability of different tests He points out that the ideal reliability coefficient is 1- a test with a reliability coefficient of 1 is one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered According to Hughes (1989), Bachman (1990), Heaton (1989), and Weir (2005:24), there are four methods (or types of scoring validity) to evaluate the reliability of a language test: test- retest , alternative or parallel/equivalent form, split half (internal consistency), rater consistency ( intra- rater, inter- rater) They are described as belows:

(i) Test- retest method: according to Hughes (1989:32), in order to obtain test- retest reliability, it is necessary to get a group of subjects to take the same test twice and then make a comparison of two sets of scores However, he points out the drawbacks of this method:” If the second administration of the test is too soon after the first, the subjects are likely to recall the items and responses If there is too long a gap between administrations, learning or forgetting will take place and the subjects’ motivations may be reduced.” Although Henning (1987) makes a suggestion for the period of time between two administrations” The second administration should be organized after no more than 2

Trang 16

no learning practice should be done between two occasions However, in practice, it is very difficult to obtain high reliability based on the parallel form method because students may get unequal results for the same level’s tasks of parallel forms ( they may find some tasks familiar with them or sometimes their results may be affected by random factors such as health, mood, climate, etc.) Therefore, alternative forms are often simply not available Because of the problems associated with parallel form and test-retest methods, the other method, which is known as split half and focuses on the consistency with each other of a test’s internal elements, is represented as bellows

(iii) Split half method, according to Bachman (1990:172), is the method used to examine the internal consistency (concerned with how consistent test takers’ performances on the different parts of the test are with each other) of a test in which we divide the test into two equivalent halves and determine the extent to which scores on these two halves are consistent with each other According to Bachman, there are two ways of splitting a test into two halves (dividing into the first and second halves, the odd and even halves) in which the odd- even method is particularly applicable to tests in which items are ordered in terms of difficulty, designed to measure the same ability and to be independent of each other However, Bachman recommends that in situations where it is not possible to retest individuals, or to administer equivalent forms, and some estimate of internal consistency is the only means of examining the reliability of the test, we must make every attempt to split the test into halves in such a way as to maximize their equivalence and their independence The reliability coefficient is calculated using what is known as the Spearman- Brown Prophecy formula as follows:

2* reliability for 1/2 test

Reliability of scores on total test =

1+ reliability for 1/2 test

Trang 17

* Intra- rater reliability is defined as the extent to which the same rater is consistent in his rating from one occasion to another (Shohamy, 1985) Actually, if the rater applied the same set of criteria consistently in rating the language performance of different individuals, this will yield a reliable set of ratings However, in some cases, inconsistency may occur in the rating criteria themselves or the way in which they are applied For instance, in scoring compositions, some raters pay more attentions to grammatical errors and that affects their ratings or in oral tests, raters may, sometimes, unconsciously relax their ratings According

to Bachman (1990: 179), in order to examine the reliability of ratings of a single rater, we need to obtain at least two independent ratings from this rater for each individual’s language sample He also recommends two approaches to calculate the reliability of the ratings: computing the appropriate correlation coefficient between the two sets of ratings,

or computing a coefficient alpha

* Inter-rater reliability is the extent to which different raters are consistent in their ratings This is a major concern in tests of writing and speaking proficiency which are subjectively scored According to Bachman (1990:180), ratings given by different raters can also vary

as a function of inconsistency in the criteria used to rate and in the way in which these criteria are applied He proves that with the same rating scale given, different raters yield different results for the same tests since different raters may form their own focuses (on

Trang 18

1.3.4.3 Measures for improving test reliability

As mentioned above about the factors to improve the reliability of a language test given by Sako (1983:28), Weir (2005:117) restates Hughes’ guidelines for making the test task itself more likely to produce reliable scores as follows:

(i) take enough sample of behavior;

(ii) do not allow candidates too much freedom;

(iii) write unambiguous items;

(iv) provide clear and explicit instructions;

(v) ensure that tests are well- laid out and perfectly legible;

(vi) make candidates familiar with format and testing techniques;

(vii) provide uniform and non- distracting conditions of administration;

(viii) use items that permit scoring which is as objective as possible;

(ix) make comparisons between candidates as direct as possible

And in relation to scoring of test performance itself:

(i) provide a detailed scoring key;

(ii) train scorers;

(iii) agree acceptable responses and appropriate scores at the outset of scoring;

(iv) exclude items which do not discriminate well between a weaker and stronger

student;

Trang 19

(v) identify candidates by number, not name;

(vi) employ multiple, independent scoring

In a nutshell, in order to achieve the above factors, we need the efforts and attempts of the test designers, test administers and test raters

1.4 Summary

In this chapter, we have attempted to establish the theoretical framework for the thesis, consisting of three main issues: language testing (definitions, approaches, roles, purposes and relationships between testing and teaching- learning); testing writing (types of writing, criteria of testing writing and problems in testing writing); and criteria of a good test with a focus on reliability (factors affecting language test scores, methods to determine test reliability, and measures for improving test reliability)

From those issues, we draw out the main principles for our research in the next chapter They are represented as follows:

(i) Reliability is considered as one of the criteria to make a good test, especially

unreliability is often found in scoring compositions as it is scored subjectively

(ii) One of the useful methods which is used to measure the reliability of a writing test, especially a composition, is inter- rater reliability

(iii) The reliability coefficient is calculated by the following formula presented by

Trang 20

2.1 The students, teaching staff and the English teaching & learning situation at N.A Junior Teacher Training College

2.1.1 The students and their backgrounds

Nghe An Junior Teacher Training College has been training English teachers for junior high schools in the province since 1999 In general, most of the students who enter N.A.JTTC are aged 18-20 and are female students (90%) Each course consists of about 3

or 4 classes with 30-40 students in each A large proportion of them comes from the rural and remote areas of Nghe An province, so their levels of English proficiency are low and especially, they are not used to learning four skills (speaking, writing, reading and listening) in which writing is considered to be one of the most difficult skills They are only good at the controlled writing, not familiar with guided and free writings, especially teachers are worried about testing and evaluating their writings

2.1.2 The English teaching staff

The English Department is staffed with 40 teachers aged between 26-55, 6 of them are

“reserved” Russian teachers The proportion of teachers who have M.A Degrees is 4/40 Now other six are studying for M.A Degrees at VNU, one at a university in America and five are studying for M.A Degrees at Dalana University of Switchland In general, they are well- trained and rather professionally experienced with at least 2 years in teaching English

Trang 21

2.1.3 The English teaching & learning situation

As mentioned above, The department of foreign languages of N.AJTTC has re-established for over 10 years with a staff of young and enthusiastic teachers At the beginning, they had to face a lot of difficulties such as: seeking for appropriate course books /materials, suitable teaching methods, and teaching with poor facilities After several years of trying

on, selecting, the teachers have collected the suitable materials and finally employed the teacher-made course books or chosen the books which are considered to meet students’ demands as well as the training objectives Classes which are often ranged from 30-40 students have well-equipped and are rather comfortable for group or pair activities Moreover, thanks to the short -term teaching methodology courses (VAT in Hanoi and communicative courses in Danang ) teachers’ teaching methods have gradually been improved

2.2 Teaching aims, syllabus and materials used for the second- year students

Basing on the subjects of the course and training objectives, we have chosen a main course book, then selected the appropriate units and drawn out the writing syllabus for the second- year English major students It is described as follows:

Aims: The aims of the course is to familiarize the students with the typical English text types such as making a speech, describing a festival, writing an advertisement, a narrative,

a biography and a book review

Objectives: Students are expected to

- understand the features of English text types

-have a necessary number of vocabulary and be able to use newly-learnt structures to express themselves

-apply successfully in the tasks given

-write genres such as making a speech, describing a festival, writing an advertisement,

a narrative, a biography and a book review

Syllabus Content:

Tiêu đề	Nghiên cứu và đánh giá độ tin cậy của bài thi viết cuối học kỳ I dành cho sinh viên chuyên Anh năm thứ hai tại trường Cao đẳng sư phạm Nghệ An và một số gợi ý thay đổi
Trường học	Cao đẳng Sư phạm Nghệ An
Chuyên ngành	Language Teaching and Testing
Thể loại	Research Study
Thành phố	Nghệ An

Định dạng
Số trang	42
Dung lượng	232,7 KB