An investigation on the writing test of the national english certicate – level b at an giang university center for foreign languages

Interpretation Table for Flesch Reading Ease Scores Table 4.1: The frequency of processed words in texts Table 4.2 : Correlation between the difficulty of the reading texts and student

Trang 1

NGUYỄN HOÀNG PHƯƠNG TRANG

AN INVESTIGATION ON THE WRITTEN TEST OF

THE NATIONAL ENGLISH CERTICATE –

LEVEL B - AT AN GIANG UNIVERSITY -

CENTER FOR FOREIGN LANGUAGES

Trang 2

CERTIFICATE OF ORIGINAL

I certificate my authorship of the thesis submitted today entitled:

“AN INVESTIGATION ON THE WRITTEN TEST OF THE NATIONAL ENGLISH CERTICATE – LEVEL B AT AN GIANG UNIVERSITY CENTER FOR FOREIGN LANGUAGES ”

In terms of the statement of Requirement for Thesis in Master’s Programs

issued by the Higher Degree Committee

This thesis has not been submitted for the award of any degree or diploma in

any other institution

Ho Chi Minh City , August 31, 2010

Nguyễn Hoàng Phương Trang

Trang 3

RETENTION AND USE OF THESES

I hereby state that I, NGUYEN HOANG PHUONG TRANG, being a candidate for the degree of Master of Arts (TESOL) accepted the requirements of the University relating to the retention and use of Master’s Theses deposited in the Library

In terms of these conditions, I agree that the original of my thesis deposited in the Library should be accessible for purposes of study and

research, in accordance with the normal conditions established by the Library for the care, loan or reproduction of theses

Trang 4

ACKNOWLEDGMENTS

First of all, I wish to express my profound gratefulness to my supervisor, Associate Professor – Dr Dinh Dien, Lecture of the Department of Comparative Linguistics – HCMC University of Social Sciences and Humanities; Lecturer of the Department of Computing Science - HCMC University of Natural Sciences for his invaluable guidance, assistance and encouragement during the preparation and completion of this thesis

Also, I would like to thank to my colleagues of Angiang University Center for Foreign Languages for their kind support that helps me

to collect and provide data related to my subject

On a personal level, I am grateful to my family, without whom

my thesis would not have been possible

Last but not least, never would this thesis have been accomplished without all those who helped me their handful hands in the research project: all my colleagues at my Center for Foreign Languages – An Giang University, their precious remarks and suggestions

Trang 5

ABSTRACT

The relationship between vocabulary knowledge and text readability is a robust one Vocabulary knowledge has consistently been found to be the foremost predictor of a text's difficulty However, the relationship is a complex one

Procedures used by readability formulae to assess the vocabulary factor can over- or underestimate text difficulty In general, it is not the mechanical counts of easy or difficult words in a text that make a text easy or difficult, but what the reader knows about the words in a text

Vocabulary knowledge is strongly correlated to reading comprehension The correlations have been found to be vigorous almost regardless of the measures used

or the populations tested Although these measures are of reading comprehension, there certainly would be similar correlations between language comprehension and production and vocabulary Knowledge of word meanings affects every aspect of language knowledge

Readability measurement is a research tradition that goes back to the

beginnings of the 20th century Readability research generally produces formulae that purport to be able to estimate the relative difficulty of a passage by a

combination of factors Word difficulty is assumed to be a cause of comprehension difficulty in these formulae, although what we consider to be word difficulty may actually be a reflection of something else

Trang 6

TABLE OF CONTENTS

CERTIFICATE OF ORIGINAL ……… i

RETENTION AND USES OF THESES ii

ACKNOWLDGEMENTS iii

ABSTRACT iv

TABLE OF CONTENTS v

LIST OF FIGURE vi

LIST OF TABLES vii

LIST OF ABBREVIATION ix

Chapter 1: INTRODUCTION 1

1.1 Rationales 1

1.2 Practical background of the study 5

1.3 Aims and scope of the study 5

1.4 Research questions 6

1.5 Significance of the study 6

1.6 Overview of the thesis 7

Chapter 2: LITERATURE REVIEW 8

2.1 A review of Readability and use of Readability Analyzer 8

2.1.1 The Development of Readability Formulas in Short 8

2.1.2 Advantages and Limitation of Readability Formulas 9

2.1.3 A Focus on Two Readability Formulas .12

2.1.3.1 Flesch Reading Ease 12

2.1.3.2 Flesch-Kincaid Readability Formula 14

Trang 7

2.1.4 Overview of some factors/variables in text difficulty 14

2.2 The vocabulary – reading connection 17

2.2.1 Vocabulary size, levels and lexical coverage 17

2.2.2 The relationship between the vocabulary knowledge and success in reading

comprehension 21

Chapter 3: METHODOLOGY 25

3.1 Research method 25

3.1.1 Materials 25

3.1.2 The instruments 25

3.1.2.1 Readability formulas 25

3.1.2.2 Software tools 27

3.1.2.2.1.The vocabulary statistic Worksheet 27

3.1.2.2.2 The Concordance program (V3.2) 28

3.1.3 Procedure 29

3.2 Summary 29

Chapter 4: RESULTS AND DISCUSSIONS 31

4.1 Results 31

4.1.1.Readability Progress of texts within years 31

4.1.2 The relationship between the word frequencies and text difficulty 34

4.1.2.1 The frequency of words in the test texts within years 34

(a).The first 3000 MFWs in the Brown Corpus list 35

(b).The first 5000 MFWs in the Brown Corpus list 38

Trang 8

(e).The frequency of word processing 44

4.1.2 The relationship between the difficulty of the reading texts and the students’ reading’s comprehension scores 46

4.2 Discussion 48

Chapter 5: RECOMMENDATIONS AND IMPLICATIONS 49

5.1 Recommendations 49

5.1.1 Recommendations for AGU CFL “Reading comprehension” testing practices 49

5.1.1.1 Recommendations for AGU development Process of Reading comprehension tests 50

5.1.1.1.1 A general criterion in selecting test texts 50

5.1.1.1.2 A supportive tool for selecting test texts 51

5.1.1.1.3 A recommended wordlist for learning and teaching at AGU CFL 51

5.1.2 Recommendations for AGU CFL staff 52

5.2 Implications 53

5.2.1 For learners 53

5.2.2 For teachers 54

5.3 Limitations 56

5.4 Conclusion 57

Trang 9

LIST OF FIGURES

Figure 3.1 The Vocabulary Statistic Worksheet

Figure 3.2: The Concordance software- Concordance V 3.2

Figure 4.1 Average Flesch Reading Ease Grade Levels of the texts

Figure 4.2: The first 3000 MFWs in the Brown Corpus wordlist

Figure 4.3: The percentage rate of the MFWs in the first 5000 common words in the Brown Corpus word list

Figure 4.4: The percentage rate of MFWs in the first out of 5000 words in the BrC wordlist

Figure 4.5: The percentage rate of MFWFs in the Paul Nation 1000 list

Trang 10

LIST OF TABLES

Table3.1 Interpretation Table for Flesch Reading Ease Scores

Table 4.1: The frequency of processed words in texts

Table 4.2 : Correlation between the difficulty of the reading texts and students’ reading comprehension score

Trang 11

LIST OF ABRREVIATION

Trang 12

CHAPTER 1: INTRODUCTION

This thesis reports the research on “reading sub- tests”, part of written

paper tests of The National English Certificate of level B examinations, designed at

Center of Foreign Languages of /at Angiang University

As such, this introduction chapter will present the following (1) the

rationales and the practical background for this study, which were related to the

assessment of text difficulty, (2) aims of the study, (3) the scope of the study, (4)the

research questions to find out the answers to the trouble previously articulated, (5) the significance of the study, (6), and the structure of the thesis

1.1 RATIONALES

Readability of text means the ease with which a reading passage can be read According to Webster Dictionary, “readable” indicates the text being “fit to be read, interesting, agreeable and attractive in style; and enjoyable.” In The Literacy Dictionary (Harris & Hodges, 1995) readability is defined as "the ease of comprehension because of style of writing" (p 203) In a common sense, it is the

“ease of reading words and sentences, “(Hargis et al., 1998) In the classroom, readability is often associated with an objective numerical score obtained by applying a readability formula Also, readability in the sense of language comprehensibility is concerned with the factors that affect the students’ success in reading and understanding

Since reading involves readers comprehending the text, the first

consideration of readability usually is whether it is easy or difficulty for the reader

to read Readability then is matching the instructional materials to the student (Fry,

1977 ), or choosing the best textbooks for the group of students In a more technical term, it is concerned with how to relate the reading level of the text to the reading ability of the students

Trang 13

Readability, or "text difficulty", has been an area of concern for all those who need to establish the appropriacy of a given text for a pedagogic purpose Establishing text difficulty is, therefore, relevant to the teacher and syllabus designer who wish to select appropriate materials for learners at a variety of ability levels and to test developers in selecting reading texts at appropriate levels for inclusion into the reading sub-tests of examinations Writers of texts for various audiences also need guidance related to the range of factors which make texts more or less accessible In all these cases, however, decisions are still made very much on intuitive grounds At the most basic level, given reading materials that are too difficult may damage to the learning process and demotivate to the students as well

Typically, text difficulty was assessed with readability formulas Studies which use readability formulae for this purpose often claim increased accuracy of match between text and reader (Jones, 1995), and the use of readability formulae

in this field may encourage the naive application of simplistic calculations to the automatic advice given to novice writers Readability formulae have also been used in the field of testing, most notably as a "control" on the difficulty

of text levels in reading tests (Davies and Irvine, 1996); therefore, the interpretation of text difficulty would qualify the examination

Generally, most studies focus on a single factor contributing to readability for a given intended audience The use of rare words or technical terminology for example can make text difficult to read for certain audience types (Collins-Thompson and Callan, 2004; Schwarm and Ostendorf, 2005) Besides, syntactic complexity is associated with delayed processing time in understanding (Gibson, 1998) and is another factor that can decrease readability Text organization

Trang 14

(1996) suggest that linguistic characteristics (vocabulary and sentence structure and variety) as well as concepts presented, text organization, and background knowledge required of readers all need to be considered in determining appropriateness of text for

a given grade level It should be noted that Readability formulas cannot evaluate all these features that promote readability Readability formulas measure certain features of text which can be subjected to mathematical calculations These formulas are usually based on one semantic factor (the difficulty of words according

to their length in characters or syllables) and one syntactic factor (the difficulty of sentences according to their length in characters or words) So, not all features that promote readability can be measured mathematically and these mathematical equations cannot measure comprehension directly Therefore, readability formulas are considered to be predictions of reading ease but not the only method for determining readability and they do not help us evaluate how well the reader will understand the ideas in the text

Vocabulary and sentence structure, are specifically the two factors

considered in the readability formulas and most of used to assess text difficulty; they are, furthermore, are precisely facets of the text which have been used to attempt to

predict the difficulty of texts According to Armbruster (1984), “readability or “text

difficulty” was measured by lots of factors such as the number of syllables in the words and the number of words in the sentence, and the absence or presence of these factors then determines the extent to which a given text can be considered

'considerate' (to enable readers with minimal effort) or 'inconsiderate' (text requiring much greater effort)”

In fact, vocabulary is one of the most easily identifiable characteristics suggesting text difficulty and it is also a very influential factor A substantial body

Trang 15

of research testifies to the fact that texts containing a lot of difficult words are likely

to be difficult texts However, this does not mean that texts can necessarily be

simplified by replacing difficult words with easier ones It appears that vocabulary is

an excellent predictor of difficulty because vocabulary reflects difficulty; a difficult

or unfamiliar topic frequently needs to be conveyed using the difficult and

unfamiliar vocabulary that is inherent to the topic (Anderson & Freebody, 1981) Because of this, simply replacing difficult words with easier ones may do little to simplify a text; in fact, it can even make a text more difficult If, for example, the

intended meaning is petrified, the simpler substitutes afraid or scared do not convey

the same meaning These latter terms do not describe a fear so great that the person becomes immobilized and cannot react This is another reason that replacing less frequent words with more frequent ones often fails to simplify a text; the words used

to replace the originals frequently do not mean quite the same thing and do not fit the context quite as well

It’s also noticeable that a few difficult words are unlikely to pose serious barriers to comprehension Actually, research has shown that it takes a substantial proportion of difficult words to affect students’ comprehension (Freebody &

Anderson, 1983) Additionally, if students read only texts in which all the words are familiar, they will be denied a major opportunity for enlarging their vocabularies Wide reading in texts that include varied and novel words is in fact the main route

to vocabulary growth

Trang 16

1.2 PRACTICAL BACKGROUND OF THE STUDY

Readability formulae work by using quantifiable textual aspects, in order

to estimate the ‘difficulty’ inherent in that text Commonly, the key factors considered in readability measures are word length and sentence length, or variations on these constructs These aspects are founded in readability studies (Dale and Chall, 1945) Since the introduction of computer-based textual analysis, newer factors such as word frequency can be included in readability formulae The frequency of words, as derived from large reference corpora, reflects a viable factor

in estimating readability since more common words are likely to be familiar to more readers Thereby, a text composed mainly of highly common words is likely to prove more readable (more comprehensible) The logic underlying a focus on word frequency as an affective factor in readability also extends to frequency of word sequences For this reason, our research activity on readability considers the impact

of word sequences In our work we use texts from the reading sub-tests of included

in the “Written test paper” of the National Language Certificate Examinations of Center for Foreign Language at Angiang University The practical application of such an investigation needed to be addressed in terms of designers of texts, and guidelines which could be formulated to help them in their task The research will

be involved in an analysis of a corpus of texts, and shows if there is a

relationship between the frequency of words and the difficulty of the texts

compiled by AGUCFL

1.3 AIMS AND SCOPES OF THE STUDY

This study compares texts taken from the “reading sub- tests”, part of written paper tests of The National English Certificate of level B examinations, designed at Center of Foreign Languages at Angiang University and regards their readability

The population of my study will be the written paper tests designed for

Trang 17

The National English Certificate of level B examinations of AGU CFL Normally,

this test include two parts: the speaking task and the written paper including three sections: (1) Listening comprehension, (2) Reading comprehension and vocabulary and (3) Use of language However, the only reading sub - test passages in the second section of the written papers of the years 2003 – 2007 were exploited The readability has been calculated with the Flesch Reading Ease formula and the Flesch-Kincaid Readability formula, which refer to word and sentence length Readability formulae work by using quantifiable textual aspects, in order to estimate the ‘difficulty’ inherent in that text Commonly, the key factors

considered in readability measures are word length and sentence length, or

variations on these constructs Particularly, since the introduction of based textual analysis, newer factors such as word frequency can be

computer-included in readability formulae; therefore the frequency of words, as derived from large reference corpora, reflects a viable factor in estimating, while other textual features and reader aspects that may affect text comprehension have not been included in the study

1.4 RESEARCH QUESTIONS:

In this study, the following question is going to be answered:

“What are the difficulty levels of the level B reading texts published by AGU CFL in term of vocabulary?”

1.5 SIGNIFICANCE OF THE STUDY

The study meets the requirements of applying readability analysis basis

in finding texts of a suitable level of difficulty, and emphasizing vocabulary level a fundamental consideration in assessing reading difficulty as well An awareness of

Trang 18

materials and ensure that the level and complexity of different texts used in parallel tests have to be shown of equivalent difficulty

1.6 OVERVIEW OF THE THESIS

This thesis is carried out with the two main aims: first, to investigate the current practices of the Reading Comprehension Test part of English Certificate of level B at AGU; and secondly, to make suggestions for improvement The author recommends a standardized scale to measure the difficult of reading texts in level B certificates Therefore, these recommendations are hopefully used as guidelines for different levels and language reading ability test development at AGU CFL

The thesis consists of five chapters as follows:

Chapter 1: identifies the problem and provides an overview of the thesis

Chapter 2: reviews the literature related to major issues in the vocabulary – reading connection and text readability

Chapter 3: describes the research method employed in the study

Chapter 4: presents the results of the study, and analyses the results to point out findings observed in the study

Chapter 5: makes some practical recommendations for standardized the reading comprehension tests of English Certificate of level B at AGU, and provide a summary of the main details of the whole thesis with a conclusion ending the thesis

Trang 19

CHAPTER 2: LITERATURE REVIEW

The aim of this chapter is to lay the foundation from the literature for the investigation into the way that content or message carrying words and their frequency

of occurrence contributes to readability The following two main and distinguishing parts are needed to be involved:

2.1 A review of Readability and use of Readability Analyzer

2.1.1 The Development of Readability Formulas

It was early in the late 19th that the standard readability studies started (DuBay, 2004: 10) and the first formula to measure readability was published in 1923 (Fry, 2002: 286; Klare,1988:15).Since then, more than 200 different readability formulas and more than 1000 studies in the field have been published (DuBay, 2004: 2) However, of these formulas, only 12, at the most, are widely used (Gunning, 2003: 176)

Readability formulas generally measure certain linguistic features of text which is associated with the difficulty and which can be quantified or subjected to mathematical calculations The features most often used are word length, word frequency and sentence length Some formulas use both or one of the features, usually words or word syllables, to establish an index of difficulty for the text by having the group of students take a test on the questions of the text either by cloze procedure or the multiple-choice test When more than half of the students of a certain age group pass the test, the text then is pronounced to be suitable for that group or grade of students The index produced from the formulas indicates the grade of students capable

of reading the text Some formulas, however, use a scale of 0 - 100 to indicate extremely difficult reading to very easy reading

Trang 20

(Chall, 1988:6) Starting in the late 1920s, focus shifted towards examinations of numerous different aspects which were believed to be possible variables of text difficulty (Chall, 1988: 6) Over the years these variables have been reduced into semantic and syntactic factors, leaving stylistic factors aside (Klare, 1988: 16) Still today, the majority of the established readability formulas test the comprehension of a text by using only a combination of the two components syntactic and semantic difficulty; the former often measured by average sentence length and the latter often measured by word length (counting letters or syllables) or frequency of unfamiliar words (Davison & Green: 2; Fry, 2002: 287; Gilliland, 1972: 84; Gunning, 2003: 176) These variables were already from the very beginning of readability suggested by Sherman to be predicators of text difficulty (DuBay, 2006: 2) Out of numerous statistically measurable factors, they are also the two that in studies have correlated the best with readers’ understanding of texts (DuBay, 2006: 42; Gray & Leary, 1972: 115-116; Gunning, 2003: 175)

In the last decade focus within school has been on leveling systems, which are based on more aspects of the text than the language itself (Stein Dzaldov & Peterson, 2005: 222).However, readability formulas are still alive and offer a more objective alternative as they can be calculated by computers (Fry, 2002:287-289)

2.1.2 Advantages and Limitation of Readability Formulas

The main strength of readability formulas is that they are relatively easy to use; an applicability which has increased with the development of computerized programs (Burns, 2006) Another strength is that the formulas are highly validated through many studies (Fry, 1977, cited in Fry, 2002: 291) They are also objective (DuBay, 2004; Fry, 2002) Worth noticing is however that‚different methods used

by different computer programs to count sentences, words, and syllables can also cause discrepancies - even though they use the same formula (DuBay, 2004:56)

Although the most common readability formulas correlate well with each

Trang 21

other, occasionally they disagree as much as three grade levels (Gunning, 2003: 183) This inconsistency between formulas is partly explained by their different starting points (Klare, 1988: 31) However, even though formulas may not provide exact difficulty levels for individual texts, they are better at indicating the progression of difficulty level between texts (Gunning, 2003: 181) Therefore some researchers argue that readability formulas are precarious for matching a specific text with any individual and that they should be used more generally (Anderson & Davison, 1988: 23; Bruce & Rubin, 1988: 19)

It is imperative to stress that readability formulas cannot‚ measure all the ingredients essential to comprehension‛ (Gilliland, 1972: 84) In order for them to

do so they would be too complicated and neither objective nor easy to use Some critics, such as Gilliland, state that‚the accuracy of a measure decreases with its ease of application‛ (1972: 84) However, others claim that there is scientific evidence ‚that the addition of attributes does not increase the reliability of the formulas (Binkley, 1988: 117) Klare states that a formula with more than two variables ‚usually increases effort more than predictiveness and that formulas with two variables thereby are sufficient for rough screening‛ (1988: 31)

That readability formulas have always been limited is a fact known to all readability researchers (Davison & Green, 1988: 2; Fry, 2002: 289; Gunning, 2003:180) Even L.A Sherman, who is considered to have started the classic readability studies in the late 19th century, stated that the readability of a text depends on the reader (DuBay, 2006: 3) Bruce & Rubin (1988: 20) agrees that the ultimate judge of readability is the reader, not a formula

Readability formulas cannot, nor are they designed to, assign exact values

of comprehensibility; instead they offer numerical approximations of text difficulty

Trang 22

methods in the process of choosing appropriate texts (Gunning, 2003: 182; Klare, 1988: 32) The readability formulas also‚ become poorer predictors of difficulty at high grade levels (especially college) where content weighs more heavily‛ (Klare, 1988: 31) Furthermore readability formulas imply that ‚the reader’s skill in dealing with increasingly difficult words rises in the same proportion as his skill in dealing with increasingly difficult sentences, which need not be the case (Gilliland, 1972: 98)

One limitation of formulas is that they only focus on text features and ignore the cognitive process (Zakaluk & Samuels, 1988: 122) Also excluded from formulas are specific internal factors such as the reader’s social and cultural background (Bruce & Rubin, 1988: 19) together with motivation, interests and previous knowledge (Afflerbach & Johnston, 1986, cited in Stein Dzaldov & Peterson, 2005: 223; Bruce & Rubin, 1988: 8; DuBay, 2004) Such factors cannot easily be integrated in the formulas (Klare, 1988: 30)

There are also many external, textual factors excluded from formulas such

as text layout and the potential presence of visual aids (Burns, 2006), writing style, organization and exposition (Davison, 1988: 38) and‚ typographical factors (Gilliland, 1972: 96) These might be easier to integrate in a formula However, many of them are difficult to measure statistically, which means that the objectiveness of the formulas would get lost

Another limitation of readability formulas is that they do not take deeper textual structures into account It is important to remember that ‚a low readability score is no guarantee of true ease of reading‛ (Bruce & Rubin, 1988: 12) Instead it might render other comprehensibility problems For instance formulas generate the same score, independent of whether the word order within a sentence is changed (Chall, 1988: 10-11) Furthermore, an overextension of short words and sentences would render a low readability score but might result in an incoherent text

Trang 23

(Bruce & Rubin, 1988: 12-13) Moreover, the lack of connectives may very well

result in a confusing text but it would not be shown in the readability scores

(Anderson & Davison, 1988: 32-33)

It is also essential to point out that longer words do not necessarily equal

harder words (Anderson & Davison, 1988: 28; Gilliland, 1972: 96; Gunning,

2003: 176) Similarly, although there is in fact a ‚correlation in English between

long sentences and complex sentences‛ (Davison, 1988: 43), a short sentence may be

more complex than a longer one and thereby harder to understand (Davison & Green,

1988: 4; Fry, 1988: 8; Gilliland, 1972: 96)

2.1.3 A Focus on Two Readability Formulas

Depending on the measure variables in the measurement of reading difficulty,

conventional readability formulas can be grouped into three major categories: 1) Those

that entirely use the word as a variable The US FORCAST formula and the

McLaughlin “SMOG” formula are good examples 2) Those that use both hard words

and sentences as variables, such as Dale-Chall Readability Formula (1948) and New

Dale-Chall Formula (1995) 3) Those that use word length (number of syllables) and

sentence length as variables represented by Gunning FOG Index, Coleman-Liau Grade

Level, Flesch-Kincaid Formula The resulting indices are displayed in a school grade

or a range of 100 – 0 to show the corresponding year of education needed to

comprehend the material or the level of difficulty with higher score indicating ease

Among them Flesch Reading Ease and Flesch-Kincaid which both are

included in Microsoft Word’s Spelling & Grammar (Microsoft Office) are said to be the

most widely used of all measures

2.1.3.1 Flesch Reading Ease

Trang 24

his new, Reading Ease, formula in 1948 (republished as Flesch, 2006) According to that study, the new formula was only slightly less correlated with the criterion used for both formulas, namely a 75% comprehension of the McCall-Crabbs’ Standard test lessons in reading (Flesch, 2006: 100-104) In other words comprehension was interpreted as getting 75% right on these tests on written texts The formulas were

to match a student’s typical grade level with such a comprehension of texts with given individual readability scores (Flesch, 2006)

Flesch’s new formula, the Reading Ease formula uses only two variables The first one, average sentence length in words, remains from the original formula and had, according to earlier studies, been shown to measure sentence complexity indirectly In a similar way, other studies had shown that the second variable, average word length in syllables, indirectly measures word complexity (Flesch, 2006) Flesch (2006: 104) also concluded that this new second variable correlated well (.87) with the second variable used in his original formula (number of affixes) and was easier to apply

Eventually, the Flesch Reading Ease grew to be the most common formula,

at least for other than pure educational purposes (Klare, 1988: 20) Studies have also established it to be ‚one of the most tested and reliable‛ (DuBay, 2006: 97) However, one study indicates that readability formulas which are based on syllable counts underrate‚ nonfiction‛ texts and that this may depend on the particular terminology used (Gunning, 2003: 178) The Flesch Reading Ease has been shown to correlate very well (.98) with the Dale-Chall Readability Formula (Gilliland, 1972: 92), which in turn has been carefully validated and was the most common in schools for a long time (Klare, 1988: 18) Flesch’s formula has also been validated against other formulas and against‚expert judgment (with correlations of 61-.84) (Gilliland, 1972: 92)

Trang 25

2.1.3.2 Flesch-Kincaid Readability Formula

In 1976, the Flesch formula was once again revised; this time in order for it to immediately generate a grade level The study was ordered by the U.S Navy and did not include Flesch himself (DuBay, 2006: 97) The new formula is now called the Flesch-Kincaid readability formula and is one of the Navy Readability Indexes (DuBay, 2004: 50) It is also called the Flesch Grade-Scale formula and the Kincaid formula (DuBay, 2006: 97) It is widely used in industry‛ (Fry, 2002: 290)

A study by Klare shows a high level of agreement between The Flesch Reading Ease and the Flesch-Kincaid; they do not vary more than two grades and usually agree within a grade (Klare, 1988: 24-25) Flesch-Kincaid uses the same variables as the Flesch Reading Ease but the relationship between them has been altered

2.1.4 Overview of some factors/variables in text difficulty

Reading is a process that involves readers and the reading material On the one end of the interaction is the reader who varies in motivation, knowledge and interest while reading a passage, and on the other end is the material, the readability of which is to be determined For a group of readers or students of the same grade to read with efficiency, the educators must select the textbook that is suitable, material that is not too difficult or too easy Putting the student variable aside, educators find that knowing what affects the level of difficulty of the reading material is something that must be solved first

Many studies have shown the importance of matching students with suitable texts at their individual levels to facilitate and enhance their learning and even to motivate the students (Gunning, 2003: 175) Such a match is supposed to enable students’ optimal learning gains (Zakaluk & Samuels, 1988: 140) Gilliland (1972:

Trang 26

(Freely from Burns, 2006: 34)

According to Burn (2006:34) and Gunning (2003:182), students need a material at both the independent and instructional levels to use in different situations However, the students’ motivation is imperative for understanding (Fry, 1988: 87) and can make a student understand a text well above his normal capacity (Fry, 2002: 289) This is why Gunning (2003: 184) stresses that it is mainly materials that the students are required to read that need matching

There are also many variables involved in determining the difficult level In the beginning, it was found and assumed that the level of difficulty of reading material

is determined mainly by two linguistic factors, semantic (word) and syntactic

structures (sentence) Chall and Dale (1995, p 84) indicates that vocabulary is “a

strong predictor of text difficulty." The word factor includes numbers of characters/letters and syllables, and word frequency while the sentence factor

involves sentence length in the passage Most readability formulas incorporate the two factors in measurement of the level of difficulty

It is generally agreed that word length is related to lexical difficulty (Chall, 1958; Harrison, 1980) Word length is shown in the number of syllables, words of more syllables being more difficulty to comprehend Another way of saying is words with more letters or characters are more difficult than words with less letters or characters, for instance, “happiness” vs “good” where the former has three syllables or

Trang 27

nine letters while the latter has only one syllable or four letters But this factor may not

be true all the time because some polysyllabic words may not be as difficult as syllabic words, for instance, “together” vs “din” or “phlegm” as Stahl has pointed out (Stahl, 2003, p.142) It should take the word frequency into account too

one-The word frequency approach has a long history tracing back to 1921 when

Thorndike first compiled a list of most frequently used words, Teacher’s Word Book,

as an aid for teachers to measure the difficulty of words and texts The idea behind this

was “Familiarity breeds understanding” - words of frequent use would become more

familiar and easy to understand There is evidence showing that the first 100 most

frequent words on the list make up almost half of the college students’ writings (Fry et

al 1993), and the 2000 most frequent word families of English make up roughly 80%

of the individual words in any English text as found in the Brown corpus (Nation & Waring, 1997) So it is now very easy to determine the ease or difficulty of each word

in a text by checking the word against a word frequency list, such as Brown’s lists or

Nation’s Academic Word List (Nation1990) A passage with more high frequency

words would be less difficult than with more low frequency words, other factors being constant

Another important linguistic factor that determines the ease of a text is syntactic structures Among them the most often referred to is sentence length because

it is easy to measure and understand Sentence length is measured by the average

number of words per sentence in a passage or a sample of the long passage, for

instance, first 100 words A passage whose average sentence length (ASL) is longer than a passage whose ASL is shorter is said to have a higher level of difficulty A number of studies show that, on the average, text difficulty is the function of sentence length As a sentence increases in length, it increases in difficulty (Coleman, 1962;

Trang 28

The clear relevance of research into text readability is that we need to

consider the level of difficulty of texts in reading test texts, and should only use texts that are appropriate in difficulty for the population being tested Besides, teachers need

to be aware that tampering with genuine texts in order to make them ‘easier’, or more amenable to test questions or assessment tasks, might have the unwanted effect of actually making them harder to process Thus, different texts used in parallel tests have

to be shown to be of equivalent difficulty Readability, in the narrow sense of

readability formulas, typically are designed and used to estimate the difficulty of two surfaces – level features (i.e., vocabulary and syntax) of language and reading in

paragraph text form Though they offer limited assistance, principally in helping to establish a level of difficulty with respect to test directions and, if used in a test, test passages in reference to somewhat broad grade or age ranges

So far we have considered readability and the use of readability analyzer to estimate reading difficulty, but intuitively it would seem that vocabulary knowledge influences the reading process as well Reading itself requires the ability to recognize words, know their meaning, read quickly and fluently and ultimately comprehend intended meaning The following parts involves in the vocabulary, a fundamental consideration in assessing difficulty

2.2 The vocabulary – reading connection

2.2.1 Vocabulary size, levels and lexical coverage

The acquisition of vocabulary is an indispensable component in the process

of learning a language For instance, a rich vocabulary makes the reading skills easier

to perform More specifically, the ability to read depends in the first instance on lexical and then linguistic knowledge The breadth and depth of a learner’s vocabulary have a direct impact on reading comprehension Limited vocabulary may be a major source of

Trang 29

difficulty in reading an English text According to Goulden, Nation and Read, (1990), a well-educated adult native speaker of English has a vocabulary of around 17,000 words This dramatically large number of English words, however, is a learning goal far beyond the reaches of foreign language learners like ours

Vocabulary knowledge plays an important role in reading comprehension Researchers tend to agree that vocabulary knowledge is a major perquisite and causal factor in comprehension and that there is a relationship between vocabulary size and reading comprehension Some studies have investigated this relationship and used vocabulary size as a predictor variable for reading comprehension (Hu & Nation, 2000; Laufer, 1989,1992; Lui Na & Nation, 1985; Bernhardt & Kamil, 1995; Laufer, 1992; Nation 2001, 2006; Quian, 1992, 2002; Ulijin &Strother, 1990) Fortunately, not all English words are equally important in different phases of language learning for

different purposes or in different stages of learning, some words deserve more attention and effort than others

Nation (2001) divided vocabulary into four categories:

(1) high-frequency or general service vocabulary

Trang 30

families of English (3,372 word types) accounts for approximately 75% of the running words in non-fiction texts (Hwang, 1989) and around 90% of the running words in fiction (Hirsh, 1993) Academic vocabulary, also called sub-technical vocabulary (Cowan, 1974) or semi-technical vocabulary (Farrell, 1990) is a class of words between technical and non-technical words and usually with technical and non-technical implications Technical words are the ones used in a specialized field and are considerably different from subject to subject About 5% of the words in an academic text are made up of technical vocabulary, with each subject containing roughly 1,000 word families (Nation, 2001)

As reading is a crucial aid in learning a second language, it is necessary to ensure that learners have sufficient vocabulary to read well (Grabe, 2009; Hudson, 2007; Koda, 2005) Early studies have estimated the percentage of vocabulary necessary for L2 learners to understand written texts as being between 95% (Laufer, 1989) and 98% (Hu and Nation, 2000) Whereas earlier research suggested that around 3,000 word families provided the lexical resources to read authentic materials independently (Laufer, 1992), Nation argues that in fact 8,000-9,000 word families are necessary The key factor in these widely varying estimates is the percentage of vocabulary in a text which one needs to comprehend it In Laufer (1989)’s study, it was concluded that around 95% coverage was sufficient for this purpose However, Hu and Nation (2000) reported that their participants needed to know 98%-99% of the words in texts before adequate comprehension was possible In 2006, Nation did use the updated percentage figure of 98% in his analysis, which led to the 8,000-9,000 vocabulary figure However, there is a very large difference between learning 3,000 and 9,000 word families, and this has massive implications for teaching methodology When the instructional implications of vocabulary size hinge so directly on the percentage of coverage figure, it is important to better establish the relationship between vocabulary coverage and reading comprehension Nation and Waring (1997)

Trang 31

pointed out that the beginners of English learning should focus on the first 2,000 most

frequently-occurring word families of English in the GSL, while for intermediate or

advanced learners who usually study English for academic purposes, the command of the top 2,000 frequent words may no longer be their concern and the priority of their vocabulary learning may be shifted to the next level of vocabulary, i.e sub-technical/academic vocabulary In academic settings, ESP students do not see technical terms as a problem because these terms are usually the focus in the specialist textbooks Low-frequency words are rarely used terms Academic vocabulary with medium-frequency of occurrence across texts of various disciplines (somewhere between the high-frequency words and technical words) has some rhetorical functions Acquiring these sub-technical words seems to be essential when learners are preparing for English for Academic Purposes Alternatively, vocabulary based on Nation’s (2001) four divisions can be learned in a systematic order Students should learn first the 2,000 general words of English, followed by a set of academic words common to all academic disciplines In line with Nation and Waring (1997), Coxhead (2000) compiled a corpus of around 3.5 million running words from university textbooks and materials from four different academic areas (law, arts and commerce as well as science), and identified 570 academic word families (AWL), which were claimed to cover almost 10% of the total words in a general academic text Her research suggested that for learners with academic goals, the academic word list contains the next set of vocabulary to learn after the top 2,000-word level To put it concretely, greater lexical coverage is gained by moving on to learning 570 academic words (10% coverage) than

by continuing to learn the next 1,000 words (“3–5%” coverage for the 3rd 1,000, Nation, 2006, p 79) after the top 2,000 word families on a frequency list In Nation’s (2006) BNC lists, it has been shown that the most frequent 1,000 word families

Trang 32

8,000 word families (enabling wide reading) entails knowing 34,660 individual word forms, although some of these family members are low-frequency items

Lexical/text coverage refers to “the percentage of running words in the text known by the readers” (Nation, 2006, p 61) Technically, it is here calculated as “the number of the words known in a text, multiplied by 100 and then divided by the total number of running words, i.e tokens in the text” (Nation, 2001, p 145) The assumption made behind lexical coverage is that there is a lexical knowledge threshold which marks the boundary between having and not having sufficient vocabulary knowledge for adequate reading comprehension Or how much unknown vocabulary can be tolerated in a text before it interferes with comprehension? Some researchers regard one unknown word in every twenty words, roughly in every two lines of a text,

as the necessary level beneath which readers are not expected to read an authentic text successfully (Laufer, 1989; Read, 2000; Schmitt & McCarthy, 1997) That is, one needs to know sufficiently different words/types to account for 95% of the running words in a text Laufer (1989) noted that reading comprehension at an academic level requires 95% lexical coverage at the minimum A lack of familiarity with more than 5% of the running words in a text (one unknown word in less than 20 words) can make reading a formidable task

2.2.2 The relationship between the vocabulary knowledge and success in reading comprehension

Vocabulary knowledge and its role in reading comprehension has been one of the main areas of focus in second language research for many years Both vocabulary knowledge and reading comprehension are closely related, and this relationship is not one-directional, since vocabulary knowledge can help the learner to comprehend written texts and reading can contribute to vocabulary growth (Chall, 1987; Nation, 2001; Stahl,

Trang 33

1990)

Some researchers advocate that vocabulary is the most crucial factor in reading comprehension Cooper (1984) described vocabulary as being the key ingredient to successful reading while other researchers argue that “no text comprehension is possible, either in one’s native language or in a foreign language, without understanding the text’s vocabulary” (Laufer 1997, p 20) They maintain that when the percentage of unknown vocabulary in a given text increases, the possibility of comprehending the text decreases (Hirsh & Nation, 1992; Hu & Nation, 2000; Laufer, 1989, 1992, 1997) According to her research, Laufer (1989) was more specific when she revealed the importance of having sufficient vocabulary for reading comprehension, claiming that a reader whose vocabulary

is insufficient to cover at least 95% of the words in a passage will not be guaranteed comprehension Readers themselves then consider vocabulary knowledge to be the main obstacle to second language reading comprehension

From a pedagogical perspective, it is useful to know how much vocabulary instruction is need before learners reached the vocabulary threshold level which is necessary for the comprehension of written texts As we have already mentioned, it is assumed that in order to reach text comprehension, readers need to be familiar with 95% of the words in a text (Hirsch and Nation, 1992)

Studies have been conducted to investigate the relationship between L2 vocabulary knowledge and success in reading comprehension Typically, vocabulary knowledge and reading performance correlate strongly: 50 - 75 (Laufer, 1992); 78 - 82 (Qian, 1999); 73 - 77 (Qian, 2002) These studies surveyed above relate reading comprehension scores to learners’ lexical coverage (Laufer, 1989; Hu &Nation, 2000),

Trang 34

factors: coverage, vocabulary knowledge, and reading comprehension are via extrapolation Besides, Laufer (1989) also found that at 95% most participants could receive a score of 55% on the reading test In 1992, she found that a vocabulary level

of 3,000 word families could assure this reading score However, she also found that to receive a score of 70%, learners would need to know 5,000 word families Hu and Nation (2000) suggest that 98% of coverage is required for “adequate” comprehension which is set at 71%, being the average of the two comprehension tests The corpus data

in Nation, (2006) shows that it is possible to reach 98% coverage with 5,000 word families and proper nouns, and 95% coverage with 3,000 word families and proper nouns

Vocabulary knowledge, strongly related to reading comprehension (Schmitt, 2000), have been considered as a predictor of overall reading ability (Nation, 1990) In fact, second/foreign language readers often cite lack of adequate vocabulary as one of the obstacles to text comprehension; that is to say, as the number of unknown lexical items in a reading passage increases, the more difficult it is for students to read it with comprehension

As mentioned above, that the level of difficulty of reading material is determined mainly by two linguistic factors, semantic (word) and syntactic structures (sentence) Vocabulary load, thus, can be one of the most significant predictor of text

difficulty (Chall and Dale, 1995) Generally, the word factor includes numbers of

characters/letters and syllables, and word frequency while the sentence factor

involves sentence length in the passage; and most readability formulas incorporate the two factors in measurement of the level of difficulty

In the present study, based on the data on the readability findings, and the proportions of low and high frequency in a text, the study was to find out the

Trang 35

relationship between the levels of the difficulty of reading test texts and the frequency words occurred in the materials being used for the English level B examination, and its impact on the learners’ reading comprehension scores as well Of particular importance

to us was to ascertain texts of the appropriate difficult level, in term of the difficult vocabulary levels

Trang 36

CHAPTER 3: METHODOLOGY

Chapter 2 has reviewed major issues in the literature related to the following points in reading comprehension tests in order to provide an adequate understanding of its theory, which serves as the basis for investigation of current practice of the Reading Comprehension Test of English Certificate of level B at AGU In particular, the discussion of text readability for developing tests presents the basis for the evaluation reading test practices at AGU CFL, and suggestions for improving its drawbacks

This chapter discusses the research design with a description of the subjects, instruments, and data collection procedures

3.1 - RESEARCH METHOD

3.1.1 Materials

The materials chosen for the study consisted of twenty - four reading

passages in the English Written Test of the National English Certificate of level B test held at AGU CFL for the years of 2003 and 2007 These twenty – four passages were subjected to item analysis with an established level of difficulty as shown in Table 1 Since they did not have a title, for the purpose of the study, I used a short form for each passage: P - 1 for the first reading passage of, P -2 for the second passage and so on For a list of all analyzed texts, see appendix 3

3.1.2 The instruments

3.1.2.1 Readability formulas

Beware that the Flesch Reading Ease formula and the Flesch-Kincaid are dependent on the programming behind them and that different applications may generate different readability scores of these formulas on the same text, which a study by Mailloux et al (1995, cited in Hall, 2006: 69) has shown It was assumed that any readability formula, in sprite of small alterations within them, should be able

to do that In modern time both these formulas are embedded in Microsoft Word’s

Trang 37

Spelling & Grammar and are therefore easily calculated (For detailed information about how to enable the readability statistics for Microsoft Office Word 2003, see appendix 4; for other verrsions of Microsoft Word, visit Microsoft Office Online)

It was also assumed that the application in Word is more accurate than most other applications available on the Internet

Flesch Reading Ease, computes readability based on the average number of syllables per word and the average number of words per sentence The Reading Ease formula can be used to analyze either a complete text or appropriate samples

(Flesch, 2006)

Reading Ease Score = 206.835 - (1.015 x ASL) - (84.6 x ASW)

ASL = average sentence length (number of words divided by number of sentences)

ASW = average number of syllables per word (number of syllables divided by number of words)

(DuBay, 2006: 97; Microsoft Office Online)

The interpretation of the readability scores are shown in following table (Table 1) As seen in Table 1, the Reading Ease formula ranges 0 (hard) and 100 (easy), and standard English documents averages approximately 60 to 70 The higher score, the greater number of people who would easily be able to understand the writing Flesch (2006: 107)

Trang 38

Table 1 Interpretation Table for Flesch Reading Ease Scores

Reading Ease

Score

Style Description

Estimated US Reading Grade

Aver Sentence length

Aver No of Syllable/100 w

90 – 100 Very easy 5th grade -8 -123

0- 30 Very difficult College grade 29 192 -

(Freely from Flesch, cited in Klare, 1988: 21)

The Flesch – Kincaid Grade Scale uses the same basic approach, calculating reading difficulty based on the number of syllables per word and the average number

of words per sentence In this case, though, the score indicates a grade level For example, a score of 10.0 means that an average tenth grader would be able to read and understand the document

3.1.2.2 Software tools

3.1.2.2.1 The Vocabulary Statistic Worksheet

Based on the Microsoft Office Software - Microsoft Excel tool, the worksheet was applied for measuring the vocabulary levels by comparing the word lists made from the target text with the University Word list (UWL 1,000 word families) and the Brown Corpus High Frequency Word List (BC HFWL 1st -5th 1,000) based on English words’ occurring frequency This worksheet was used to compare a text against certain base word lists to see what words in the text are and are not in the text This data sheet enables us to process the proportions of low and high frequency of words in each text which were based on the UWL and in the BC HFWL The words were ranked from the first 1,000 words to 5,000 first words and out of the 5,000 words

Trang 39

(purely frequency-based list – BC HFWL), and the 1,000 most frequent word families and 1,000 academic word families (Paul Nation’s Word List – PNWL)

Generally, much of the English vocabulary at the tertiary education is over the 3,000-word level, this study, thus, would refer to the vocabulary of a text reaching a level of 3,000- word and 5,000 Considering vocabulary learning sequencing in relation to frequency, we made a series of ranked word lists (e.g 1,000-word, 2,000-word and 3,000-word rankings, etc.) to count the number of the most frequent word

lists in different texts

3.1.2.2.2 The Concordance program (V 3.2)

Figure3.1: The Vocabulary Statistic Worksheet

Trang 40

A concordance is a software program that is used to analyze corpora and list the result (Daniel Krieger, 2003) “The concordance can find selected word and list sentences containing that word, called key word in context (KWIC)

It can also identify collocations or words most often found together in the key word” (Garry N.Dyckn, 1999) providing students with information on patterns in sample sentences of real language Linguistics and applied linguistics researcher are not the only group who can benefit from the use of concordance (i.e., as a mean of exploring the meaning and uses of words in their authentic contexts; see (Aston, 1997

&Tribble, 1997) Also language teachers can use a concordance program to facilitate their teaching in the light of lexical, syntactic, semantic and stylistic patterns of a language

Carrying out those analyses in this study, the Concordance V 3.2 was highly appreciated to process the monolingual corpus and conclude the data for

their compatibility with the Microsoft Excel program

This software could list all of the investigated words in the form of concordance lines with the contexts in which they occur The “word list” for each text, moreover, would be displayed in the descending order of their frequency of

Figure 3.2: The Concordance software- Concordance V 3.2

Định dạng
Số trang	192
Dung lượng	1,39 MB