Academic Vocabulary in Learner Writing
Trang 2Academic Vocabulary in Learner Writing
Trang 3Series editors: Wolfgang Teubert, University of Birmingham, and Michaela
Mahlberg, University of Liverpool
Editorial Board: Paul Baker (Lancaster), Frantisek Cˇermák (Prague), Susan
Conrad (Portland), Geoffrey Leech (Lancaster), Dominique Maingueneau (Paris XII), Christian Mair (Freiburg), Alan Partington (Bologna), Elena Tognini- Bonelli (Siena and TWC), Ruth Wodak (Lancaster), Feng Zhiwei (Beijing) Corpus linguistics provides the methodology to extract meaning from texts Taking as its starting point the fact that language is not a mirror of reality but lets
us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community.
Consisting of both spoken and written language, discourse always has historical, social, functional, and regional dimensions Discourse can be monolingual or multilingual, interconnected by translations Discourse is where language and social studies meet.
The Corpus and Discourse series consists of two strands The fi rst, Research in Corpus and Discourse, features innovative contributions to various aspects of corpus
linguistics and a wide range of applications, from language technology via the teaching of a second language to a history of mentalities The second strand,
Studies in Corpus and Discourse, is comprised of key texts bridging the gap between
social studies and linguistics Although equally academically rigorous, this strand will be aimed at a wider audience of academics and postgraduate students working
Corpus-Based Approaches to English Language Teaching
Edited by Mari Carmen Campoy, Begona Bellés-Fortuno and M a Lluïsa Gea-Valor
Corpus Linguistics and World Englishes
An Analysis of Xhosa English
Vivian de Klerk
Evaluation and Stance in War News
A Linguistic Analysis of American, British and Italian television news reporting of the 2003 Iraqi war
Edited by Louann Haarman and Linda Lombardo
Trang 4Evaluation in Media Discourse
Analysis of a Newspaper Corpus
Monika Bednarek
Historical Corpus Stylistics
Media, Technology and Change
Patrick Studer
Idioms and Collocations
Corpus-based Linguistic and Lexicographic Studies
Edited by Christiane Fellbaum
Working with Spanish Corpora
Edited by Giovanni Parodi
Studies in Corpus and Discourse
Corpus Linguistics and The Study of Literature
Stylistics In Jane Austen’s Novels
Bettina Starcke
English Collocation Studies
The OSTI Report
John Sinclair, Susan Jones and Robert Daley
Edited by Ramesh Krishnamurthy
With an introduction by Wolfgang Teubert
Text, Discourse, and Corpora Theory and Analysis
Michael Hoey, Michaela Mahlberg, Michael Stubbs and Wolfgang Teubert With an introduction by John Sinclair
Trang 6Academic Vocabulary in
Learner Writing
From Extraction to Analysis
Magali Paquot
Trang 7The Tower Building 80 Maiden Lane
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library ISBN: 978-1-4411-3036-5 (hardcover)
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
Typeset by Newgen Imaging Systems Pvt Ltd, Chennai, India
Printed and bound in Great Britain by the MPG Books Group
Trang 8Acknowledgements xi List of abbreviations xiii List of fi gures xv List of tables xvii
Introduction 1
Part I: Academic vocabulary
1.1 Academic vocabulary vs core vocabulary and
1.2 Academic vocabulary and sub-technical vocabulary 171.3 Vocabulary and the organization of academic texts 22
Chapter 2 A data-driven approach to the selection of
2.4 The Academic Keyword List 55
Trang 9Part II: Learners’ use of academic vocabulary
3.1 The International Corpus of Learner English 67
Chapter 4 Rhetorical functions in expert academic writing 81
4.1 The Academic Keyword List and rhetorical functions 81
4.2.1 Using prepositions, adverbs and adverbial
Chapter 5 Academic vocabulary in the International
Corpus of Learner English 125
5.1 A bird’s-eye view of exemplifi cation in learner writing 1255.2 Academic vocabulary and general interlanguage features 142
5.2.3 The phraseology of academic vocabulary
Part III: Pedagogical implications and conclusions
Trang 10Contents ix
7.2 Learner corpora, interlanguage and second
Appendix 1: Expressing cause and effect 219 Appendix 2: Comparing and contrasting 226 Notes 235 References 240 Author index 257 Subject index 261
Trang 12There are several people without whom this book would never have been written First and foremost, I want to express my deepest and most sincere gratitude to my PhD supervisor, Professor Sylviane Granger, for her infectious enthusiasm, her intellectual perceptiveness and her unfailing expert guidance I am greatly indebted to you, Sylviane, for giving me the opportunity to join the renowned Centre for English Corpus Linguistics seven years ago now! I have been lucky enough to undertake research in
an environment where writing a PhD also means collaborating with many fellow researchers on up-and-coming projects, attending thought-provoking conferences, organizing seminars, conferences and summer schools, as well as lecturing and offering guidance to undergraduate students
I am also very grateful to my colleagues and friends at the Centre for English Corpus Linguistics - Céline, Claire, Fanny, Gặtanelle, Jennifer, Marie-Aude, Suzanne and Sylvie – for making the Centre for English Corpus Linguistics such an inspiring and intellectually stimulating research centre I also wish to thank them for their moral and intellectual support and for all the entertaining lunchtimes we spent together talking about everyday life and work
I am indebted to a great number of colleagues not only for supplying
me with corpora, corpus-handling tools and references, but also for providing helpful comments on earlier versions and stimulating ideas for
my research I would like to thank Yves Bestgen, Liesbet Degand, Jean Heiderscheidt, Sebastian Hoffmann, Scott Jarvis, Jean-René Klein, Fanny Meunier, Hilary Nesi, John Osborne and JoAnne Neff van Aertselaer I am also grateful to an anonymous reviewer for recommendations on the fi rst draft of the text
I gratefully acknowledge the support of both the Communauté française
de Belgique, which funded my doctoral dissertation out of which this book has grown, and the Belgian National Fund for Scientifi c Research (F.N.R.S)
Trang 13On a more personal note, I would like to express my deepest thanks to my parents and friends for everything they have done to help me while I was working on this book And last, but not least, Arnaud: thank you for making
it all worthwhile
Magali PaquotLouvain-la-NeuveNovember, 2009
Trang 14List of abbreviations
Corpus
BNC-AC-HUM British National Corpus – academic sub-corpus
(discipline: humanities and arts)
catholique de Louvain
system
et al., 2002)ICLEv2 International Corpus of Learner English (version 2)
Trang 15LOCNESS Louvain Corpus of Native Speaker Essays
Language, Lancaster University
Trang 16Figure 2.3: Distribution of the words example and law in the
Figure 2.4: WordSmith Tools Detailed Consistency Analysis 51
Figure 3.1: ICLE task and learner variables (Granger et al.,
Figure 3.2: Contrastive Interlanguage Analysis (Granger 1996a) 70
Figure 4.2: The distribution of the adverb ‘notably’
Figure 4.5: The distribution of the verbs ‘illustrate’ and
Figure 4.6: The phraseology of rhetorical functions
Trang 17Figure 5.4: Distribution of the adverbials ‘for example’ and
Figure 5.5: The treatment of ‘namely’ on websites devoted
Figure 5.6: The use of ‘despite’ and ‘in spite of’ in
Figure 5.7: The frequency of speech-like lexical items in expert
academic writing, learner writing and speech
Figure 5.8: Phraseological cascades with ‘in conclusion’ and
Figure 5.10: A possible rationale for the use of ‘according to me’
Figure 5.11: A possible rationale for the use of ‘let us in
Figure 5.12: Features of novice writing - Frequency in expert
academic writing, native-speaker and EFL novices’
writing and native speech (per million words of
Figure 6.1: Connectives: contrast and concession
Figure 6.2: Comparing and contrasting: using nouns such
as ‘resemblance’ and ‘similarity’ (Gilquin et al.,
Figure 6.3: Reformulation: Explaining and defi ning:
using ‘i.e.’, ‘that is’ and ‘that is to say’ (Gilquin
Figure 6.4: Expressing cause and effect: ‘Be careful’ note on
Trang 18List of tables
Table 1.1: Composition of the Academic Corpus
Table 1.2: Chung and Nation’s (2003: 105) rating scale for fi nding
technical terms, as applied to the fi eld of anatomy 14
Table 2.1: The corpora of professional academic writing 31Table 2.2: The re-categorization of data from the professional
Table 2.4: Examples of essay topics in the BAWE pilot corpus 34
Table 2.7: CLAWS horizontal output [lemma + simplifi ed
Table 2.9: CLAWS tagging of the complex preposition
Table 2.10: Semantic fi elds of the UCREL Semantic
Table 2.15: Automatic semantic analysis of potential
Table 2.16: Distribution of grammatical categories in the
Table 2.18: The distribution of AKL words in the GSL
Trang 19Table 3.1: Breakdown of ICLE essays 69
Table 3.2: BNC Index – Breakdown of written BNC genres
Table 4.1: Ways of expressing exemplifi cation found in the
BNC-AC-HUM 89 Table 4.2: The use of ‘for example’ and ‘for instance’ in the
Table 4.6: The use of the lemma ‘illustrate’ in the BNC-AC-HUM 103
Table 4.7: The use of the lemma ‘exemplify’ in the BNC-AC-HUM 105
Table 4.8: The use of imperatives in academic writing (based
Table 4.14: Co-occurrents of verbs expressing possibility and
Trang 20List of tables xix
Table 5.1: A comparison of exemplifi ers based on the total
Table 5.2: A comparison of exemplifi ers based on the total
Table 5.3: Two methods of comparing the use of exemplifi ers 130Table 5.4: Signifi cant adjective co-occurrents of the noun
Table 5.5: Adjectives co-occurrents of the noun ‘example’
Table 5.6: Signifi cant verb co-occurrents of the noun ‘example’
Table 5.7: Verb co-occurrent types of the noun ‘example’
Table 5.8: The distribution of ‘example’ and ‘be’ in the ICLE
Table 5.9: The distribution of ‘there + BE + example’ in ICLE
Table 5.11: Examples of AKL words which are overused and
Table 5.12: Two ways of comparing the use of cause and effect
Table 5.13: The over- and underuse by EFL learners of specifi c
devices to express cause and effect (based on
Table 5.14: The over- and underuse by EFL learners of
specifi c devices to express comparison and
Table 5.15: Speech-like overused lexical items per
Table 5.16: The frequency of ‘maybe’ in learner corpora 154Table 5.17: The frequency of ‘I think’ in learner corpora 154Table 5.18: Examples of overused and underused clusters
Table 5.19: Clusters of words including AKL verbs which
are over- and underused in learners’ writing,
Table 5.20: Examples of overused clusters in learner writing 159Table 5.21: Verb co-occurrents of the noun conclusion
Trang 21Table 5.22: Adjective co-occurrents of the noun conclusion
Table 5.23: The frequency of sentence-initial position of
Table 5.24: Sentence-fi nal position of connectors in the ICLE
Table 5.25: Jarvis’s (2000) three effects of potential L1 infl uence 183Table 5.26: Jarvis’s (2000) unifi ed framework applied to
Table 5.27: A comparison of the use of the English verb
‘illustrate’ and the French verb ‘illustrer’ 188
Table 5.29: The transfer of frequency of the fi rst person
plural imperative between French and English writing 191
Table 6.1: Le Robert & Collins CD-Rom (2003–2004):
Trang 22That English has become the major international language for research and publication is beyond dispute As a result, university students need to have good receptive command of English if they want to have access to the literature pertaining to their discipline As a large number of them are also required to write academic texts (e.g essays, reports, MA dissertations, PhD theses, etc.), they also need to have a productive knowledge of academic language As noted by Biber, ‘students who are beginning university studies face a bewildering range of obstacles and adjustments, and many of these diffi culties involve learning to use language in new ways’ (2006: 1) Several studies have shown that the distinctive, highly routinized, nature of academic prose is problematic for many novice native-speaker writers (e.g Cortes, 2002), but poses an even greater challenge to students for whom English is a second (e.g Hinkel, 2002) or foreign language (e.g Gilquin et al., 2007b)
Studies in second language writing have established that learning to write second-language (L2) academic prose requires an advanced linguistic com-petence, without which learners simply do not have the range of lexical and grammatical skills required for academic writing (Jordan, 1997; Nation and Waring, 1997; Hinkel, 2002; 2004; Reynolds, 2005) A questionnaire survey
of almost 5,000 undergraduates showed that students from all 26 ments at the Hong Kong Polytechnic University experienced diffi culties with the writing skills necessary for studying content subjects through the medium of English (Evans and Green, 2006) Almost 50 per cent of the students reported that they encountered diffi culties in using appropriate academic style, expressing ideas in correct English and linking sentences smoothly Mastering the subtleties of academic prose is, however, not only a problem for novice writers International refereed journal articles are regarded as the most important vehicle for publishing research fi ndings and non-native academics who want to publish their work in those top jour-nals often fi nd their articles rejected, partly because of language problems
Trang 23depart-These problems include the fact that they have less facility of expression and a poorer vocabulary; they fi nd it diffi cult to ‘hedge’ appropriately and the structure of their texts may be infl uenced by their fi rst language (see Flowerdew, 1999)
Because it causes major diffi culties to students and scholars alike, academic discourse has become a major object of study in applied linguis-tics Flowerdew (2002) identifi ed four major research paradigms for investigating academic discourse, namely (Swalesian) genre analysis, contrastive rhetoric, ethnographic approaches and corpus-based analysis While the fi rst three approaches to English for Academic Purposes (EAP) emphasize the situational or cultural context of academic discourse, corpus-linguistic methods focus more on the co-text of selected lexical items in academic texts
Corpus linguistics is concerned with the collection in electronic format and the analysis of large amounts of naturally occurring spoken or written data ‘selected according to external criteria to represent, as far as possible,
a language or language variety as a source of linguistic research’ (Sinclair, 2005: 16) Computer corpora are analysed with the help of software pack-ages such as WordSmith Tools 4 (Scott, 2004), which includes a number of text-handling tools to support quantitative and qualitative textual data anal-ysis Wordlists give information on the frequency and distribution of the vocabulary – single words but also word sequences – used in one or more corpora Wordlists for two corpora can be compared automatically so as to highlight the vocabulary that is particularly salient in a given corpus, i.e., its keywords Concordances are used to analyse the co-text of a linguistic feature, in other words its linguistic environment in terms of preferred co-occurrences and grammatical structures The research paradigm of corpus linguistics is ideally suited for studying the linguistic features of academic discourse as it can highlight which words, phrases or structures are most typical of the genre and how they are generally used
Corpus-based studies have already shed light on a number of distinctive linguistic features of academic discourse as compared with other genres Biber’s (1988) study of variation across speech and writing has shown that academic texts typically have an informational and non-narrative focus; they require highly explicit, text-internal reference and deal with abstract,
conceptual or technical subject matter (Biber, 1988: 121–60) The Longman
Grammar of Spoken and Written English (Biber et al., 1999) provides a
compre-hensive description of the range of distinctive grammatical and lexical features of academic prose, compared to conversation, fi ction and newspa-per reportage Common features of this genre include a high rate of
Trang 24Introduction 3
occurrence of nouns, nominalizations, noun phrases with modifi ers, attributive adjectives, derived adjectives, activity verbs, verbs with inanimate subjects, agentless passive structures and linking adverbials By contrast,
fi rst and second person pronouns, private verbs, that-deletions and
contrac-tions occur very rarely in academic texts
In addition, studies of vocabulary have emphasized the importance of a
‘sub-technical’ or ‘academic’ vocabulary alongside core words and cal terms in academic discourse (Nation, 2001: 187–216) Hinkel (2002: 257–65) argues that the exclusive use of a process-writing approach, the relative absence of direct and focused grammar instruction, and the lack of academic vocabulary development contribute to a situation in which non-native students are simply not prepared to write academic texts She pro-vides a list of priorities in curriculum design and writes that, among the top priorities, ‘NNSs [non-native students] need to learn more contextualized and advanced academic vocabulary, as well as idioms and collocations to develop a substantial lexical arsenal to improve their writing in English’
techni-(Hinkel, 2002: 247) The Academic Word List (Coxhead, 2000) was compiled
on the basis of corpus data to meet the specifi c vocabulary needs of dents in higher education settings
stu-But what is ‘academic vocabulary’? Despite its widespread use, the term has been used in various ways to refer to different (but often overlapping) vocabulary categories This book aims to provide a better description of the notion of ‘academic vocabulary’ It takes the reader full circle, from the extraction of potential academic words through their linguistic analysis in expert and learner corpus data, to the pedagogical implications that can be drawn from the results Recent corpus-based studies have emphasized the specifi city of different academic disciplines and genres As a result, research-ers such as Hyland and Tse (2007) question the widely held assumption that students need a common core vocabulary for academic study They argue that the different disciplinary literacies undermine the usefulness of such lists and recommend that lecturers help students develop a discipline-based lexical repertoire
This book is an attempt to resolve the tension between the particularizing trend which advocates the teaching of a more restricted, discipline-based vocabulary syllabus, and the generalizing trend which recognizes the existence of a common core ‘academic vocabulary’ that can be taught to a large number of learners in many disciplines I fi rst argue that, to resolve this tension, the concept of ‘academic vocabulary’ must be revisited
I demonstrate, on the basis of corpus data, that, as well as discipline-specifi c vocabulary, there is a wide range of words and phraseological patterns that
Trang 25are used to refer to activities which are characteristic of academic discourse, and more generally, of scientifi c knowledge, or to perform important dis-course-organizing or rhetorical functions in academic writing
A large proportion of this lexical repertoire consists of core vocabulary, a category which has so far been largely neglected in EAP courses but which
is usually not fully mastered by English as a foreign language (EFL) ers, even those at the high-intermediate or advanced levels I make use of
learn-Granger’s (1996a) Contrastive Interlanguage Analysis to test the working
hypothesis that upper-intermediate to advanced EFL learners, irrespective
of their mother tongue background, share a number of linguistic features that characterize their use of academic vocabulary The learner corpus
used is the fi rst edition of the International Corpus of Learner English (ICLE),
which is among the largest non-commercial learner corpora in existence
It contains texts written by learners with different mother tongue grounds Ten ICLE sub-corpora representing different mother tongue backgrounds (Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish, Swedish) are compared with a subset of the academic
back-component of the British National Corpus (texts written by specialists in the
Humanities) to identify ways in which learners’ use of academic vocabulary differs from that of more expert writers A comparison of the ten sub- corpora then makes it possible to identify linguistic features that are shared by learners from a wide range of mother tongue backgrounds, and therefore possibly developmental The EFL learners are all learning how
to write in a foreign language, and they are often novice writers in their mother tongue as well
However, not all learner specifi c-features can be attributed to mental factors The comparison of several ICLE sub-corpora helps to pinpoint a number of patterns that are characteristic of learners who share the same fi rst language, and which may therefore be transfer-related
develop-I made use of Jarvis’s (2000) unifi ed framework to investigate the potential infl uence of the fi rst language on French learners’ use of academic vocabu-lary in English
The book is organized in three sections The fi rst scrutinizes the concept
of ‘academic vocabulary’, reviewing the many defi nitions of the term and arguing that, for productive purposes, academic vocabulary is more use-fully defi ned as a set of options to refer to those activities that characterize academic work, organize scientifi c discourse, and build the rhetoric of academic texts It then proposes a data-driven procedure based on the criteria of keyness, range, and evenness of distribution, to select academic words that could be part of a common core academic vocabulary syllabus
Trang 26Introduction 5
The resulting list, called the Academic Keyword List (AKL), comprises a set of
930 potential academic words One important feature of the methodology
is that, unlike Coxhead’s (2000) Academic Word List, the AKL includes the
2,000 most frequent words of English, thus making it possible to appreciate the paramount importance of core English words in academic prose The AKL is used in Section 2 to explore the importance of academic vocabulary in expert writing and to analyse EFL learners’ use of lexical devices that perform rhetorical or organizational functions in academic writing This section offers a thorough analysis of these lexical devices as
they appear in the International Corpus of Learner English, describing the
fac-tors that account for learners’ diffi culties in academic writing These facfac-tors include a limited lexical repertoire, lack of register awareness, infelicitous word combinations, semantic misuse, sentence-initial positioning of adverbs and transfer effects
The fi nal section briefl y comments on the pedagogical implications of these results, summarizes the major fi ndings, and points the way forward to further research in the area
Trang 28Part I Academic vocabulary
‘Academic vocabulary’ is a term that is widely used in textbooks on English for academic purposes and Second Language Acquisition (SLA) reference books Nevertheless, it can be understood in a variety of ways and used to indicate different categories of vocabulary In this section, my objectives are
to clarify the meaning of ‘academic vocabulary’ by critically examining its many uses, and to build a list of words that fi t my own defi nition of the term Chapter 1 therefore tries to identify the key features of academic vocabu-lary and to clear up the confusion between academic words and other vocabulary Chapter 2 proposes a data-driven methodology based on the criteria of keyness, range and evenness of distribution, and uses this to build
a new list of potential academic words, viz the Academic Keyword List (AKL) This list is very different from Coxhead’s Academic Word List and has already
been used to inform the writing sections in the second edition of the
Macmillan English Dictionary for Advanced Learners (see Gilquin et al., 2007b)
The AKL is used in Section 2 to analyse EFL learners’ use of lexical devices that perform rhetorical or organizational functions in academic writing
Trang 30Chapter 1 What is academic vocabulary?
Academic vocabulary is in fashion, as witnessed by the increasing number
of textbooks on the topic Recent titles include Essential Academic Vocabulary:
Mastering the Complete Academic Word List (Huntley, 2006) and Academic ulary in Use (McCarthy and O’Dell, 2008) But what is academic vocabulary?
Vocab-The term often refers to a set of lexical items that are not core words but which are relatively frequent in academic texts Examples of academic
words include adult, chemical, colleague, consist, contrast, equivalent, likewise,
parallel, transport and volunteer (cf Coxhead, 2000) Unlike technical terms,
they appear in a large proportion of academic texts, regardless of the pline Academic vocabulary is also sometimes used as a synonym for sub-
disci-technical vocabulary (e.g mouse, bug, nuclear, solution) or discourse-organizing vocabulary (e.g cause, compare, differ, feature, hypothetical, and identify) In this
chapter, I set out to review the many defi nitions of academic vocabulary that have been given and to clear up the confusion between academic words, core words, technical terms, sub-technical words and discourse-organizing words I will show why a defi nition of academic vocabulary that excludes the top 2,000 words of English is not very useful for productive purposes in higher education settings and argue for a function-based defi nition of the term The very existence of academic words has recently been challenged by several researchers in English for Specifi c Purposes (ESP) who advocate that teachers help students develop a more restricted, discipline-specifi c lexical repertoire I will round off this chapter by situat-ing the book in ongoing debates over generality vs disciplinary specifi city
in teaching vocabulary for academic purposes
1.1 Academic vocabulary vs core vocabulary
and technical termsNumerous second language acquisition studies have investigated whether there is a threshold which marks the point at which vocabulary knowledge
Trang 31becomes suffi cient for adequate reading comprehension Laufer (1989; 1992) has shown that at least 95 per cent coverage is needed to ensure reasonable comprehension of a text To achieve this coverage, it is com-monly believed that students in higher education settings need to master three lists of vocabulary: a core vocabulary of 2,000 high-frequency words, plus some academic words, and technical terms Some researchers, however, do not agree that vocabulary categories can be described as if they were clearly separable In this section, the notions of core vocabulary, academic vocabulary and technical terms are described and illustrated The criticisms levelled at the division of vocabulary into mutually exclusive lists are then reviewed.
1.1.1 Core vocabulary
A core (or basic or nuclear) vocabulary consists of words that are of high frequency in most uses of the language It comprises the most useful func-
tion words (e.g a, about, be, by, do, he, I, some and to) and content words like
bag, lesson, person, put and suggest Stubbs describes nuclear words as an
essential common core of ‘pragmatically neutral words’ (1986: 104) and lists fi ve main reasons for their pragmatic neutrality:
1 Nuclear words have a ‘purely conceptual, cognitive, logical or tional meaning, with no necessary attitudinal, emotional or evaluative connotations’ (ibid.)
proposi-2 They have no cultural or geographical associations
3 They give no indication of the fi eld of discourse from which a text is taken, i.e its domain of experience and social settings
4 They are also neutral with respect to tenor and mode of discourse: they are not restricted to formal or informal usage or to a specifi c medium of communication, e.g written or spoken language
5 They are used in preference to non-nuclear words in summarizing tasks
The best-known list of core words is West’s (1953) General Service List of
English Words (GSL),1 which was created from a fi ve-million word corpus of written English and contains around 2,000 word families Percentage fi gures are given for different word meanings and parts of speech of each head-word In a variety of studies, the GSL provided coverage of up to 92 per cent
of fi ction texts (e.g Hirsh and Nation, 1992), and up to 76 per cent of academic texts (Coxhead, 2000) Next to frequency and coverage, other
Trang 32What is academic vocabulary? 11
criteria such as learning ease, necessity and style were also used in making the selection (West 1953: ix–x) West also wanted the list to include words that are often used in the classroom or that would be useful for understand-ing defi nitions of vocabulary outside the list The GSL has had a wide infl u-ence for many years and served as a resource for writing graded readers and other material
A number of criticisms have, however, been levelled at the GSL, most particularly at its coverage and age Engels (1968) criticized the low cover-age of the second 1,000 word families While the fi rst 1,000 word families covered between 68 and 74 per cent of the words in the ten texts of 1,000 running words he analysed, the second set of word families in the GSL provided coverage of less than 10 per cent In addition, because of changes
in the English language and culture, the GSL includes many words that are
considered to be of limited utility today (e.g crown, coal, ornament and vessel) but does not contain very common words such as computer, astronaut and
television (see Nation and Hwang, 1995: 35–6; Leech et al., 2001: ix–x; Carter,
1998: 207) However, several researchers have pointed out that, for tional purposes, it still remains the best of the available lists because of ‘its information on frequency of each word’s various meanings, and West’s careful application of criteria other than frequency and range’ (Nation and Waring 1997:13)
educa-1.1.2 Academic vocabulary
A number of academic word lists have been compiled to meet the specifi c vocabulary needs of students in higher education settings (e.g Campion and Elley, 1971; Praninskas, 1972; Lynn, 1973; Ghadessy, 1979; Xue and
Nation, 1984) The Academic Word List (Coxhead, 2000) is the most widely
used today in language teaching, testing and the development of cal material It is now included in vocabulary textbooks (e.g Schmitt and Schmitt, 2005; Huntley, 2006), vocabulary tests (e.g Schmitt et al., 2001), computer-assisted language learning (CALL) materials, and dictionaries (e.g Major, 2006)
pedagogi-The Academic Word List (AWL) was created from a corpus of 414 academic
texts by more than 400 authors and totals around 3.5 million words
The Academic Corpus includes journal articles, chapters from university
textbooks and laboratory manuals It is divided into four sub-corpora of approximately 875,000 words representing broad academic disciplines: arts, commerce, law and science Each sub-corpus is further subdivided into seven subject areas as shown in Table 1.1
Trang 33Like the General Service List, the Academic Word List is made up of word
families Each family consists of a headword and its closely related affi xed forms according to Level 6 of Bauer and Nation’s (1993) scale, which includes all the infl ections and the most frequent and productive deriva-
tional affi xes For example, the words presumably, presume, presumed, presumes,
presuming, presumption, presumptions and presumptuous are all members of the
same family
Coxhead (2000) selected word families to be included in the AWL on the basis of three criteria:
1 Specialized occurrence: a word family could not be in the fi rst 2,000
most frequent words of English as listed in West’s (1953) General Service
List.
2 Range: a word family had to occur in all four academic disciplines with
a frequency of at least 10 in each sub-corpus and in 15 or more of the
word families included in Sublist 1 are headed by the word forms analyse,
benefi t, context, environment, formula, issue, labour, research, signifi cant and
Table 1.1 Composition of the Academic Corpus (Coxhead 2000: 220)
Running words Texts Subject areas
Arts 883,214 122 education; history; psychology; politics;
psychol-ogy; sociology Commerce 879,547 107 accounting; economics; fi nance; industrial rela-
tions; management; marketing; public policy Law 874,723 72 constitutional law; criminal law; family law and
medico-legal; international law; pure cial law; quasi-commercial law; rights and remedies
commer-Science 875,846 113 biology; chemistry; computer science; geography;
geology; mathematics; physics
Trang 34What is academic vocabulary? 13
vary Examples of the least frequent word families in Sublist 10 are assemble, colleague, depress, enormous, likewise, persist and undergo
Academic words are likely to be problematic for native as well as native students as a large proportion of them are Graeco-Latin in origin and refer to abstract ideas and processes, thus introducing additional prop-ositional density to a text (cf Corson, 1997) Scarcella and Zimmerman (2005: 127) have also shown that mastery of derivative forms makes aca-demic words particularly diffi cult for foreign language learners who often fail to analyse the different parts of complex words
non-1.1.3 Technical terms
Domain-specifi c or technical terms are words whose meaning requires scientifi c knowledge They are typically characterized by semantic special-ization, resistance to semantic change and absence of exact synonyms (cf Mudraya, 2006: 238–9) As explained by Nation (2001: 203), some prac-titioners consider that it is not the English teacher’s job to teach technical terms These words are best learned through the study of the body of knowledge that they are attached to Language teachers are not specialists
in chemistry, computer science, law or economics and may have a great deal of diffi culty with technical words By contrast, learners who specialize
in the fi eld may have little diffi culty in understanding these words (Strevens, 1973: 228)
Since technical terms are highly subject-specifi c, it is possible to identify them on the basis of their frequencies of occurrence, range and distribu-tion (see Section 2.3) and to use them as a way of characterizing text types (Yang, 1986) Technical terms occur with very high or at least moderate frequency within a very limited range of texts (Nation and Hwang, 1995) In
biology, for example, we fi nd words such as alleles, genotype, chromatid,
cyto-plasm and abiotic These words are very unlikely to occur in texts from other
disciplines or subject areas Technical vocabulary is diffi cult to quantify According to Coxhead and Nation (2001), technical dictionaries contain probably 1,000 headwords or less per subject area Research suggests that knowledge of domain-specifi c or technical terms allows learners to under-stand an additional 5 per cent of academic texts in a specifi c discipline
1.1.4 Fuzzy vocabulary categories
Although core words, academic words and technical terms are described
as if they were clearly separable, the boundaries between them are fuzzy
Trang 35(cf Yang, 1986; Mudraya, 2006; Beheydt, 2005) As Nation and Hwang remark, ‘any division is based on an arbitrary decision on what numbers represent high, moderate or low frequency, or wide or narrow range, because vocabulary frequency, coverage and range fi gures for any text or group of texts occur along a continuum’ (1995: 37) Chung and Nation (2003) investigate what kinds of words make up technical vocabulary in anatomy and applied linguistics texts They classify technical terms on a four-level scale designed to measure the strength of the relationship of a word to a particular specialized fi eld Results for vocabulary in anatomy texts are given in Table 1.2 Chung and Nation consider items at Steps 3 and 4 to be technical terms, but not items at Steps 1 and 2 A large pro-portion of technical words belong to the 2,000 most frequent word families of English as given in the GSL or to the AWL In the anatomy texts,
16.3 per cent of the word types at Step 3 are from the GSL or AWL (e.g cage,
chest, neck, shoulder) This increases to 50.5 per cent in the applied linguistics
texts (e.g acquisition, input, interaction, meaning, review) A major result of
this study is that a word can only be described as general service, academic
or technical in context
Table 1.2 Chung and Nation’s (2003: 105) rating scale for fi nding technical
terms, as applied to the fi eld of anatomy
Step 1
Words such as function words that have a meaning that has no particular relationship with
the fi eld of anatomy, that is, words independent of the subject matter Examples are: the, is,
between, it, by, adjacent, amounts, common, commonly, directly, constantly, early and especially
Step 2
Words that have a meaning that is minimally related to the fi eld of anatomy in that they
describe the positions, movements, or features of the body Examples are: superior, part,
forms, pairs, structures, surrounds, supports, associated, lodges, protects.
Step 3
Words that have a meaning that is closely related to the fi eld of anatomy They refer to parts, structures and functions of the body, such as the regions of the body and systems of the body Such words are also used in general language The words may have some
restrictions of usage depending on the subject fi eld Examples are: chest, trunk, neck,
abdomen, ribs, breast, cage, cavity, shoulder, skin, muscles, wall, heart, lungs, organs, liver, bony, abdominal, breathing Words in this category may be technical terms in a specifi c fi eld like
anatomy and yet may occur with the same meaning in other fi elds where they are not technical terms.
Step 4
Words that have a specifi c meaning to the fi eld of anatomy and are not likely to be used
in general language They refer to structures and functions of the body These words have
clear restrictions of usage depending on the subject fi eld Examples are: thorax, sternum,
costal, vertebrae, pectoral, fascia, trachea, mammary, periosteum, hematopoietic, pectoralis, viscera, intervertebral, demifacets, pedicle.
Trang 36What is academic vocabulary? 15
Similarly, it has been shown that the GSL contains words that appear with
particularly high range and frequency in academic texts (e.g example, reason,
argument, result, use, fi nd, show) (cf Martínez et al., 2009: 192) These words
may be used differently in academic discourse For example, Partington
(1998: 98) has shown that a claim in academic or argumentative texts is not
the same as in news reporting or a legal report On the other hand, the AWL
includes words that are extremely common outside academia (e.g adult,
drama, sex, tape) (Paquot, 2007a) Hanciog˘lu et al argue that ‘the tion that any high frequency word outside the GSL coverage in the academic corpus would be a de facto academic item perhaps accounts for the distinctly “un-academic” texture of some of the items on the list’ (Hanciog˘ lu
assump-et al., 2008: 462) They also comment that the fact that ‘items such as study appear in the GSL (but not in the AWL) and items such as drama in the AWL
(but not in the GSL), suggests that the division of vocabulary into mutually exclusive lists is likely to be an activity that for all its initial convenience may prove inherently problematic in the long run’ (ibid.: 463)
Originating from research on vocabulary needs for reading sion and text coverage, the division between core words and academic words is very practical for assessing text diffi culty and targeting words that are worthy of explanation when reading an academic text in the classroom Most English for Academic Purposes (EAP) students recognize core words
comprehen-but are not familiar with the meaning of academic words such as amend,
concept, implement, normalize, panel, policy, principle and rationalize, which are
not very common in everyday English These words are, however, relatively frequent in academic texts and students will most probably encounter them quite often while reading They should therefore be the focus of an aca-demic reading course
The division of vocabulary into three mutually exclusive lists becomes problematic, however, when it is transposed to academic writing courses and the need arises to distinguish between knowing a word for receptive and productive purposes As early as 1937, West argued that ‘both as regards Selection and still more as regards detailed Itemization, there is a need of a divorce between receptive and productive work’ (West, 1937: 437) and regretted that teachers were giving
composite lessons aiming at teaching reading and speaking ously, whereas reading and speaking are the Hare and the Tortoise Reading and speech bear the same relation to each other as musical
simultane-appreciation and actual execution on the piano The one is Recognition of
a lot; the other is Skill in using a little (ibid.)
Trang 37Learning vocabulary for productive purposes has been found to be much more diffi cult than learning for receptive uses Knowing a word produc-tively involves, for example, being able to pronounce and/or spell it correctly, produce it to express the intended meaning in the appropriate context, and use it with words that commonly occur with it (Nation, 2001:
27–8) Selection is thus a key issue in teaching vocabulary for academic
writing and speaking It is questionable whether all the words from the AWL should be the focus of productive learning And yet this strategy lies at the heart of several recent textbooks (e.g Schmitt and Schmitt, 2005; Huntley, 2006) and CALL materials (see, e.g., Gillett’s website about vocab-ulary in EAP < http://www.uefap.com/vocab/vocfram.htm>; Luton’s Exercises for the Academic Word List < http://www.academicvocabularyex-ercises.com> and Haywood’s AWL Gapmaker <)
Several scholars have suggested replacing separate lists of general service words, academic vocabulary and technical terms by a single list, either a more specialized list or a larger common core vocabulary Ward (1999), for example, built an engineering word list of 2,000 word families which con-tains both technical terms and all the general words necessary for reading comprehension and shows that it provides 95 per cent coverage of many basic engineering texts (see also Mudraya, 2006) Others, by contrast, have
tried to revise the General Service List, to ensure maximum utility for
any learner, regardless of specialization Billurog˘ lu and Neufeld (2007) combined into one list all the words from: (1) the GSL, (2) the AWL,
(3) the fi rst 2,000 words of the Brown corpus, (4) the fi rst 5,000 words of the British National Corpus, (5) the revised version of the GSL, (6) the
Longman Wordwise of commonly used words and (7) the Longman Defi ning Vocabulary The resulting Billurog˘ lu-Neufeld-List (BNL) consists
of 2,709 word families categorized according to the number of lists in which they were represented This procedure led to the emergence of only 176 word families that were not in either the GSL or the AWL, thus confi rming that ‘if the GSL was enlarged by even a relatively small degree, [ ] much
of the AWL would be absorbed into it’ (Hanciog˘ lu et al., 2008: 466) See Stein (2008) for a similar approach
A fi nal criticism that can be levelled at the AWL is related to the notion of
a word family The AWL, as well as most word lists for learners of English,
groups words into families Other examples include the GSL, the University
Word List (Xue and Nation, 1984) and recent domain-specifi c lists such as
those developed by Ward (1999) and Mudraya (2006) Coxhead (2000: 218) argues that this practice is supported by psycholinguistic evidence suggesting that morphological relations between words are represented in
Trang 38What is academic vocabulary? 17
the mental lexicon This may well be true and may justify the use of word families for receptive purposes However, not all members of a word family are likely to be equally helpful in academic writing For example, under
the headword item, which has a relative frequency of 134.29 occurrences per million words in the academic part of the British National Corpus (see Section 3.3), we fi nd the noun itemisation and word forms of the verb
itemise However, these two lemmas are quite rare in academic writing,
with relative frequencies of 0.06 and 1.17 occurrences per million words respectively A related problem is that parts-of-speech are not differenti-ated Table 1.3 shows several word families taken from the AWL: the only information provided is that the words in italics are the most frequent form
of their family This, however, does not tell us whether the word forms issue and issues (under the headword issue) are more often used as nouns or
verbs in EAP
1.2 Academic vocabulary and sub-technical vocabularyLike Coxhead (2000), Nation (2001: 187–96) uses the term ‘academic vocab-ulary’ to refer to words that are not in the top 2,000 words of English but which occur reasonably frequently in a wide range of academic texts Unlike Coxhead, however, he also uses it to label a whole set of lexical items also known as ‘sub-technical vocabulary’ (Cowan 1974; Yang, 1986; Baker, 1988; Mudraya, 2006), ‘semi-technical vocabulary’ (Farrell, 1990), ‘non-technical terms’ (Goodman and Payne, 1981), and ‘specialised non-technical lexis’
Table 1.3 Word families in the AWL
itemisation itemise itemised itemises itemising
items
stressed stresses stressful stressing unstressed
utilisation utilised utilises utilising utiliser utilisers
utility
utilities utilization utilize utilized utilizes utilizing
Trang 39(Cohen et al, 1988) However, all these terms have been used quite ently in the literature Cowan defi nes sub-technical vocabulary as ‘context independent words which occur with high frequency across disciplines’ and comments that,
differ-Clearly some of what I am calling sub-technical vocabulary would be
encompassed in the existing word frequency counts like Thorndike Lorge, Michael West’s General Service List and the recent one million word com-
puter analysis by Henry Kucˇera and Nelson Francis (Cowan, 1974: 391)
Cowan’s defi nition of sub-technical vocabulary applies to those words that have the same meaning in several disciplines Trimble (1985) extends Cowan’s (1974) usage to include ‘those words that have one or more
“general” English meanings and which in technical contexts take on extended meanings’ (Trimble 1985: 129) Trimble’s defi nition thus encompasses words
such as junction, circuit, wage and cage that would be categorized as technical
terms according to Chung and Nation’s (2003) four-level rating scale of nicality or fi eld-specifi city (see Table 1.2) (see also Farrell, 1990: 37)
tech-Cohen et al (1988) regard the extended meanings of what they call
‘non-technical’ words as a major area of diffi culty for non-native readers who may only be aware of one of their meanings In biology, for example,
the adjective specifi c may also be used with reference to the genetic notion
of specifi city, which is a characteristic of enzymes A second area of diffi culty
arises because non-technical words may be used in contextual paraphrases
to refer to the same concept (e.g repair processes and repair mechanism
in a genetics text), thus causing problems of lexical cohesion at the level of synonymy Cohen et al (1988) identify a subset of non-technical vocabulary
as a third area of diffi culty, viz ‘specialized non-technical lexis’ They do not offer a precise defi nition of the term, but explain that this lexis includes vocabulary items indicating, for example, time sequence, measurement, or truth validity They show that a large proportion of vocabulary items which indicate time sequence or frequency in a genetics text are unknown to their
informants (e.g ensuing, alternatively, consecutively, intermittently, subsequent and successive).
In Li and Pemberton’s (1994) view, sub-technical vocabulary as defi ned
by Trimble (1985) is an important subset of academic vocabulary They showed that fi rst-year computer science students are better able to recog-nize the technical meanings of sub-technical words than their non-technical meanings For example, they are quite familiar with the technical meaning
of the verb compile in computer science and tend to interpret it as ‘convert
Trang 40What is academic vocabulary? 19
or translate a language into a machine code’ or ‘translate’ regardless of the context in which the word occurs This is problematic as the non-technical meaning of a sub-technical word is often more common than its technical
meaning (see Mudraya, 2006) For example, the word solution is more
fre-quently used in its non-technical sense in engineering textbooks, even in a chemical engineering thermodynamics textbook
Baker (1988) has argued that this middle area between core and cal vocabulary is itself made up of several different types of vocabulary:
techni-1 Items which express notions shared by all or several specialized
disciplines Examples include factor, method and function.
2 Items which have a specialized meaning in a particular fi eld, in addition
to a different meaning in general language (e.g bug in computer science, solution in mathematics and chemistry).
3 Items which are not used in general language but which have different
technical meanings in different disciplines (e.g morphological in
linguis-tics, botany and biology)
4 General language items which have restricted meanings in one or more
disciplines In botany, ‘genes which are expressed have observable effects, i.e are more apparent physically, as opposed to being masked Expressed in
botany is therefore not associated with emotional or verbal behaviour as
is the case in general language’ (Baker, 1988: 92)
5 General language items which are used, in preference to other cally equivalent items, to describe or comment on technical processes and functions For example, an examination of biology textbooks showed
semanti-that photosynthesis does not happen but takes place or occasionally occurs Baker thus comments that take place and occur can be regarded as sub-
technical words
6 Items which are used in academic texts to perform specifi c rhetorical functions These are ‘items which signal the writer’s intentions or his evaluation of the material presented’ (Baker, 1988: 92)
Martin uses the term academic vocabulary as a synonym for sub-technical vocabulary to refer to words that ‘have in common a focus on research, analysis and evaluation – those activities which characterize academic work’ (1976: 92) The vocabulary of the research process consists primarily of
verbs, nouns and their co-occurrences (e.g state the hypothesis and expected
results; present the methodology; plan or design the experiment; develop a model)
The vocabulary of analysis includes high-frequency verbs and two-word verbs that are ‘often overlooked in teaching English to foreign students but