A Corpus-based Study on Collocations of Keywords in English Business Articles on the European Debt Crisis Đào Thị Ngọc Nguyên Trường Đại học Ngoại ngữ Luận văn Thạc sĩ ngành: Ngôn ngữ A
Trang 1A Corpus-based Study on Collocations of Keywords in English Business Articles on the
European Debt Crisis Đào Thị Ngọc Nguyên
Trường Đại học Ngoại ngữ Luận văn Thạc sĩ ngành: Ngôn ngữ Anh; Mã số: 60 22 15
Người hướng dẫn: TS Phạm Thị Thanh Thủy
Năm bảo vệ: 2012
Abstract: One of the most problematic areas when vocabulary is dealt with is
collocation It is often seen as arbitrary and overwhelming, a seemingly insurmountable obstacle to the attainment of native like fluency This piece of work presents a study on collocations of keywords within a 20,000-word corpus of various English business articles about the European debt crisis 2011 The aim of the present study is to find out high-frequency words used within the corpus, and above all, to examine collocation patterns of keywords that distinguish the business genre of the selected texts Concordance Program 3.3 is the main methods employed throughout the study for the data collection and analysis The major findings of the research are a good number of striking collocation patterns some of the most recurrent keywords possess The major findings drawn from the research is the basis for the recommendation of pedagogical implications and suggestions for raising students'
consciousness of the English collocation acquisition
Keywords: Ngôn ngữ; Tiếng Anh; Từ vựng
Content
I INTRODUCTION
I.1.Statement of the problem and rationale of the study
However, no matter how convinced learners of English in principles of the importance
of vocabulary, the vocabulary acquisition actually poses enormous difficulties to them One of the most complicated problems arising when vocabulary is dealt with is how to combine and use words appropriately in accordance with culture or language conventions, which is often referred to as “collocation competence” (Hill,1999)
Collocations are usually defined as words that typically occur in association with other words; in reality, they run through the whole of the English language and they are as old as the language itself No piece of natural spoken and written English is totally free of
Trang 2collocations Because of their widespread use, the role that collocations play in the language
is absolutely undeniable
For learners of English in general, with collocation competence, they should have the ability to combine lexical (and grammatical) chunks in order to produce fluent, accurate, as well as semantically and stylistically appropriate utterances For business English learners in particular, a good knowledge of collocation patterns in English is also of great importance The most important characteristics of the language of business English, as opposed to the language of general English, are a sense of purpose, intercultural dimension and a need for clear, straightforward and concise communication (Ellis & Johnson, 1994) In order to achieve these broad objectives of business English learners, teachers have to find out the best ways to teach business performance skills such as socializing, telephoning, meeting, presentation, and report writing In all these situations, collocation competence is significantly essential
With the rise of computing power as well as the acceptance of corpus linguistics since 1990s, collocations have received serious treatment The dramatic rise in processing power of computers now makes it possible to quickly compose lists of frequency for lexical items in a large corpus At the same time, there have been a large number of different software programs installed for keywords and collocations extract from corpus data Such software packages have made easier access to the investigation into typical lexical items and their collocations of any particular text genres
With the writer‟s personal interest in collocations as a researcher and observations of students‟ tough experience in dealing with collocations in business discourse as a tutor of business learners, this thesis provides a comprehensive research on collocations of keywords
in a variety of business articles written about a currently hot topic for business learners, the European debt crisis The thesis, therefore, is carried out in the hope that it may be of some help to business learners of English as well as those who find themselves interested in English semantics and collocation-related issues
I.2 Aims of the study
The aim of this research is to conduct a close investigation into collocations of keywords from a corpus of a certain number of business articles written about the European debt crisis
Trang 3To be specific, it identifies words with high frequency of occurrence within the chosen corpus and examines their collocations The research, therefore, is carried out to answer the
following research questions:
What are the top high-frequency words in the corpus of written articles about the European debt crisis?
What are significant patterns and features of collocations of such keywords?
I.3.Scope of the study
This study is about to discuss keywords and their collocations in 15 written articles about the European debt crisis The designed corpus of over 20,000 words is taken from
online business articles from websites of high reputation such as The Washington Post, Money
CNN, ….Keywords chosen for analysis of significant patterns of collocation within the study
are those which can distinguish the business genre of the selected articles
I.4 Structure of the thesis
The study is organized as follows Chapter I-Introduction- is firstly introduced,
briefly stating the rationale, aims, scope and organization of the study Secondly, chapter
II-Literature review- deals with the literature setting the background for the study Thirdly,
chapter III- Research Methodology- is a presentation on the methodology of the research,
referring to the research design, data collection procedures and analytical framework of the
study Next, on chapter IV-Results and Discussion-, a detailed discussion of collocations
keywords in the selected corpus is carried out, through which some interesting aspects can be
revealed In chapter V-Conclusion- major findings of the study and pedagogical implications
and suggestions are presented
II LITERATURE REVIEW
II.1 Corpus linguistics
Trang 4Corpus linguistics (hereafter CL) deals with the principles and practice of using such corpora in language study As a branch of linguistics, it differs from traditional linguistics as it
is related to the study of authentic examples of language (Sinclair, 1997) The main focus on
CL is to discover pattern of authentic language in order to verify a hypothesis about language, for example, to determine how the usage of a particular sound, word, or syntactic construction varies This, in turn, allows learners and researchers to ascertain related linguistic patterns and structures for the goals of their research
II.2 Sense and sense relations
In Nguyen Hoa‟s words (Nguyen Hoa, 2000:56), "sense is a philosophical term for meaning" Meaning and sense are closely related; however, sense is sometimes distinguished from meaning The meaning of a word is seen as part of the language system whereas sense is the realization of this meaning in speech According to John Lyons (1995:80), the sense of an expression may be defined as the set, or network, of sense-relations that hold between it and other expressions of the same language
Sense relation is the kind of relationship between vocabulary items when they are arranged in texts, spoken or written: how they are related to one another in terms of their meaning; how they may or may not substitute for one another; how similar or how different they are to each other and so on
II.3 Transference of meaning
In English, there are basically two types of meaning transference, namely metaphor and metonymy
II.3.1 Metaphor
According to Nguyen Hoa (2004:105), "metaphor is the transference of meaning from one object to another based on the similarity between these two objects" Traditionally, metaphors have been viewed as implicit comparisons
II.3.2 Metonymy
According to Nguyen Hoa (2004:112), metonymy can be defined as "the substitution
of one word for another with which it is associated" Thus, metonymy works by continuity rather than similarity, which means that instead of the name of one object or notion we use the name of another because these objects are associated or closely related
According to Lyons (1995:314), body parts are favourite sources of metonymy, and
many such expressions have been incorporated into the language, with words like hand, heart,
head as in have a hand in, bear one's heart, or keep your head
Trang 5II.3.3 Other types of meaning transference
Besides metaphor and metonymy, there are other types of meaning transference
involving hyperbole, litotes, irony, and euphemisms
II.4 Collocation
II.4.1 Definition of collocation
Different linguists have different definitions of collocation Moira Runcie in Oxford
Collocation Dictionary gives a general definition in which collocation is defined as the way
words combine in a language to produce natural-sounding speech and writing To a native speaker, these combinations are highly predictable; to a learner they are anything but Specifically speaking, Chitra Fernando, Richards and others (1996:62) states that collocation refers to the restrictions on how words can be used together, for examples which prepositions are used with particular verbs or which verbs and nouns are used together In Kjellmer (1994:xiv & xxxiii), collocation is "such recurring sequences of items as are grammatically
well formed" Kathleen R McKeown and Dragomir R Radev in their paper on Collocations
regard collocations as word pairs and phrases that are commonly used in language with no general syntactic or semantic rules applied Additionally, many linguists have tried to define collocation by presenting its functions Halliday (1966) and Sinclair (1966) introduced the notion that patterns of collocation can form the basis for a lexical analysis of language alternative to, and independent of, the grammatical analysis They regarded the two levels of analysis as being complementary, with neither of the two being subsumed by the other Holding the same idea, McIntosh (1961:328) and Mitchell (1971) presented the lexical and grammatical analyses as interdependent: "Collocations are to be studied within grammatical matrices which in turn depend for their recognition on the observation of collocation similarities" (Mitchell, 1971:65) Later, Halliday (1966:151&157) argued that the collocation patterns of lexical items can lead to generalization at the lexical level Sinclair (1966:412 & 1974:16) proposed that a lexical item can be defined from its collocation pattern
II.4.2 Properties of collocation
II.4.2.1 Collocation is arbitrary
In the first place, collocation is typically characterized as arbitrary, which means that words are often combined with each other without any particular reasons
II.4.2.2 Collocation is language-specific
Secondly, collocation is language-specific as is nature persists across languages As Larson (1984:141) points out, every language interprets the physical worlds in its own way
Trang 6instances, in French, the phrase régler la circulation is used to refer to a policeman who
directs traffic, the English collocation In Russian and German, the direct translation of regulate is used; only in English is direct used in place of regulate Similarly, American and
British English exhibit differences in similar phrases Thus, in American English one says set
the table and make a decision; whereas in British English, the corresponding phrases are lay
the table and take a decision
II.4.2.3 Collocation is recurrent in context
While the two properties mentioned above indicate difficulties in determining what is
an acceptable collocation, on the positive side it is clear that collocation occurs frequently in similar contexts It is possible to observe collocations in samples of language Generally, collocations are those word pairs which occur frequently together in the same environment, but do not include lexical items which have a high overall frequency in language This property, in fact, has exploited by many researchers in natural language processing in identify collocation automatically
II.4.3 Classifications of collocation
By examining a huge number of collocates of the same syntactic category, Kathleen R
McKeown and Dragomir R Radev in their paper on Collocations identify similarities and
differences in their behavior Distinctions are made between grammatical collocations and
semantic collocations In their opinion, grammatical collocations often contain prepositions,
including paired syntactic categories such as verb + preposition, adjective + preposition, and
noun + preposition In these cases, the open-class word is called the base and determines the words it can collocate with, the collocation indicator Semantic collocations are lexically
restricted word pairs, where only a subset of the synonyms of the collocation indicator can be used in the same lexical context
In Oxford Advanced Learner's Dictionary(Moira Runcie:2002) collocation is
classified both in terms of the grammatical pattern and the strength of collocation Firstly, according to the grammatical pattern, there exist thirteen types of collocations as follows, including: adjective + noun, quantifier + noun, verb + noun, noun + noun, preposition + noun, noun + preposition, adverb + verb, verb + verb, verb + preposition, verb + adjective, adverb + adjective, and adjective + preposition
Secondly, according to the strength of collocation, collocations are categorized into four types: unique collocations, strong collocations, medium-strength collocations, and weak
collocations
Trang 7III RESEARCH METHODOLOGY
III.1 Data collecting instruments
III.1.1 Construction of Corpus
Since the study is primarily a corpus-based analysis of collocations, its findings come from a linguistic analysis of a substantial number of written articles The corpus of the study
is constructed from 15 extracted articles from four databases
III.1.1.1 Database
The database in this thesis refers to the set of publications from which articles for analysis have been extracted It consists of the following journals: the New York Times, Washington Post, The Guardian, CNNmoney.com, and Bloomberg.com The mentioned-above newspapers were chosen to serve as the database for the study because of their reliability and reputation for famous authors, prestige presses and worldwide use in the world
of economy
III.1.1.2 Extracted business articles
As mentioned above, 15 articles were extracted from the sample publications with a view to identifying, then examining keywords with high-frequency of occurrence and their collocation patterns The selected articles are all written about the European debt crisis in
2011, providing readers with up-to-date features, critical and systematical analysis of the crisis-related aspects The following table summarizes the corpus used in the study, including databases and the extracted texts A detailed referencing of each selected text can be found in Appendix
Table 1: List of the selected articles Database Information of the Articles
(Author, Date of publication, Title)
Average Text Length
Washington
Post
Ezra Klein (8 May 2011) Everything You Need to Know
about the European Debt Crisis in One Post
1392 words
Louis Cooper (3 Aug 2011) Debt Crisis in Europe:
Worries Grow of Spread to Larger Economies of Italy, Spain
527 words
Alex Witt (05 Feb 2011) Debt Crisis Unsettles European
Economy
1024 words
Money.cnn Ben Rooney (26 Nov 2011) Europe’s Debt Crisis: Five
Things You Need to Know
1420 words
Trang 8Ben Rooney (1 Feb 2011) Europe’s Debt Crisis: Where
Bloomberg Simon Johnson (23 Jan 2011) Europe’s Debt Crisis is
Still Likely to End Badly
Larry Elliot, Heather Stewards and John Hooper (9 Nov
2011) European Debt Crisis Spiraling Out of Control
1383 words
(9 Aug 2011) Debt Crisis: A Default in Europe Could
Benefit Poor Countries
Donna Rogers (02 Dec 2011) An Overview of the
European Debt Crisis
806 words
Hannelore Foerster (26 Aug 2011) European Debt Crisis 5748 words
Total Corpus Length 21,083 words
III.1.2 Concordance Program
Concordance Program is a computer program that is helpful to the corpus linguist It
is used to create word lists, count word frequency, compare different usages of a word, analyze keywords, and find phrases and idioms The Concordance Program is a general-purpose working tool for studying of text, whether the text is literary, linguistic, historical, philosophical, legal, commercial, and political or of other kinds In this study, the Concordance Program 3.3 was used to search for high-frequency words and their collocations
in the corpus of business articles
III.2 Data collecting procedures
Trang 9The research was conducted in the following steps Firstly, articles written about the European debt crisis in 2011 were copied from the websites of selected newspapers and journals, and saved as Plain Text Next, dates, titles, and the names of author in the articles were deleted from the Text Only the articles bodies were left for analysis The corpus was then fully developed from the completed Plain Text file Finally, the Concordance Program 3.3 was used to investigate the constructed corpus From the made full concordance, results and findings of the research were taken out for analysis
IV RESULTS AND DISCUSSION
The data of the study are interpreted in the following steps To begin with, analyses of the corpus are conducted using the Concordance Program 3.3, available in website: www.concordancesoftware.co.uk To get the quantitative results, 100 words with highest percentage of occurrence are listed in tables with reference to their rank and relative frequency Out of those 100 lexis, the top 25 content words are selected, from which keywords are brought out for full analysis A keyword is one which has unusually high, or low, frequency in comparison to a base reference corpus (Berber Sardinha, 1999) and thus may characterize a text or a genre (Scott, 2009) Within this study, keywords are recurrent and candifferentiate the business genre of the chosen articles However, as this is a corpus-based study on collocations, frequency alone may not be adequate; some measures of collocation strength is also required Thanks to the relatively small dimensions of the corpus, a close reading of the texts could be undertaken both manually and by computer Therefore, in the next step, concordance of the keywords is scanned in order to bring an overview of collocation patterns of keywords From that, final decision about target words for analysis is given to those with a wide and remarkable range of collocations
Once the target keywords are identified, an in-depth investigation into different collocates of the words will be carried out The investigation, in turn, is diversified as collocations are examined as regards their every possible semantic and syntactic feature For example, various senses of a word in different collocations can be interpreted through careful definition of phrases it involve, through comparisons with words convey the same meanings,
or by the researcher‟s illustrating example sentences or contexts
IV.1 Quantitative Results
Research question 1: What are the top high-frequency words in the corpus of written
business articles about the European debt crisis 2011?
Trang 10Table 2 below illustrates frequencies of the first 100 words in the corpus of well over 20,000 words from 15 selected written articles about the debt crisis in Europe in 2011
Table 2: Top 100 high-frequency words from the constructed corpus
N Word Freq % N Word Freq %
Trang 12It can be obviously seen from the table that, as in most English written texts, the most
frequent items in the corpus of the research are functional (or grammatical) words such as the,
to, of, and, a, that, for From the 8th item in the word list, the key (or content) words that
distinguish the business genre of the corpus start to appear Among these are debt, European,
Greece, crisis, Euro, countries, financial, bailout and so on Table 3 below shows the first 25
key words from the high-frequency word list of the corpus
Table 3: First 25keywords from the corpus
Trang 13N Word Freq % N Word Freq %
The top 25 keywords from the corpus, as shown in Table 3, are perhaps noticeable for
a large number of geographical names for the zone and countries in which the debt crisis occurred, accurately reflecting the fact that the three countries Greece, Italy, and Spain are among the most unfortunate victims of the crisis
IV.2 Collocation analysis of content keywords
Research question 2: What are significant patterns of collocations of the content keywords
Trang 14IV.2.1 DEBT and CRISIS
DEBT and CRISIS are the top high-frequency content words of business genre among all the words in the selected articles with the relative frequency of 179 (0.852%) and 96 (0.457) respectively Both of the two words take a wide range of collocates within the corpus They are selected for analysis at the same time and in the same section as they themselves frequently occur together throughout the articles and their collocates share a good number of common features
At the first glance, it is noteworthy that almost all of the adjectives shown in Table 4 below are used attributively within the corpus, coming before the noun CRISIS they modify
(only continuing, looming, imminent, unshakable are excluded) Semantically, these are
general adjectives susceptible to objective measure since they are used to describe the
existence and development of the debt crisis This should be a signal feature of the corpus genre; the objectiveness and concision must be guaranteed in the provision of factual
information in business articles in order to accurately report the issues
Table 4: Adjectives collocating with CRISIS
financial immediate looming possible
ongoing underlying economic sovereign
mounting full-blown continuing growing
potential imminent slow-moving unshakable
full-on European
Adjectives in the combinations with DEBT as shown in Table 5, on the other hand, are
remarkable for the predominance of words indicating the „debt owner‟ such as European,
Trang 15Greek, Italian, Irish, or nation’s and country’s – nouns in the possessive case functioning as
adjectives This tells readers about countries that suffered the most in the stories told
Table 5: Adjectives collocating with DEBT
existing sustainable nation‟s country's
Spanish Jamaica‟s French
Both of the two groups of adjectival collocations are also distinct for a significant
number of –ing adjectives coming from the same families with verbs describing trends to
indicate the current status of the debt crisis at the time it was written about Some of other
examples are continuing, looming, rising, growing, ongoing and so on
While most of the adjectives in Table 4and Table 5are widely used, some of the words should be focused for attention as when in collocation with CRISIS and DEBT such as
immediate, unshakable, or bad, which may convey such meaning that causes confusion
among learners
A look at the nominal collocations of CRISIS in Table 8 and DEBT in Table 9 above
reveals various compound nouns of the words in the corpus, including debt crisis, future
crisis, crisis management, crisis victims, debt burden, debt payment, housing debt, public debt
and so on