The data for the analysis were taken from two popular corporawhich are Corpus of Contemporary American English and Time Magazine Corpus.The analysis suggests the frequency of the use of
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF POST-GRADUATE
Triệu Tuấn Anh
Title: A corpus-based analysis of the collocates of the word
“homeland” in the 1990s, 2000s and 2010s.
(Nghiên cứu đồng định vị của từ “homeland” qua các thập niên 1990, 2000 và
2010 trên cơ sở ngôn ngữ học khối liệu)
Major: English Linguistics
Code: 60.22.15
Supervisor: Assoc Pro Tran Xuan Diep
Hanoi, Sep 2013
Trang 2My sincere thanks also go to all the lecturers at University of Foreign Languagesand International Studies, Vietnam National University (Hanoi), who helped me build up
a solid theoretical background studies and research methods through their invaluablelessons
Furthermore, I wish to express my thanks to all of my colleagues at Faculty ofEnglish, Hanoi National University of Education for supporting me in my job during thetime I took this course Without their support, I could not fully concentrate on my study
ii
Trang 3This study is intended to describe a corpus-based analysis of the collocates ofthe word “homeland” The data for the analysis were taken from two popular corporawhich are Corpus of Contemporary American English and Time Magazine Corpus.The analysis suggests the frequency of the use of the word, and will show howsimilarly or differently it is used over periods of time In order to analyze the data,both qualitative and quantitative methods were employed Significant conclusionswere drawn: (1) There was a great increase in the use of the word “homeland” overthe 1990s, 2000s and 2010s, and this trend tends to go upward in this decade (2)There was a shift in the use of the word “homeland” This word was almost used as anoun in the 1990s to refer to the geographic space related to a particular groupwhereas it was mainly used as a noun or adjective to modify the word “security” or torefer to a political department in the 2000s and 2010s
Trang 4TABLE OF CONTENTS
Part I: Introduction
1 Rationale of the study
2 Objectives of the study
3 Scope of the study
4 Design of the study
Part II: Development
Chapter I: Literature review
Chapter II: Theoretical background
Chapter III: Methodology
1 Subjects of the study
2 Research methodology
3 Data collection instrument
Chapter IV: Findings and Discussion
1 The frequency of the use of the word “homeland”
2 The meanings of the word “homeland”
Trang 5PART I:
INTRODUCTION
I Rationale of the study
It is clearly seen that the homeland is the common topic in people‟sconversations; especially, it is an endless inspiration for the authors and thewriters Although one may travel or have to live in different places all over theworld, the homeland still plays an important part and exerts such certaininfluences on his life as the appearance or the characteristics, and that is theplace that one often feels the most comfortable This fact can explain why it isnormally said that “One‟s homeland is even greater than the heaven.”
To the American, the homeland has a great importance because theAmerican have different values from people from other countries, and they
seem to be proud of their country This importance is normally expressed through language Meanwhile, language is shown in corpora where not only
various forms of language but also a significant number of written and spokentexts are stored electronically Studying linguistic features of texts discloses thewriters‟ and speakers‟ intention The Corpus of Contemporary AmericanEnglish and the Time Magazine Corpus are among the biggest corpora whichcover authentic language use on different fields and language use over time Byusing these two corpora, how the language has been used and how thelanguage has changed over periods of time can be revealed
In the linguistic research field, there are a huge number of works usingcorpus-based approach; nonetheless, only a few scientific studies related tohomeland have been conducted Consequently, this study, employing authentictexts and exploring the topic “homeland”, is carried out with the aim at fillingthat gap
Another important factor is corpus linguistics, despite its long history,seems to be quite new to me and attracts my attention There are so many
Trang 6problems which can be dealt with by using corpus-based approach, one ofwhich is collocation analysis This personal interest as well as the above-statedreasons has inspired me to conduct this paper, entitled “A corpus-basedanalysis of the collocates of the word “homeland” in the 1990s, 2000s and2010s The corpora used for the analysis are the Corpus of ContemporaryAmerican English (COCA) and Time Magazine Corpus.
II Objectives of the study
The study are carried out with two purposes, which are exploring thewords which collocate in the highest frequency with the target word
“homeland” in the period of three decades in the COCA Corpus and TimeMagazines Corpus Also, it is intended to find out whether the use of that wordremained unchanged over three decades or changed
In order to achieve the objectives, two specific research questions were raised:
1. What are the words collocating in the highest frequency with the word
“homeland” in the periods of 1990s, 2000s and 2010s in the COCA Corpus and Time Magazine Corpus?
2. How has the use of the word “homeland” changed during the last three decades?
BI. Scope of the study
As the title of this paper suggests, the aim of the research is exploringthe collocates of the word “homeland” over three periods of time There exist
so many corpora in the world now; therefore, the writer of this paper has littleintention of employing all the corpora available Instead, he merely analyzesthe collocates based on the data in three selected corpora, namely COCA andTime Magazine Corpus The data of these two corpora are gathered from bothspoken and written language through different sources
Furthermore, the use of each word may stay unchanged all the time, or itmay change over time However, the writer of this paper does not wish to look
2
Trang 7at the trend over many periods of time, but only the use of the word in the 1990s, 2000s and 2010s are explored.
IV Design of the study
The study includes three parts which are as follows:
1. Part I: Introduction This part aims at providing the readers with basicinformation including rationales, objective of the study, scope of the studyand its design
2. Part II: Development:
Chapter 1: Literature review: this chapter presents what other linguists have done before related to the field
Chapter 2: Theoretical background: This part serves to provide thetheory to the study, which pays attention to corpus linguistics andcollocation analysis
Chapter 3: Methodology This chapter introduces the subjects of thestudy, the research approach, the instrument of data collection andprocedures implemented in the study
Chapter 4: Findings and Discussion This is considered the mostimportant part of any research This chapter will show which wordscollocate in the highest frequency with the word “homeland” in theCOCA corpus and Time Magazine Corpus Also, this part will alsoconfirm whether there is a shift in the use of the word “homeland” ornot
3. Part III: Conclusion This part summarizes all the important pointsdiscussed in the research; also, it will give some suggestions for furtherresearch
Trang 8PART II: DEVELOPMENT CHAPTER 1: LITERATURE REVIEW
In this part, the writer of this paper will review what other linguists have donebefore associated with the field of corpus linguistics
Corpus-based techniques have been employed in many studies which haveattempted to investigate the differences in language use
Pearce (2008) carried out a study using corpus-based approach He looked atcollocates of the lemmas “man” and “woman” He used the corpus analysis tool SketchEngine in order to distinguish which verbs tend to co-occur with “man” and :woman” Hethen came up with the conclusion that women tended to take the object of verbs whichdenoted sexual violence, coercion and observation such as „rape”, “categorize”,
“monitor” and “define”, and women co-occurred as the subject of verbs whichconstructed them as irritating: “fuss”, “annoy” or “nag” In contrast, men were both theobject and subject of non-sexual violence verbs This word normally collocated withwords like “oppress”, “betray” or “raid”
Baker (2010) conducted a study named “Will Ms ever be frequent as Mr” with theaim at exploring frequency and context of usage of gender marked language In thisstudy, he collected the data from four equal sized and equivalently sampled corpora ofBritish English in a range of written genres (press, fiction, general prose and learnedwriting) from 1931, 1961, 1991 and 2006 He investigated terms related to male andfemale pronouns, man, woman, boy and girl, gender-related profession and such rolenouns as chairman, spokesperson and policewoman, and terms of address as Mr and Ms.The writer finally drew the conclusion that there were some reductions in frequencies ofmale terms, particularly decreases of male pronouns and Mr It was also found that whilethere were some reductions in gender stereotypes, others were being maintained (such as
a lack of adjectives associated with women‟s success or power) Additionally, the term
4
Trang 9“girl” was still more likely than the term “boy” to refer to adults, and it was often used in
a sexual way
Fang (2008) conducted the research discussing the meaning of the text segment
international community in two different discourse communities: GuCorpus (British) and
PdCorpus (Chinese), which are somehow typical for two discourse communities inWestern and Asian countries By exploring the different collocates and grammaticalstructures within each community, he could figure out the different ways in which thephrase was used
These studies mentioned above have proposed outstanding findings which againconfirm the fact that the meaning of a word can only be understood and interpretedthrough its collocation collected by a corpus of authentic data
The writer of this paper has found that despite the availability of a huge number ofresearch papers employing corpus linguistics approach, no corpus-based study affiliatedwith homeland has been conducted before This paper, hence, is carried out aiming atfilling that gap The data used for analysis will be taken from the authentic data
Trang 10a language”.
Sinclair (1991) states that “a corpus is a collection of naturally occurringtexts, chosen to characterize a state or variety of a language” Similarly, Reppen(2010) defines that “a corpus is a large, principled collection of naturally occurringtexts (written or spoken) stored electronically He then clarifies the terms used inhis definition:
- “naturally occurring texts” is the language that is from actual languagesituations, such as friends chatting, meetings, letters, class assignments andbooks rather than surveys, questionnaires or just made-up language
- “a principled collection”: the design of the corpus must be principled The texts
in the corpus need to represent the type of language that the corpus is intending
to capture For example, if a corpus is to be representative of written language,then the corpus designer would need to make a comprehensive list of thedifferent written language situations
- Stored electronically: the corpus can be saved in text format, rich text format orweb-based format
Although each scholar has a different view of the definition of the corpus, many
of them share the same following characteristics of the corpus:
- The language must be authentic rather than made-up
- The collection of data must be principled
- The corpus is electronically saved
6
Trang 112. Notable corpora
There are a huge number of corpora thanks to the development of science and
technology Wynne and Prytz (2012) illustrate some types of corpora and some
famous English examples as shown in the following table:
addedand
“monitor”languagechange
Trang 12Parallel Same texts
inmorelanguages
morelanguages orlanguagevarieties
differentperiods,
8
preferablecomparable
Trang 13MrEnery and Wilson (1996) define corpus linguistics as “the study of
language based on examples of real life language use” However, unlike
qualitative approaches to research, corpus linguistics uses bodies of electronically
encoded text, implementing a more quantitative method
Bennett (2010) provides a simpler definition of corpus linguistics, that is
“corpus linguistics approaches the study if language in use through corpora A
corpus is large, principled collection of naturally occurring examples of language
9
Trang 14stored electronically He also states that corpus linguistics, in short, serves to answer two fundamental research questions:
What particular patterns are associated with lexical and grammatical features?
How do these patterns differ within varieties and registers?
Biber, Conrad and Reppen (1998) identify four main features of corpus linguistics as follows:
It is empirical, analyzing the actual patterns of language use in natural texts
It utilizes a large and principled collection of natural texts as the basis foranalysis
It makes extensive use of computers for analysis
It depends on both quantitative and qualitative analytical techniques
4. Corpus linguistics and Discourse analysis
Corpus-based approach is found to be of great value since it can beapplied to a number of areas of linguistics, one of which is discourse analysis.Conrad (2002) points out four major approaches that corpus linguistics canaddress the discoursal-level phenomena:
investigating characteristics associated with the use of a language feature, for
example, analyzing the factors that affect the omission or retention of that in
complement clauses;
examining the realizations of a particular function of language, such as
describing all the constructions used in English to express stance;
characterizing a variety of language, for example, conducting a dimensional analysis to investigate relationships among the registers used indifferent settings at universities;
multi-10
Trang 15 mapping the occurrences of a feature through entire texts, for example, tracinghow writers refer to themselves and their audience as they construct authority
in memos
5. Collocations
Phraseology, the study of phrases, is regarded as a central element ofcorpus linguistics Sinclair (1991) determined that the meaning of a word is foundthrough several words in a sequence, through phrases Phraseology includes thestudy of collocations, lexical bundles, and language occurring in preferredsequences This paper merely lays an emphasis on collocation
Until the present days, the term “collocation” seems to be difficult to bedefined clear-cut, and each linguist has different point of view about the definition
of “collocation”; thus, this term is still controversial:
Firth (1957) states that “collocations of a given word are statements of thehabitual or customary places of that word.”
According to Manning (1999), a collocation is “an expression consisting
of two or more words that correspond to some conventional way of saying things”.Likewise, Lewis (2000) defines that “a collocation is two or more words that tend
to occur together.”
Although each linguist has different viewpoints, they all share the samepoint that a collocation is the regular combination of lexical items Benson (1985)points out that lexical collocations include:
Verb + noun (Eg: to do homework)
Adjective + noun (Eg: a big deal)
Noun + verb (Eg: alarms go off)
Noun of noun (Eg: a bar of chocolate)
Adverb + adjective (Eg: terribly sorry)
Verb + adverb (Eg: affect deeply)
Trang 166. Collocation analysis
Baker (2006) builds up a clear model of step-by-step guide to collocation analysis:
1. Build or obtain access to a corpus
2. Decide a search term, bearing in mind that the terms can be expanded toinclude plurals or other forms, euphemisms, anaphora or relevant propernouns
3. Obtain a list of collocates
4. Decide how many collocates you want to look at
5. Can the collocates be grouped semantically, thematically or grammatically? Use this as a basis for the order in which you analyze the words in more detail
6. Obtain concordances of the collocates and look for patterns within the context.This should enable you to uncover dominant discourses surrounding thesubject
7. Consider contesting discourses- concordance lines which go against or
question the dominant reading of a term
8. Look at concordance lines of the search term that do not contain collocates.What discourse prosodies are present there? Do they support or contradictthose found in the analysis of the collocates?
9. How do the collocates relate to each other?
10. Attempt to explain why particular discourse patterns appear around collocatesand relate this to issues of text production and reception and/or etymologies ofparticular words
12
Trang 17CHAPTER III:
METHODOLOGY
I Subjects of the study
The subjects of the study are the language materials found and storedonline in two biggest corpora, namely COCA (Corpus of ContemporaryAmerican English at: americancorpus.org) and Time Magazine Corpus at:corpus.byu.edu/time)
The followings are the descriptions of each corpus:
The COCA is an online searchable corpus of American English,consisting if more than 400 million words, and it is equally arranged byregister, including news, spoken and academic texts This corpus has texts from
1990, and more texts have been added to the corpus regularly This site isdifferent from other corpora by allowing users to search by part of speech.Additionally, because of its design, this corpus seems to be suitable for users tolook at how language has changed over a period of time The texts in thiscorpus come from various sources:
Spoken: (95 million words) Transcripts of unscripted conversations from
morethan 150 different TV and radio programs (Examples: All Things Considered, Newshour, Good Morning America, Today Show, 60 Minutes, Hannity and Colmes or Jerry Springer)
Fiction: (90 million words) Short stories and plays from literary magazines,children‟s magazines, popular magazines, first chapters of first edition books from 1990 to present, and movie scripts)
Popular magazines: (95 million words) Nearly 100 different magazines, with a good mix (overall and by year) between specific domains (news, health, home and gardening, women, financial, religions, sports) A few
examples are Time, Men’s Health, Good Housekeeping.
Trang 18 Newspapers: (92 million words) Ten newspapers from across the US,
including: USA Today, New York Times, and Allanta Journal Constitution
In most cases, there is a good mix between different sections of the
newspaper, such as local news,opinion, sports and financial
Academic journals: (91 million words) Nearly 100 different peer-reviewed journals They were selected to cover the entire range of the Library of Congress classification system
Time Magazine Corpus consists of more than 100 million words ofAmerican English from 1923 to present, as found in Time Magazine TheTime Magazine Corpus allows users to easily look at:
The overall frequency over time of words and phrases that were related to changes in society and culture or historical events such as: new age, political correct, email, global warming
Changes in the language itself, such as the rise and fall of words and
phrases like beauteous, nifty or freak out Changes with grammatical constructions like going to V, phrasal verbs with up or the use of whom can also be found
Parts of words (which show how word roots, prefixes and suffixes are beingused over time in other words
Words that were used more in one period oftime than other, even when the users do not know what the specified words might be
How the meaning of words have changed over time, by looking at the changes in collocates For example, the collocates of chip, engine or web have changed recently, due to changes in technology; consequently, the meaning of these words has also changed
(corpus.byu.edu/time)
The writer of this paper does not wish to collect the data from all periods
of time in the two corpora, but he also wishes to gather the data from the
14
Trang 19two corpora in three periods (1940s, 1970s and 2000s) to see whether and how the meaning of the words “homeland” has changed.
II Research methodology:
This study employs both quanitative and qualitative methods as theresearch methods
As the title of this study may suggest, this research paper employs thecorpus approach which uses the authentic mateials to identify words co-occuring with a target one The quanitative method, therefore, is used to figureout the top words that collocate with “homeland”
The qualitative method, at the same time, is performed for thediscourse analysis to show what words collocate “homeland” in each period oftime, and hence propose how the meaning of the word “homeland” haschanged over periods of time
BI. Data collection instrument
The data for the research were gathered from two corpora presented above(Coca and Time Magazine Corpus) In order to collect the date, these stageswere followed:
1. Collect data from website: americancorpus.org
In the DISPLAY section, tick the box KWIC (key word in context)
In the SERCH STRING section, type the word “homeland” in the boxWORD
In the box COLLOCATE, enter the number 1 and 1, which means oneword before and after “homeland” will be hightlighted for easieranalysis
In the box SECTION, choose 1990s, 2000s, 2010s respectively, whichmeans the collocates of the word “homeland”in these periods of timewill be on display
Finally, press the button “search”, and the data were displayed in the form of a table
Trang 202. Similar steps were conducted in the Time Magazine Corpus at
www.corpus.byu.edu/time
3. After all the data from two corpora had been collected, the top collocates ineach corpus were analyzed through texts, and a comparison between theresults from two corpora was made And then, the research would beconcluded with how the meaning of the word “homeland” had changedthrough three selected periods of time
16