A corpus based analysis of the collocates of the word homeland in the 1990s, 2000s and 2010s m a thesis

The data for the analysis were taken from two popular corporawhich are Corpus of Contemporary American English and Time Magazine Corpus.The analysis suggests the frequency of the use of

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES

FACULTY OF POST-GRADUATE

Triệu Tuấn Anh

Title: A corpus-based analysis of the collocates of the word

“homeland” in the 1990s, 2000s and 2010s.

(Nghiên cứu đồng định vị của từ “homeland” qua các thập niên 1990, 2000 và

2010 trên cơ sở ngôn ngữ học khối liệu)

Major: English Linguistics

Code: 60.22.15

Supervisor: Assoc Pro Tran Xuan Diep

Hanoi, Sep 2013

Trang 2

My sincere thanks also go to all the lecturers at University of Foreign Languagesand International Studies, Vietnam National University (Hanoi), who helped me build up

a solid theoretical background studies and research methods through their invaluablelessons

Furthermore, I wish to express my thanks to all of my colleagues at Faculty ofEnglish, Hanoi National University of Education for supporting me in my job during thetime I took this course Without their support, I could not fully concentrate on my study

ii

Trang 3

This study is intended to describe a corpus-based analysis of the collocates ofthe word “homeland” The data for the analysis were taken from two popular corporawhich are Corpus of Contemporary American English and Time Magazine Corpus.The analysis suggests the frequency of the use of the word, and will show howsimilarly or differently it is used over periods of time In order to analyze the data,both qualitative and quantitative methods were employed Significant conclusionswere drawn: (1) There was a great increase in the use of the word “homeland” overthe 1990s, 2000s and 2010s, and this trend tends to go upward in this decade (2)There was a shift in the use of the word “homeland” This word was almost used as anoun in the 1990s to refer to the geographic space related to a particular groupwhereas it was mainly used as a noun or adjective to modify the word “security” or torefer to a political department in the 2000s and 2010s

Trang 4

TABLE OF CONTENTS

Part I: Introduction

1 Rationale of the study

2 Objectives of the study

3 Scope of the study

4 Design of the study

Part II: Development

Chapter I: Literature review

Chapter II: Theoretical background

Chapter III: Methodology

1 Subjects of the study

2 Research methodology

3 Data collection instrument

Chapter IV: Findings and Discussion

1 The frequency of the use of the word “homeland”

2 The meanings of the word “homeland”

Trang 5

PART I:

INTRODUCTION

I Rationale of the study

It is clearly seen that the homeland is the common topic in people‟sconversations; especially, it is an endless inspiration for the authors and thewriters Although one may travel or have to live in different places all over theworld, the homeland still plays an important part and exerts such certaininfluences on his life as the appearance or the characteristics, and that is theplace that one often feels the most comfortable This fact can explain why it isnormally said that “One‟s homeland is even greater than the heaven.”

To the American, the homeland has a great importance because theAmerican have different values from people from other countries, and they

seem to be proud of their country This importance is normally expressed through language Meanwhile, language is shown in corpora where not only

various forms of language but also a significant number of written and spokentexts are stored electronically Studying linguistic features of texts discloses thewriters‟ and speakers‟ intention The Corpus of Contemporary AmericanEnglish and the Time Magazine Corpus are among the biggest corpora whichcover authentic language use on different fields and language use over time Byusing these two corpora, how the language has been used and how thelanguage has changed over periods of time can be revealed

In the linguistic research field, there are a huge number of works usingcorpus-based approach; nonetheless, only a few scientific studies related tohomeland have been conducted Consequently, this study, employing authentictexts and exploring the topic “homeland”, is carried out with the aim at fillingthat gap

Another important factor is corpus linguistics, despite its long history,seems to be quite new to me and attracts my attention There are so many

Trang 6

problems which can be dealt with by using corpus-based approach, one ofwhich is collocation analysis This personal interest as well as the above-statedreasons has inspired me to conduct this paper, entitled “A corpus-basedanalysis of the collocates of the word “homeland” in the 1990s, 2000s and2010s The corpora used for the analysis are the Corpus of ContemporaryAmerican English (COCA) and Time Magazine Corpus.

II Objectives of the study

The study are carried out with two purposes, which are exploring thewords which collocate in the highest frequency with the target word

“homeland” in the period of three decades in the COCA Corpus and TimeMagazines Corpus Also, it is intended to find out whether the use of that wordremained unchanged over three decades or changed

In order to achieve the objectives, two specific research questions were raised:

1. What are the words collocating in the highest frequency with the word

“homeland” in the periods of 1990s, 2000s and 2010s in the COCA Corpus and Time Magazine Corpus?

2. How has the use of the word “homeland” changed during the last three decades?

BI. Scope of the study

As the title of this paper suggests, the aim of the research is exploringthe collocates of the word “homeland” over three periods of time There exist

so many corpora in the world now; therefore, the writer of this paper has littleintention of employing all the corpora available Instead, he merely analyzesthe collocates based on the data in three selected corpora, namely COCA andTime Magazine Corpus The data of these two corpora are gathered from bothspoken and written language through different sources

Furthermore, the use of each word may stay unchanged all the time, or itmay change over time However, the writer of this paper does not wish to look

2

Trang 7

at the trend over many periods of time, but only the use of the word in the 1990s, 2000s and 2010s are explored.

IV Design of the study

The study includes three parts which are as follows:

1. Part I: Introduction This part aims at providing the readers with basicinformation including rationales, objective of the study, scope of the studyand its design

2. Part II: Development:

 Chapter 1: Literature review: this chapter presents what other linguists have done before related to the field

 Chapter 2: Theoretical background: This part serves to provide thetheory to the study, which pays attention to corpus linguistics andcollocation analysis

 Chapter 3: Methodology This chapter introduces the subjects of thestudy, the research approach, the instrument of data collection andprocedures implemented in the study

 Chapter 4: Findings and Discussion This is considered the mostimportant part of any research This chapter will show which wordscollocate in the highest frequency with the word “homeland” in theCOCA corpus and Time Magazine Corpus Also, this part will alsoconfirm whether there is a shift in the use of the word “homeland” ornot

3. Part III: Conclusion This part summarizes all the important pointsdiscussed in the research; also, it will give some suggestions for furtherresearch

Trang 8

PART II: DEVELOPMENT CHAPTER 1: LITERATURE REVIEW

In this part, the writer of this paper will review what other linguists have donebefore associated with the field of corpus linguistics

Corpus-based techniques have been employed in many studies which haveattempted to investigate the differences in language use

Pearce (2008) carried out a study using corpus-based approach He looked atcollocates of the lemmas “man” and “woman” He used the corpus analysis tool SketchEngine in order to distinguish which verbs tend to co-occur with “man” and :woman” Hethen came up with the conclusion that women tended to take the object of verbs whichdenoted sexual violence, coercion and observation such as „rape”, “categorize”,

“monitor” and “define”, and women co-occurred as the subject of verbs whichconstructed them as irritating: “fuss”, “annoy” or “nag” In contrast, men were both theobject and subject of non-sexual violence verbs This word normally collocated withwords like “oppress”, “betray” or “raid”

Baker (2010) conducted a study named “Will Ms ever be frequent as Mr” with theaim at exploring frequency and context of usage of gender marked language In thisstudy, he collected the data from four equal sized and equivalently sampled corpora ofBritish English in a range of written genres (press, fiction, general prose and learnedwriting) from 1931, 1961, 1991 and 2006 He investigated terms related to male andfemale pronouns, man, woman, boy and girl, gender-related profession and such rolenouns as chairman, spokesperson and policewoman, and terms of address as Mr and Ms.The writer finally drew the conclusion that there were some reductions in frequencies ofmale terms, particularly decreases of male pronouns and Mr It was also found that whilethere were some reductions in gender stereotypes, others were being maintained (such as

a lack of adjectives associated with women‟s success or power) Additionally, the term

4

Trang 9

“girl” was still more likely than the term “boy” to refer to adults, and it was often used in

a sexual way

Fang (2008) conducted the research discussing the meaning of the text segment

international community in two different discourse communities: GuCorpus (British) and

PdCorpus (Chinese), which are somehow typical for two discourse communities inWestern and Asian countries By exploring the different collocates and grammaticalstructures within each community, he could figure out the different ways in which thephrase was used

These studies mentioned above have proposed outstanding findings which againconfirm the fact that the meaning of a word can only be understood and interpretedthrough its collocation collected by a corpus of authentic data

The writer of this paper has found that despite the availability of a huge number ofresearch papers employing corpus linguistics approach, no corpus-based study affiliatedwith homeland has been conducted before This paper, hence, is carried out aiming atfilling that gap The data used for analysis will be taken from the authentic data

Trang 10

a language”.

Sinclair (1991) states that “a corpus is a collection of naturally occurringtexts, chosen to characterize a state or variety of a language” Similarly, Reppen(2010) defines that “a corpus is a large, principled collection of naturally occurringtexts (written or spoken) stored electronically He then clarifies the terms used inhis definition:

- “naturally occurring texts” is the language that is from actual languagesituations, such as friends chatting, meetings, letters, class assignments andbooks rather than surveys, questionnaires or just made-up language

- “a principled collection”: the design of the corpus must be principled The texts

in the corpus need to represent the type of language that the corpus is intending

to capture For example, if a corpus is to be representative of written language,then the corpus designer would need to make a comprehensive list of thedifferent written language situations

- Stored electronically: the corpus can be saved in text format, rich text format orweb-based format

Although each scholar has a different view of the definition of the corpus, many

of them share the same following characteristics of the corpus:

- The language must be authentic rather than made-up

- The collection of data must be principled

- The corpus is electronically saved

6

Trang 11

2. Notable corpora

There are a huge number of corpora thanks to the development of science and

technology Wynne and Prytz (2012) illustrate some types of corpora and some

famous English examples as shown in the following table:

addedand

“monitor”languagechange

Trang 12

Parallel Same texts

inmorelanguages

morelanguages orlanguagevarieties

differentperiods,

8

preferablecomparable

Trang 13

MrEnery and Wilson (1996) define corpus linguistics as “the study of

language based on examples of real life language use” However, unlike

qualitative approaches to research, corpus linguistics uses bodies of electronically

encoded text, implementing a more quantitative method

Bennett (2010) provides a simpler definition of corpus linguistics, that is

“corpus linguistics approaches the study if language in use through corpora A

corpus is large, principled collection of naturally occurring examples of language

9

Trang 14

stored electronically He also states that corpus linguistics, in short, serves to answer two fundamental research questions:

 What particular patterns are associated with lexical and grammatical features?

 How do these patterns differ within varieties and registers?

Biber, Conrad and Reppen (1998) identify four main features of corpus linguistics as follows:

 It is empirical, analyzing the actual patterns of language use in natural texts

 It utilizes a large and principled collection of natural texts as the basis foranalysis

 It makes extensive use of computers for analysis

 It depends on both quantitative and qualitative analytical techniques

4. Corpus linguistics and Discourse analysis

Corpus-based approach is found to be of great value since it can beapplied to a number of areas of linguistics, one of which is discourse analysis.Conrad (2002) points out four major approaches that corpus linguistics canaddress the discoursal-level phenomena:

 investigating characteristics associated with the use of a language feature, for

example, analyzing the factors that affect the omission or retention of that in

complement clauses;

 examining the realizations of a particular function of language, such as

describing all the constructions used in English to express stance;

 characterizing a variety of language, for example, conducting a dimensional analysis to investigate relationships among the registers used indifferent settings at universities;

multi-10

Trang 15

 mapping the occurrences of a feature through entire texts, for example, tracinghow writers refer to themselves and their audience as they construct authority

in memos

5. Collocations

Phraseology, the study of phrases, is regarded as a central element ofcorpus linguistics Sinclair (1991) determined that the meaning of a word is foundthrough several words in a sequence, through phrases Phraseology includes thestudy of collocations, lexical bundles, and language occurring in preferredsequences This paper merely lays an emphasis on collocation

Until the present days, the term “collocation” seems to be difficult to bedefined clear-cut, and each linguist has different point of view about the definition

of “collocation”; thus, this term is still controversial:

Firth (1957) states that “collocations of a given word are statements of thehabitual or customary places of that word.”

According to Manning (1999), a collocation is “an expression consisting

of two or more words that correspond to some conventional way of saying things”.Likewise, Lewis (2000) defines that “a collocation is two or more words that tend

to occur together.”

Although each linguist has different viewpoints, they all share the samepoint that a collocation is the regular combination of lexical items Benson (1985)points out that lexical collocations include:

 Verb + noun (Eg: to do homework)

 Adjective + noun (Eg: a big deal)

 Noun + verb (Eg: alarms go off)

 Noun of noun (Eg: a bar of chocolate)

 Adverb + adjective (Eg: terribly sorry)

 Verb + adverb (Eg: affect deeply)

Trang 16

6. Collocation analysis

Baker (2006) builds up a clear model of step-by-step guide to collocation analysis:

1. Build or obtain access to a corpus

2. Decide a search term, bearing in mind that the terms can be expanded toinclude plurals or other forms, euphemisms, anaphora or relevant propernouns

3. Obtain a list of collocates

4. Decide how many collocates you want to look at

5. Can the collocates be grouped semantically, thematically or grammatically? Use this as a basis for the order in which you analyze the words in more detail

6. Obtain concordances of the collocates and look for patterns within the context.This should enable you to uncover dominant discourses surrounding thesubject

7. Consider contesting discourses- concordance lines which go against or

question the dominant reading of a term

8. Look at concordance lines of the search term that do not contain collocates.What discourse prosodies are present there? Do they support or contradictthose found in the analysis of the collocates?

9. How do the collocates relate to each other?

10. Attempt to explain why particular discourse patterns appear around collocatesand relate this to issues of text production and reception and/or etymologies ofparticular words

12

Trang 17

CHAPTER III:

METHODOLOGY

I Subjects of the study

The subjects of the study are the language materials found and storedonline in two biggest corpora, namely COCA (Corpus of ContemporaryAmerican English at: americancorpus.org) and Time Magazine Corpus at:corpus.byu.edu/time)

The followings are the descriptions of each corpus:

The COCA is an online searchable corpus of American English,consisting if more than 400 million words, and it is equally arranged byregister, including news, spoken and academic texts This corpus has texts from

1990, and more texts have been added to the corpus regularly This site isdifferent from other corpora by allowing users to search by part of speech.Additionally, because of its design, this corpus seems to be suitable for users tolook at how language has changed over a period of time The texts in thiscorpus come from various sources:

 Spoken: (95 million words) Transcripts of unscripted conversations from

morethan 150 different TV and radio programs (Examples: All Things Considered, Newshour, Good Morning America, Today Show, 60 Minutes, Hannity and Colmes or Jerry Springer)

 Fiction: (90 million words) Short stories and plays from literary magazines,children‟s magazines, popular magazines, first chapters of first edition books from 1990 to present, and movie scripts)

 Popular magazines: (95 million words) Nearly 100 different magazines, with a good mix (overall and by year) between specific domains (news, health, home and gardening, women, financial, religions, sports) A few

examples are Time, Men’s Health, Good Housekeeping.

Trang 18

 Newspapers: (92 million words) Ten newspapers from across the US,

including: USA Today, New York Times, and Allanta Journal Constitution

In most cases, there is a good mix between different sections of the

newspaper, such as local news,opinion, sports and financial

 Academic journals: (91 million words) Nearly 100 different peer-reviewed journals They were selected to cover the entire range of the Library of Congress classification system

Time Magazine Corpus consists of more than 100 million words ofAmerican English from 1923 to present, as found in Time Magazine TheTime Magazine Corpus allows users to easily look at:

The overall frequency over time of words and phrases that were related to changes in society and culture or historical events such as: new age, political correct, email, global warming

Changes in the language itself, such as the rise and fall of words and

phrases like beauteous, nifty or freak out Changes with grammatical constructions like going to V, phrasal verbs with up or the use of whom can also be found

Parts of words (which show how word roots, prefixes and suffixes are beingused over time in other words

Words that were used more in one period oftime than other, even when the users do not know what the specified words might be

How the meaning of words have changed over time, by looking at the changes in collocates For example, the collocates of chip, engine or web have changed recently, due to changes in technology; consequently, the meaning of these words has also changed

(corpus.byu.edu/time)

The writer of this paper does not wish to collect the data from all periods

of time in the two corpora, but he also wishes to gather the data from the

14

Trang 19

two corpora in three periods (1940s, 1970s and 2000s) to see whether and how the meaning of the words “homeland” has changed.

II Research methodology:

This study employs both quanitative and qualitative methods as theresearch methods

As the title of this study may suggest, this research paper employs thecorpus approach which uses the authentic mateials to identify words co-occuring with a target one The quanitative method, therefore, is used to figureout the top words that collocate with “homeland”

The qualitative method, at the same time, is performed for thediscourse analysis to show what words collocate “homeland” in each period oftime, and hence propose how the meaning of the word “homeland” haschanged over periods of time

BI. Data collection instrument

The data for the research were gathered from two corpora presented above(Coca and Time Magazine Corpus) In order to collect the date, these stageswere followed:

1. Collect data from website: americancorpus.org

 In the DISPLAY section, tick the box KWIC (key word in context)

 In the SERCH STRING section, type the word “homeland” in the boxWORD

 In the box COLLOCATE, enter the number 1 and 1, which means oneword before and after “homeland” will be hightlighted for easieranalysis

 In the box SECTION, choose 1990s, 2000s, 2010s respectively, whichmeans the collocates of the word “homeland”in these periods of timewill be on display

 Finally, press the button “search”, and the data were displayed in the form of a table

Trang 20

2. Similar steps were conducted in the Time Magazine Corpus at

www.corpus.byu.edu/time

3. After all the data from two corpora had been collected, the top collocates ineach corpus were analyzed through texts, and a comparison between theresults from two corpora was made And then, the research would beconcluded with how the meaning of the word “homeland” had changedthrough three selected periods of time

16

Định dạng
Số trang	40
Dung lượng	340,41 KB