1. Trang chủ
  2. » Luận Văn - Báo Cáo

a corpus-based analysis of the collocates of the word homeland in the 1990s, 2000s and 2010s = nghiên cứu đồng định vị của từ homeland qua các thập niên 1990, 2000 và 2010 trên cơ sở ngôn ngữ học khối liệu

40 435 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 40
Dung lượng 614,09 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The data for the analysis were taken from two popular corpora which are Corpus of Contemporary American English and Time Magazine Corpus.. The Corpus of Contemporary American English and

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES

FACULTY OF POST-GRADUATE

Triệu Tuấn Anh

Title: A corpus-based analysis of the collocates of the word “homeland” in the

1990s, 2000s and 2010s

(Nghiên cứu đồng định vị của từ “homeland” qua các thập niên 1990, 2000 và 2010

trên cơ sở ngôn ngữ học khối liệu)

Major: English Linguistics

Code: 60.22.15

Supervisor: Assoc Pro Tran Xuan Diep

Hanoi, Sep 2013

Trang 2

My sincere thanks also go to all the lecturers at University of Foreign Languages and International Studies, Vietnam National University (Hanoi), who helped me build up

a solid theoretical background studies and research methods through their invaluable lessons

Furthermore, I wish to express my thanks to all of my colleagues at Faculty of English, Hanoi National University of Education for supporting me in my job during the time I took this course Without their support, I could not fully concentrate on my study

Trang 3

ABSTRACT

This study is intended to describe a corpus-based analysis of the collocates of the word “homeland” The data for the analysis were taken from two popular corpora which are Corpus of Contemporary American English and Time Magazine Corpus The analysis suggests the frequency of the use of the word, and will show how similarly or differently it is used over periods of time In order to analyze the data, both qualitative and quantitative methods were employed Significant conclusions were drawn: (1) There was a great increase in the use of the word “homeland” over the 1990s, 2000s and 2010s, and this trend tends to go upward in this decade (2) There was a shift in the use of the word “homeland” This word was almost used as a noun in the 1990s to refer to the geographic space related to a particular group whereas it was mainly used as a noun or adjective to modify the word “security” or to refer to a political department in the 2000s and 2010s

Trang 4

TABLE OF CONTENTS

Part I: Introduction

Part II: Development

Chapter II: Theoretical background

Chapter III: Methodology

Chapter IV: Findings and Discussion

1 The frequency of the use of the word “homeland” 18-19

2 The meanings of the word “homeland” 19-30

Trang 5

PART I:

INTRODUCTION

I Rationale of the study

It is clearly seen that the homeland is the common topic in people‟s conversations; especially, it is an endless inspiration for the authors and the writers Although one may travel or have to live in different places all over the world, the homeland still plays an important part and exerts such certain influences on his life as the appearance or the characteristics, and that is the place that one often feels the most comfortable This fact can explain why it is normally said that “One‟s homeland is even greater than the heaven.”

To the American, the homeland has a great importance because the American have different values from people from other countries, and they

seem to be proud of their country This importance is normally expressed through language Meanwhile, language is shown in corpora where not only

various forms of language but also a significant number of written and spoken texts are stored electronically Studying linguistic features of texts discloses the writers‟ and speakers‟ intention The Corpus of Contemporary American English and the Time Magazine Corpus are among the biggest corpora which cover authentic language use on different fields and language use over time

By using these two corpora, how the language has been used and how the language has changed over periods of time can be revealed

In the linguistic research field, there are a huge number of works using corpus-based approach; nonetheless, only a few scientific studies related to homeland have been conducted Consequently, this study, employing authentic texts and exploring the topic “homeland”, is carried out with the aim at filling that gap

Another important factor is corpus linguistics, despite its long history, seems to be quite new to me and attracts my attention There are so many

Trang 6

problems which can be dealt with by using corpus-based approach, one of which is collocation analysis This personal interest as well as the above-stated reasons has inspired me to conduct this paper, entitled “A corpus-based analysis of the collocates of the word “homeland” in the 1990s, 2000s and 2010s The corpora used for the analysis are the Corpus of Contemporary American English (COCA) and Time Magazine Corpus

II Objectives of the study

The study are carried out with two purposes, which are exploring the words which collocate in the highest frequency with the target word

“homeland” in the period of three decades in the COCA Corpus and Time Magazines Corpus Also, it is intended to find out whether the use of that word remained unchanged over three decades or changed

In order to achieve the objectives, two specific research questions were raised:

1 What are the words collocating in the highest frequency with the word

“homeland” in the periods of 1990s, 2000s and 2010s in the COCA Corpus and Time Magazine Corpus?

2 How has the use of the word “homeland” changed during the last three decades?

III Scope of the study

As the title of this paper suggests, the aim of the research is exploring the collocates of the word “homeland” over three periods of time There exist

so many corpora in the world now; therefore, the writer of this paper has little intention of employing all the corpora available Instead, he merely analyzes the collocates based on the data in three selected corpora, namely COCA and Time Magazine Corpus The data of these two corpora are gathered from both spoken and written language through different sources

Furthermore, the use of each word may stay unchanged all the time, or it may change over time However, the writer of this paper does not wish to look

Trang 7

at the trend over many periods of time, but only the use of the word in the 1990s, 2000s and 2010s are explored

IV Design of the study

The study includes three parts which are as follows:

1 Part I: Introduction This part aims at providing the readers with basic information including rationales, objective of the study, scope of the study and its design

2 Part II: Development:

 Chapter 1: Literature review: this chapter presents what other linguists have done before related to the field

 Chapter 2: Theoretical background: This part serves to provide the theory to the study, which pays attention to corpus linguistics and collocation analysis

 Chapter 3: Methodology This chapter introduces the subjects of the study, the research approach, the instrument of data collection and procedures implemented in the study

 Chapter 4: Findings and Discussion This is considered the most important part of any research This chapter will show which words collocate in the highest frequency with the word “homeland” in the COCA corpus and Time Magazine Corpus Also, this part will also confirm whether there is a shift in the use of the word “homeland” or not

3 Part III: Conclusion This part summarizes all the important points discussed in the research; also, it will give some suggestions for further research

Trang 8

PART II: DEVELOPMENT CHAPTER 1: LITERATURE REVIEW

In this part, the writer of this paper will review what other linguists have done before associated with the field of corpus linguistics

Corpus-based techniques have been employed in many studies which have attempted to investigate the differences in language use

Pearce (2008) carried out a study using corpus-based approach He looked at collocates of the lemmas “man” and “woman” He used the corpus analysis tool Sketch Engine in order to distinguish which verbs tend to co-occur with “man” and :woman” He then came up with the conclusion that women tended to take the object of verbs which denoted sexual violence, coercion and observation such as „rape”, “categorize”,

“monitor” and “define”, and women co-occurred as the subject of verbs which constructed them as irritating: “fuss”, “annoy” or “nag” In contrast, men were both the object and subject of non-sexual violence verbs This word normally collocated with words like “oppress”, “betray” or “raid”

Baker (2010) conducted a study named “Will Ms ever be frequent as Mr” with the aim at exploring frequency and context of usage of gender marked language In this study, he collected the data from four equal sized and equivalently sampled corpora of British English in a range of written genres (press, fiction, general prose and learned writing) from 1931, 1961, 1991 and 2006 He investigated terms related to male and female pronouns, man, woman, boy and girl, gender-related profession and such role nouns as chairman, spokesperson and policewoman, and terms of address as Mr and Ms The writer finally drew the conclusion that there were some reductions in frequencies of male terms, particularly decreases of male pronouns and Mr It was also found that while there were some reductions in gender stereotypes, others were being maintained (such as

a lack of adjectives associated with women‟s success or power) Additionally, the term

Trang 9

“girl” was still more likely than the term “boy” to refer to adults, and it was often used in

a sexual way

Fang (2008) conducted the research discussing the meaning of the text segment

international community in two different discourse communities: GuCorpus (British) and

PdCorpus (Chinese), which are somehow typical for two discourse communities in Western and Asian countries By exploring the different collocates and grammatical structures within each community, he could figure out the different ways in which the phrase was used

These studies mentioned above have proposed outstanding findings which again confirm the fact that the meaning of a word can only be understood and interpreted through its collocation collected by a corpus of authentic data

The writer of this paper has found that despite the availability of a huge number of research papers employing corpus linguistics approach, no corpus-based study affiliated with homeland has been conducted before This paper, hence, is carried out aiming at filling that gap The data used for analysis will be taken from the authentic data

Trang 10

a language”

Sinclair (1991) states that “a corpus is a collection of naturally occurring texts, chosen to characterize a state or variety of a language” Similarly, Reppen (2010) defines that “a corpus is a large, principled collection of naturally occurring texts (written or spoken) stored electronically He then clarifies the terms used in his definition:

- “naturally occurring texts” is the language that is from actual language situations, such as friends chatting, meetings, letters, class assignments and books rather than surveys, questionnaires or just made-up language

- “a principled collection”: the design of the corpus must be principled The texts

in the corpus need to represent the type of language that the corpus is intending

to capture For example, if a corpus is to be representative of written language, then the corpus designer would need to make a comprehensive list of the different written language situations

- Stored electronically: the corpus can be saved in text format, rich text format or web-based format

Although each scholar has a different view of the definition of the corpus, many

of them share the same following characteristics of the corpus:

- The language must be authentic rather than made-up

- The collection of data must be principled

- The corpus is electronically saved

Trang 11

2 Notable corpora

There are a huge number of corpora thanks to the development of science and technology Wynne and Prytz (2012) illustrate some types of corpora and some famous English examples as shown in the following table:

Types of

corpora

Features of the corpus

to mirror a particular language or language variety

Brown family http://khnt.aksis,uib.no/icame/manuals/

- Written 1 million words, 15 text categories,

500 texts

 American English: Brown (1961), Frown (1992)

 British English: LOB (1961), FLob (1991)

 Indian (Kolhapur), NZ (Wellington), Australian (ACE)

BNC: British National Corpus http://natcorp.ox.ac.uk

- 100 million words, 10% spoken

- Carefully composed to be balanced Monitor Next texts

added by and by to

“monitor”

language change

BoE: Bank of English http://collins.co.uk/

- Written and spoken, much newspaper and media language

- Different varieties and text categories

- Part can be searched online COCA: Corpus of Contemporary American English http://corpus.byu.edu/coca

- Currently 385 million words

Trang 12

- 5 genres (one spoken)

- Searchable online Parallel Same texts

c

- Originally English and Norwegian originals with Norwegian and English translations, now also German, Dutch and Portuguese

- 50 text extracts in each direction, fiction and non-fiction

- Material produced by language learners in different countries

Trang 13

preferable comparable ones

- Five genres of similar size, 20% spoken

- Part of Brigham Young University corpus collection

of text

Air Traffic Control Speech Corpus http://eurocontrol.int/eec/public/standard_page/EEC_News_2008_1_ATCOSIM.html

Lampeter Corpus of Early Modern English Tracts http://khnt.hit.uib.no/icame/manuals/LAMPETER/LAMPHOME.HTM

- Historical, written

- Tracts published between 1640 and 1740

- Six domains, ten decades

- 120 different texts, 1.1 million words

Types of corpora and some famous English example

3 Corpus linguistics

MrEnery and Wilson (1996) define corpus linguistics as “the study of language based on examples of real life language use” However, unlike qualitative approaches to research, corpus linguistics uses bodies of electronically encoded text, implementing a more quantitative method

Bennett (2010) provides a simpler definition of corpus linguistics, that is

“corpus linguistics approaches the study if language in use through corpora A corpus is large, principled collection of naturally occurring examples of language

Trang 14

stored electronically He also states that corpus linguistics, in short, serves to answer two fundamental research questions:

 What particular patterns are associated with lexical and grammatical features?

 How do these patterns differ within varieties and registers?

Biber, Conrad and Reppen (1998) identify four main features of corpus linguistics as follows:

 It is empirical, analyzing the actual patterns of language use in natural texts

 It utilizes a large and principled collection of natural texts as the basis for analysis

 It makes extensive use of computers for analysis

 It depends on both quantitative and qualitative analytical techniques

4 Corpus linguistics and Discourse analysis

Corpus-based approach is found to be of great value since it can be applied to a number of areas of linguistics, one of which is discourse analysis Conrad (2002) points out four major approaches that corpus linguistics can address the discoursal-level phenomena:

 investigating characteristics associated with the use of a language feature, for

example, analyzing the factors that affect the omission or retention of that in

Trang 15

multi- mapping the occurrences of a feature through entire texts, for example, tracing how writers refer to themselves and their audience as they construct authority

in memos

5 Collocations

Phraseology, the study of phrases, is regarded as a central element of corpus linguistics Sinclair (1991) determined that the meaning of a word is found through several words in a sequence, through phrases Phraseology includes the study of collocations, lexical bundles, and language occurring in preferred sequences This paper merely lays an emphasis on collocation

Until the present days, the term “collocation” seems to be difficult to be defined clear-cut, and each linguist has different point of view about the definition

of “collocation”; thus, this term is still controversial:

Firth (1957) states that “collocations of a given word are statements of the habitual or customary places of that word.”

According to Manning (1999), a collocation is “an expression consisting

of two or more words that correspond to some conventional way of saying things” Likewise, Lewis (2000) defines that “a collocation is two or more words that tend

to occur together.”

Although each linguist has different viewpoints, they all share the same point that a collocation is the regular combination of lexical items Benson (1985) points out that lexical collocations include:

 Verb + noun (Eg: to do homework)

 Adjective + noun (Eg: a big deal)

 Noun + verb (Eg: alarms go off)

 Noun of noun (Eg: a bar of chocolate)

 Adverb + adjective (Eg: terribly sorry)

 Verb + adverb (Eg: affect deeply)

Trang 16

6 Collocation analysis

Baker (2006) builds up a clear model of step-by-step guide to collocation analysis:

1 Build or obtain access to a corpus

2 Decide a search term, bearing in mind that the terms can be expanded to include plurals or other forms, euphemisms, anaphora or relevant proper nouns

3 Obtain a list of collocates

4 Decide how many collocates you want to look at

5 Can the collocates be grouped semantically, thematically or grammatically? Use this as a basis for the order in which you analyze the words in more detail

6 Obtain concordances of the collocates and look for patterns within the context This should enable you to uncover dominant discourses surrounding the subject

7 Consider contesting discourses- concordance lines which go against or question the dominant reading of a term

8 Look at concordance lines of the search term that do not contain collocates What discourse prosodies are present there? Do they support or contradict those found in the analysis of the collocates?

9 How do the collocates relate to each other?

10 Attempt to explain why particular discourse patterns appear around collocates and relate this to issues of text production and reception and/or etymologies of particular words

Trang 17

CHAPTER III:

METHODOLOGY

I Subjects of the study

The subjects of the study are the language materials found and stored online in two biggest corpora, namely COCA (Corpus of Contemporary American English at: americancorpus.org) and Time Magazine Corpus at: corpus.byu.edu/time)

The followings are the descriptions of each corpus:

The COCA is an online searchable corpus of American English, consisting if more than 400 million words, and it is equally arranged by register, including news, spoken and academic texts This corpus has texts from 1990, and more texts have been added to the corpus regularly This site is different from other corpora by allowing users to search by part of speech Additionally, because of its design, this corpus seems to be suitable for users to look at how language has changed over a period of time The texts in this

corpus come from various sources:

 Spoken: (95 million words) Transcripts of unscripted conversations from

morethan 150 different TV and radio programs (Examples: All Things Considered, Newshour, Good Morning America, Today Show, 60 Minutes, Hannity and Colmes or Jerry Springer)

 Fiction: (90 million words) Short stories and plays from literary magazines, children‟s magazines, popular magazines, first chapters of first edition books from 1990 to present, and movie scripts)

 Popular magazines: (95 million words) Nearly 100 different magazines, with a good mix (overall and by year) between specific domains (news, health, home and gardening, women, financial, religions, sports) A few

examples are Time, Men’s Health, Good Housekeeping

Trang 18

 Newspapers: (92 million words) Ten newspapers from across the US,

including: USA Today, New York Times, and Allanta Journal Constitution

In most cases, there is a good mix between different sections of the

newspaper, such as local news,opinion, sports and financial

 Academic journals: (91 million words) Nearly 100 different peer-reviewed journals They were selected to cover the entire range of the Library of

Congress classification system

Time Magazine Corpus consists of more than 100 million words of American English from 1923 to present, as found in Time Magazine The

Time Magazine Corpus allows users to easily look at:

 The overall frequency over time of words and phrases that were related to changes in society and culture or historical events such as: new age, political correct, email, global warming

 Changes in the language itself, such as the rise and fall of words and phrases like beauteous, nifty or freak out Changes with grammatical constructions like going to V, phrasal verbs with up or the use of whom can also be found

 Parts of words (which show how word roots, prefixes and suffixes are being used over time in other words

 Words that were used more in one period oftime than other, even when the users do not know what the specified words might be

 How the meaning of words have changed over time, by looking at the changes in collocates For example, the collocates of chip, engine or web have changed recently, due to changes in technology; consequently, the meaning of these words has also changed

(corpus.byu.edu/time)

The writer of this paper does not wish to collect the data from all periods

of time in the two corpora, but he also wishes to gather the data from the

Trang 19

two corpora in three periods (1940s, 1970s and 2000s) to see whether and how the meaning of the words “homeland” has changed

II Research methodology:

This study employs both quanitative and qualitative methods as the research methods

As the title of this study may suggest, this research paper employs the corpus approach which uses the authentic mateials to identify words co-occuring with a target one The quanitative method, therefore, is used to figure out the top words that collocate with “homeland”

The qualitative method, at the same time, is performed for the discourse analysis to show what words collocate “homeland” in each period of time, and hence propose how the meaning of the word “homeland” has

changed over periods of time

III Data collection instrument

The data for the research were gathered from two corpora presented above (Coca and Time Magazine Corpus) In order to collect the date, these stages

were followed:

1 Collect data from website: americancorpus.org

 In the DISPLAY section, tick the box KWIC (key word in context)

 In the SERCH STRING section, type the word “homeland” in the box WORD

 In the box COLLOCATE, enter the number 1 and 1, which means one word before and after “homeland” will be hightlighted for easier analysis

 In the box SECTION, choose 1990s, 2000s, 2010s respectively, which means the collocates of the word “homeland”in these periods of time will be on display

 Finally, press the button “search”, and the data were displayed in the form of a table

Trang 20

2 Similar steps were conducted in the Time Magazine Corpus at www.corpus.byu.edu/time

3 After all the data from two corpora had been collected, the top collocates in each corpus were analyzed through texts, and a comparison between the results from two corpora was made And then, the research would be concluded with how the meaning of the word “homeland” had changed through three selected periods of time

Ngày đăng: 02/03/2015, 14:17

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w