1. Trang chủ
  2. » Luận Văn - Báo Cáo

A corpus based study of the linguistic features and processes which influence the way collocations are formed some implications for the learning of collocations

22 179 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 22
Dung lượng 136,74 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

We may use the term node to refer to an item whose collocations we arestudying, and we may define a span as the number of lexical items on eachside of a node that we consider relevant to

Trang 1

A Corpus-Based Study of the Linguistic Features and Processes Which Influence the Way Collocations Are Formed: Some Implications for the Learning of

I show that it is possible to explain many of these collocations byconsidering the linguistic features and processes which have influencedthe way they have been formed My contention is that, if the learner isencouraged to look for an explanation, it makes the process of learningcollocations more memorable

doi: 10.5054/tq.2011.247710

T he subject of collocation has received considerable attention in thefield of language teaching over recent years A number of authors(Lewis, 1993, 1997, 2000; McCarthy, 1990; Nation, 2001; Thornbury,2002; Woolard, 2000) have represented collocations as being eitherpartially or fully arbitrary, and several studies (Benson, 1989; Nesselhauf,

2003, 2005; Smadja & McKeown, 1991) have even used arbitrariness aspart of their definition of what constitutes a collocation Lewis claimedthat ‘‘collocation is an arbitrary linguistic phenomenon’’ (Lewis, 1997,

p 32), and, as a consequence, teachers are urged not to attempt toexplain collocations to their learners

If collocations are simply arbitrary combinations of words, it meansthat the foreign language learner has little option but to memorise largenumbers of collocations with very little in the way of explanation or any

Trang 2

other help in memorising them The learner is liable to become verydependent on a dictionary, especially a collocational dictionary,checking whether a particular combination is acceptable or not beforeusing it in his or her writing If, on the other hand, there is some sort ofexplanation as to why a particular word is frequently found in thecompany of one or more others, it means that the foreign languagelearner is able to understand how and why a particular combination isfrequently used by native speakers Instead of trying to remember largenumbers of collocations, the learner would be able to produce some ofthese combinations by using his or her understanding of the linguisticfeatures and processes which influenced the way they were formed.More recently there have been a few publications (Crowther, Dignen,

& Lea, 2002; McCarthy & O’Dell, 2005) which have taken the positionthat not all collocations are arbitrary and have started to presentcollocations in such a way that students can begin to understand why oneparticular word is frequently found in the company of another.Unfortunately, there is very little research so far to support this position.Although Kennedy (2003) did not go into the question directly, hiscorpus-based research concerning the collocational behaviour ofadverbs of degree or amplifiers (e.g., absolutely, completely, utterly, rather,about, somewhat) seems to show that the collocations they form are notthat arbitrary Liu (2010) is one of the few studies which criticallyexamined the accepted definition of collocation and found that manycollocations can be explained using a combination of techniques drawnfrom the disciplines of corpus linguistics and cognitive linguistics.The aim of the current study is to show that collocation is not simply

an arbitrary phenomenon but is a process which can be partiallyexplained by examining some of the linguistic features and processeswhich influence the way collocations are formed In order to do this, thestudy uses a corpus-based methodology to investigate the collocationalbehaviour of groups of semantically related nouns and verbs taken fromthe domain of business English The study found that the process ofcollocation is influenced by, for example, the precise meaning ormeanings of a particular lexical item, the use of metaphor, and anyphraseological behaviour or semantic prosody associated with the item

COLLOCATION

In this article the term a collocation (countable noun) is used to refer

to a combination of two or more words which occur together or in closeproximity to each other in both written and spoken discourse, whereasthe term collocation (uncountable noun) is used in a more general sense

Trang 3

to refer to ‘‘the habitual co-occurrence of individual lexical items’’(Crystal, 2003, p 82).

It is clear from the literature that a collocation is defined in a variety

of ways, and that these different definitions reflect differences inapproach, the only common denominator being that the term is used torefer to some kind of syntagmatic relationship between words However,

it is possible to group the different definitions into two broad categories,those which use what I call a lexical approach to collocation (Carter,1987; Cowie, 1998; Howarth, 1996, 1998) and those which use afrequency or statistically based approach (Moon, 1998; Nesselhauf, 2003,2005; Sinclair, 1991) Studies which follow a lexical approach use lexicalcriteria to decide whether a particular combination can be classified as acollocation or not According to this approach a collocation will typicallyexhibit a degree of fixedness and/or a lack of transparency in meaning.There is a tendency with this type of approach to create categories (e.g.,unrestricted, semirestricted, familiar, and restricted collocations; Carter,

1987, p 63) based on the lexical characteristics exhibited by differentcombinations

Studies which use a frequency or statistically based approach generallyconsider a collocation to be a co-occurrence of words within a certaindistance of each other Collocations are seen as being co-occurrencesthat are ‘‘more frequent than could be expected if words combinedrandomly in a language’’ (Nesselhauf, 2005, pp 11–12) Frequency-based approaches are often associated with the work of Sinclair, whoseown approach to collocation was, in turn, influenced by the work of Firth(1957, 1968) Collocations are viewed more in terms of probability,where the strength of a particular collocation is assessed on the basis ofhow frequently it appears in a large representative sample of discourse.According to Halliday, ‘‘the native speaker’s knowledge of his languagewill not take the form of his accepting or rejecting a given collocation:

he will react to something as more acceptable or less acceptable on ascale of acceptability’’ (1966, p 159) In other words, the question is notwhether something is a collocation or not but rather whether aparticular collocation is more or less acceptable

This means that there are virtually no impossible collocations, butthat some collocations are much more likely to occur than others.However, as Halliday has pointed out, there is a need for at least onecutoff point in order to eliminate combinations which are simply theresult of a random distribution of items within the discourse Sinclair,writing in the Office of Scientific and Technical Information (OSTI)report (Krishnamurthy, 2004) first circulated in 1970,1 used the term

1 The original OSTI report (1970) only had a limited distribution but has recently been republished This new edition, entitled English Collocation Studies, is edited by Ramesh Krishnamurthy (2004).

Trang 4

significant collocations to refer to combinations which co-occur morefrequently than ‘‘their respective frequencies and the length of the text

in which they appear would predict’’ (Sinclair, Jones, & Daley, 1970,

p 10) Sinclair (1966) also used three very useful terms for anydiscussion of collocation

We may use the term node to refer to an item whose collocations we arestudying, and we may define a span as the number of lexical items on eachside of a node that we consider relevant to that node Items in theenvironment set by the span we will call collocates (Sinclair, 1966, p 415)

Writing in the OSTI report, Sinclair went on to explain that there isessentially no difference in status between the node and a collocate

if word A is a node and word B one of its collocates, when B is studied as anode, word A will be one of its collocates In practice, however, it isconvenient to examine the behaviour of one item at a time and the use of thetwo terms enables a useful distinction to be made when describing results.(Sinclair et al., 1970, p 10)

Sinclair and Jones (1974) proposed a span of four words on either side

of the node word The following nomenclature is normally used todescribe the positions in the span; node –1 to –4 describe the fourpositions to the left of the node and node +1 to +4 describe the positions

to the right, as can be seen in the example below:

Although there is some statistical basis for using a span of four words(Mason, 1997, 1999), the distance between a collocate and a node willdepend on both lexical and grammatical elements For example, thedistance between the node and the collocate(s) will normally be greater

in the case of verb/noun collocations compared with adjective/noun ornoun/noun combinations, and consequently it may be necessary to use

a wider span when verb/noun collocations are being examined.Arguably, a frequency or statistical approach is more suited to acorpus-based methodology, because it enables large quantities of spoken

or written discourse stored on a computer to be analysed by softwareprogrammes (concordancing packages) which can extract the mostfrequent, or the most statistically significant collocates associated with aparticular node These programmes can be used to rank collocatesaccording to frequency or statistical significance for each of the differentpositions within the span It is also possible to specify a cutoff point, asproposed by Halliday (1966), in order to eliminate combinations which

the long and painful process of rebuilding this Country node –4 node –3 node –2 node –1 node node +1 node +2 node +3 node +4

Trang 5

may simply be the result of random distribution The approach tocollocation used in the current study has been influenced by both thelexical and frequency or statistically based approaches.

SEMANTIC PROSODY

I would like to briefly discuss semantic prosody here in theintroductory section, because the concept is referred to a number oftimes later in the article The term semantic prosody2 was first used byLouw in an article published in 1993, where he credits Sinclair withhaving provided him with both the idea and the term in a personalcommunication Sinclair (1991) examined the collocational behaviour

of the phrasal verb set in and found that most of the subjects associatedwith it referred to ‘‘unpleasant states of affairs’’ (Sinclair, 1991, p 74).Louw suggested that semantic prosody is the result of a diachronicprocess, whereby meaning has been transferred from one word or words

to another, and defined semantic prosody as being a ‘‘consistent aura ofmeaning with which a form is imbued by its collocates’’ (1993, p 157).The term semantic prosody is also used by some writers (Nelson, 2006;Sinclair, 1996, 2004a, 2004b; Stubbs, 2001, 2009) in a wider sense todescribe the way in which a lexical item can develop one of a range ofdifferent prosodies such as ‘‘ ‘something nasty’ or ‘something worrying’

or ‘disturbing’ [ ] ‘something magnificent’, ‘socially appropriate’

‘positively constructive’ etc.’’ (Sinclair, 2004b, p 173) However, it can beargued that when the term is used in this wider sense, it is simplyreflecting the rather complex and multifaceted nature of the meaning of

a lexical item I have chosen to limit the use of the term in this article toLouw’s original notion of a lexical item having either a positive ornegative prosody, depending on whether it is frequently associated withcollocates which refer to desirable or undesirable items or events

METHODOLOGY

The main corpus used in the current study was the Bank of English(BoE)3 which is a large corpus of general English consisting of 450million words A second more specialised corpus of business English wasalso used in order to check that the results obtained from the corpus of

3 The Bank of English (BoE) corpus is jointly owned by HarperCollins Publishers and the University of Birmingham During 2003 to 2006, when most of the research for this study was carried out, the corpus contained 450 million words http://www.titania.bham.ac.uk

2 For a comprehensive account of semantic prosody, please refer to Stewart (2009).

Trang 6

general English are valid in the domain of business English The secondcorpus, which was made up of commercial and financial data files fromthe British National Corpus,4contains 6.3 million words In the currentstudy this second more specialised corpus is referred to as the BritishNational Commercial Corpus (BNCc).

The lexical items selected for study (i.e., the nodes), referred to as theselected items, were chosen for two reasons First, because they are allhigh-frequency items in the BNCc, and, second, because each itemwithin a particular group is a partial or close synonym of the other (e.g.,process was chosen because it is a close synonym of procedure and system).Experience gained from the pilot studies showed that it was morefruitful to establish the collocational behaviour of a particular selecteditem by comparing its collocational behaviour with that of a synonym ornear synonym It was therefore decided to study the collocationalbehaviour of groups of synonyms or near synonyms rather than ofindividual items Table 1 shows the four groups of items selected forstudy (the table does not include plural forms, which were also studied).Synonymy, near-synonymy, and frequency were not the only criteriaused when selecting the items Some items were chosen because they areparticularly important within the context of teaching business English(e.g., RUN,5HEAD, MANAGE), whereas others were selected because, in

my experience,6 learners frequently have difficulties using the item oritems appropriately (e.g., issue, aspect, factor) These difficulties arefrequently caused by cross-linguistic factors, such as a level of semanticincongruency between items in the learner’s first language and thetarget language or the fact that the item already exists as a loan word inthe first language

As already mentioned, it is often difficult to attach a precise level ofsignificance to a list of collocates ranked solely according to the number oftimes they occur (raw frequency) together with the node For this reasonstatistical measures such as t-score7 are used in order to assign a moreprecise level of significance to each co-occurrence For example, anycollocate with a t-score of 2.00 or above can be regarded as significant(Barnbrook, 1996, p 98); that is, the way that it combines with the node is

4 The British National Corpus (BNC) is a 100 million word corpus developed in the 1980s It

is maintained and distributed by the Oxford University Computer Service (OUCS) http:// www.natcorp.ox.ac.uk

5 Capital letters are used to indicate that reference is being made to all members of a lemma RUN, for example, refers to run, ran, runs, running In this study the lemma is only used with verb forms, that is, the members of Group 1.

6 I spent 18 years in Germany teaching business English in large organisations such as Bosch GmbH, Audi AG, Siemens AG, and Deutsch Bank AG.

7 The t-score is a statistical instrument which is used to measure distribution, or more specifically how the distribution of something deviates from what is standard For more information regarding t-score, please refer to Barnbrook (1996) and Hunston (2002).

Trang 7

not simply the result of random distribution The t-score value usuallyreflects how frequently a particular combination occurs in the corpus, that

is, the more frequent the collocation, the higher the t-score Given thatthere have been a number of reservations expressed about the use ofstatistical measures in corpus research (Clear, 1993; Stubbs, 1995), both t-score and raw frequency data are included in the current study

The first stage of the research consisted of establishing a collocationalprofile for each of the selected items using a corpus of general English,

in this case the BoE This involved identifying the most frequentcollocates for each of the positions within a span of four words to the leftand right of the node The easiest way to do this with the BoE is to usethe picture function, which identifies and ranks the most frequentcollocates for each of the positions within the span Table 2 shows a t-picture for the node word aspect where the collocates are rankedaccording to their t-score values, with the highest (i.e., the mostsignificant collocates) at the top of each of the four columns to the leftand to the right of the node

During the first stage of the research, all the relevant information fromthe collocational profile was carefully recorded This involved listing each

of the 20 most frequent collocates together with its t-score value and rawfrequency; that is, the number of times the collocate was found to occurwith the node in this particular position If one takes the first group ofselected items (issue, aspect, factor) as an example, an examination of the

it about one another aspect computing your is was but to only any aspect Call life business work has or about other aspect however it job and ,p this every some aspect though her policy system focus was this particular aspect to my relations write

TABLE 1

The Four Groups of Selected Items

1 issue, aspect, factor Noun forms

2 aim, objective, target, goal* Noun forms

3 RUN, HEAD, MANAGE, DEAL with, HANDLE Verb forms

4 system, process, procedure Noun forms

Trang 8

corpus data reveals both shared collocates, that is, collocates which arefrequently associated with all the selected items in the group, andcharacteristic collocates, that is, collocates which are more frequentlyassociated with one item in the group It is significant that, as a generalrule, shared collocates are more frequent than characteristic ones.Once the data had been recorded, they could then be manuallyexamined for characteristic collocations (e.g., controversial issue, worryingaspect, growth factor), which reflect the precise meaning of individualitems within the group The data were also examined for fixed orsemifixed phrases (e.g., every aspect of, take issue with), for collocationswhich reflect either different polysemous or homonymous forms (e.g.,the latest issue of, a controversial issue, a share issue), for signs of a particularsemantic prosody (e.g., a long and difficult process) and for the use ofmetaphor (e.g., meet the 3% target).

The aim of the second stage of the research was to establish acollocational profile for each selected item using a corpus of businessEnglish By comparing the two profiles (i.e., the profile obtained fromthe BNCc and the profile obtained using the BoE) it was possible toestablish whether there are any significant differences in the way theselected items are used in a business domain compared with a moregeneral one In this particular case, the results showed that there werevery few differences in the way the selected items are used in the twodomains and, as a result, much of the data used in this article have beentaken from the BoE because, as the larger of the two corpora, it is liable

to yield more reliable results

RESULTS AND DISCUSSION

Results from the current study show that there are a number oflinguistic features and processes which influence the way in whichcollocations are formed The first of these is concerned with semanticand pragmatic features associated with the selected item (i.e., the nodeword) itself

Semantics and Usage

The corpus data show how items such as issue, aspect, or factor arefrequently used as cohesive devices in both spoken and writtendiscourse Halliday and Hasan (1976) used the term general noun8 to

8 Francis (1994, pp 83–88) used the terms ‘‘advance labels’’ and ‘‘retrospective labels’’ to refer to nouns or noun groups which are frequently used to label stretches of text Partington (1998) also examined the way in which these general nouns function as cohesive devices.

Trang 9

refer to a class of nouns (and noun phrases) which are frequently used

as cohesive devices in text They are part of the system of deixis inEnglish and function as proforms, which typically refer to eitherindividual items (e.g., place, man, woman, boy) or to whole stretches ofdiscourse (e.g., situation, question of, issue) Seven out of the ten mostfrequent node 21 collocates associated with issue, aspect, and factorbelong to a group of evaluative adjectives (important, key, main, major,crucial, critical, vital) which seem to have the same semantic function—that of attributing a level of importance to the node word These sharedcollocates were found to be associated mainly with the way in which issue,aspect, and factor are used as general nouns Here is one example takenfrom the BoE data, where issue refers forward to what the writer regards

as being the most important issue in the presidential election

By far the most important issue in the campaign was the state of the nationaleconomy Clinton won because he presented himself as a competent,moderate alternative to a president who was perceived as having failed tomanage the economy

Although the shared collocates are generally the most frequent collocates,

it can be argued that the characteristic collocates, which normally occurslightly lower down in any list of collocates ordered by t-score orfrequency, are more useful to learners as they highlight slight butsignificant differences in the way that the selected items from a particulargroup are used In the case of the items from Group 1, for example, anissue is frequently seen as something which is contentious and controversial,whereas an aspect is something which can be worrying or disturbing(Table 3) Factor, on the other hand, was found to be frequently associatedwith more technical usages (e.g., growth factor) but is also used in a kind ofpseudotechnical way (e.g., feel-good factor), which may be an attempt tobring a sense of objectivity to something that can only really be measured

by more subjective means Table 3 shows the most frequent node 21collocates associated with issue, aspect, and factor For example, sensitiveissue example, issue is the node and sensitive is the collocate which appears

in the node 21 position The values in Table 3 show that this collocationoccurs 316 times in the BoE and a t-score value of 17.77 is a measure of thestatistical significance of this combination i.e., the collocation is notsimply the result of chance as the t-score is well above 2.00

Although it is clear from the data that all three Group 1 items arefrequently used as cohesive devices, it is also clear from the characteristiccollocates that they do not all have the same meaning and associations

By choosing one item over another, the user is obviously making someform of evaluation and is not simply referring to another item or stretch

of discourse in a neutral manner It is precisely these slight but

Trang 10

significant differences in usage and therefore in meaning, that learnersneed to be aware of in order to use the target language effectively.

Polysemy and Homonymy

Where a word has a number of different senses it is normally thecollocates in the surrounding cotext which can be used to disambiguatethe item It is possible, for example, to identify three different senses ofissue in the corpus data, each associated with different characteristiccollocates (e.g., contentious issue, latest issue, share issue) The values inTable 4 show, for example, that there are 535 occurrences of thecollocation political issue, 452 of latest issue, and 1,024 of rights issue in theBoE

The corpus data from the current study show that one of the mostsignificant features which influences the way collocations are formed isthe semantics of the selected item, and, where the item has two or moredistinct senses, each of them is generally associated with a different set ofcharacteristic collocates Hoey (2005) argued that, where ambiguity ispossible, speakers deliberately avoid collocates that increase thisambiguity and generally choose ones which decrease it

However, it was not always so easy to discern a clear number ofdifferent senses for a particular selected item from the corpus data Anexamination of the most frequent node 21 collocates for system, forexample, shows how it is used to refer to a variety of different types ofsystem and that it is possible to group these collocates according to thetype of system they refer to (Table 5) In this case the collocates havebeen grouped together to reveal seven different types of system, but this

is based on my own rather subjective judgement, and the number ofdifferent types of system would seem to vary according to who is doing

TABLE 3

The Most Frequent Characteristic Collocates Associated With Issue, Aspect, and Factor

Issue Aspect Factor Node –1 t-Score Frequency t-Score Frequency t-Score Frequency sensitive 17.77 316 2.00 4 1.00 1 contentious 15.45 239 2.45 6 1.41 2 controversial 15.09 229 6.32 40 1.00 1 worrying 2.64 7 8.42 71 3.60 13 disturbing 1.99 4 6.41 41 3.16 10 pleasing 0.00 0 5.29 28 2.23 5 risk 1.00 1 0.00 0 19.61 386 feel-good 0.00 0 0.00 0 18.11 329 growth 2.00 4 0.00 0 13.69 189 Note Data are from the Bank of English.

Trang 11

the grouping The Collins COBUILD Advanced Learner’s Dictionary(Sinclair, 2006), for example, lists six different types of system, whereasthe Oxford Advanced Learner’s Dictionary (Wehmeier, 2005) and theLongman Dictionary of Contemporary English (Summers, 2003) only listthree and four different types, respectively.

The corpus data also contained a number of verbal collocates whichwere associated with specific types of system For example, verbs such asDEPRESS and STIMULATE were found to be associated with biologicalsystems, INSTALL and ASSEMBLE with technical systems, and REFORMand RESTRUCTURE with social systems The following concordance linestaken from the BoE serve to illustrate some of these associations

TABLE 4

The Most Frequent Collocates Associated With Three Meanings of Issue

Issue (meaning 1) Issue (meaning 2) Issue (meaning 3) Node –1 t-score

quency

Fre-Node –1 t-Score

quency Node –1 t-Score

quency political 22.05 535 latest 20.80 452 rights a 33.48 1024 Palestinian 18.09 332 current 18.57 368 bond 19.41 377 contentious 15.43 239 next 16.78 282 share 18.66 349 controversial 14.96 229 special 14.45 240 stock 6.95 49 thorny 12.59 159 last 14.02 197 currency 6.76 46 Note Data are from the BoE.aCollocations such as human rights issue or civil rights issue are not included in the values for frequency or t-score.

Fre-TABLE 5

The Most Frequent Node –1 Collocates Associated With Seven Different Types of System Node –1 t-Score Frequency Node –1 t-Score Frequency social systems political systems

legal 39.31 1,554 capitalist 18.78 354 education 36.90 1,380 democratic 18.31 341 business systems technical systems

management 18.78 461 computer 40.11 1,610 accounting 10.56 112 telephone 17.16 305 transport systems biological systems

transport 23.62 565 immune 55.43 3,074 rail 14.99 225 nervous 46.67 2,200 geographical systems

solar 40.17 1,615

river 11.04 122

Note Data are from the Bank of English.

Ngày đăng: 01/01/2019, 22:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm