8 The convergence of corpuslinguistics, psycholinguistics and functionalist linguistics As we have seen in Chapter7, functionalist linguistics in the broad sense including cognitive ling
Trang 18 The convergence of corpus
linguistics, psycholinguistics and functionalist linguistics
As we have seen in Chapter7, functionalist linguistics in the broad sense (including cognitive linguistics) is increasingly making use of corpus-based methods, and in turn informing the analyses of corpus linguists In this chapter, we will show that this phenomenon extends as well to experimental psycholinguistics We will also discuss the implications of the rapprochement
of functionalist linguistics and psycholinguistics with corpus linguistics with regard to the neo-Firthian school of thought which we surveyed in Chapter6;
we will argue that in the neo-Firthian school, this rapprochement with functional linguistics has taken a very different form As we saw in Chapter6, one of the bases of the neo-Firthian or so-called ‘corpus-driven’ approach is a rejection
of non-corpus-derived theoretical frameworks To explicitly adopt a functionalist theory as the basis for a corpus-driven study would be distinctly peculiar from the neo-Firthian perspective Indeed, some of the stronger forms of the neo-Firthian position – such as that espoused by Teubert, for instance – explicitly reject the notion of a convergence of neo-Firthian corpus linguistics and functional
or cognitive linguistics, with Teubert (2005: 2) claiming that corpus linguistics
‘offers a perspective on language that sets it apart from received views or the views of cognitive linguistics, both relying heavily on categories gained from introspection rather than from the data itself’ Nevertheless, we wish to argue that such a convergence is in fact taking place, stemming on the neo-Firthian side from work by Sinclair and others from the 1990s onwards Our basis for making this case is that, when we closely examine the findings of the most extensively developed neo-Firthian theories – in particular, Pattern Grammar and
Lexical Priming – we will find that many of these conclusions have also been
arrived at by one or more branches of functional linguistics or psycholinguistics These congruent conclusions stem from wildly different sets of evidence and are, of course, expressed using very different descriptive apparatus But certain fundamental insights – namely, the inseparability of lexis and grammar, and the nature of grammar as secondary to, and emergent from, lexis – have been arrived at by both functional linguists and neo-Firthian corpus linguists, largely independently of one another
192
Trang 28.2 Corpus methods and psycholinguistics 193
In this chapter, then, we have two main topics Firstly, in section 8.2 we will
consider the role of corpora in experimental psycholinguistics, as we
consid-ered their role in functionalism in Chapter7 Psycholinguistics as a discipline
is methodologically rather different to functionalist theoretical linguistics, but it
shows signs of a similar trend with regard to corpus methods – that is, that over
recent years there has been more and more use of corpus data within
psycholin-guistic research, and a convergence or rapprochement between the findings of
psycholinguistic experiments and of corpus investigations
Secondly, section 8.3 discuss the convergence of findings, regarding in
par-ticular the ontological status of grammar, lexis and language itself, between
neo-Firthian corpus linguistics, functional linguistics and psycholinguistics
Overlapping cognitive linguistics (which we discussed in the previous
chapter), but in many ways distinct from it, is the field of psycholinguistics –
and in particular that branch of psycholinguistics whose methodology is mainly
experimental In the latter approach, the primary source of data is various types
of laboratory tests on human subjects (or, as we will see later, computer
mod-els) While experimental psycholinguistics is not usually considered a branch
of functional-cognitive linguistics, its fundamental methodological assumption –
that the nature of language in the brain or mind can be investigated in much the
same way that experimental psychology in general looks at other aspects of the
nature of thought – is in accordance with the general tenet of functionalism that
there is no absolute divide between form and function, between language and
non-linguistic cognition
Psycholinguistics is a very broad field, and there is absolutely no room here
for a full review of it – nor even to treat comprehensively all research which has
linked psycholinguistics with corpus data and methods We must therefore
con-fine ourselves to an extremely brief and purely indicative survey To characterise
psycholinguistics in very broad terms, we might say that it is focused on two
primary issues (which are closely interrelated, as Ellis2002illustrates): language
learning and language processing There are other topics of interest of course,
such as the evolution of the language faculty However, we will limit ourselves
here to looking at how corpora have been used in some psycholinguistic
investi-gations into first language acquisition, second language acquisition and language
processing
8.2.1 Corpus data in experiments on language processing 䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲䡲
Language processing has been investigated experimentally in a
num-ber of ways Two that are reasonably common are self-paced reading experiments
Trang 3194 c o r p u s , p s y c h o l i n g u i s t i c s a n d f u n c t i o n a l i s m
and eye-tracking experiments Both are means of investigating the speed with
which particular segments of language are processed In a self-paced reading experiment, participants work at a computer running a specially designed pro-gram The computer shows one word of a sentence at a time to the participant, who presses a button to get the next word once they have read the word currently
on screen The program records the time for each button-press, so that the relative speed of reading for each word is known Typically, after each sentence partici-pants have to answer a (very easy) question about the content of the sentence – this prevents participants from just clicking through sentences without actually reading for meaning The results of such an experiment can be used to infer what elements (morphological, syntactic or semantic) are processed easily, and which are more difficult and thus require more processing time This in turn can give indications about what is actually happening in the brain Although useful, self-paced reading experiments may potentially be misleading in that fluent readers
do not typically read one word at a time, in sequence, without ever going back
in the text In fact, it is known that a reader’s journey through a sentence of printed text can be quite complex, with multiple movements back and forth This type of evidence is gathered in eye-tracking experiments (see Rayner1998for a review) Again, participants are given the task of reading sentences presented on
a computer screen, but this time an entire sentence is presented at one time, and specialised video equipment records the movements of one of the participant’s eyes as it looks at different positions in the sentence immediately after the sen-tence appears on screen The resulting data is much richer, but correspondingly rather more difficult to interpret, than self-paced reading data
These kinds of experiments may seem remote from the concerns of corpus linguistics However, there are at least two ways in which corpus data can play
an important role in the design and interpretation of such experiments Firstly, corpus data can be used as a check on the naturalness of the language task that the experiment sets its participants For instance, Frisson and Pickering (2001) summarise the results of a series of eye-tracking experiments aimed at investigating the processing of words which are ambiguous between a literal and
a metaphorical meaning, when the part of the sentence prior to the ambiguous word does not provide sufficient cues to indicate which meaning is intended But Deignan (2005: 114–17), in a review of this study, points out that in fact, such
cases almost never occur in corpora of real usage: in all the examples she looks at,
some aspect of the preceding context – possibly in an earlier sentence – indicates
which meaning is intended So, for instance, the word campaign literally relates
to warfare and metaphorically relates to politics In any given real example of
campaign from a corpus, the prior context is overwhelmingly likely to give some
indication whether a military campaign or a political campaign is intended; so by
the time the reader gets up to campaign, it is already effectively disambiguated.
On this basis, Deignan argues that if an experiment presents participants with a
word such as campaign without any indication in the foregoing text as to whether
it is literal or metaphorical, as Frisson and Pickering’s experiment did, then that
Trang 48.2 Corpus methods and psycholinguistics 195 experiment is actually ‘forcing participants to tackle problems that are not faced
in normal discourse’ (Deignan 2005: 117) If this is the case, then it may be
argued that while such an experiment may indeed tell us something interesting
about the processing of ambiguously metaphorical words, it cannot tell us about
the normal processing of language in use We can see, then, that a corpus-derived
awareness of how words (and other linguistic items) are actually used can serve
as a useful anchoring-point for psycholinguistic experimentation This is not to
say that unnatural language should never be used in an experiment – there are
cases where non-idiomatic language may itself be the object of study, for instance
Millar’s (2011) study of how errors in collocation, of the type made by non-native
speakers of English, can affect processing speed in self-paced reading What is
undesirable is a situation where experimental tasks include highly unnatural
language without the experimenter being aware that this is the case.
Secondly, corpus data can be used as a source of frequency data in the
construc-tion of test sentences in self-paced reading or eye-tracking experiments Often,
the test sentences used will not be drawn directly from corpus data, because the
analysis of the resulting data may require certain aspects of the sentences to be
controlled across different examples For instance, if we are primarily interested
in the time taken to process (say) the verb in a sentence, then we might well
wish to control the length and syntactic structure of the preverbal elements (as
well as, potentially, that of the rest of the sentence) We are unlikely to find such
controlled sentences in a corpus! But even when invented example sentences are
used, it is entirely possible for the creation of the sentences to be informed by
fre-quency data of various sorts extracted from a corpus The study by Millar (2011)
which we cited above uses this approach: Millar’s test sentences are all fabricated,
but each is built around an observed non-idiomatic collocation extracted from a
learner corpus
A perhaps more straightforward use of frequency data drawn from corpora is
exemplified by the eye-tracking experiments of McDonald and Shillcock (2003a,
2003b) They investigate whether the co-occurrence frequency of a pair of words
(as established in a large corpus, in this case the BNC) can predict the ease of
processing of the second word in that pair The co-occurrence frequencies are
expressed, in this case, as transition probabilities; that is, given that the first
word in the pair is X, what is the probability that the second word is Y? In
this case, the probability is equal to the number of times the sequence X-then-Y
occurs in the BNC, divided by the total number of instances of word X – this is
fundamentally very similar to a collocation calculation McDonald and Shillcock
(2003a) look at the processing of verb–object pairs, contrasting pairs where the
object is probable, given the verb – e.g avoid confusion – and pairs where it is less
probable – e.g avoid discovery The frequencies of these bigrams in the BNC are
50 and 2 respectively, relative to 7,823 instances of the wordform avoid in total.
McDonald and Shillcock’s eye-tracking data showed that participants’ eyes fixed
on the object noun for a shorter time when they were reading a high-probability
transition than when reading a low-probability transition This suggests that the