The learning task is further facilitated by an inversion of the perspective on the relation between natural language and the human learning mech-anism, suggesting that language is an art
Trang 1Language Learning in the Full
or, Why the stimulus might not be so poor after all
Morten H Christiansen Philosophy-Neuroscience-Psychology Program
Department of Philosophy Washington University in St Louis
Campus Box 1073
St Louis, MO 63130 morten@twinearth.wustl.edu January 6, 1994
Abstract
Language acquisition is often said to require a massive innate body of language specic knowledge in order to overcome the poverty of stimulus
In this picture, language learning merely implies setting a number of pa-rameters in an internal Universal Grammar But is the primary linguistic evidence really so poor that it warrants such an extreme nativism? Is there
no room for a more empiricist oriented approach to language acquisition?
In this paper, I argue against the extreme nativist position, discussing re-cent results from psycholinguistics and connectionist research on natural language Specically, I eschew the competence/performance distinction traditionally presupposed in linguistics, advocating a close relationship between the representation of grammar and the processing mechanism Abandoning the innite grammatical competence allows us to focus on regular languages This signicantly restricts the computational com-plexity that such models will face regarding both learning and processing The learning task is further facilitated by an inversion of the perspective
on the relation between natural language and the human learning mech-anism, suggesting that language is an artifact heavily constrained by the human learning mechanism This is supplemented by evidence concerning the way maturational development alleviates language learning Constru-ing natural language as a maturationally constrained artifact promises
to overcome the predicament following a classic formal language learning
Trang 2proof; viz , that no interesting language can be learned from positive ex-amples alone Together, these arguments strongly suggest that some sort
of non-trivial empiricist language learning is possible after all|because, given the right assumptions, the stimulus is not really that poor.
Universal Grammar (UG)|understood as a substantial innate endowment of language specic knowledge|is widely regarded as being a necessary compo-nent in accounts of language learning In this framework language \learning"
is simply a matter of setting a number of parameters (specic to a particular language) in an innate database consisting of universal language specic prin-ciples Chomsky, the principal architect of UG, has argued that when it comes
to the explanation of language acquisition the postulation of a universal set of linguistic principles (constraints or rules) is the only game in town In other words, UG is held to be the best explanation of human linguistic behavior That there are internal constraints on the acquisition of language is hardly controversial, but the nature and extent of these constraints is the focus of much debate The main argument against empiricist approaches to language learning stems from observations about thepoverty of the stimulusthat language learners are exposed to Without substantial constraints, it seems that the fragment of language available to a child is too limited and (perhaps) too degenerate to allow the induction of the appropriate grammar underlying a specic language
In addition, Gold's (1967) proof that context-free languages|and even regular languages|cannot be reliably learned from a nite set of positive examples has been taken as further evidence against empiricist approaches to language acquisition
Yet, there are a number of holes in the arguments from the poverty of stim-ulus to the extreme nativist position of UG These lacunae center around (1) the appeal to a competence/performance distinction; (2) the perceived relation between language and the human language learning mechanism; (3) a static conception of the learning mechanism; and (4) an over-interpretation of Gold's results Together, these problems pose a strong challenge to traditional Chom-skyan linguistics regarding the psychological reality of their grammars In this paper, I seek to advance this challenge, presenting arguments based on recent connectionist and psycholinguistic research
Trang 3However, before we start, a few clarications are in order to avoid termi-nological confusions In particular, we need to be clear about what is meant
by `grammar' and `psychological reality' Regarding the latter, Peacocke (1989) has suggested \that for a rule of grammar to be psychologically real for a given subject is for it to specify the information drawn upon by the relevant mecha-nisms or algorithms in that subject" (p 114) Given the connectionist spirit of the present paper, we can come up with a more suitable notion of grammar and the psychological reality thereof by dropping the qualication of grammars as being essentially rule-based Thus, agrammarconsists of a body of information organized in such a way that it can account for a given set of linguistic data Notice that the mode of organization is not predetermined, so the above notion
of grammar can be applied in the discussion of both rule-based and connection-ist natural language systems Rephrasing Peacocke, I suggest that a grammar
ispsychologically realif it species the body of information which, when drawn upon by a particular mechanism or algorithm, is necessary for the causal expla-nation of a given collection of linguistic and psycholinguistic evidence
In what follows, I start out by eschewing the distinction between competence and performance typically presupposed in linguistics, stressing performance data
as the basis for models of natural language learning and processing Abandon-ing the innite grammatical competence allows us to focus on regular languages which signicantly limits the complexity that our models face Next, the lan-guage learning task is further reduced by observing the way maturational con-straints interacts with language learning, the former strongly constraining the latter Finally, suggestions are made that circumvent the predicament follow-ing Gold's formal language learnfollow-ing results without succumbfollow-ing to the extreme nativism of UG In the conclusion, I suggest that we can get much further than previously assumed using simple statistical and connectionist language models incorporating certain maturational constraints
2 Abandoning the Distinction Between Com-petence and Performance
In modern linguistics, the paradigmatic method of obtaining data is through intuitive grammaticality judgements However, it is a generally accepted fact that the greater the length and complexity of a particular utterance is, the
Trang 4less sure people are in their judgement thereof To explain this phenomena, a distinction between an innite linguisticcompetenceand a limitedperformance
is made In contrast to the idealized grammatical competence, the performance
of a particular individual is limited by memory limitations, attention span, lack
of concentration, and so on
This methodologicalseparation of the unbounded linguistic competence from the limitedperformance of observable natural language behavior has been strongly advocated by Chomsky:
One common fallacy is to assume that if some experimental result provides counter-evidence to a theory of processing that includes a grammatical theory T and parsing procedure P (: :), then it is T that is challenged and must be changed The conclusion is partic-ularly unreasonable in the light of the fact that in general there is independent (so-called \linguistic") evidence in support of T while there is no reason at all to believe that P is true (Chomsky, 1981:
p 283)
The main methodological implication of this position is that it leads to what I elsewhere have called the`Chomskyan paradox'1 On the one hand, the compe-tence/performance distinction (C/PD) makes T immune to all empirical falsi-cation, since any falsifying evidence can always be dismissed as a consequence
of a false P On the other hand, all grammatical theories nevertheless rely on grammaticality judgements that (indirectly via processing) display our knowl-edge of language Consequently, it seems paradoxical that only certain kinds of empirical material is accepted|i.e., grammaticality judgements|whereas other kinds are dismissed on what appears to be relatively arbitrary grounds Indeed, Chomsky does not seem to care much about psycholinguistic results:
In the real world of actual research on language, it would be fair to say, I think, that principles based on evidence derived frominformant judgment have proved to be deeper and more revealing than those based on evidence derived from experiments on processing and the like, although the future may be dierent in this regard (Chomsky, 1980: p 200)
1 For a more detailed discussion of this and related points, see Christiansen (1992).
Trang 5In this light, the C/PD provides its proponents with a protective belt that surrounds their grammatical theories and makes them empirically impenetrable
to psycholinguistic counter-evidence
As long as the C/PD is upheld, potentially falsifying psycholinguistic evi-dence can always be explained away by referring to performance errors This
is methodologically unsound insofar as linguists want to claim that their gram-mars have psychological reality But it is clear that Chomsky (1986) nds that linguistic grammars are psychologically real when he says that the standpoint of generative grammar \is that of individual psychology" (p 3) Nevertheless, by evoking the distinction between grammatical competence and observable nat-ural language behavior, thus disallowing negative empirical testing, linguists cannot hope to nd other than speculative (or what Chomsky calls `indepen-dent linguistic') support for their theories In other words, if linguistic theory
is to warrant psychological claims, then the C/PD must be abandoned
In contrast, a connectionist perspective on natural language promises to es-chew the C/PD, since it is not possible to isolate a network's representations from its processing The relation between a grammar, which has been acquired through training, and network processing is as direct as it can be Instead of being a set ofpassive representations of declarative rules waiting to be manip-ulated by a central executive, a connectionist grammar is distributed over the network's memory as anability to process language(Port & van Gelder, 1991) Notice also that although networks are generally \tailored" to t the linguis-tic data, this does not simply imply that a network's failure to t the data is passed onto the processing mechanism alone Rather, when you tweak a net-work to t a particular set of linguistic data, you are not only changing how
it will process the data, but also what it will be able to learn That is, any architectural modications will lead to a change in the overall constraints on a network, forcing it to adapt dierently to the contingencies inherent in the data and, consequently, to the acquisition of a dierent grammar Thus, since the representation of the grammar is an inseparable andactive part of a network's processing, it is impossible to separate a connectionist model's competence from its performance
It is, furthermore, worth noticing that performance, but not competence, can be described in terms of regular languages produced by nite-state ma-chines This substantially reduces the complexity of the processing involved
Trang 6and, subsequently, of the learning task, too This avenue has been pursued by Elman (1991a) and Christiansen & Chater (1994) via connectionist simulations
of various aspects of natural language performance In particular, these models exhibit psychologically realistic behavior when faced with sentences involving center-embedding and cross-dependency Moreover, as we shall see next, the close t between the kind of grammar that can be learned and the network conguration is also likely to be characteristic of the human language learning mechanism
Artifact
Cross-linguistic studies have revealed a number of universal patterns that can
be found in all human languages An example of such a universal pattern is the `head parameter' which gives name to the observation that the head always
is positioned in the same way within phrases For a `head rst' language, such
as English, this means that the head is always rst in phrases; for instance,
in verb phrases the verb is always rst (as in \feeds the cat") Together with the apparent poverty of the primary linguistic stimulus, this has been taken as evidence that the human language acquisition mechanism must be constrained
in such a way that it induces only human languages This seems to be a considerable feat, since the set of human languages is merely a small subset of the vast range of theoretically possible languages The combination of innate constraints necessary for such a feat is therefore supposed to be substantial Indeed, Chomsky has suggested \that in certain fundamental respects we do not really learn language; rather, grammar grows in mind" (Chomsky, 1980: p 134) Thus, UG is the proposed explanation of the prima faciesurprising fact that humans only learn human languages, and not any of the many possible non-human languages
However, I suggest that this is the wrong way to perceive the tight connection between language and the human learning mechanism What we need is a Gestalt switch2: Instead of saying that humans can only learn a small subset
of a huge set of possible languages, we must inverse our perspective, observing thatnaturallanguages exist only because humans can produce, learn and process
Some of the ideas presented in this section were developed in discussion with Andy Clark.
Trang 7them Natural languages are human artifacts constrained by human learning and processing mechanisms It is therefore not surprising that we, after all, are so good at learning them Language is closely tailored just for human learning, not vice versa Notice, moreover, that the \evolutionary rate" for cultural artifacts, such as language, is much faster than for a group of humans making up a specic speech community3 The artifacts are therefore much more likely to adapt to their humans producers than the other way round Languages that are hard for humans to learn simply die out, or, more likely, do not come into existence at all So, in short, the human learning mechanism determines a number ofa priorirestrictions on the possiblehumanlanguages; and the set of the latter is a small fraction of the set oftheoreticallypossible languages That language has evolved in close relation to the development of the human language mechanism is a phylogenetic point, but it has ontogenetic plausibility, too Based on evidence from studies of both rst and second language learners, Newport (1990) has proposed a\Less is More"hypothesis which suggests \para-doxically, that the more limited abilities of children may provide an advantage for tasks (like language learning) which involve componential analysis" (p 24) Maturationally imposed limitations in perception and memory forces children
to focus on certain parts of language depending on their stage of development Interestingly, it turns out that these limitations make the learning task easier, because they help the children acquire the building blocks necessary for further language learning In contrast, the superior processing abilities of adults prevent them from picking up the building blocks directly; rather, they have to be found using complex computations, making language learning more dicult (hence, the notion of a crucial period in language learning) This means that \because of age dierences in perceptual and memorial abilities, young children and adults exposed to similar linguistic environments may nevertheless have very dierent internal data bases on which to perform linguistic analysis" (Newport, 1990: p 26)
In relation to morphology, Newport discusses whether a learner necessarily needs a priori knowledge akin to UG in order to segment language into the right units corresponding to morphemes She nds that such a segmentation
is, indeed, possible, \even without advance knowledge of the morphology, if the units of perceptual segmentation are (at least sometimes) the morphemes
This was pointed out to me by Nick Chater.
Trang 8which natural language have developed" (p 25) Given the above discussion of language as a human artifact, I contend that this point can be applied not only to morphology, but also to syntax Consequently, some of the universal principles and parameters of UG|or perhaps even all of them|do not necessarily need to
be specied innately, but could instead be mere artifacts of a learning mechanism undergoing maturational development Whether this conjecture will hold is an empirical matter that only future research into the human language learning mechanism can settle However, recent results from connectionist and statistical language learning does seem to point in the right direction
In a number of connectionist simulations involving simple recurrent net-works, Elman (1991b) showed that a network without any `maturational' de-velopment cannot learn to respond appropriately to sentences derived from a small phrase grammar of some linguistic complexity However, when Elman introduced constraints|decreasing over time|on the number of words in sen-tence that the network was able to `pay attention to', the network was able to learn the task This work has recently been extended by Christiansen & Chater (1994) who trained a network to deal with a substantially more complex gram-mar using the same approach This practical evidence supports the idea that maturational constraints (of some sorts) on the learning mechanism allows it to pick up relatively complex linguistic structure without presupposing any innate language specic knowledge
Still, it might be objected that these connectionist simulationsonly deal with small articially generated corpora and will therefore not be able to scale up
to noisy real world data This might be true, but recent research in statistical language learning suggests otherwise Finch & Chater (1994) demonstrated that simple statistics|similar to what the above networks are sensitive to|can lter out noise and induce lexical categories and constituent phrases from a
40 million word corpus extracted from INTERNET newsgroups This positive outlook lends support to the idea of construing language as a maturationally constrained human artifact In addition, the latter might, in turn, pave the way out of the predicament following Gold's learnability proof, as we shall see in the nal section
Trang 94 Formal Language Learning Results in the Limit
In a now classic paper in the (formal) language learning literature, Gold (1967) proved that not even regular languages can be learned in nite time from a nite set of positive examples Gold was aware that his proof combined with the lack of observed negative input found in primary linguistic data leads to a predicament regarding human language learning He therefore suggested that his nding must lead to at least one of the followingthree suppositions Firstly, it
is suggested that the learning mechanism is equipped with information allowing
it to constrain the search space dramatically In other words, innate knowledge will impose strong restrictions on exactly what kind of grammars generate the proper projections from the input to (only) human languages This is the ap-proach which goes under the name of UG Secondly, Gold proposes that children might receive negative input that we are simply not aware of This would allow the correct projection to only human languages However, see Pinker (1989) for an extensive discussion and subsequent dismissal of such a proposal (though the prediction task as applied in the previously mentioned language learning simulations by Christiansen & Chater [1994] and Elman [1991a, 1991b] might
be construed as a kind ofweaknegative feedback) Thirdly, it could be the case that there are a priori restrictions on the way the training sequences are put together For instance, the statistical distribution of words and sentence struc-tures in particular language could convey information about which sentences are acceptable and which are not (as suggested by, for instance, Finch & Chater, 1994) Regarding such an approach, Gold notes that distributional models are not suitable for this purpose because they lack sensitivity to the order of the training sequences
So, it seems thatprima facieUG is the only way to get a language learning o the ground|even though learning has to take second place to innate knowledge Nevertheless, given our earlier discussions it should be clear that this conclusion
is far from inevitable The way out of Gold's predicament without bying into which the proof is based on:
Given any language of the class and given any allowable training
sequence for this language, the language will be identied in the
limit [i.e, it is learnable in nite time from a nite set of examples]."
Trang 10(Gold, 1967: p 449; my emphasis and comment).
First of all, Gold is considering all possible permutations from a nite al-phabet (of words) into possible languages constrained by a certain language formalism (e.g., context-free or nite-state formalisms) Thus, he stresses that
"identiability (learnability) is a property of classes of languages, not of individ-ual languages." (Gold, 1967: p.450) This imposes a rather stringent restriction
on candidate learning mechanisms, since they would have to able to learn the whole classof languages that can be derived from the combination of an initial vocabulary and a given language formalism Considering the above discussion
of language as a human artifact, this seems like an unnecessarily strong require-ment to impose on a candidate for the human language learning mechanism In particular, the set of human languages is much smaller than the class of possi-ble languages that can be derived given a certain language formalism So, all
we need to require from a candidate learning mechanism is that it can learn all (and only) human languages, not the whole class of possible languages derivable given a certain language formalism
Secondly, Gold's proof presupposes that the nite set of examples from which the grammatical knowledge is to be induced can be composed in an arbitrary way However, if the learning mechanism is not xed but is undergoing signi-cant changes in terms of what kind of data it will be sensitive to (as discussed above), then we have a completely dierent scenario Specically, even though the order of the environmentally presented input that a learning mechanism is exposed to might be arbitrary, the composition of the eective training set is not That is, maturational constraints on the learning mechanism will essen-tially recongure the input in such a way that the training sequence always will end up having the same eective conguration (and this is, in eect, comparable with Gold's third suggestion) Importantly, this is done without imposing any restrictions on the publically available language, i.e., the language that the child
is exposed to
In this paper, I have provided evidence and arguments against the extreme na-tivist approach to language learning that follows from Chomskyan UG Instead,
I have advocated the role of empiricist language learning in accounts of natural
... Next, the lan-guage learning task is further reduced by observing the way maturational con-straints interacts with language learning, the former strongly constraining the latter Finally, suggestions... empirical testing, linguists cannot hope to nd other than speculative (or what Chomsky calls `indepen-dent linguistic'') support for their theories In other words, if linguistic theoryis to... necessary for further language learning In contrast, the superior processing abilities of adults prevent them from picking up the building blocks directly; rather, they have to be found using complex