In this chapter, we hypothesize that integrating multiple probabilistic cues phonological, prosodic, and distributional by perceptually attuned general-purpose learning mechanisms may ho
Trang 1PART II Words, Language, and Music
Trang 2Among the many feats of learning that children showcase in their development, syntactic
abilities appear long before many other skills, such as riding bikes, tying shoes, or playing a musical instrument This is achieved with little or no direct instruction, making it both
impressive and even puzzling, because mastering natural language syntax is one of the most difficult learning tasks that humans face One reason for this difficulty is a “chicken-and-egg” problem involved in acquiring syntax Syntactic knowledge can be characterized by constraints governing the relationship between grammatical categories of words (such as noun and verb) in a sentence At the same time, the syntactic constraints presuppose the grammatical categories in terms of which they are defined; and the validity of grammatical categories depends on how they support those same syntactic constraints A similar “bootstrapping” problem faces a student learning an academic subject such as physics: understanding momentum or force presupposes some understanding of the physical laws in which they figure; yet these laws presuppose these very concepts The bootstrapping problem solved by very young children seems much more daunting, both because the constraints governing natural language are so intricate, and because these children do not have the intellectual capacity or explicit instruction present in conventional academic settings Determining how children accomplish the astonishing feat of language
acquisition remains a key question in cognitive science
Trang 3By 12 months, infants are attuned to the phonological and prosodic regularities of their native language (Jusczyk, 1997; Kuhl, 1999) This perceptual attunement may provide an
essential scaffolding for later learning by biasing children toward aspects of language input that are particularly informative for acquiring grammatical knowledge In this chapter, we
hypothesize that integrating multiple probabilistic cues (phonological, prosodic, and
distributional) by perceptually attuned general-purpose learning mechanisms may hold promise for explaining how children solve the bootstrapping problem Multiple cues can provide reliable evidence about linguistic structure that is unavailable from any single source of information
In the remainder of this chapter, we first review empirical evidence suggesting that
infants may use a combination of phonological, prosodic, and distributional cues to bootstrap into syntax We then report a series of simulations demonstrating the computational efficacy of multiple-cue integration within a connectionist framework (for modeling of other aspects of cognitive development, see the chapter by Mareschal & Westermann, this volume) Simulation 1 shows how multiple-cue integration results in better, faster, and more uniform learning
Simulation 2 uses this initial model to mimic the effect of grammatical and prosodic
manipulations in a sentence comprehension study with 2-year-olds (Shady & Gerken, 1999) Simulation 3 uses an idealized representation of prenatal exposure to gross-level phonological and prosodic cues, leading to facilitation of postnatal learning of syntax by the model Simulation
4 demonstrates that adding additional distracting cues, irrelevant to the syntactic acquisition task, does not hinder learning Finally, Simulation 5 scales up these initial simulations, showing that connectionist models can acquire aspects of syntactic structure from cues present in actual child-directed speech
Trang 4THE NEED FOR MULTIPLE LANGUAGE-INTERNAL CUES
In this section, we identify three kinds of constraints that may serve to help the language learner solve the syntactic bootstrapping problem First, innate constraints in the form of linguistic universals may be available to discover to which grammatical category a word belongs, and how they function in syntactic rules Second, language-external information, concerning observed semantic relationships between language and the world, could help map individual words onto their grammatical function Finally, language-internal information, such as aspects of
phonological, prosodic, and distributional patterns, may indicate the relation of various parts of language to each other, thus bootstrapping the child into the realm of syntactic relations We discuss each of these potential constraints below, and conclude that some form of language-internal information is needed to break the circularity
Although innate constraints likely play a role in language acquisition, they cannot solve the bootstrapping problem Even with genetically prescribed abstract knowledge of grammatical categories and syntactic rules (e.g., Pinker, 1984), the problem remains: Innate knowledge
requires building in universal mappings across languages, but the relationships between words and grammatical categories clearly differ cross-linguistically (e.g., the sound /su/ is a noun in
French (sou) but a verb in English (sue)) Even with rich innate knowledge, children still must
assign sound sequences to appropriate grammatical categories while determining the syntactic relations between these categories in their native language Recently, a wealth of compelling experimental evidence has accumulated, suggesting that children do not initially use abstract linguistic categories Instead, they seem to employ words at first as concrete individuals (rather than instances of abstract kinds), thereby challenging the usefulness of hypothesized innate grammatical categories (Tomasello, 2000) Whether we grant the presence of extensive innate
Trang 5knowledge or not, it seems clear that other sources of information are necessary to solve the bootstrapping problem
Language-external information, such as correlations between the environment and
semantic categories, may contribute to language acquisition by supplying a “semantic
bootstrapping” solution (Pinker, 1984) However, because children learn linguistic distinctions that have no semantic basis (e.g., gender in French: Karmiloff-Smith, 1979), semantics cannot be the only source of information involved in solving the bootstrapping problem Other sources of language-external constraints include cultural learning, indicated by a child’s imitation of
linguistic forms in socially conventional contexts (Tomasello, Kruger & Ratner, 1993) For
example, a child may perceive that the idiom “John let the cat out of the bag,” used in the
appropriate context, means that John has revealed some sort of secret, and not that he released a feline from captivity Despite both of these important language-external sources, to break down the linguistic forms into relevant units, it appears that correlation and cultural learning must be coupled with language-internal information
We do not challenge the important role that the two foregoing sources of information play in language acquisition We would argue, however, that language-internal information is fundamental to bootstrapping the child into syntax Because language-internal input is rich in potential cues to linguistic structure, we offer a requisite feature of this information for syntax acquisition: Cues may only be partially reliable individually, and a learner must integrate an array of these cues to solve the bootstrapping problem For example, a learner could use the
tendency for English nouns to be longer than verbs to conjecture that bonobo is a noun, but the same strategy would fail for ingratiate Likewise, although speakers tend to pause at syntactic
phrase boundaries in a sentence, pauses also occur elsewhere during normal language
Trang 6production And although it is a good distributional bet that the definite article the will precede a noun, so might adjectives, such as silly The child therefore needs to integrate a great diversity of
probabilistic cues to language structure Fortunately, as we review in the next section, there is now extensive evidence that multiple probabilistic cues are available in language-internal input, that children are sensitive to them, and that they facilitate learning through integration
Bootstrapping through Multiple Language-Internal Cues
We explore three sources of language-internal cues: phonological, prosodic, and distributional Phonological information includes stress, vowel quality, and duration, and may help distinguish grammatical function words (e.g., determiners, prepositions, and conjunctions) from content words (nouns, verbs, adjectives, and adverbs) in English (e.g., Cutler, 1993; Gleitman &
Wanner, 1982; Monaghan, Chater & Christiansen, 2005; Monaghan, Christiansen & Chater, 2007; Morgan, Shi, & Allopenna, 1996; Shi, Morgan, & Allopenna, 1998) Phonological
information may also help separate nouns and verbs (Monaghan, Chater, & Christiansen, 2005; Monaghan, Christiansen, & Chater, 2007; Onnis & Christiansen, 2008) For example, English disyllabic nouns tend to receive initial-syllable (trochaic) stress whereas disyllabic verbs tend to receive final-syllable (iambic) stress, and adults are sensitive to this distinction (Kelly, 1988) Acoustic analyses have also shown that disyllabic words that are noun–verb ambiguous and have the same stress placement can still be differentiated by syllable duration and amplitude cue differences (Sereno & Jongman, 1995) Even 3-year-old children are sensitive to this stress cue, despite the fact that few multisyllabic verbs occur in child-directed speech (Cassidy & Kelly,
1991, 2001) Additional noun/verb cues in English likely include differences in word duration, consonant voicing, and vowel types, and many of these cues may be cross-linguistically relevant (see Kelly, 1992; Monaghan & Christiansen, 2008, for reviews)
Trang 7Prosodic cues help word and phrasal/clausal segmentation and may reveal syntactic structure (e.g., Gerken, Jusczyk & Mandel, 1994; Gleitman & Wanner, 1982; Kemler-Nelson, Hirsh-Pasek, Jusczyk, & Wright Cassidy, 1989; Morgan, 1996) Acoustic analyses find that pause length, vowel duration, and pitch all mark phrasal boundaries in English and Japanese child-directed speech (Fisher & Tokura, 1996) Perhaps from utero (Mehler et al., 1988) and beyond, infants seem highly sensitive to such language-specific prosodic patterns (Gerken et al., 1994; Kemler-Nelson et al., 1989; for reviews, see Gerken, 1996; Jusczyk & Kemler-Nelson, 1996; Morgan, 1996) Prosodic information also improves sentence comprehension in 2-year-olds (Shady & Gerken, 1999) In experiments using adult participants, artificial language
learning is facilitated in the presence of prosodic marking of syntactic phrase boundaries
(Morgan, Meier & Newport, 1987; Valian & Levitt, 1996) Neurophysiological evidence in the form of event-related brainwave potentials (ERP) in adults shows that prosodic information has
an immediate effect on syntactic processing (Steinhauer, Alter, & Friederici, 1999), suggesting a rapid, on-line role for this important cue While prosody is influenced to some extent by a
number of nonsyntactic factors, such as breathing patterns, resulting in an imperfect mapping between prosody and syntax (Fernald & McRoberts, 1996), infants’ sensitivity to prosody argues for its likely contribution to syntax acquisition (Fisher & Tokura, 1996; Gerken 1996; Morgan, 1996)
Distributional characteristics of linguistic fragments at or below the word level may also provide cues to grammatical category Morphological patterns across words may be
informative—e.g., English words that are observed to have both –ed and –s endings are likely to
be verbs (Maratsos & Chalkley, 1980) In artificial language learning experiments, adults acquire grammatical categories more effectively when they are cued by such word-internal patterns
Trang 8(Brooks, Braine, Catalano & Brody, 1993; Frigo & McDonald, 1998) Corpus analyses reveal that word co-occurrence also gives useful cues to grammatical categories in child-directed
speech (e.g., Mintz, 2003; Monaghan et al., 2005, 2007; Redington, Chater, & Finch, 1998) Given that function words primarily occur at phrase boundaries (e.g., initially in English and French and finally in Japanese), they can also help the learner by signaling syntactic structure This idea has received support from corpus analyses (Mintz, Newport & Bever, 2002) and
artificial language learning studies (Green, 1979; Morgan et al., 1987; Valian & Coulson, 1988) Finally, artificial language learning experiments indicate that duplication of morphological
patterns across related items in a phrase (e.g., Spanish: Los Estados Unidos) <COMP: Keep
underline for clarity.> facilitates learning (Meier & Bower, 1986; Morgan et al., 1987)
It is important to note that there is ample evidence that children are sensitive to these multiple sources of information After just 1 year of language exposure, the perceptual
attunement of children likely allows them to make use of language-internal probabilistic cues (for reviews, see Jusczyk, 1997, 1999; Kuhl, 1999; Pallier, Christophe & Mehler, 1997; Werker
& Tees, 1999) Through early learning experiences, infants already appear sensitive to the
acoustic differences between function and content words (Shi, Werker & Morgan, 1999) and the relationship between function words and prosody in speech (Shafer, D W Shucard, J L
Shucard & Gerken, 1998) Young infants are able to detect differences in syllable number among isolated words (Bijeljac, Bertoncini & Mehler, 1993) In addition, infants exhibit rapid
distributional learning (e.g., Gómez & Gerken, 1999; Saffran, Aslin, & Newport, 1996; see Gómez & Gerken, 2000; Saffran, 2003 for reviews), and importantly, they are capable of
multiple-cue integration (Mattys, Jusczyk, Luce, & Morgan, 1999; Morgan & Saffran, 1995) When facing the bootstrapping problem, children probably also benefit from characteristics of
Trang 9child-directed speech, such as the predominance of short sentences (Newport, Gleitman &
Gleitman, 1977) and exaggerated prosody (Kuhl et al., 1997)
In summary, phonological information helps to distinguish function words from content words and nouns from verbs Prosodic information helps word and phrasal/clausal segmentation, thus serving to uncover syntactic structure Distributional characteristics aid in labeling and segmentation, and may provide further cueing of syntactic relations Despite the value of each source, none of these cues in isolation suffices to solve the bootstrapping problem The learner must integrate these multiple cues to overcome the limited reliability of each individually This review has indicated that a range of language-internal cues is available for language acquisition, that these cues affect learning and processing, and that mechanisms exist for multiple-cue
integration What is yet unknown is how far these cues can be combined to solve the
bootstrapping problem (Fernald & McRoberts, 1996) Here we present connectionist simulations
to demonstrate that efficient and robust computational mechanisms exist for multiple-cue
integration (see also the chapters in this volume by Hannon, Kirkham, and Saffran, for evidence from human infant learning)
SIMULATION 1: MULTIPLE-CUE INTEGRATION
Although the multiple-cue approach is gaining support in developmental psycholinguistics, its computational efficacy still remains to be established The simulations reported in this chapter are therefore intended as a first step toward a computational approach to multiple-cue
integration, seeking to test its potential value in syntax acquisition Based on our previous
experience with modeling multiple-cue integration in speech segmentation (Christiansen, Allen,
& Seidenberg, 1998), we used a simple recurrent network (SRN; Elman, 1990) to model the integration of multiple cues The SRN is feed-forward neural network equipped with an
Trang 10additional copy-back loop that permits the learning and processing of temporal regularities in the stimuli presented to it (see Figure 5.1) This makes it particularly suitable for exploring the acquisition of syntax, an inherently temporal phenomenon
INSERT FIGURE 5.1 ABOUT HERE The networks were trained on corpora of artificial child-directed speech generated by a grammar that includes three probabilistic cues to grammatical structure: word length, lexical stress, and pitch The grammar (described further below) was motivated by considering frequent constructions in child-directed speech in the CHILDES database (MacWhinney, 2000)
Simulation 1 demonstrates how the integration of these three cues benefits the acquisition of syntactic structure by comparing performance across the eight possible cue combinations ranging from the absence of cues to the presence of all three
Method
Networks
Ten networks were trained per condition, with an initial randomization of network connections in the interval [–0.1, 0.1] Learning rate was set to 0.1, and momentum to 0 Each input to the networks contained a localist representation of a word (one unit = one word) and a set of cue units depending on cue condition Words were presented one by one, and networks were required
to predict the next word in a sentence along with the corresponding cues for that word With a total of 44 words (see below) and a pause marking boundaries between utterances, the networks had 45 input units Networks in the condition with all available cues had an additional five input units The number of input and output units thus varied between 45 and 50 across conditions Each network had 80 hidden units and 80 context units
Trang 11Materials
We constructed an idealized but relatively complex grammar based on independent analyses of child-directed speech corpora (Bernstein-Ratner, 1984; Korman, 1984) and a study of child-directed speech by mother–daughter pairs (Fisher & Tokura, 1996) As illustrated in Table 5.1, the grammar included three primary sentence types: declarative, imperative, and interrogative sentences Each type consisted of a variety of common utterances reflecting the child’s exposure For example, declarative sentences most frequently appeared as transitive or intransitive verb
constructions (the boy chases the cat, the boy swims), but also included predication using be (the horse is pretty) and second person pronominal constructions commonly found in child-directed
corpora (you are a boy) Interrogative sentences were composed of wh-questions (where are the boys? , where do the boys swim?), and questions formed by using auxiliary verbs (do the boys walk? , are the cats pretty?) Imperatives were the simplest class of sentences, appearing as intransitive or transitive verb phrases (kiss the bunny, sleep) Subject–verb agreement was upheld
in the grammar, along with appropriate determiners accompanying nouns (the cars vs *a cars)
Each word was assigned a unit for input into the model, and we added a number of units
to represent cues Two basic cues were available to all networks The fundamental distributional information inherent in the grammar could be exploited by all networks in this simulation As a second basic cue, utterance-boundary pauses signaled grammatically distinct utterances with 92% reliability (Broen, 1972) This was encoded as a single unit that was activated at the end of all but 8% of the sentences Other semireliable prosodic and phonological cues accompanied the phrase-structure grammar: word length, stress, and pitch Network groups were constructed using different combinations of these three cues Cassidy and Kelly (1991) demonstrated that syllable count is a cue available to English speakers to distinguish nouns and verbs They found that the
Trang 12probability of a single syllable word to be a noun rather than a verb is 38% This probability rises
to 76% at two syllables, and 92% at three We selected verb and noun tokens that exhibited this distinction, whereas the length of the remaining words was typical for their class (i.e., function words tended to be monosyllabic) Word length was represented in terms of three units using thermometer encoding—that is, one unit would be on for monosyllabic words, two for bisyllabic words, and three for trisyllabic words Pitch change is a cue associated with syllables that
precede pauses Fisher and Tokura (1996) found that these pauses signaled grammatically
distinct utterances with 96% accuracy in child-directed speech, allowing pitch to serve as a cue
to grammatical structure In the networks, this cue was a single unit that would be activated at the final word in an utterance Finally, we used a single unit to encode lexical stress as a possible cue to distinguish stressed content words from the reduced, unstressed form of function words This unit would be on for all content words
INSERT TABLE 5.1 ABOUT HERE
Procedure
Eight groups of networks, one for each combination of cues (all cues, 2 cues, 1 cue, or none), were trained on corpora consisting of 10,000 sentences generated from the grammar Each
network within a group was trained on a different randomized training corpus Training consisted
of 200,000 input/output presentations (words), or approximately 5 passes through the training corpus Each group of networks had cues added to its training corpus depending on cue
condition Networks were expected to predict the next word in a sentence, along with the
appropriate cue values A corpus consisting of 1,000 novel sentences was generated for testing Performance was measured by assessing the networks’ ability to predict the next set of
grammatical items given prior context Importantly, this measure did not include predictions of
Trang 13cue information, and all network conditions were thus evaluated by exactly the same
Results
After training, SRNs trained with localist output representations will produce a distributional pattern of activation closely corresponding to a probability distribution of possible next items In order to assess the overall performance of the SRNs, we made comparisons between network output probabilities and the full conditional probabilities given the prior context For example,
the full conditional probabilities given the context of “The boy chases ” can be represented as a
vector containing the probabilities of being the next item in this sentence for each of the 44 words in the vocabulary and the pause To ensure that our performance measure can deal with novel test sentences not seen during training, we estimate the prior conditional probabilities based on lexical categories rather than individual words (Christiansen & Chater, 1999) Suppose,
in the example above, that every continuation of this sentence fragment in the training corpus
always involved the indefinite determiner “a” (as in “The boy chases a cat”) If we did not base
our full conditional probability estimates on lexical categories, we would not be able to assess
SRN performance on novel sentences in which the definite determiner “the” followed the
Trang 14example fragment (as in “The boy chases the cat”’) Formally, we thus have the following
Equation 1 with c i denoting the category of the ith word in the sentence:
(5.1)
where the probability of getting some member of a given lexical category as the pth item, c p, in a
sentence is conditional on the previous p–1 lexical categories Note that for the purpose of
performance assessment, singular and plural nouns are assigned to separate lexical categories throughout Simulations 1–4, as are singular and plural verbs Given that the choice of lexical items for each category is independent, and that each word in a category is equally frequent, the
probability of encountering a particular word w n , which is a member of a category c p, is simply
inversely proportional to the number of items, C p, in that category So, overall, we have the following equation:
(5.2)
If the networks are performing optimally, then the vector of output unit activations should
exactly match these probabilities We evaluate the degree to which each network performs
successfully by measuring the mean squared error between the vectors representing the
network’s output and the conditional probabilities (with 0 indicating optimal performance)
All networks achieved better performance than the standard bigram/trigram models values < 0001), suggesting that the networks had acquired knowledge of syntactic structure
(p-beyond the information associated with simple pairs or triples of words Figure 5.2A illustrates the best performance achieved by the trigram model as well as SRNs provided with no cues (the baseline network), a single cue (length, stress, or prosody), and three cues The nets provided
Trang 15with one or more phonological/prosodic cues achieved significantly better performance than
baseline networks (p-values < 02) Using trigram performance as criterion, all multiple-cue
networks surpassed this level of performance faster than the baseline networks as shown in
Figure 5.2B (p-values < 002) Moreover, the three-cue networks were significantly faster than the single-cue networks (p-values < 001) Finally, using Brown-Forsyth tests for variability in
the final level of performance, we found that the three-cue networks also exhibited significantly
more uniform learning than the baseline networks (F(1,18) = 5.14, p < 04), as depicted in Figure
5.2C
INSERT FIGURE 5.2 ABOUT HERE
SIMULATION 2: SENTENCE COMPREHENSION IN 2-YEAR-OLDS
Simulation 1 provides evidence for the general feasibility of multiple-cue integration for
supporting syntax learning To further demonstrate the relevance of the model to language
development, closer contact with human data is needed (Christiansen & Chater, 2001) In the current simulation, we demonstrate that the three-cue networks from Simulation 1 are able to accommodate experimental data showing that 2-year-olds can integrate grammatical markers (function words) and prosodic cues in sentence comprehension (Shady & Gerken, 1999:
Experiment 1) In this study, children heard sentences, such as (1) [see below], in one of three prosodic conditions depending on pause location: early natural [e], late natural [l], and unnatural
[u] Each sentence moreover involved one of three grammatical markers: grammatical (the), ungrammatical (was), and nonsense (gub)
1 Find [e] the/was/gub [u] dog [l] for me
The child’s task was to identify the correct picture corresponding to the target noun (dog)
Children performed the task best when the pause location delimited a phrasal boundary
Trang 16(early/late), and with the grammatical marker the Simulation 2 models these data by using
comparable stimuli and assessing noun unit activations
Method
Networks
Twelve three-cue networks of the same architecture and training used in Simulation 1 were used
in each prosodic condition in the infant experiment This number was chosen to match the
number of infants in the Shady and Gerken (1999) experiment An additional unit was added to
the networks to encode the nonsense word (gub) in Shady and Gerken’s experiment
Materials
We constructed a sample set of sentences from our grammar that could be modified to match the stimuli in Shady and Gerken Twelve sentences for each prosody condition (pause location) were constructed Pauses were simulated by activating the utterance-boundary unit Because these pauses probabilistically signal grammatically of distinct utterances, the utterance-boundary unit provides an approximation of what the children in the experiment would experience Finally, the nonsense word was added to the stimuli for the within group condition (grammatical vs
ungrammatical vs nonsense) Adjusting for vocabulary differences, the networks were tested on comparable sentences, such as (2):
2 Where does [e] the/is/gub [u] dog [l] eat?
Procedure
Each group of networks was exposed to the set of sentences corresponding to its assigned pause location (early vs late vs unnatural) No learning took place, since the fully trained networks were used To approximate the picture selection task in the experiment, we measured the degree
Trang 17to which the networks would activate the groups of nouns following the/is/gub The two
conditions were expected to affect the activation of the nouns
Results
The human results for the prosody condition in Shady and Gerken (1999) is depicted in Figure 5.3A They reported a significant effect of prosody on the picture selection task The same was
true for our networks (F(2,33) = 1,253.07, p < 0001), and the pattern of noun activations closely
resembles that of the toddlers’ correct picture choice as evidenced by Figure 5.3B The late natural condition elicited the highest noun activation, followed by the early natural condition, and with the unnatural condition yielding the least activation The experiment also revealed an effect of grammaticality as can be seen from the human data shown in Figure 5.3C We similarly
obtained a significant grammaticality effect for our networks (F(2,70) = 69.85, p < 0001),
which, as illustrated by Figure 5.3D, produced the highest noun activation following the
determiner, followed by the nonsense word, and lastly for the ungrammatical word Again, the network results match the pattern observed for the toddlers One slight discrepancy is that the networks are producing higher noun activation following the nonsense word compared to the ungrammatical marker This result is however consistent with the results from a more sensitive picture selection task, showing that children were more likely to end up with a semantic
representation of the target following nonsense syllables compared to incorrectly used
morphemes (Carter & Gerken, 1996) Thus, the results suggest that the syntactic knowledge acquired by the networks mirrors the kind of sensitivity to syntactic relations and prosodic
content observed in human children Together with Simulation 1, the results also demonstrate that multiple-cue integration may both facilitate syntax acquisition, and underlie some patterns of linguistic skill observed early on in human performance In the next simulation, we show that the
Trang 18multiple-cue perspective can simulate possible prosodic scaffolding that occurs much earlier in development: prenatal attunement to prosody
INSERT FIGURE 5.3 ABOUT HERE
SIMULATION 3: THE ROLE OF PRENATAL EXPOSURE
Studies of 4-day-old infants suggest that the attunement to prosodic information may begin prior
to birth (Mehler et al., 1988) We suggest that this prenatal exposure to language may provide a scaffolding for later syntactic acquisition by initially focusing learning on certain aspects of prosody and gross-level properties of phonology (such as word length) that later will play an important role in postnatal multiple-cue integration In the current simulation, we test this
hypothesis using the connectionist model from Simulations 1 and 2 If this scaffolding
hypothesis is correct, we would expect that prenatal exposure corresponding to what infants receive in the womb would result in improved acquisition of syntactic structure
Trang 19Procedure
The networks in the prenatal group were first trained on 100,000 input/output filtered
presentations drawn from a corpus of 10,000 new sentences Following this prenatal exposure, the nets were then trained on the full input patterns exactly as in Simulation 1 The nonprenatal group only received training on the postnatal corpora As previously, networks were required to predict the following word and corresponding cues Performance was again measured by the prediction of following words, ignoring the cue units
Results
Both network groups exhibited significantly higher performance than the bigram/trigram models
(F(1,18) = 25.32, p < 0001 for prenatal, F(1,18) = 12.03, p < 01 for nonprenatal), again
indicating that the networks are acquiring complex grammatical regularities that go beyond simple adjacency relations We compared the performance of the two network groups across different degrees of training using a two-way analysis of variance with training condition
(prenatal vs nonprenatal) as the between-network factor and amount of training as
within-network factor (five levels of training measured in 20,000 input/output presentation intervals)
There was a main effect of training condition (F(1,18) = 12.36, p < 01), suggesting that prenatal exposure significantly improved learning A main effect of degrees of training (F(9,162) = 15.96,
p < 001) reveals that both network groups benefited significantly from training An interaction between training conditions and degrees of training indicates that the prenatal networks learned
significantly better than postnatal networks (F(1,18) = 9.90, p < 01) Finally, as illustrated by
Figure 5.4, prenatal input also resulted in faster learning (measured in terms of the amount of
training needed to surpass the trigram model; F(1,18) = 9.90, p < 01) The exposure to prenatal
input—void of any information about individual words—promotes better performance on the
Trang 20prediction task as well as faster learning overall This provides computational support for the prenatal scaffolding hypothesis, derived as a prediction from the multiple-cue perspective on syntax acquisition
INSERT FIGURE 5.4 ABOUT HERE
SIMULATION 4: MULTIPLE-CUE INTEGRATION WITH USEFUL AND DISTRACTING CUES
So far, simulations have demonstrated the importance of cue integration in syntax acquisition, that integration can match data obtained in infant experiments, and that this perspective can provide novel predictions in language development A possible objection to these simulations is that our networks succeed at multiple-cue integration because they are only provided with cues that are at least partially relevant for syntax acquisition Consequently, performance may
potentially drop significantly if the networks themselves had to discover which cues were
partially relevant and which are not Simulation 4 therefore tests the robustness of our cue approach when faced with additional, uncorrelated distractor cues Accordingly, we added three distractor cues to the previous three reliable cues These new cues encoded the presence of word-initial vowels, word-final voicing, and relative (male/female) speaker pitch—all
multiple-acoustically salient in speech, but which do not appear to cue syntactic structure
Method
Networks
Networks, groups, and training details were the same as in Simulation 3, except for three
additional input units encoding the distractor cues