To allow such finite copying constructions to be taken into account in formal modeling, it is necessary to recognize t h a t natural languages cannot be realistically represented by form
Trang 1COPYING IN N A T U R A L L A N G U A G E S , C O N T E X T - F R E E N E S S , A N D Q U E U E G R A M M A R S
A l e x i s M a n a s t e r - R a m e r
U n i v e r s i t y o f M i c h i g a n
2236 F u l l e r Road # 1 0 8
A n n Arbor, MI 4 8 1 0 5
A B S T R A C T
The documentation of (unbounded-len~h) copying and
cross-serial constructions in a few languages in the recent
literature is usually taken to mean that natural languages
are slightly context-sensitive However, this ignores those
copying constructions which, while productive, cannot be
easily shown to apply to infinite sublanguages To allow such
finite copying constructions to be taken into account in formal
modeling, it is necessary to recognize t h a t natural languages
cannot be realistically represented by formal languages of the
usual sort Rather, they must be modeled as families of
formal languages or as formal languages with indefinite
vocabularies Once this is done, we see copying as a truly
pervasive and fundamental process in human language
Furthermore, the absence of mirror-image constructions in
human languages means that it is not enough to extend
Context-free Grammars in the direction of context-sensitivity
Instead, a class of g r a m m a r s must be found which handles
(context-sensitive) copying but not (context-free) mirror
images This suggests that human linguistic processes use
queues rather than stacks, making imperative the
development of a hierarchy of Queue Grammars as a
counterweight to the Chomsky Grammars A simple class of
Context-free Queue Grammars is introduced and discussed
I n t r o d u c t i o n
The claim that at least some human languages cannot
be described by a Context-free G r a m m a r no matter how large
or complex has had an interesting career In the late 1960's
it might have seemed, given the arguments of Bar-Hillel and
Shamir (1960) about r e s p e c t i v e l y coordinations in English,
Postal (1964) about reduplication-cum-incorporation of object
noun stems in Mohawk, and Chomsky (1963) about English
comparative deletion, that this claim was firmly established
Potentially serious and at any rate embarrassing
problems with both the formal and the linguistic aspects of
these arguments kept popping up, however (Daly, 1974;
Levelt, 1974), and the partial fixes provided by Brandt
Corstius (as reported in Levelt, 1974) for the r e s p e c t i v e l y
arguments and by Langendoen (1977) for that as well as the
Mohawk argument did not deter Pullum and Gazdar (1982)
from claiming t h a t "it seems reasonable to assume that the
natural languages are a proper subset of the infinite-
cardinality CFL's, until such time as they are validly shown
not to be" Two new arguments, Higginbotham's (1984) one
involving s u c h t h a t relativization and Postal and
Langendoen's (1984) one about sluicing were dismissed on
grounds of descriptive inadequacy by Pullum (1984a), who,
however, suggested that the Langendoen and Postal (1984)
argument about the doubling relativization construction may
be correct (all these arguments deal with English)
Pullum (1984b) likewise heaped scorn on my argument
t h a t English reshmuplicative constructions show non-CFness, but he accepted (1984a; 1984b) Culy's (1985) argument about noun reduplication in Bambara and Shieber's (1985)
one about Swiss German cross-serial constructions of causative and perception verbs and their objects Gazdar and Pullum (1985) also cite these two, as well as an argument by Carlson (1983) about verb phrase reduplication in Engenni They also refer to m y discovery of the X o r n o X
construction in English I and mention t h a t "Alexis Manaster- Ramer in unpublished lectures finds reduplication constructions t h a t appear to have no length bound in Polish, Turkish, and a number of other languages" While they do not refer to my 1983 reshmuplication argument, which they presumably still reject, the Turkish construction they allude
to was cited in my 1983 paper and is similar to the English reshmuplication in form as well as function (see below)
In any case, the acceptance of even one case of non- CFness in one natural language by the only active advocates
of the CF position would seem to suffice to remove the issue from the agenda Any additional arguments, such as Kac (to appear), Kac, Manaster-Ramer, and Rounds (to appear), and Manaster-Ramer (to appear a; to appear b) may appear to be
no more than flogging of dead horses However, as I argued
in Manaster-Ramer (1983) and as recent work (Manaster- Ramer, to appear a; Rounds, Manaster-Ramer, and Friedman, to appear) shows ever more clearly, this conception of the issue (viz., Is there one natural languages that is weakly noncontext-free?) makes very little difference and not much sense
First of all, if non-CFness is so hard to find, then it is presumably linguistically marginal Second, weak generative arguments cannot be made to work for natural languages, because of their high degree of structural ambiguity and the great difficulty in excluding every conceivable interpretation
on which an apparently ungrammatical string might turn
o u t - o n reflection to be in the language Third, weak generative capacity is in any case not a very interesting property of a formal grammar, especially from a linguistic point of view, since linguistic models are judged by other criteria (e.g., natural languages might well be regular without this making CFGs any the more attractive as models for them) Fourth, results about the place of natural languages
in the Chomsky Hierarchy seem to be should be considered in light of the fact that there is no reason to take the Chomsky Hierarchy as the appropriate formal space in which to look for them Fifth, models of natural languages that are actually in use in theoretical, computational, and descriptive linguistics are - a n d always have been only remotely related to the Chomsky Grammars, which means that results about the latter may be of little relevance to linguistic models
Trang 2A s I a r g u e d in 1983, we should go beyond piecemeal
d e b u n k i n g of invalid a r g u m e n t s a g a i n s t C F G s a n d b y t h e
s a m e token it s e e m s to m e t h a t we m u s t go beyond piecemeal
r e s t a t e m e n t s of s u c h a r g u m e n t s R a t h e r , we should focus on
g e n e r a l i s s u e s a n d ones t h a t h a v e implications for t h e
modeling of h u m a n l a n g u a g e s O n e s u c h i s s u e is, it s e e m s to
m e , t h e kind of c o n t e x t - s e n s i t i v i t y found in n a t u r a l
l a n g u a g e s It a p p e a r s t h a t t h e c o u n t e r e x a m p l e s to context-
f r e e n e s s a r e all r a t h e r similar Specifically, t h e y all s e e m to
involve s o m e kind of c r o s s - s e r i a l d e p e n d e n c y , i.e., a
d e p e n d e n c y b e t w e e n t h e n t h e l e m e n t s of two or m o r e
s u b s t r i n g s T h i s - - u n l i k e t h e s t a t e m e n t t h a t n a t u r a l
l a n g u a g e s a r e n o n c o n t e x t - f r e e - - m i g h t m e a n s o m e t h i n g if we
k n e w w h a t k i n d s of models w e r e a p p r o p r i a t e for cross-serial
dependencies G i v e n t h a t n o t e v e r y kind of c o n t e x t - s e n s i t i v e
c o n s t r u c t i o n is found in h u m a n l a n g u a g e s , it should be clear
t h a t t h e r e is n o t h i n g to be gained by invoking the dubious
s l o g a n of c o n t e x t - s e n s i t i v i t y
A n o t h e r r e l e v a n t q u e s t i o n is t h e c e n t r a l i t y or
p e r i p h e r a l i t y of t h e s e c o n s t r u c t i o n s in n a t u r a l l a n g u a g e s
T h e r e l e v a n t l i t e r a t u r e m a k e s it a p p e a r t h a t t h e y a r e
s o m e w h a t m a r g i n a l a t best T h i s would explain t h e t o r t u r e d
h i s t o r y of t h e a t t e m p t s to s h o w t h a t t h e y e x i s t a t all
H o w e v e r , t h i s a p p e a r s to be w r o n g , a t l e a s t w h e n we
consider copying c o n s t r u c t i o n s T h e r e q u i r e m e n t of full or
n e a r identity of two or m o r e s u b p a r t s of a s e n t e n c e (or a
discourse) is a v e r y w i d e s p r e a d p h e n o m e n o n In this p a p e r , I
will focus on t h e copying c o n s t r u c t i o n s precisely b e c a u s e t h e y
a r e so c o m m o n in h u m a n l a n g u a g e s
In addition to s u c h q u e s t i o n s , w h i c h a p p e a r to focus on
t h e linguistic side of t h i n g s , t h e r e a r e also t h e m o r e
m a t h e m a t i c a l a n d conceptual p r o b l e m s involved in t h e whole
e n t e r p r i s e of m o d e l i n g h u m a n l a n g u a g e s in formal t e r m s
M y o w n belief is t h a t both k i n d s of i s s u e s m u s t be solved in
t a n d e m , since we c a n n o t know w h a t kind of formal models we
w a n t until we k n o w w h a t we a r e going to model, and we
c a n n o t know w h a t h u m a n l a n g u a g e s a r e or a r e n o t like until
we k n o w h o t , to r e p r e s e n t t h e m a n d w h a t to c o m p a r e t h e m
to T h i s p a p e r is intended as a contribution to this kind of
work
C o p y i n g D e p e n d e n c i e s
T h e e x a m p l e s of copying (and other) c o n s t r u c t i o n s w h i c h
h a v e figured in t h e g r e a t c o n t e x t - f r e e n e s s d e b a t e h a v e all
involved a t t e m p t s to s h o w t h a t a whole ( n a t u r a l ) l a n g u a g e is
n o n c o n t e x t free Now, while it is often e a s y to find a
n o n c o n t e x t - f r e e s u b s e t of s u c h a l a n g u a g e , it is not a l w a y s
possible to isolate t h a t s u b s e t formally f r o m t h e r e s t of t h e
l a n g u a g e in s u c h a w a y a s to s h o w t h a t t h e l a n g u a g e a s a
whole is noncontext-free T h e r e is so m u c h a m b i g u i t y in
n a t u r a l l a n g u a g e s t h a t it is strictly s p e a k i n g impossible to
isolate a n y c o n s t r u c t i o n a t the level of s t r i n g s , t h u s
i n v a l i d a t i n g all a r g u m e n t s a g a i n s t C F G s or e v e n R e g u l a r
G r a m m a r s t h a t refer to w e a k g e n e r a t i v e c a p a c i t y H o w e v e r ,
t h e a r g u m e n t s c a n be r e c o n s t r u c t e d b y m a k i n g u s e of t h e
notion of classificatory c a p a c i t y of f o r m a l g r a m m a r s ,
introduced in M a n a s t e r - R a m e r (to a p p e a r a) a n d M a n a s t e r -
R a m e r a n d R o u n d s (to appear) The classificatory c a p a c i t y is
t h e set of l a n g u a g e s g e n e r a t e d by t h e v a r i o u s s u b g r a m m a r s
of a g r a m m a r , a n d if we a r e willing to a s s u m e t h a t l i n g u i s t s
c a n tell w h i c h s e n t e n c e s in a l a n g u a g e e x e m p l i f y t h e s a m e or
different s y n t a c t i c p a t t e r n s , t h e n we c a n u s u a l l y s i m p l y
d e m o n s t r a t e t h a t , e.g., no CFG c a n h a v e a s u b g r a m m a r
g e n e r a t i n g all and only t h e s e n t e n c e s of s o m e p a r t i c u l a r
c o n s t r u c t i o n if t h a t c o n s t r u c t i o n involves reduplication T h i s
will s h o t ' the i n a d e q u a c y of C F G s , e v e n if t h e s t r i n g s e t a s a
a p p r o a c h holds t h a t it is impossible to d e t e r m i n e w i t h a n y confidence t h a t a p a r t i c u l a r s t r i n g q u a s t r i n g is
u n g r a m m a t i c a l , b u t t h a t it m a y be possible to tell one
c o n s t r u c t i o n from a n o t h e r , a n d t h a t t h e l a t t e r - - a n d n o t the
f o r m e r - - i s t h e real b a s i s of all linguistic work, theoretical,
c o m p u t a t i o n a l , and descriptive
F i n i t e C o p y i n g
T h e c o u n t e r e x a m p l e s to c o n t e x t - f r e e n e s s in t h e
l i t e r a t u r e h a v e all b e e n claimed to crucially involve
e x p r e s s i o n s of u n b o u n d e d length T h i s s e e m e d n e c e s s a r y in view of t h e fact t h a t a n u p p e r b o u n d on l e n g t h would i m p l y
f i n i t e n e s s of t h e s u b s e t of s t r i n g s involved, w h i c h would a s a
r e s u l t be of no f o r m a l l a n g u a g e theoretic i n t e r e s t H o w e v e r , it
is often difficult to m a k e a c a s e for u n b o u n d e d length, a n d t h e
m a i n r e s u l t h a s been t h a t , e v e n t h o u g h e v e r y l i n g u i s t k n o w s
a b o u t reduplication, it s e e m e d n e a r l y i m p o s s i b l e to find a n
i n s t a n c e of reduplication t h a t could be u s e d to m a k e a formal
a r g u m e n t a g a i n s t C F G s , e v e n t h o u g h no one would e v e r u s e
a C F G to describe reduplication
For, in addition to reduplications t h a t c a n apply to
u n b o u n d e d l y long e x p r e s s i o n s , t h e r e is a m u c h b e t t e r k n o w n
c l a s s of reduplications exemplified b y I n d o n e s i a n pluralization of n o u n s H e r e it is difficult to s h o w t h a t t h e reduplicated f o r m s a r e infinite in n u m b e r , b e c a u s e c o m p o u n d
n o u n s a r e n o t pluralized in t h e s a m e w a y , a n d ignoring
c o m p o u n d i n g , it would s e e m t h a t t h e n u m b e r of fiouns is finite H o w e v e r , t h i s n u m b e r is v e r y l a r g e a n d m o r e o v e r it is
p r o b a b l y n o t well defined T h e class of n o u n s t e m s is open,
a n d c a n be enriched b y b o r r o w i n g f r o m foreign l a n g u a g e s a n d
n e o l o g i s m s , a n d all of t h e s e s p o n t a n e o u s l y pluralize by reduplication
R o u n d s , M a n a s t e r - R a m e r , a n d F r i e d m a n (to a p p e a r )
a r g u e t h a t facts like t h i s m e a n t h a t a n a t u r a l l a n g u a g e should n o t be modeled a s a f o r m a l l a n g u a g e b u t r a t h e r a s a
f a m i l y of l a n g u a g e s , e a c h of w h i c h m a y be t a k e n a s a n
a p p r o x i m a t i o n to a n ideal l a n g u a g e I n t h e c a s e before u s ,
we could a r g u e t h a t e a c h of t h e a p p r o x i m a t i o n s h a s only a finite n u m b e r of n o u n s , for e x a m p l e , b u t a d i f f e r e n t n u m b e r
in d i f f e r e n t a p p r o x i m a t i o n s T h i s idea, related to t h e w o r k of Yuri G u r e v i c h on finite d y n a m i c models of c o m p u t a t i o n , allows u s to s t a t e t h e a r g u m e n t t h a t t h e e x i s t e n c e of a n open
c l a s s of reduplications is sufficient to s h o w t h e i n a d e q u a c y of
C F G s for t h a t f a m i l y of a p p r o x i m a t i o n s T h e b a s i s of t h e
a r g u m e n t is the o b s e r v a t i o n t h a t while e a c h of t h e
a p p r o x i m a t e l a n g u a g e s could in principle h a v e a C F G , e a c h
s u c h C F G would differ f r o m t h e n e x t n o t only in t h e addition
of a n e w lexical i t e m b u t also in t h e addition of a n e w reduplication rule (for t h a t p a r t i c u l a r item)
To c a p t u r e w h a t is really going on, we r e q u i r e a
g r a m m a r t h a t is t h e s a m e for e a c h a p p r o x i m a t i o n modulo t h e lexicon T h i s g r a m m a r in a s e n s e g e n e r a t e s t h e infinite ideal,
b u t a c t u a l l y e a c h actual a p p r o x i m a t e g r a m m a r only h a s a finite lexicon a n d h e n c e a c t u a l l y only g e n e r a t e s a finite
n u m b e r of reduplications In order to model t h e flexibility of
t h e n a t u r a l l a n g u a g e v o c a b u l a r y , we a s s u m e t h a t e a c h
m e m b e r of t h e f a m i l y h a s t h e s a m e g r a m m a r modulo t h e
t e r m i n a l v o c a b u l a r y a n d t h e r u l e s w h i c h i n s e r t t e r m i n a l s
A n o t h e r w a y of s t a t i n g t h i s is t h a t t h e lexicon of
I n d o n e s i a n is finite b u t of a n indefinite size ( w h a t G u r e v i c h calls " u n c o u n t a b l y finite") A C F G would still h a v e to contain
a s e p a r a t e rule for t h e plural of e v e r y n o u n a n d henc, would h a v e to be of a n indefinite size T h u s , with
Trang 3n e w rule However, this would m e a n t h a t t h e g r a m m a r a t
a n y given t i m e can only f o r m t h e plurals of n o u n s t h a t h a v e
a l r e a d y been learned Since s p e a k e r s of t h e l a n g u a g e know
in a d v a n c e how to pluralize u n f a m i l i a r n o u n s , t h i s c a n n o t be
true R a t h e r the g r a m m a r at a n y given time m u s t be able to
f o r m plurals of n o u n s t h a t h a v e not y e t been learned T h i s in
t u r n m e a n s t h a t a n indefinite n u m b e r of p l u r a l s c a n be
formed by a g r a m m a r of a d e t e r m i n a t e finite size Hence, in
effect, t h e n u m b e r of r u l e s for plural f o r m a t i o n m u s t be
s m a l l e r t h a n the n u m b e r of plural f o r m s t h a t c a n be
g e n e r a t e d , a n d this in t u r n m e a n s t h a t t h e r e is no CFG of
I n d o n e s i a n
T h i s brings up a crucial issue, of which we are all
p r e s u m a b l y a w a r e b u t w h i c h is u s u a l l y lost s i g h t of in
practice, n a m e l y , t h a t t h e w a y a m a t h e m a t i c a l model (in this
case, formal l a n g u a g e theory) is applied to a p h y s i c a l or
m e n t a l d o m a i n (in this case, n a t u r a l l a n g u a g e ) is a m a t t e r of
utility a n d not itself subject to proof or disproof F o r m a l
l a n g u a g e t h e o r y deals with s e t s of s t r i n g s over well-defined
finite vocabularies (also often called a l p h a b e t s ) s u c h a s t h e
h a c k n e y e d {a, b} It h a s been all too e a s y to fall into t h e t r a p
of e q u a t i n g t h e f o r m a l l a n g u a g e theoretic notion of
v o c a b u l a r y (alphabet) w i t h the linguistic notion of v o c a b u l a r y
and likewise to confuse t h e formal l a n g u a g e theoretic notion
of a s t r i n g (word) over t h e v o c a b u l a r y (alphabet) with t h e
linguistic notion of s e n t e n c e
H o w e v e r , t h e f u n d a m e n t a l fact a b o u t all k n o w n n a t u r a l
l a n g u a g e s is t h e o p e n n e s s of a t l e a s t s o m e c l a s s e s of w o r d s
(e.g., n o u n s b u t p e r h a p s not p r e p o s i t i o n s or, in s o m e
l a n g u a g e s , verbs), w h i c h c a n acquire n e w m e m b e r s t h r o u g h
borrowing or t h r o u g h v a r i o u s p r o c e s s e s of n e w f o r m a t i o n ,
m a n y of t h e m a p p a r e n t l y not rule-governed, a n d w h i c h c a n
also lose m e m b e r s , a s w o r d s a r e forgotten T h u s , t h e well-
defined finite v o c a b u l a r i e s of formal l a n g u a g e t h e o r y a r e not
a v e r y good model of the v o c a b u l a r i e s of n a t u r a l l a n g u a g e s
W h e t h e r we decide to introduce t h e notion of families of
l a n g u a g e s or t h a t of u n c o u n t a b l y finite s e t s or w h e t h e r we
r a t h e r choose to s a y t h a t the v o c a b u l a r y of a n a t u r a l
l a n g u a g e is really infinite (being the s e t of all s t r i n g s over the
s o u n d s or letters of the l a n g u a g e t h a t could conceivably be or
become lexical i t e m s in it), we end up h a v i n g to conclude t h a t
a n y l a n g u a g e w h i c h productively r e d u p l i c a t e s s o m e open
word class to form s o m e g r a m m a t i c a l c a t e g o r y c a n n o t h a v e a
CFG
Copying in English
It should now be noted t h a t reduplications (and
r e i t e r a t i o n s generally) are e x t r e m e l y c o m m o n in n a t u r a l
l a n g u a g e s J u s t how c o m m o n follows f r o m a n inspection of
the bewildering v a r i e t y of s u c h c o n s t r u c t i o n s t h a t a r e found
in English All t h e e x a m p l e s cited h e r e are productive t h o u g h
t h e y m a y be of b o u n d e d length
Linguistics s h m i n g u i s t i c s
L i n g u i s t i c s or no linguistics, (I a m going home)
A dog is a dog is a dog
Philosophize while t h e philosophizing is good!
Moral is as m o r a l does
Is s h e beautiful or is s h e beautiful?
T h e s e a r e clause-level c o n s t r u c t i o n s , b u t we also find
o n e s restricted to t h e p h r a s e level
(He) deliberates, deliberates, deliberates (all d a y long) (He worked slowly) t h e o r e m by t h e o r e m
(They form) a c h u r c h within a c h u r c h (He d e b u n k s ) t h e o r y after t h e o r y Also r e l e v a n t a r e c a s e s w h e r e a copying d e p e n d e n c y
e x t e n d s a c r o s s s e n t e n c e b o u n d a r i e s , a s in d i s c o u r s e s like: A: She is fat
B: She is fat, m y foot
It is i n t e r e s t i n g t h a t s e v e r a l of t h e s e t y p e s a r e productive e v e n t h o u g h t h e y a p p e a r to be b a s e d on w h a t originally m u s t h a v e been m o r e restricted, idiomatic
e x p r e s s i o n s T h e p a t t e r n a X within a X, for e x a m p l e , is
s u r e l y derived f r o m t h e single e x a m p l e a state within a state,
y e t h a s become quite productive
M a n y of t h e s e p a t t e r n s h a v e a n a l o g u e s in o t h e r
l a n g u a g e s For e x a m p l e , the X after X c o n s t r u c t i o n a p p e a r s
to involve quantification and this m a y be related to t h e fact
t h a t , for e x a m p l e , B a m b a r a u s e s reduplication to m e a n ' w h a t e v e r ' and S a n s k r i t to m e a n ' e v e r y ' (P~nini 8.1.4) English r e s h m u p l i c a t i o n h a s close a n a l o g u e s in m a n y
l a n g u a g e s , including t h e whole D r a v i d i a n and Turkic
l a n g u a g e families T a m i l kiduplication (e.g pustakam kistakarn) a n d T u r k i s h meduplication (e.g., kitap mitap) a r e
i n s t a n c e s of this, t h o u g h the s e m a n t i c r a n g e is s o m e w h a t different I n both of these, the s e n s e is m o r e like t h a t of
English books and things, books and such, i.e., a combination
of deprecation and e t c e t e r a n e s s r a t h e r t h a n t h e p u r e l y
derisive function of E n g l i s h books shmoohs The English X or
no X p a t t e r n is v e r y similar to a Polish construction consisting of the f o r m X (nominative) X ( i n s t r u m e n t a l ) in its r a n g e of applications The repetition of a verb or verbal
p h r a s e to d e p r e c a t e excessive repetition or i n t e n s i t y of a n action s e e m s to be found in m a n y l a n g u a g e s as well
I h a v e not tried here to s u r v e y t h e u s e s to w h i c h copying
c o n s t r u c t i o n s a r e p u t in different l a n g u a g e s or e v e n to
d o c u m e n t fully their wide incidence, t h o u g h the e x a m p l e s cited should give s o m e indication of both It does a p p e a r t h a t copying c o n s t r u c t i o n s a r e e x t r e m e l y c o m m o n a n d p e r v a s i v e ,
a n d this in t u r n s u g g e s t s t h a t t h e y a r e central to m a n ' s linguistic faculties W h e n we consider s u c h additional facts
as the f r e q u e n c y of copying in child l a n g u a g e , we m a y be
t e m p t e d to t a k e copying a s one of the basic linguistic operations
C o p i e s v s m i r r o r images
T h e e x i s t e n c e a n d t h e c e n t r a l i t y of copying c o n s t r u c t i o n s poses i n t e r e s t i n g q u e s t i o n s t h a t go beyond the i n a d e q u a c y of
C F G s For e x a m p l e , w h y should n a t u r a l l a n g u a g e s h a v e reduplications w h e n t h e y lack m i r r o r - i m a g e c o n s t r u c t i o n s , which are context-free? T h i s a s y m m e t r y (first noted in
M a n a s t e r - R a m e r a n d Kac, 1985, a n d Rounds, M a n a s t e r -
R a m e r , and F r i e d m a n op cit.) a r g u e s t h a t it is not e n o u g h to
m a k e a s m a l l concession to c o n t e x t - s e n s i t i v i t y , as t h e s a y i n g goes R a t h e r t h a n g r u d g i n g l y c l a m b e r i n g up t h e C h o m s k y
H i e r a r c h y t o w a r d s C o n t e x t - s e n s i t i v e G r a m m a r s , we should consider going back down to R e g u l a r G r a m m a r s a n d s t r i k i n g
Trang 4o u t in a different direction T h e s i m p l e s t a l t e r n a t i v e proposal
is a class of g r a m m a r s w h i c h intuitively h a v e t h e s a m e
relation to q u e u e s t h a t C F G s h a v e to stacks T h e idea, ~vhich
I owe to Michael Kac, would be t h a t h u m a n linguistic
p r o c e s s e s m a k e little if a n y u s e of s t a c k s and e m p l o y q u e u e s
instead
Q u e u e G r a m m a r s
T h i s s u g g e s t s t h a t C F G s a r e n o t j u s t i n a d e q u a t e a s
models of n a t u r a l l a n g u a g e s b u t i n a d e q u a t e in a p a r t i c u l a r l y
d a m a g i n g w a y T h e y a r e not e v e n the r i g h t point of
d e p a r t u r e , since t h e y n o t only u n d e r g e n e r a t e b u t also
o v e r g e n e r a t e T h i s leads to t h e idea of a h i e r a r c h y of
g r a m m a r s w h o s e relation to q u e u e s is like t h a t of t h e
C h o m s k y G r a m m a r s to s t a c k s A q u e u e - b a s e d a n a l o g u e to
C F G is being developed, u n d e r t h e n a m e of C o n t e x t - f r e e
Q u e u e G r a m m a r T h e c u r r e n t v e r s i o n is allowed r u l e s of
t h e following form:
A - > a
A - - > aB
A - - > a B b
A - - > a b
A - - > .B
W h a t e v e r a p p e a r s to t h e r i g h t of the t h r e e dots is p u t a t
t h e end of t h e s t r i n g b e i n g r e w r i t t e n O t h e r w i s e , all
definitions a r e a s in a c o r r e s p o n d i n g restricted CFG T h u s ,
t h e g r a m m a r
S - > a S a
S - > bS b
S - - > a a
S - - > b b
will g e n e r a t e t h e copying l a n g u a g e over {a,b} excluding t h e
null s t r i n g a n d define d e r i v a t i o n s like t h e following:
S - > a S a - > a b S a b - - > a b a a b a
S - > bSb - - > b a S b a - > b a a S b a a - - > b a a b S b a a b
O n t h e o t h e r h a n d , I conjecture t h a t t h e corresponding
xmi(x) l a n g u a g e c a n n o t be g e n e r a t e d b y s u c h a g r a m m a r
E v e n a t t h i s e a r l y s t a g e of i n q u i r y into t h e s e f o r m a l i s m s ,
t h e n , we h a v e s o m e tangible promise of being able to explain
w h y n a t u r a l l a n g u a g e s should h a v e reduplications b u t n o t
m i r r o r - i m a g e constructions V a r i o u s xh(x) c o n s t r u c t i o n s s u c h
a s the respectively o n e s a n d t h e c r o s s - s e r i a l v e r b c o n s t r u c t i o n s
c a n be h a n d l e d in t h e s a m e w a y a s reduplications
While the idea of t a k i n g q u e u e s a s opposed to s t a c k s a s
t h e principal n o n f i n i t e - s t a t e r e s o u r c e available to h u m a n
linguistic p r o c e s s e s would explain t h e p r e v a l e n c e of copying
a n d t h e a b s e n c e of m i r r o r i m a g e s , it does n o t explain t h e
coexistence of c e n t e r - e m b e d d e d c o n s t r u c t i o n s with cross-serial
o n e s or t h e relative scarcity of c r o s s - s e r i a l c o n s t r u c t i o n s o t h e r
t h a n copying ones
For t h i s r e a s o n , if for no other, t h e C F Q G s could not be
a n a d e q u a t e model of n a t u r a l l a n g u a g e In fact, t h e r e a r e
t h e y fail is t h a t t h e y a p p a r e n t l y c a n only g e n e r a t e two
c o p i e s - - o r two cross-serially d e p e n d e n t s u b s t r i n g s - - w h e r e a s
n a t u r a l l a n g u a g e s s e e m to allow m o r e (as in Grammar is grammar is grammar) T h i s is s i m i l a r to t h e limitation of
H e a d G r a m m a r s a n d Tree Adjoining G r a m m a r s to g e n e r a t i n g
no m o r e t h a n four copies ( M a n a s t e r - R a m e r to a p p e a r a)
H o w e v e r , a m o r e g e n e r a l class of Q u e u e G r a m m a r s a p p e a r s
to be w i t h i n r e a c h which will g e n e r a t e a n a r b i t r a r y n u m b e r of copies
P e r h a p s m o r e serious is t h e fact t h a t C F Q G s a p p a r e n t l y
c a n only g e n e r a t e copying c o n s t r u c t i o n s a t t h e cost of profligacy (as defined in R o u n d s , M a n a s t e r - R a m e r , a n d
F r i e d m a n , to appear) T h e r e p a i r of this defect is less obvious, b u t it a p p e a r s t h a t t h e f u n d a m e n t a l idea of b a s i n g models of n a t u r a l l a n g u a g e s on q u e u e s r a t h e r t h a n s t a c k s is
n o t u n d e r m i n e d R a t h e r , w h a t is a t i s s u e is t h e w a y in which
i n f o r m a t i o n is e n t e r e d into a n d r e t r i e v e d f r o m the queue
T h e C F Q G s s u g g e s t a p i e c e m e a l process b u t t h e
c o n s i d e r a t i o n s cited h e r e s e e m to a r g u e for a global one A
n u m b e r of f o r m a l i s m s with t h e s e p r o p e r t i e s a r e being explored
O n t h e o t h e r h a n d , it m a y be t h a t s o m e t h i n g m u c h like
t h e s i m p l e C F Q G is a n a t u r a l w a y of c a p t u r i n g c r o s s - s e r i a l
d e p e n d e n c i e s in c a s e s o t h e r t h a n copying To see e x a c t l y
w h a t is involved, consider t h e difference b e t w e e n copying a n d
o t h e r c r o s s - s e r i a l dependencies T h i s difference h a s little to
do w i t h t h e f o r m of t h e s t r i n g s R a t h e r , in t h e c a s e of o t h e r
c r o s s - s e r i a l d e p e n d e n c i e s , t h e r e is a s y n t a c t i c a n d s e m a n t i c relation b e t w e e n t h e n t h e l e m e n t s of two or m o r e s t r u c t u r e s For e x a m p l e , in ~ respectively c o n s t r u c t i o n involving a conjoined subject arid a conjoined predicate, e a c h conjunct of
t h e f o r m e r is s e m a n t i c a l l y combined w i t h t h e c o r r e s p o n d i n g conjunct of t h e latter In t h e c a s e of copying c o n s t r u c t i o n s ,
t h e r e is n o t h i n g a n a l o g o u s T h e c o r r e s p o n d i n g p a r t s of t h e two copies do n o t b e a r a n y relations to e a c h other T h u s it
m a k e s s o m e s e n s e to build up t h e c o r r e s p o n d i n g p a r t s of
c r o s s - s e r i a l c o n s t r u c t i o n in a p i e c e m e a l f a s h i o n , b u t t h i s
a p p e a r s to be inapplicable in t h e c a s e of c o p y i n g
c o n s t r u c t i o n s
I n view of all t h e s e limitations, t h e C F Q G s m i g h t s e e m
to be a n o n - s t a r t e r H o w e v e r , their i m p o r t a n c e lies in t h e fact t h a t t h e y a r e t h e first step in r e o r i e n t i n g o u r notions of
t h e f o r m a l s p a c e for m o d e l s of n a t u r a l l a n g u a g e A n y real
s u c c e s s in t h e theoretical m o d e l s of h u m a n l a n g u a g e d e p e n d s
on t h e d e v e l o p m e n t of a p p r o p r i a t e m a t h e m a t i c a l concepts a n d
on closing t h e g a p b e t w e e n formal l a n g u a g e a n d n a t u r a l
l a n g u a g e t h e o r y O n e of t h e first s t e p s in t h i s direction m u s t involve b r e a k i n g t h e spell of C F G s a n d t h e C h o m s k y
H i e r a r c h y T h e C F Q G s s e e m to be c u t o u t for t h i s t a s k Moreover, t h e idea t h a t q u e u e s r a t h e r t h a n s t a c k s a r e involved in h u m a n l a n g u a g e a p p e a r s to be correct, a n d t h i s
m o r e g e n e r a l r e s u l t is i n d e p e n d e n t of t h e l i m i t a t i o n s of
C F Q G s H o w e v e r , given m y s t a t e d goals for f o r m a l models,
it is n e c e s s a r y to develop models s u c h a s C F Q G s before proceeding to m o r e complex o n e s precisely in order to develop
a n a p p r o p r i a t e notion of f o r m a l s p a c e w i t h i n w h i c h we will
h a v e to work
T h e o t h e r m a i n point a d d r e s s e d in t h i s p a p e r , t h e n e e d
to model h u m a n l a n g u a g e s a s families of f o r m a l l a n g u a g e s or
a s formal l a n g u a g e s w i t h indefinite t e r m i n a l v o c a b u l a r i e s , is intended in t h e s a m e spirit T h e allure of identifying f o r m a l
l a n g u a g e theoretic cor~cepts with linguistic o n e s in the
s i m p l e s t possible w a y is h a r d to overcome, b u t it m u s t be if
Trang 5we are to get any meaningful results about natural languages
through the formal route It will, again, be necessary to do
more work on these concepts, but it is beginning to look as
though we have found the right direction
REFERENCES
Constituents L i n g u i s t i c C a t e g o r i e s (Frank Heny and B a r r y
Richards, eds.), 1: Categories, 69-98 Dordrecht: Reidel
Chomsky, Noam 1963 Formal Properties of
G r a m m a r s H a n d b o o k of M a t h e m a t i c a l P s y c h o l o g y
(R Duncan Luce a t al., eds.), 2: 323-418 New York: Wiley
Culy, Christopher
Vocabulary of Bambara
345-351
1985 The Complexity of the
Linguistics and Philosophy, 8:
Daly, R T 1974 A p p l i c a t i o n s o f t h e Mathematical
T h e o r y of L i n g u i s t i c s The Hague: Mouton
Gazdar, Gerald, and Geoffrey K Pullum 1985
Computationally Relevant Properties of Natural Languages
and Their Grammars New G e n e r a t i o n C o m p u t i n g , 3: 273-
306
Higginbotham, J a m e s 1984 English is not a Context-
free Language L i n g u i s t i c I n q u i r y , 15: 225-234
Kac, Michael B To appear Surface Transitivity and
Context-freeness
Kac, Michael B., Alexis Manaster-Ramer, and William
C Rounds To appear Simultaneous-distributive
Coordination and Context-freeness C o m p u t a t i o n a l
Linguistics
Langendoen, D Terence 1977 On the Inadequacy of
Type-3 and Type-2 G r a m m a r s for H u m a n
Languages Studies in Descriptive and Historical
Linguistics: F e s t s c h r i f t for W i n f r e d P L e h m a n n (Paul
Hopper, ed.), 159-171 Amsterdam: Benjamins
Langendoen, D Terence, and Paul M Postal 1984
Comments on Pullum's Criticisms CL, 8: 187-188
Levelt, W J M 1974 Formal Grammars in
Linguistics and P s y c h o l i n g u i s t i c s The Hague: Mouton
Manaster-Ramer, Alexis 1983 The Soft Formal
Underbelly of Theoretical Syntax CLS, 19: 256-262
Manaster-Ramer, Alexis To appear a Dutch as a
Formal Language Linguistics and P h i l o s o p h y
Manaster-Ramer, Alexis To appear b Subject-verb
Agreement in Respective Coordinations in English
Manaster-Ramer, Alexis, and Michael B Kac 1985
Formal Languages and Linguistic Universals Paper read a t
the Milwaukee Symposium on Typology and Universals
Postal, Paul M 1964 Limitations of Phrase Structure
Grammars T h e S t r u c t u r e of L a n g u a g e : R e a d i n g s in t h e
P h i l o s o p h y of L a n g u a g e (Jerry A Fodor and Jerrold
J Katz, eds.), 137-151 Englewood Cliffs, NJ: Prentice-Hall
Postal, Paul M., and D Terence Langendoen 1984 English and the Class of Context-free Languages CL, 10:177-181
Pullum, Geoffrey K., and Gerald Gazdar 1982 Natural Languages and Context-free Languages Linguistics and Philosophy, 4: 471-504
Pullum, Geoffrey K 1984a On Two Recent Attempts to Show t h a t English is not a CFL CL, 10: 182-186
Pullum, Geoffrey K 1984b Syntactic and Semantic Parsability P r o c e e d i n g s of COLING84, 112-122 Stanford, CA: ACL
Rounds, William C., Alexis Manaster-Ramer, and Joyce Friedman To appear Finding Natural Languages a Home in Formal Language Theory M a t h e m a t i c s o f Language
(Alexis Manaster-Ramer, ed.) Amsterdam: John Benjamins
Shieber, S t u a r t M 1985 Evidence against the Context- freeness of Natural Language Linguistics and P h i l o s o p h y ,
8: 333-343