Building on Existing Linguistic Knowledge- 123docz.net

Definition 4 Translation of MT-revision-programs into extended LPs)

3.7 Building on Existing Linguistic Knowledge

Data driven methods of producing natural language resources are motivated by the diﬃculty of producing such resources manually. However, it would be wasteful not to draw on human linguistic knowledge when it is economical to do so. Brill argues that we should . . .

focus on ways of capitalizing on the relative strengths of people and machines, rather than simply viewing machine learning as another way to do the same thing. [6]

In this section we look at how this can be done in a logical learning framework, dividing the approaches into static (before learning) and active (during learning) approaches.

Static Incorporation of Linguistic Information A static approach to incorporating linguistic knowledge demands that the user presents linguistic information to the learning system before the learning process begins. In ILP this information is given by (i) deﬁning the hypothesis space; often using extra-logical constraints on acceptable hypotheses and (ii) providing the background knowl- edgeB as in Table 1. It is well known amongst ILP practitioners that “getting the background knowledge right” is crucial to the success of an ILP application.

Logical learning techniques have most to oﬀer where this background information (i.e. information other than data) has a logical representation. For example, in the case of learning grammars we can take whatever initial grammar we might have and add it to the background knowledge B—as in our reformulation of the CHILL system. This initial grammar will always be unsatisfactory, hence the need for learning which will revise it in some way, but starting from such knowledge that we do have is more eﬃcient than ab initio techniques.

In [11] an initial grammar is provided as background knowledge, but the main focus is on constraining the hypothesis space. Although an inductive approach to grammar construction assumes that it is undesirable to do manual grammar writing, it is not unreasonable to expect a user to constrain the hypothesis space with general linguistic principles. In [11] the goal was to add sufficiently tight linguistic constraints such that no linguistically implausible grammar rule or lexical item gets past the constraints to be evaluated against the data. Con- straints on headedness and gap-threading proved particularly useful, not only in filtering out implausible rules, but also in constructing rules. The great practical advantage of a logical approach here is that these constraints can be expressed declaratively (in Prolog) using a logical representation specifically devised to facilitate the expression of linguistic knowledge.

A more fundamental approach is oﬀered by Adriaans and de Haas [2]. They note that

If one wants to use logic to describe certain phenomena in reality there are in principle two options. 1) One takes some variant of predicate calculus, e.g. Horn clauses, and one tries to model the phenomena in this medium, or, 2) one tries to ﬁnd a certain variant of logic in the substructural landscape that has characteristics that intrinsically model the target concepts. The latter route is to the knowledge of the authors hardly taken by researchers in ILP. [2]

Adriaans and de Haas argue for the latter option:

we show that in some areas, especially grammar induction, the substructural approach has speciﬁc advantages. These advantages are: 1) a knowledge representation that models the target concepts intrinsically,

2) of which the complexity issues are well known, 3) with an expressive power that is in general weaker than the Horn-clause or related representations that are used in more traditional ILP research, 4) for which explicit learnability results are available. [2]

This approach underlies the EMILE algorithm which learns categorial grammars from unannotated data and queries to the user. In general, there are different benefits from using a logic just expressive enough for a particular learning problem (e.g. grammar learning) and using problem specific constraints within a more expressive logic. This is, at base, a practical problem and further work is required to compare the hard-coded restrictions of Adriaans and de Haas with the problem-specific restrictions commonly used in ILP.

Active Learning An active learning system seeks out information, usually data,during the course of learning. For example, in the ASIUM system the user is called upon in two ways. One is to give comprehensible names to predicates invented by ASIUM, the other more important way is to check each stage of generalisation to prevent over-generalisation. Thompson and Caliﬀ [25] use ase- lective sampling approach to active learning where the learning systems (in their case the ILP systems CHILL and RAPIER) ask the user to annotate particularly informative examples.

We ﬁnish this section by considering early work by Wirth [26] where abduc- tion is used to guess missing facts and the user is asked whether these guesses are correct. For example, in Wirth’s grammar learning example the user is asked whether the abduced facts

intransitive_verb([loves,a,man],[]) and

verb_phrase([loves,a,man],[])

are true. The user also has to evaluate the conjectured rule, a practice which Wirth defends as follows:

A system that learn[s] concepts or rules from looking at the world is useless as long as the results are not veriﬁed because a user who feels responsible for his knowledge base rarely use these concepts or rules. [26]

The counter-argument to this is that an output of, say, a large lexicon is too big for a user to check, and so the best veriﬁcation in such a case is against out-of-sample data. The desirability of user interaction is a quantitative matter;

one needs to weigh up the eﬀort required of the user against the gains in quality of output. In the systems discussed in this section the user is required to give a yes/no answer to a hypothesis produced by the system or provide an annotation for a particular example. Both these interactions, particularly the former, do not put a heavy burden on the user as long as they are not required too frequently.

In any case, they are vastly less burdensome than a purely manual non-inductive approach, and so, given the valuable information which users can supply seem likely to be used extensively in future work on language learning.

4 Conclusions

This paper has not examined all aspects of learning language in logic (LLL).

For example, there is almost no discussion of LLL work in morphology or PoS tagging, overviews of which are given by [16] and [13], respectively. However, hopefully some key issues have been discussed in suﬃcient detail to back up the argument that LLL is both practical and desirable for a number of NLL tasks.

Looking ahead, it seems likely that hybrid approaches will be important for LLL. One important hybridisation is between manual development environments and inductive techniques. I have previously argued that LLL is attractive because logic is often the native representation for NLP—this should also make integrated systems easier to build. Also such an environment is the right one for active learning. An existing LLL system that takes user interaction seriously is ASIUM (seehttp://www.lri.fr/~faure/Demonstration.UK/Presentation_Demo.html).

However, the most important hybridisation is between logic and probability, an enterprise which has been continuing since the very beginning of symbolic logic [4]. NLL is considerably more diﬃcult than many other machine learning tasks, so it is inconceivable that NLL outputs will not have residual uncertainty.

In much ILP work uncertainty is left unquantiﬁed or dealt with in a statistically unsophisticated manner. The statistical NLP revolution has demonstrated the advantages of (i) recognising the inevitability of uncertainty and (ii) modelling it properly using probabilistic models. There is no reason, in principle, why these elementary observations can not be applied to LLL. However, the very ﬂexibility of logic that makes it so attractive for NLP gives rise to complex probabilistic models. Nonetheless there has been progress in this area [14, 18, 1, 22, 23, 8, 9, 10].

References

[1] Steven Abney.Stochastic attribute-value grammars.Computational Linguistics, 23(4):597–618, 1997.

[2] Pieter Adriaans and Erik de Haas.Grammar induction as substructural inductive logic programming.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[3] H.Alshawi, editor.The Core Language Engine. MIT Press, Cambridge, Mass, 1992.

[4] George Boole.An Investigation of the Laws of Thought, on which are founded the Mathematical Theories of Logic and Probabilities.Dover, 1854.

[5] Henrik Bostr¨om.Induction of recursive transfer rules.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 of LNAI.Springer, 2000.

[6] Eric Brill.A closer look at the automatic induction of linguistic knowledge.In James Cussens and Saˇso Dˇzeroski, editors, Learning Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[7] M.E. Caliﬀ and R.J. Mooney. Relational learning of pattern-match rules for information extraction.In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 328–334, Orlando, FL, July 1999.

[8] James Cussens.Loglinear models for ﬁrst-order probabilistic reasoning.InPro- ceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intel- ligence (UAI–99), pages 126–133, San Francisco, CA, 1999.Morgan Kaufmann Publishers.

[9] James Cussens.Stochastic logic programs: Sampling, inference and applications.

In Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI–2000), pages 115–122, San Francisco, CA, 2000.Morgan Kauf- mann.

[10] James Cussens.Parameter estimation in stochastic logic programs.Machine Learning, 2001.To appear.

[11] James Cussens and Stephen Pulman.Incorporating linguistics constraints into inductive logic programming.InProceedings of CoNLL2000 and LLL2000, pages 184–193, Lisbon, September 2000.ACL.

[12] Saˇso Dˇzeroski, James Cussens, and Suresh Manandhar.An introduction to inductive logic programming and learning language in logic.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 of LNAI.Springer, 2000.

[13] Martin Eineborg and Nikolaj Lindberg.ILP in part-of-speech tagging — an overview.In James Cussens and Saˇso Dˇzeroski, editors, Learning Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[14] Andreas Eisele.Towards probabilistic extensions of constraint-based grammars.

Contribution to DYANA-2 Deliverable R1.2B, DYANA-2 project, 1994.Available atftp://moon.philo.uva.nl/pub/dekker/dyana/R1.2.B.

[15] D.Faure and C.N´edellec.A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology Acquisition.In Paola Velardi, editor,LREC workshop on Adapting lexical and corpus ressources to sublanguages and applications, pages 5–12, Granada, Spain, May 1998.

[16] Dimitar Kazakov.Achievements and prospects of learning word morphology with inductive logic programming.In James Cussens and Saˇso Dˇzeroski, editors,Learn- ing Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[17] S.Muggleton and L.De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 20:629–679, 1994.

[18] Stephen Muggleton.Stochastic logic programs.In Luc De Raedt, editor,Advances in Inductive Logic Programming, volume 32 ofFrontiers in Artificial Intelligence and Applications, pages 254–264.IOS Press, Amsterdam, 1996.

[19] Stephen Muggleton.Semantics and derivation for stochastic logic programs.In Richard Dybowski, editor,Proceedings of the UAI-2000 Workshop on Fusion of Domain Knowledge with Data for Decision Support, 2000.

[20] Claire Nedellec.Corpus-based learning of semantic relations by the ILP system, Asium.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[21] Miles Osborne.DCG induction using MDL and parsed corpora.In James Cussens and Saˇso Dˇzeroski, editors, Learning Language in Logic, volume 1925 of LNAI.

Springer, 2000.

[22] Stefan Riezler. Probabilistic Constraint Logic Programming.PhD thesis, Univer- sität Tübingen, 1998.AIMS Report 5(1), 1999, IMS, Universität Stuttgart.

[23] Stefan Riezler.Learning log-linear models on constraint-based grammars for dis- ambiguation.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 ofLNAI.Springer, 2000.

[24] Lappoon R.Tang and Raymond J.Mooney. Automated construction of database interfaces: Integrating statistical and relational learning of semantic parsing.In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora(EMNLP/VLC-2000), pages 133–

141, Hong-Kong, October 2000.

[25] Cynthia A.Thompson and Mary Elaine Caliﬀ. Improving learning by choosing examples intelligently in two natural language tasks.In James Cussens and Saˇso Dˇzeroski, editors,Learning Language in Logic, volume 1925 of LNAI.Springer, 2000.

[26] Ruediger Wirth.Learning by failure to prove.In Derek Sleeman, editor,Proceed- ings of the 3rd European Working Session on Learning, pages 237–251, Glasgow, October 1988.Pitman.

[27] J.M.Zelle and R.J.Mooney. Learning semantic grammars with constructive inductive logic programming.InProceedings of the Eleventh National Conference on Artificial Intelligence, pages 817–822, Washington, D.C., July 1993.

[28] J.M.Zelle and R.J.Mooney. Comparative results on using inductive logic programming for corpus-based parser construction. In S.Wermter, E.Riloﬀ, and G.Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pages 355–369.Springer, Berlin, 1996.

[29] J.M. Zelle, C.A. Thompson, M.E. Caliﬀ, and R.J. Mooney. Inducing logic programs without explicit negative examples.In L.De Raedt, editor,Proceedings of the 5th International Workshop on Inductive Logic Programming, pages 403–416.

Department of Computer Science, Katholieke Universiteit Leuven, 1995.

[30] John M.Zelle and Raymond J.Mooney.An inductive logic programming method for corpus-based parser construction.Unpublished Technical Report, 1997.

Veronica Dahl

Logic and Functional Programming Group School of Computing Science

Simon Fraser University Burnaby, B.C. Canada V5A 1S6

veronica@cs.sfu.ca

Abstract. We present a logic programming parsing methodology which we believe especially interesting for understanding implicit human-language structures. It records parsing state constituents through linear assumptions to be consumed as the corresponding constituents materi- alize throughout the computation. Parsing state symbols corresponding to implicit structures remain as undischarged assumptions, rather than blocking the computation as they would if they were subgoals in a query.

They can then be used to glean the meaning of elided structures, with the aid of parallel structures. Word ordering inferences are made not from symbol contiguity as in DCGs, but from invisibly handling numbered edges as parameters of each symbol. We illustrate our ideas through a metagrammatical treatment of coordination, which shows that the pro- posed methodology can be used to detect and resolve parallel structures through syntactic and semantic criteria.

Keywords: elision, parallel structures, logic grammars, datalog grammars, hypothetical reasoning, bottom-up parsing, left-corner parsing, chart parsing, linear aﬃne implication, prediction, coordination.

1 Introduction

Work on implicit meaning reconstruction has typically centered around the notion of parallelism as a key element in the determination of implicit meanings.

[1] deﬁnes parallelism as

a pairing of constituents ... and their parts, such that each pair contains two semantically and structurally similar objects

For instance, in Bob likes tea and Alain coﬀee. we can recognize two parallel verb phrases, one complete (likes tea) and one incomplete (coﬀee), in which the verb’s meaning is implicit and can be inferred from that of the verb in the parallel, complete verb phrase.

Because the parallel structures can be any phrase at all, it would be highly ineﬃcient to try to code all possible cases explicitly, even if we did not have the added complication of possible elision. Instead, we can use the metarule X --> X conj X

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 506–525, 2002.

c Springer-Verlag Berlin Heidelberg 2002

to express the large number of its speciﬁc instances (where X = noun phrase, X = adjective, X = sentence, etc.).

A parser dealing with conjunction metagrammatically must keep this metarule “in mind” to try to parse a structure of same category on either side of the conjunction (the conjoints), and then identify the string covering both conjoints as being of the same category, its meaning an appropriate combination of the meanings of both conjoints.

In this article we present a bottom-up, left-corner rendition of datalog grammars [8] in which expected constituents are expressed as continuation- based aﬃne linear assumptions [6, 11, 17]. These can be consumed at most once, are backtrackable, and remain available during the entire continuation. Parsing state symbols that correspond to implicit structures can then remain as undischarged assumptions, rather than blocking the computation as they would if they were subgoals in a query. By examining the undischarged assumptions with the aid of the parallel structures concerned, we can recover the meaning of elided strings.

As proof of concept, we present a parser which charts, or memoes, the theo- rems obtained at each level of iteration, and examines them when searching for parallel structures.

Given that memoization has been born from Computational Linguistics [26, 4, 15, 29], it is poetic justice that related ideas boomerang back into CL to help solve a long-time, very interesting problem, and that they should do so in a context that honours Robert Kowalski,a source of so many beautiful and lasting ideas.

Datalog grammars themselves, partly inspired from database theory, follow an assertional representation of parsing which was ﬁrst cast in logic programming terms by Kowalski [18]–in which sentences are coded with numbered word edges rather than as lists of words, e.g.

the(1,2).

resolution(2,3).

principle(3,4).

rather than

[the,resolution,principle]

The two representations are equivalent, though it is faster to look up words in an assertionally represented sentence than to repeatedly pick a list apart.

After an Introduction and a Background section, we present our parsing methodology: predictive left-corner datalog. Section 5 examines our treatment of elipsis in the context of coordinated sentences, and section 6 discusses our results and related work.

2 Background

As we saw in the Introduction, the notion of parallelism is central to work on el- lipsis. [12], following [23], also postulates the necessity, within a feature-structure

setting, of combining elements which exhibit a degree of syntactico-semantic parallelism in order to determine the way in which some kinds of anaphora are re- solved, and argue that the use of default unification (or priority union) improves on Prust’s operation for combining the parallel structures. Intuitively, default unification [3] takes two feature structures, one of which (called the TARGET) is identified as “strict”, while the other one (called the SOURCE) is “defeasible”, and combines the information in both such that the information in the strict structure takes priority over that in the defeasible structure.

For instance, the combination of the feature structures shown below for sentences 1a and 1b:

1a. Hannah likes beetles.

[ AGENT Hannah PATIENT beetle ]

likes 1b. So does Thomas.

[ AGENT Thomas ]

agentive results in the priority union:

[ AGENT Thomas PATIENT beetle ]

likes

Thus, the implicit constituent in the second sentence is reconstituted from the ﬁrst by using a generally applicable procedure on the representations of the parallel structures.

[10] postulated a similar analysis, but it was based on λ-calculus semantic representations, and used higher order uniﬁcation. For instance, in their example:

Dan likes golf, and George does too.

they identify the antecedent or source as the complete structure (“Dan likes golf”), whereas the target clause (”George does too”) is either missing, or contains only vestiges of, material found overtly in the source.

Their analysis of such structures consist of:

a) determining the parallel structure of source and target;

b) determining which are parallel elements in source and target (e.g., “Dan”

and “George” are parallel elements in the example);

c) using Huet’s higher-order uniﬁcation algorithm [14] for ﬁnding a property P such that P(s1,...,sn) = S, where s1through sn are the interpretations of the parallel elements of the source, and S is the interpretation of the source itself.

Building on Existing Linguistic Knowledge

The Advance Formation of Plans

The Theorem Proving Power of Proof Planning