The Knowledge Discovery Cycle

Appendix IV: Target Completion Follow-Up for Example

5.3 The Knowledge Discovery Cycle

Knowledge discovery in databases typically proceeds in an iterative manner.

Data are selected, possibly cleaned, formatted, input in a data mining engine, and results are analysed and interpreted. As ﬁrst results often can be improved, one would typically re-iterate this process until some kind of a local optimum has been reached (cf. [12]).

Because knowledge discovery is an iterative process data mining tools should support this process. One consequence of the iterative nature of knowledge discovery in our context is that many of the queries formulated to the database mining engine will be related. Indeed, one can imagine that various queries are similar except possibly for some parameters such as thresholds, data sets, pattern syntax, etc. The relationships among consecutive queries posed to the data mining engine should provide ample opportunities for optimization. The situation is - to some extent - akin to the way that views are dealt with in databases (cf. [11]). Views in databases are similar to patterns in data mining in that both constructs are virtual data structures, i.e. they do not physically exist in the database. Both forms of data can be queries and it is the task of the engines to eﬃciently answer questions concerning these virtual constructs.

Answering queries involving views can be realized essentially in two diﬀerent ways. First, one can materialize views, which means that one generates the tuples in the view relation explicitly, and then processes queries as if a normal relation were queried. Second, one can performquery modification, which means that any query to a view is ‘unfolded’ into a query over the base relations.

The advantage of materialization is that new queries are answered much faster

whereas the disadvantage is that one needs to recompute or update the view whenever something changes in the underlying base relations. At a more general level, this corresponds to the typical computation versus storage trade-oﬀ in computer science.

These two techniques also apply to querying patterns in data mining. Indeed, if consecutive queries are inter-related, it would be useful to store the results (and possibly also the intermediate results) of one query in order to speed up the computation of the latter ones. This corresponds to materializing the patterns (together with accompanying information). Doing this would result in eﬀective but fairly complicated solvers.

6 Related Work

RDM is related to other proposals for database mining query languages such as e.g. [26,17,14,13,?]. However, it differs from these proposals in a number of aspects. First, due to the use of deductive databases as the underlying database model, RDM allows - in principle - to perform pattern discovery over various domains, such as e.g. item-sets, sequences, graphs, datalog queries, ... Secondly, a number of new and useful primitives are foreseen. Using RDM one is not re- stricted to finding frequent patterns, but one may also look for infrequent ones with regards to certain sets of (negative) examples. One can also require that certain examples are (resp. are not) covered by the patterns to be induced. Thirdly and most importantly, RDM allows to combine different primitives when search- ing for patterns. Finally, its embedding within Prolog puts database mining on the same methodological grounds as constraint programming.

As another contribution, we have outlined an eﬃcient algorithm for answering complex database mining queries. This algorithm integrates the principles of the level-wise algorithm with those of version spaces and thus provides evidence that RDM can be executed eﬃciently. It also provides a generalized theoretical framework for data mining. The resulting framework extends the borders in the level-wise techniques sketched by [23], who link the level-wise algorithm to the S set of Mitchell’s version space approach but do not further exploit the version space model as we do here. An implementation of the level-wise versionspace algorithm for use in molecular applications has been implemented [20,21] and the results obtained are promising.

Finally, the author hopes that this work provides a new perspective for data mining, which is grounded in the methodology of computational logic. The hope is that this will result in a clear separation of the declarative from the procedural aspects in data mining.

Acknowledgements

This work was partially supported by the EU IST project cInQ. The author is grateful to Stefan Kramer, Jean-Francois Boulicaut and the anonymous review-

ers for comments, suggestions and discussions on this work. Finally, he would like to thank the editors for their patience.

References

1. R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases. InProceedings of ACM SIGMOD Conference on Management of Data, pp. 207-216, 1993.

2. E. Baralis, G. Psaila. Incremental Reﬁnement of Mining Queries. In Mukesh K.

Mohania, A. Min Tjoa (Eds.)Data Warehousing and Knowledge Discovery, First International Conference DaWaK ’99 Proceedings. Lecture Notes in Computer Science, Vol. 1676, Springer Verlag, pp. 173-182, 1999.

3. Jean-Francois Boulicaut, Mika Klemettinen, Heikki Mannila: Querying Inductive Databases: A Case Study on the MINE RULE Operator. InProceedings of PKDD- 98, Lecture Notes in Computer Science, Vol. 1510, Springer Verlag, pp. 194-202, 1998.

4. I. Bratko. Prolog Programming for Artificial Intelligence. Addison-Wesley, 1990.

2nd Edition.

5. W. Cohen, Whirl : a word-based information representation language. Artificial Intelligence, Vol. 118 (1-2), pp. 163-196, 2000.

6. L. Dehaspe, H. Toivonen and R.D. King. Finding frequent substructures in chem- ical compounds, inProceedings of KDD-98, AAAI Press, pp. 30-36, 1998.

7. L. Dehaspe, H. Toivonen. Discovery of Frequent Datalog Patterns, inData Mining and Knowledge Discovery Journal, Vol. 3 (1), pp. 7-36, 1999.

8. L. De Raedt, An inductive logic programming query language for database mining (Extended Abstract), in Proceedings of Artificial Intelligence and Symbolic Com- putation, Lecture Notes in Artiﬁcial Intelligence, Vol. 1476, Springer Verlag, pp.

1-13, 1998.

9. L. De Raedt, A Logical Database Mining Query Language. in Proceedings of the 10th Inductive Logic Programming Conference, Lecture Notes in Artiﬁcial Intelli- gence, Vol. 1866, Springer Verlag, pp. 78-92, 2000.

10. L. De Raedt, S. Kramer, The level-wise version space algorithm and its application to molecular fragment ﬁnding, inProceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 853-862, 2001.

11. R. Elmasri, S. Navathe. Fundamentals of database systems. Benjamin Cummings.

1994.

12. Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthu- rusamy (Eds.). Advances in Knowledge Discovery, The MIT Press, 1996.

13. F. Giannotti, G. Manco: Querying Inductive Databases via Logic-Based User- Deﬁned Aggregates. InProceedings of PKDD 99, Lecture Notes in Artiﬁcial Intel- ligence, Vol. 1704, Springer Verlag, pp. 125-135, 1999.

14. J. Han, Y. Fu, K. Koperski, W. Wang, and O. Zaiane, DMQL: A Data Mining Query Language for Relational Databases, in SIGMOD’96 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, June 1996.

15. J. Han, L. V. S. Lakshmanan, and R. T. Ng, Constraint-Based, Multidimensional Data Mining,Computer, Vol. 32(8), pp. 46-50, 1999.

16. H. Hirsh. Generalizing Version Spaces. Machine Learning, Vol. 17(1), pp. 5-46 (1994).

17. T. Imielinski and H. Mannila. A database perspectivce on knowledge discovery.

Communications of the ACM, Vol. 39(11), pp. 58–64, 1996.

18. T. Imielinski, A. Virmani, and A. Abdulghani. Application programming interface and query language for database mining. InProceedings of KDD 96. AAAI Press, pp. 256-262, 1996.

19. Robert A. Kowalski. Algorithm = Logic + Control.Communications of the ACM, 22(7), pp. 424-436, 1979.

20. S. Kramer, L. De Raedt. Feature Construction with Version Spaces for Biochem- ical Applications, in Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann, 2001.

21. S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp. 136-143, 2001.

22. D. Gunopulos, H. Mannila, S. Saluja: Discovering All Most Speciﬁc Sentences by Randomized Algorithms. In Foto N. Afrati, Phokion Kolaitis (Eds.):Database Theory - ICDT ’97, 6th International Conference, Lecture Notes in Computer Science, Vol. 1186, Springer Verlag, pp. 41-55, 1997.

23. H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery,Data Mining and Knowledge Discovery, Vol. 1(3), pp. 241-258, 1997.

24. H. Mannila. Inductive databases. in Proceedings of the International Logic Pro- gramming Symposium, The MIT Press, pp. 21-30, 1997.

25. Marriott, K. and Stuckey, P. J. Programming with constraints : an introduction.

The MIT Press. 1998.

26. R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules.

Data Mining and Knowledge Discovery, Vol. 2 (2), pp. 195-224, 1998.

27. C. Mellish. The description identiﬁcation algorithm.Artificial Intelligence, Vol. 52 (2), pp,. 151-168, 1990.

28. T. Mitchell. Generalization as Search,Artificial Intelligence, Vol. 18 (2), pp. 203- 226, 1980.

29. G. Sablon, L. De Raedt, and Maurice Bruynooghe. Iterative Versionspaces.Artifi- cial Intelligence, Vol. 69(1-2), pp. 393-409, 1994.

30. A. Inokuchi, T. Washio, H. Motoda. An Apriori-based algorithm for mining frequent substructures from graph data. in D. Zighed, J. Komorowski, and J. Zyktow (Eds.) Proceedings of PKDD 2000, Lecture Notes in Artiﬁcial Intelligence, Vol.

1910, Springer-Verlag, pp. 13-23, 2000.

Chris Mellish Division of Informatics University of Edinburgh

80 South Bridge, Edinburgh EH1 1HN, Scotland C.Mellish@ed.ac.uk,

http://www.dai.ed.ac.uk/homes/chrism/

Abstract. The idea of viewing parsing asdeductionhas been a powerful way of explaining formally the foundations of natural language processing systems. According to this view, the role of grammatical description is to write logical axioms from which the well-formedness of sentences in a natural language can be deduced.

However, this view is at odds with work on unification grammars, where categories are given complex descriptions and the process of building sat- isfyingmodelsis at least as relevant as that of building deductive proofs.

In some work feature logics are even used to replace the context-free component of grammars. From this work emerges the view that grammatical description is more like writing down a set of constraints, with well-formed sentences being the possible solutions to these constraints.

In this paper, we concentrate on Definite Clause Grammars (DCGs), the paradigm example of “parsing as deduction”. The fact that DCGs are based on using deduction (validity) and unification grammar approaches are based on constructing models (satisfiability) seems to indicate a significant divergence of views. However, we show that, under some plausible assumptions, the computation involved in using deduction to derive consequences of DCG clauses produces exactly the same results as would be produced by a process of model building using a set of axioms derived syntactically from the original clauses.

This then suggests that there is a single view of parsing (and genera- tion) that reconciles the two approaches. This is a view of parsing as model-building, not a view of parsing as deduction. Even in the original paradigm case there is some doubt as to whether “parsing as deduction”

is the best, or only, explanation of what is happening.

1 Parsing as Deduction?

The idea of viewing parsing asdeduction, which goes back to the work of Colmer- auer [Colmerauer 1978]and Kowalski [Kowalski 1979], has been a powerful way of explaining formally the foundations of natural language processing systems.

According to this view, the role of grammatical description is to write logical axioms from which the well-formedness of sentences in a natural language can be deduced. Pereira and Warren [Pereira and Warren 1983]cite a number of beneﬁts that arise from investigating the connection between the two, including

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 548–566, 2002.

c Springer-Verlag Berlin Heidelberg 2002

the transfer of useful techniques between theorem-proving and computational linguistics. Shieber [Shieber 1988]and others have used similar arguments for also consideringgenerationas deduction.

The paradigm examples of parsing as deduction have used Definite Clause Grammars (DCGs [Pereira and Warren 1980]). Demonstrating that a sentence is well-formed according to a DCG grammar is achieved using the theorem-proving approach known as SLD resolution. Part of the resolution model involves having aunificationoperation to establish when a category required to be present could be decomposed by one of the grammar rules. In general, unification involves applying rewrite rules acting on sets of constraints in such a way as to build representations of possible models of those constraints. In the DCG case, the constraints are so simple (equality in the Herbrand universe of the terms which can be constructed from the constants and function symbols in the grammar) that the unification operation almost goes unnoticed as part of the definition of valid inference. However later work has introduced the possibility of describ- ing categories by complex feature descriptions expressed in a feature logic (e.g.

[Kasper and Rounds 1986], [Smolka 1992]). In such cases, unification can be doing a significant part of the real work in a parser. When such complex feature descriptions are used to annotate context-free phrase structure rules, as in PATR [Shieber 1986], a hybrid model such as Höhfeld and Smolka’s model of constraint logic programming [Höhfeld and Smolka 1988]is needed to provide a way of rec- onciling the use of a model-building component within an inference system. The simple view that parsing is deduction has now become more complex.

The situation unfortunately becomes different again when the feature logic is also used also to replace the context-free skeleton present in DCGs (as in [Manaster-Ramer and Rounds 1987],[Emele and Zajac 1990],[Manandhar 1993], and many approaches based on HPSG or Categorial Grammar). In this case, unification becomes more or less all there is in a parser, which leads to the view that parsing is really model-building. For instance, in typed unification grammars parsing is implemented as a process of type checking (in the presence of a type theory expressing the constraints of the grammar) which rewrites an input term to possible normal forms corresponding to models of it ([Emele 1994], [A¨ıt-Kaci and Podelski 1993]). According to this view, grammatical description is more like writing down a set of constraints, with well-formed sentences being the possible solutions to these constraints.

The fact that DCGs are based on using deduction (validity) and feature logic approaches are based on constructing models (satisﬁability) seems to indicate a signiﬁcant divergence of views about what parsing “is” [Johnson 1992].

2 Definite Clause Grammars - The Usual Account

In this section, we brieﬂy give the standard account of DCGs and show how they illustrate the idea of parsing as deduction. This section contains nothing original, but we wish to go through the steps fairly carefully in order that we can later show that a diﬀerent account explains the same phenomena.

Figure 1 shows a simple example of a deﬁnite clause grammar (DCG) in various forms (to be discussed below).

1. Original DCG:

vp(Num) --> vtr(Num), np(Num1).

vp(Num) --> vintr(Num).

vtr(sing) --> [hates].

vtr(plur) --> [hate].

np(Num) --> [sheep].

2. Context-free skeleton:

vp→vtr np vp→vintr vtr→hates vtr→hate np→sheep

3. Prolog translationΠ:

vp(Num,P0,P1) :- vtr(Num,P0,P1), np(Num1,P1,P2).

vp(Num,P0,P1) :- vintr(Num,P0,P1).

vtr(sing,[hates|P],P).

vtr(plur,[hate|P],P).

np(Num,[sheep|P],P).

4. Horn Clause interpretation Πif:

∀Num, Num1, P0, P1, P2. vp(Num, P0, P2) ⊂ vtr(Num, P0, P1) ∧ np(Num1, P1, P2)

∀Num, P0, P1. vp(Num, P0, P1)⊂vintr(Num, P0, P1)

∀P. vtr(sing,[hates|P], P)

∀P. vtr(plur,[hate|P], P)

∀Num, P. np(Num,[sheep|P], P)

Fig. 1.Deﬁnite Clause Grammar in various forms

Basically, the notation allows for the expression of context-free rules where the nonterminal symbols can be associated with values for particular features (using a ﬁxed positional notation for each nonterminal). Feature values in the grammar rules can be given as constants (e.g. sing) or by variables (whose

names begin with upper case letters, e.g. Num). Where variables are used, the intent is that every time a rule is used the same value must be used consistently for each occurrence of a given variable in the rule. If the feature annotations are stripped away from a DCG, the result is (modulo trivial syntactic diﬀerences) a context-free grammar, the context-free skeleton. The context-free skeleton in general generates a larger language than the DCG because it ignores all the feature constraints. The context-free skeleton for the above example is shown in the ﬁgure. hates, hateand sheepare terminals, and all other symbols are nonterminals.1

A DCG can be viewed as an abbreviation for a Prolog programΠ, which makes explicit the relation between the phrases and portions of the string by using a threading technique on two extra arguments added to each nonterminal. In the figure, [X|Y]is the usual Prolog syntactic sugar forcons(X, Y), for some function symbol consused to construct lists. 2 The two extra arguments represent a difference list of a string and a (not necessarily proper) tail of that string (encoded as lists), the given category then being taken to describe the portion of the string which is the differencebetween these. This translationΠ is standardly interpreted as a set of Horn clauses of logicΠif which states a set of “if” definitions (also shown in the figure).

In this example, “hates sheep” is a valid VP because vp(sing,[hates, sheep],[])

is a logical consequence of the above axioms. 3 In general, the set of strings α making up the language generated by a DCG is the set of strings for which s(f1, . . . fn, α,[]) is a valid logical consequence of the appropriateΠif axioms, where s is the initial symbol, f1, . . . fn can be any values for the features associated with that category by the grammar and α is the encoding of αas a list structure as illustrated by the example. That is, we are interested in the situation where

Πif s(f1, . . . fn, α,[])

SLD resolution, which is the basis of the execution mechanism of Prolog, is one way in which logical consequences can be derived from Πif. The operation of

1 Note that, although the word “sheep” is ambiguous as to number, in a good grammar one would only want to allow plural nouns to stand alone as NPs. This formulation has been chosen here to make a particular formal point later. The examples in this paper are not intended to make any real claims about any natural languages. The reader is asked to imagine that the example grammars really do make plausible claims.

2 We also assume the Prolog syntactic sugar [a, b, c] forcons(a, cons(b, cons(c, nil))) and [] fornil, for some constant symbolnil.

3 In this section, and in the rest of the paper, we will actually concentrate on the recognition problem, rather than the parsing problem, for DCGs. Given, however, that wewillbe concerned with the possible feature values for the categories that are recognised and that there are standard ways to express parse trees in the features of the categories [Pereira and Warren 1980], this represents no limitation.

a successful proof of an atom φ from axioms Πif is characterised by an SLD refutation of Πif ∪ {←φ}. An example successful SLD refutation showing the grammaticality of “hates sheep” is shown in Figure 2. SLD resolution operates on

Goals Renamed Clause/ Subst

←vp(Num,P0,P1) vp(Num1,P01,P11):-vtr(Num1,P01,P11),np(Num11,P11,P21) [Num/Num1,P0/P01,P1/P11]

←vtr(Num1,P01,P11),np(Num11,P11,P21) vtr(sing,[hates|P22],P22)

[Num1/sing,P01/[hates|P22],P11/P22]

np(Num11,P22,P21) np(Num3,[sheep|P33],P33)

[Num11/Num3,P22/[sheep|P33],P21/P33]

Fig. 2.SLD refutation

the “Prolog” representation of the grammar rules, not the DCG representation or the Predicate Calculus version. Each line of the refutation starts with a sequence of goals to be proved. Initially, the only goal is to find some true instance of the predicate for the initial symbol. As well as the goals, each line must mention a clause whose left hand side matches the first goal in the sequence (this clause has its variables renamed so as not accidentally to clash with those in the goal) and the minimal substitution (computed by unification) required to make the left hand side of the clause the same as that first goal. The next line of the refutation starts with the remaining goals preceded by the right hand side of the chosen clause, to all of which the just-computed substitution has been applied.

The last line must be empty (indicating that all goals have been proved).

The completeness and soundness of SLD resolution [Lloyd 1987]ensure that a ground atomφis a logical consequence ofΠifif and only if there is a finite SLD refutation of Π∪ {← φ}. This justifies regarding a Prolog implementation of DCGs as doing “parsing by deduction”. Note that in this paper, we will assume that our primary interest is in possible ground conclusions that can be drawn from a grammar and some input data. For instance, we may wish to know which particular logical forms can be associated with a particular sentence, or vice versa. Thus if a particular atomφis of interest to us (e.g.vp(plur,X,[])) then that interest can be expressed as an interest in those ground instances φ of φ that are true. Restricting attention to the ground case simplifies the presentation and does not sacrifice generality.

3 Limitations

DCGs allow one to describe a language in terms of the Horn clause subset of Predicate Logic, but the expressive limitations of this are well-known. Horn clauses do not allow for arbitrary occurrences of negation and disjunction, and yet these (and other extensions) are well-motivated from a linguistic point of view [Wedekind 1990].

The Advance Formation of Plans

The Theorem Proving Power of Proof Planning