An introduction to formal language theory that integrates ex

Formal Languages are set of strings over finite sets ofsymbols, called alphabets, and various ways of describing such languageshave been developed and studied, including regular expressi

Trang 1

Integrates Experimentation and Proof

Allen StoughtonKansas State University

Draft of Fall 2004

Trang 2

Permission is granted to copy, distribute and/or modify this document underthe terms of the GNU Free Documentation License, Version 1.2 or any laterversion published by the Free Software Foundation; with no Invariant Sec-tions, no Front-Cover Texts, and no Back-Cover Texts A copy of the license

is included in the section entitled “GNU Free Documentation License”

The LATEX source of this book and associated lecture slides, and thedistribution of the Forlan toolset are available on the WWW at http://www.cis.ksu.edu/~allen/forlan/

Trang 3

Preface v

1.1 Basic Set Theory 1

1.2 Induction Principles for the Natural Numbers 11

1.3 Trees and Inductive Definitions 16

2 Formal Languages 21 2.1 Symbols, Strings, Alphabets and (Formal) Languages 21

2.2 String Induction Principles 26

2.3 Introduction to Forlan 34

3 Regular Languages 44 3.1 Regular Expressions and Languages 44

3.2 Equivalence and Simplification of Regular Expressions 54

3.3 Finite Automata and Labeled Paths 78

3.4 Isomorphism of Finite Automata 86

3.5 Algorithms for Checking Acceptance and Finding Accepting Paths 94

3.6 Simplification of Finite Automata 99

3.7 Proving the Correctness of Finite Automata 103

3.8 Empty-string Finite Automata 114

3.9 Nondeterministic Finite Automata 120

3.10 Deterministic Finite Automata 129

3.11 Closure Properties of Regular Languages 145

3.12 Equivalence-testing and Minimization of Deterministic Finite Automata 174

3.13 The Pumping Lemma for Regular Languages 193

3.14 Applications of Finite Automata and Regular Expressions 199

ii

Trang 4

4 Context-free Languages 204 4.1 (Context-free) Grammars, Parse Trees and Context-free

Lan-guages 204

4.2 Isomorphism of Grammars 213

4.3 A Parsing Algorithm 215

4.4 Simplification of Grammars 219

4.5 Proving the Correctness of Grammars 221

4.6 Ambiguity of Grammars 225

4.7 Closure Properties of Context-free Languages 227

4.8 Converting Regular Expressions and Finite Automata to Grammars 230

4.9 Chomsky Normal Form 233

4.10 The Pumping Lemma for Context-free Languages 236

5 Recursive and R.E Languages 242 5.1 A Universal Programming Language, and Recursive and Re-cursively Enumerable Languages 243

5.2 Closure Properties of Recursive and Recursively Enumerable Languages 246

5.3 Diagonalization and Undecidable Problems 249

Trang 5

1.1 Example Diagonalization Table for Cardinality Proof 93.1 Regular Expression to FA Conversion Example 1513.2 DFA Accepting AllLongStutter 1944.1 Visualization of Proof of Pumping Lemma for Context-freeLanguages 2395.1 Example Diagonalization Table for R.E Languages 249

iv

Trang 6

Since the 1930s, the subject of formal language theory, also known as tomata theory, has been developed by computer scientists, linguists andmathematicians (Formal) Languages are set of strings over finite sets ofsymbols, called alphabets, and various ways of describing such languageshave been developed and studied, including regular expressions (which “gen-erate” languages), finite automata (which “accept” languages), grammars(which “generate” languages) and Turing machines (which “accept” lan-guages) For example, the set of identifiers of a given programming language

au-is a formal language—one that can be described by a regular expression or afinite automaton And, the set of all strings of tokens that are generated by aprogramming language’s grammar is another example of a formal language.Because of its many applications to computer science, e.g., to compilerconstruction, most computer science programs offer both undergraduate andgraduate courses in this subject Many of the results of formal languagetheory are proved constructively, using algorithms that are useful in practice

In typical courses on formal language theory, students apply these algorithms

to toy examples by hand, and learn how they are used in applications Butthey are not able to experiment with them on a larger scale

Although much can be achieved by a paper-and-pencil approach to thesubject, students would obtain a deeper understanding of the subject ifthey could experiment with the algorithms of formal language theory us-ing computer tools Consider, e.g., a typical exercise of a formal languagetheory class in which students are asked to synthesize an automaton thataccepts some language, L With the paper-and-pencil approach, the stu-dent is obliged to build the machine by hand, and then (perhaps) provethat it is correct But, given the right computer tools, another approachwould be possible First, the student could try to express L in terms ofsimpler languages, making use of various language operations (union, inter-

v

Trang 7

section, difference, concatenation, closure) He or she could then synthesizeautomata accepting the simpler languages, enter these machines into thesystem, and then combine these machines using operations corresponding

to the language operations used to express L With some such exercises, astudent could solve the exercise in both ways, and could compare the results.Other exercises of this type could only be solved with machine support

Integrating Experimentation and Proof

Over the past several years, I have been designing and developing a puter toolset, called Forlan, for experimenting with formal languages For-lan is implemented in the functional programming language Standard ML[MTHM97, Pau96], a language whose notation and concepts are similar tothose of mathematics Forlan is used interactively In fact, a Forlan session

com-is simply a Standard ML session in which the Forlan modules are pre-loaded.Users are able to extend Forlan by defining ML functions

In Forlan, the usual objects of formal language theory—automata, ular expressions, grammars, labeled paths, parse trees, etc.—are defined

reg-as abstract types, and have concrete syntax The standard algorithms offormal language theory are implemented in Forlan, including conversionsbetween different kinds of automata and grammars, the usual operations

on automata and grammars, equivalence testing and minimization of ministic finite automata, etc Support for the variant of the programminglanguage Lisp that we use (instead of Turing machines) as a universal pro-gramming language is planned

deter-While developing Forlan, I have also been writing lectures notes on mal language theory that are based around Forlan, and this book is theoutgrowth of those notes I am attempting to keep the conceptual and no-tational distance between the textbook and toolset as small as possible Thebook treats each concept or algorithm both theoretically, especially usingproof, and through experimentation, using Forlan Special proofs that arecarried out assuming the correctness of Forlan’s implementation are labeled

for-“[Forlan]”, and theorems that are only proved in this way are also so-labeled.Readers of this book are assumed to have a significant amount of expe-rience reading and writing informal mathematical proofs, of the kind onefinds in mathematics books This experience could have been gained, e.g.,

in courses on discrete mathematics, logic or set theory The core sections

of the book assume no previous knowledge of Standard ML Eventually, vanced sections covering the implementation of Forlan will be written, and

Trang 8

ad-these sections will assume the kind of familiarity with Standard ML thatcould be obtained by reading [Pau96] or [Ull98].

Outline of the Book

The book consists of five chapters Chapter 1, Mathematical Background,consists of the material on set theory, induction principles for the naturalnumbers, and trees and inductive definitions that is required in the remain-ing chapters

In Chapter 2, Formal Languages, we say what symbols, strings, bets and (formal) languages are, introduce and show how to use severalstring induction principles, and give an introduction to the Forlan toolset.The remaining three chapters introduce and study more restricted sets oflanguages

alpha-In Chapter 3, Regular Languages, we study regular expressions and guages, four kinds of finite automata, algorithms for processing and convert-ing between regular expressions and finite automata, properties of regularlanguages, and applications of regular expressions and finite automata tosearching in text files and lexical analysis

lan-In Chapter 4, Context-free Languages, we study context-free grammarsand languages, algorithms for processing grammars and for converting regu-lar expressions and finite automata to grammars, and properties of context-free languages It turns out that the set of all context-free languages is aproper superset of the set of all regular languages

Finally, in Chapter 5, Recursive and Recursively Enumerable Languages,

we study a universal programming language based on Lisp, which we use todefine the recursive and recursively enumerable languages We study algo-rithms for processing programs and for converting grammars to programs,and properties of recursive and recursively enumerable languages It turnsout that the context-free languages are a proper subset of the recursive lan-guages, that the recursive languages are a proper subset of the recursivelyenumerable languages, and that there are languages that are not recursivelyenumerable Furthermore, there are problems, like the halting problem (theproblem of determining whether a program P halts when run on an input w),

or the problem of determining if two grammars generate the same language,that can’t be solved by programs

Trang 9

Định dạng
Số trang	288
Dung lượng	1,15 MB