Regular expressions are analogous to the algebra of arithmetic expressions with which we are all familiar, and the relational algebra that we met in Handout #3.. REGULAR EXPRESSIONSA set
Trang 1INTRODUCTION TO COMPUTER SCIENCE
HANDOUT #8 REGULAR EXPRESSIONS
K5 & K6, Computer Science Department, Văn Lang University
Second semester Feb, 2002
Instructor: Trần Đức Quang
Major themes:
1 Introduction
2 Algebraic Laws for Regular Expressions
Reading: Sections 10.5 and 10.7.
8.1 INTRODUCTION
In the previous handout, we have studied a finite automaton which is, in a sense, a machine recognizing character strings with some patterns In this handout, we meet
regular expressions which is an algebraic way to describe a string pattern Regular
expressions are analogous to the algebra of arithmetic expressions with which we are all familiar, and the relational algebra that we met in Handout #3
Interestingly, the set of patterns that can be expressed in the regular-expression algebra is exactly the same set of patterns that can be described by automata For
example, the regular expression a | bc* can express the patterns described by the
fol-lowing automaton
A
start
B
a
D
b
c
Trang 246 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #8 REGULAR EXPRESSIONS
A set of strings denoted by a regular expression E is called a regular language and can be referred to as L(E) Given two regular languages L and M, we can construct new
languages by applying the operators below one or more times to these languages:
1 The union of two languages L and M, denoted L ∪ M, is the set of strings that are in either L or M For example, if L = {00, 01, 11} and M = {aa, ab}, the union
L ∪ M = {00, 01, 11, aa, ab}
2 The concatenation of languages L and M, denoted LM, is the set of strings that
can be formed by taking any string in L and concatenating it with any string in
M For the above example, we have LM = {00aa, 01aa, 11aa, 00ab, 01ab, 11ab}
3 The closure (star, or Kleene closure) of a language L is denoted L* and represents
the set of strings that can be formed by taking any number of strings from L and
concatenating all of them
Algebra of regular expressions, like all kinds of algebras, starts with some elementary expressions, usually constants and/or variables denoting languages We then construct more expressions by applying the set of three operators (union, concatenation, star) to these elemetary expressions and to previously constructed expressions In particular:
1 Empty string is denoted by the regular expression ε
2 A symbol a is denoted by the regular expression a.
3 Suppose R and S are two regular expressions denoting the languages L(R) and
L(S), respectively.
a) R + S (or R | S) is a regular expression for the union of the languages L(R) and L(S).
b) RS is a regular expression for the concatenation of the languages L(R) and
L(S).
c) R* is a regular expression for the Kleen closure of the language L(R).
For a simple example, suppose letter is any character in the alphabet, digit is any decimal digits, and under is the symbol _ We can define a regular expression as a
pattern of legal identifiers in C as follows:
identifiers = (letter|under)(letter|digit|under)*
As in arithmetic expressions, we can use parentheses to group regular expressions
8.2 ALGEBRAIC LAWS FOR REGULAR EXPRESSIONS
It is possible for two regular expressions to denote the same language, just as two arithmetic expressions can denote the same function of their operands As an example,
the arithmetic expressions x + y and y + x each denote the same function of x and y,
Trang 38.3 GLOSSARY 47
because addition is commutative So are the regular expressions R + S and S + R.
We now list some common algebraic laws for regular expressions with no proofs More useful laws can be found in the textbook
1 Commutativity of union (R |S) ≡ (S |R)
2 Associativity of union ((R |S) |T)≡ (R | (S |T))
3 Associativity of concatenation ((RS)T) ≡(R(ST))
4 Left distributivity of concatenation over union (R(S |T))≡ (RS |RT)
5 Right distributivity of concatenation over union ((S |T)R)≡ (SR |TR)
6 Idempotence of union (R |R) ≡ R
7 RR* ≡ R*R
8.3 GLOSSARY
Regular Expression: Biểu thức chính quy.
Arithmetic Expression: Biểu thức số học.
Algebra: Đại số.
algebra of regular expressions: đại số biểu thức chính quy.
algebra of arithmetic expressions: đại số biểu thức số học.
set algebra: đại số tập hợp.
relational algebra: đại số quan hệ.
Algebraic law: Luật đại số (tính chất đại số).
Commutative law, commutativity: Luật giao hoán, tính giao hoán.
Associative law, associativity: Luật kết hợp, tính kết hợp.
Distributive law, distributivity: Luật phân phối, tính phân phối.
Idempotence: Tính lũy đẳng.
Regular language, regular set: Ngôn ngữ chính quy, tập chính quy.
Union: Phép hợp.
Concatenation: Phép ghép nối.
Star, Kleene closure: Phép toán sao, phép lấy bao đóng Kleene.