slike bài giảng môn chương trình dịch chương 2 design pattern visitor

• Read the input characters • Produce as output a sequence of tokens • Eliminate white space and comments lexical symbol table source program token get next token... Tokens, Patterns,

Trang 1

LEXICAL ANALYSIS

Phung Hua Nguyen

University of Technology

2006

Trang 3

• Read the input characters

• Produce as output a sequence of tokens

• Eliminate white space and comments

lexical

symbol table

source program

token get next token

Trang 4

Why ?

• Simplify design

• Improve compiler efficiency

• Enhance compiler portability

Trang 5

Tokens, Patterns, Lexemes

Token Sample Lexeme Informal description of pattern

const const const

relation <,<=,==,!=,>,>= < or <= or == or != or > or >=

id pi, count, x2 letter followed by letters or digits

num 3.14, 25, 6.02E3 any numeric constant

literal “core dumped” any characters between “ and “ except “

Trang 7

Alphabet, Strings and Languages

• Alphabet ∑: any finite set of symbols

– The Vietnamese alphabet {a, á, à, , ã, , b, c, d, đ,…}

– The binary alphabet {0,1}

– The ASCII alphabet

• String : a finite sequence of symbols drawn from ∑ :

– Length |s| of a string s: the number of symbols in s

– The empty string, denoted ∈, | ∈ | = 0

• Language : any set of strings over ∑;

– its two special cases:

• ∅: the empty set

• { ∈ }

Trang 8

– The set of Pentium instructions

• ∑ = the ASCII set

– A string is a program

– The set of C programs

Trang 9

Terms (Fig.3.7)

prefix of s a string obtained by removing 0 or more trailing

symbols of s;

e.g ban is a prefix of banana

suffix of s a string formed by deleting 0 or more the leading

symbols of s;

e.g na is a suffix of banana

substring of s a string obtained by deleting a prefix and a suffix from

Trang 10

String operations

• String concatenation

– If x and y are strings, xy is the string formed

by appending y to x.

E.g.: x = hom, y = nay ⇒ xy = homnay

– ∈ is the identity: ∈y = y; x∈ = x

• String exponentiation

– s0 = ∈

– si = si-1s

E.g s = 01, s 0 = ∈, s 2 = 0101, s 3 = 010101

Trang 11

Language Operations (Fig 3.8)

union: L ∪ M L ∪ M = { s | s ∈ L or s ∈ M }concatenation: LM LM= { st | s ∈ L and t ∈ M }

Kleene closure: L* L* = L0 ∪ L ∪ LL ∪ LLL ∪ …

where L0 = {∈}

0 or more concatenations of Lpositive closure: L+ L+ = L ∪ LL ∪ LLL ∪ …

1 or more concatenations of L

Trang 12

all strings of letters, including ∈ all strings of letters and digits beginning with a letter all strings of one or more digits

Trang 13

Regular Expressions (REs) over

• Inductive base :

1 ∈ is a RE, denoting the RL {∈}

2 a ∈ ∑ is a RE, denoting the RL {a}

• Inductive step : Suppose r and s are REs,

denoting the language L(r) and L(s) Then

3 (r)|(s) is a RE, denoting the RL L(r) ∪ L(s)

4 (r)(s) is a RE, denoting the RL L(r)L(s)

5 (r)* is a RE, denoting the RL (L(r))*

6 (r) is a RE, denoting the RL L(r)

Trang 14

Precedence and Associativity

• Precedence:

– “*” has the highest precedence

– “concatenation” has the second highest precedence – “|” has the lowest precedence

• Associativity:

– all are left-associative

E.g.: (a)|((b)*(c)) ≡ a|b*c

Trang 15

• ∑ = {a, b}

1 a|b denotes {a,b}

2 (a|b)(a|b) denotes {aa,ab,ba,bb}

3 a* denotes {∈,a,aa,aaa,aaaa,…}

4 (a|b)* denotes ?

5 a|a*b denotes ?

Trang 16

Notational Shorthands

• One or more instances +: r+ = rr*

– denotes the language (L(r))+

• Zero or one instance ?: r? = r|∈

– denotes the language (L(r) ∪ {∈})

• Character classes

– [abc] denotes a|b|c

– [A-Z] denotes A|B|…|Z

– [a-zA-Z_][a-zA-Z0-9_]* denotes ?

Trang 18

3.3

Trang 19

Nondeterministic finite automata

• A nondeterministic finite automaton (NFA)

is a mathematical model that consists of

– a finite set of states S

– a set of input symbols ∑

– a transition function move: S × ∑ → S

– a start state s 0

– a finite set of final or accepting states F

Trang 22

• A NFA accepts an input string x iff there is some path in the transition graph from

start state to some accepting state such

that the edge labels along this path spell

0

Trang 23

Deterministic finite automata

• A deterministic finite automaton (DFA) is

a special case of NFA in which

1 no state has an ∈-transition, and

2 for each state s and input symbol a, there is

at most one edge labeled a leaving s

Trang 24

Thompson’s construction of NFA

Trang 25

Thompson’s construction (cont’d)

• Suppose N(s) and N(t) are NFA’s for REs

Trang 26

– REs ⇒ NFA (Thompson’s construction) √

– NFA ⇒ DFA (subset construction)

– DFA ⇒ minimal DFA (Algorithm 3.6)

• Programming

Trang 27

Subset construction

Operation Description

∈-closure(s) Set of NFA states reachable from state s on

∈-transition alone

∈-closure(T) Set of NFA states reachable from some

state s in T on ∈-transition alone

move(T,a) Set of NFA states to which there is a

transition on input a from some state s in T

• s : an NFA state

• T : a set of NFA states

Trang 28

Subset construction (cont’d)

Let s0 be the start state of the NFA;

Dstates contains the only unmarked state ∈-closure(s 0 );

while there is an unmarked state T in Dstates do begin

mark T

for each input symbol a do begin

U := ∈-closure(move(T; a));

if U is not in Dstates then

Add U as an unmarked state to Dstates ;

DTran [T; a] := U;

end;

Trang 29

• Let (∑, S, T, F, s0) be the original NFA The DFA is:

• The alphabet: ∑

• The states: all states in Dstates

• The transitions: DTran

• The accepting states: all states in Dstates

containing at least one accepting state in F of

the NFA

• The start state: ∈-closure(s0)

Trang 30

– NFA ⇒ DFA (subset construction) √

– DFA ⇒ minimal DFA (Algorithm 3.6)

• Programming

Trang 31

Minimise a DFA

Initially, create two states:

1 one is the set of all final states: F

2 the other is the set of all non-final states: S - F

while (more splits are possible) {

Let S = {s1,…, sn} be a state and c be any char in ∑

Let t1,…, tn be the successor states to s1,…, sn under c

if (t1,…, tn don't all belong to the same state) {

Split S into new states so that si and sj remain in the same state iff ti and tj are in the same state

}

Trang 32

C b

b

b b

a a

a

Trang 33

– DFA ⇒ minimal DFA (Algorithm 3.6) √

• Programming

Trang 34

Input Buffering

b e g i n

…

Scanner

eof

if (forward at end of first half) {

reload second half forward++

} else

if (forward at end of second half) {

reload first half forward = 0

} else

forward++

Trang 35

Input Buffering

b e g i n

if (forward at end of first half) {

reload second half forward++

} else

if (forward at end of second half) {

reload first half forward = 0

} else

terminate the analysis

}

Trang 38

move forward back

get lexeme from beginning to forward

move forward onward

beginning = forward

state = 0

}

b e g i n : = …

Trang 39

– DFA ⇒ minimal DFA (Algorithm 3.6) √

Định dạng
Số trang	39
Dung lượng	242,74 KB