1. Trang chủ
  2. » Công Nghệ Thông Tin

slike bài giảng môn chương trình dịch chương 2 design pattern visitor

39 418 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 39
Dung lượng 242,74 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

• Read the input characters • Produce as output a sequence of tokens • Eliminate white space and comments lexical symbol table source program token get next token... Tokens, Patterns,

Trang 1

LEXICAL ANALYSIS

Phung Hua Nguyen

University of Technology

2006

Trang 3

• Read the input characters

• Produce as output a sequence of tokens

• Eliminate white space and comments

lexical

symbol table

source program

token get next token

Trang 4

Why ?

• Simplify design

• Improve compiler efficiency

• Enhance compiler portability

Trang 5

Tokens, Patterns, Lexemes

Token Sample Lexeme Informal description of pattern

const const const

relation <,<=,==,!=,>,>= < or <= or == or != or > or >=

id pi, count, x2 letter followed by letters or digits

num 3.14, 25, 6.02E3 any numeric constant

literal “core dumped” any characters between “ and “ except “

Trang 7

Alphabet, Strings and Languages

• Alphabet ∑: any finite set of symbols

– The Vietnamese alphabet {a, á, à, , ã, , b, c, d, đ,…}

– The binary alphabet {0,1}

– The ASCII alphabet

• String : a finite sequence of symbols drawn from ∑ :

– Length |s| of a string s: the number of symbols in s

– The empty string, denoted ∈, | ∈ | = 0

• Language : any set of strings over ∑;

– its two special cases:

• ∅: the empty set

• { ∈ }

Trang 8

– The set of Pentium instructions

• ∑ = the ASCII set

– A string is a program

– The set of C programs

Trang 9

Terms (Fig.3.7)

prefix of s a string obtained by removing 0 or more trailing

symbols of s;

e.g ban is a prefix of banana

suffix of s a string formed by deleting 0 or more the leading

symbols of s;

e.g na is a suffix of banana

substring of s a string obtained by deleting a prefix and a suffix from

Trang 10

String operations

• String concatenation

– If x and y are strings, xy is the string formed

by appending y to x.

E.g.: x = hom, y = nay ⇒ xy = homnay

– ∈ is the identity: ∈y = y; x∈ = x

• String exponentiation

– s0 = ∈

– si = si-1s

E.g s = 01, s 0 = ∈, s 2 = 0101, s 3 = 010101

Trang 11

Language Operations (Fig 3.8)

union: L ∪ M L ∪ M = { s | s ∈ L or s ∈ M }concatenation: LM LM= { st | s ∈ L and t ∈ M }

Kleene closure: L* L* = L0 ∪ L ∪ LL ∪ LLL ∪ …

where L0 = {∈}

0 or more concatenations of Lpositive closure: L+ L+ = L ∪ LL ∪ LLL ∪ …

1 or more concatenations of L

Trang 12

all strings of letters, including ∈ all strings of letters and digits beginning with a letter all strings of one or more digits

Trang 13

Regular Expressions (REs) over

• Inductive base :

1 ∈ is a RE, denoting the RL {∈}

2 a ∈ ∑ is a RE, denoting the RL {a}

• Inductive step : Suppose r and s are REs,

denoting the language L(r) and L(s) Then

3 (r)|(s) is a RE, denoting the RL L(r) L(s)

4 (r)(s) is a RE, denoting the RL L(r)L(s)

5 (r)* is a RE, denoting the RL (L(r))*

6 (r) is a RE, denoting the RL L(r)

Trang 14

Precedence and Associativity

• Precedence:

– “*” has the highest precedence

– “concatenation” has the second highest precedence – “|” has the lowest precedence

• Associativity:

– all are left-associative

E.g.: (a)|((b)*(c)) ≡ a|b*c

Trang 15

• ∑ = {a, b}

1 a|b denotes {a,b}

2 (a|b)(a|b) denotes {aa,ab,ba,bb}

3 a* denotes {∈,a,aa,aaa,aaaa,…}

4 (a|b)* denotes ?

5 a|a*b denotes ?

Trang 16

Notational Shorthands

• One or more instances +: r+ = rr*

– denotes the language (L(r))+

• Zero or one instance ?: r? = r|∈

– denotes the language (L(r) ∪ {∈})

• Character classes

– [abc] denotes a|b|c

– [A-Z] denotes A|B|…|Z

– [a-zA-Z_][a-zA-Z0-9_]* denotes ?

Trang 18

3.3

Trang 19

Nondeterministic finite automata

• A nondeterministic finite automaton (NFA)

is a mathematical model that consists of

– a finite set of states S

– a set of input symbols

– a transition function move: S × ∑ → S

– a start state s 0

– a finite set of final or accepting states F

Trang 22

• A NFA accepts an input string x iff there is some path in the transition graph from

start state to some accepting state such

that the edge labels along this path spell

0

Trang 23

Deterministic finite automata

• A deterministic finite automaton (DFA) is

a special case of NFA in which

1 no state has an ∈-transition, and

2 for each state s and input symbol a, there is

at most one edge labeled a leaving s

Trang 24

Thompson’s construction of NFA

Trang 25

Thompson’s construction (cont’d)

• Suppose N(s) and N(t) are NFA’s for REs

Trang 26

– REs ⇒ NFA (Thompson’s construction) √

– NFA ⇒ DFA (subset construction)

– DFA ⇒ minimal DFA (Algorithm 3.6)

• Programming

Trang 27

Subset construction

Operation Description

-closure(s) Set of NFA states reachable from state s on

∈-transition alone

-closure(T) Set of NFA states reachable from some

state s in T on ∈-transition alone

move(T,a) Set of NFA states to which there is a

transition on input a from some state s in T

• s : an NFA state

• T : a set of NFA states

Trang 28

Subset construction (cont’d)

Let s0 be the start state of the NFA;

Dstates contains the only unmarked state ∈-closure(s 0 );

while there is an unmarked state T in Dstates do begin

mark T

for each input symbol a do begin

U := ∈-closure(move(T; a));

if U is not in Dstates then

Add U as an unmarked state to Dstates ;

DTran [T; a] := U;

end;

end;

Trang 29

• Let (∑, S, T, F, s0) be the original NFA The DFA is:

• The alphabet: ∑

• The states: all states in Dstates

• The transitions: DTran

• The accepting states: all states in Dstates

containing at least one accepting state in F of

the NFA

• The start state: ∈-closure(s0)

Trang 30

– REs ⇒ NFA (Thompson’s construction) √

– NFA ⇒ DFA (subset construction) √

– DFA ⇒ minimal DFA (Algorithm 3.6)

• Programming

Trang 31

Minimise a DFA

Initially, create two states:

1 one is the set of all final states: F

2 the other is the set of all non-final states: S - F

while (more splits are possible) {

Let S = {s1,…, sn} be a state and c be any char in ∑

Let t1,…, tn be the successor states to s1,…, sn under c

if (t1,…, tn don't all belong to the same state) {

Split S into new states so that si and sj remain in the same state iff ti and tj are in the same state

}

}

Trang 32

C b

b

b

b b

a a

a

a

Trang 33

– REs ⇒ NFA (Thompson’s construction) √

– NFA ⇒ DFA (subset construction) √

– DFA ⇒ minimal DFA (Algorithm 3.6) √

• Programming

Trang 34

Input Buffering

b e g i n

Scanner

eof

if (forward at end of first half) {

reload second half forward++

} else

if (forward at end of second half) {

reload first half forward = 0

} else

forward++

Trang 35

Input Buffering

b e g i n

if (forward at end of first half) {

reload second half forward++

} else

if (forward at end of second half) {

reload first half forward = 0

} else

terminate the analysis

}

Trang 38

move forward back

get lexeme from beginning to forward

move forward onward

beginning = forward

state = 0

}

b e g i n : = …

Trang 39

– REs ⇒ NFA (Thompson’s construction) √

– NFA ⇒ DFA (subset construction) √

– DFA ⇒ minimal DFA (Algorithm 3.6) √

Ngày đăng: 23/10/2014, 17:33

TỪ KHÓA LIÊN QUAN