Chapter 3 describing syntax and semantics

Chapter 3 Topics • Introduction • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Attributes Grammars • Describing the Meanings of Programs: Dynamic Semantics Introduction A language may be hard to learn, hard to implement, and any ambiguity in the specification may lead to dialect differences if we do not have a clear language definition Most new programming languages are subjected to a period of scrutiny by potential users before their designs are completed Who must use language definitions – Other language designers – Implementors – Programmers (the users of the language)

Trang 1

Chapter 3

Describing Syntax

and Semantics

Trang 2

Chapter 3 Topics

• Introduction

• The General Problem of Describing Syntax

• Formal Methods of Describing Syntax

• Attributes Grammars

• Describing the Meanings of Programs:

Dynamic Semantics

Trang 3

• A language may be hard to learn, hard to

implement, and any ambiguity in the

specification may lead to dialect differences if

we do not have a clear language definition

• Most new programming languages are subjected

to a period of scrutiny by potential users before

their designs are completed

• Who must use language definitions

– Other language designers

– Implementors

– Programmers (the users of the language)

Trang 4

Introduction (cont.)

• The study of programming languages can be

divided into examinations of syntax and

semantics

– Syntax - the form or structure of the expressions,

statements, and program units

– Semantics - the meaning of the expressions,

statements, and program units

• Semantics should follow from syntax, the form

of statements should be clear and imply what

the statements do or how they should be used

Trang 6

The General Problem of Describing Syntax

• A sentence is a string of characters over some alphabet

• A language is a set of sentences

• A lexeme is the lowest level syntactic unit of a language (e.g., *, +, =, sum, begin)

• A token is a category of lexemes (e.g.,

identifier, number, operator, …)

Trang 8

The Definition of Languages

• Languages can be formally defined in two

distinct ways: by recognition and by

generation

– A recognition device of the language reads

input strings and decides whether the input

strings belong to the language

– Example: syntax analysis part of a compiler

Trang 9

The Definition of Languages (cont.)

– A device that generates sentences of a language – One can determine if the syntax of a particular

sentence is correct by comparing it to the

structure of the generator

Trang 10

Language Recognizers vs Generators

• The language recognizer can only be used

in trial-and-error mode (black box)

• The structure of the language generator is

an open-book which people can easily read and understand

Trang 11

Formal Methods of Describing Syntax

• This section discusses the formal language generation mechanisms that are commonly used to describe the syntax of programming languages

• These mechanisms are often called

grammars

• We will discuss the class of languages called context-free languages

Trang 12

Context-Free Grammars

• Noam Chomsky

– A linguist

– Described four classes of grammars in the mid-1950s

• Two of these grammar class, context-free

and regular grammars, are useful in

computer science

– The tokens of programming languages can be

described by regular grammars

– Whole programming languages, with minor

exceptions, can be described by context-free

grammars

Trang 13

• BNF is equivalent to context-free grammars

• In BNF, abstractions are used to represent

syntactic structures (also called nonterminal symbols )

Trang 14

Backus-Naur Form (cont.)

• Although BNF is simple, it is sufficiently

powerful to describe the great majority of

the syntax of programming languages:

– Lists of similar constructs

– The order in which different constructs must

Trang 15

Grammar and Rules

• A grammar is a finite nonempty set of rules

• A rule has a left-hand side (LHS) and a

• An abstraction (or nonterminal symbol) can

have more than one RHS

<if_stmt> 

if <logic_expr> then <stmt> |

Trang 16

Describing Lists

• Syntactic lists (for example, a list of

identifiers appearing on a data declaration statement) are described using recursion

• A rule is recursive if its LHS appears in its RHS

<ident_list>  identifier | identifier , <ident_list>

Note: Comma „,‟ is a terminal

Trang 17

Example: A grammar of expressions

• Seven terminal symbols:

+ * ( ) x y

• Four non-terminal symbols:

‹expr› ‹term› ‹factor› ‹var›

• Start/goal symbol:

‹expr›

Rules of grammar

‹expr›  ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›

‹term›  ‹factor› | ‹term› * ‹factor›

‹factor›  ‹var› | ( ‹expr› )

‹var›  x | y

Trang 18

• A derivation is a repeated application of

rules, starting with the start symbol and

ending with a sentence (all terminal

Trang 19

<noun-phrase>  <article> <noun>

<article>  a | the

<noun>  girl | dog

<verb-phrase>  <verb> <noun-phrase>

<verb>  sees | pets

 <article> <noun> <verb-phrase> .

 the <noun> <verb-phrase> .

 the girl <verb-phrase>

 the girl <verb> <noun-phrase>

 the girl sees <noun-phrase>

 the girl sees <article> <noun>

 the girl sees a <noun>

Trang 20

Derivations (cont.)

• A sentence is a sentential form that has only terminal symbols

• A leftmost derivation is one in which the

leftmost nonterminal in each sentential form

is the one that is expanded

• A derivation may be neither leftmost nor

rightmost

• Derivation order should have no effect on the language generated by a grammar

Trang 21

Context free?

• In the previous example you might wonder

about the idea of context

• In a context-free grammar we find that

replacements do not have any context in which they cannot occur

– The dog pets the girl = 

– The girl pets the dog = 

• Of course this means that there are certain

contexts that the rules don‟t work, thus it would not be “context-free”

Trang 22

Example: A left derivation of expression:

( x - y ) * x + y

‹expr› 

‹expr› + ‹term› 

‹term› + ‹term› 

‹term› * ‹factor› + ‹term› 

‹factor› * ‹factor› + ‹term› 

( ‹expr› ) * ‹factor› + ‹term› 

( ‹expr› - ‹term› ) * ‹factor› + ‹term› 

( ‹term› - ‹term› ) * ‹factor› + ‹term› 

( ‹factor› - ‹term› ) * ‹factor› + ‹term› 

( ‹var› - ‹term› ) * ‹factor› + ‹term› 

( x - ‹term› ) * ‹factor› + ‹term› 

( x - ‹factor› ) * ‹factor› + ‹term› 

( x - ‹var› ) * ‹factor› + ‹term› 

Trang 23

Example: A grammar for a small language

<program>  begin <stmt_list> end

Trang 24

Example: A left derivation of grammar

 begin <stmt_list> end

 begin <stmt> ; <stmt_list> end

 begin <var> = <expression> ; <stmt_list> end

 begin A = <expression> ; <stmt_list> end

 begin A = <var> + <var> ; <stmt_list> end

 begin A = B + <var> ; <stmt_list> end

 begin A = B + C ; <stmt_list> end

 begin A = B + C ; <stmt > end

 begin A = B + C ; <var> = <expression> end

 begin A = B + C ; B = <expression> end

 begin A = B + C ; B = <var> end

Trang 25

Parse Tree

• Grammars naturally describe the hierarchical

syntactic structure of the sentences of the

languages they define

trees

– Every internal node of a parse tree is labeled with a

nonterminal symbol

– Every leaf is labeled with a terminal symbol

– Every subtree of a parse tree describes one instance of

an abstraction in the statement

Trang 26

Example: Parser tree of sentence

“the girl sees a dog.”

sentence noun-phrase verb-phrase

article noun verb noun-phrase

article noun the girl sees

Trang 27

Example: A Parser Tree

Rules of grammar:

<assign>  <id> = <expr>

<expr>  <expr> + <expr> |

<expr> * <expr> |

<id> |(<expr>)

<id>  A | B | C

Trang 28

<expr>  <expr> + <expr> | <expr> * <expr> |

( <expr> ) | <number> | <id>

<digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

<id>  A | B | C

Trang 29

… given 234, we have different derivations …

 number digit digit

 digit digit digit

4

Trang 30

Ambiguity (cont.)

• A grammar is ambiguous if and only if it

generates a sentential form that has two or more distinct parse trees

• Ambiguity should be avoided

Trang 31

Two distinct parser trees for sentence

Trang 32

Two distinct parser trees for sentence

Trang 33

Operator Precedence

• An operator in an arithmetic expression

which is generated lower in the parse tree

can be used to indicate that it has

precedence over an operator produced

higher up in the tree

• The precedence order of multiplication and addition operators in the above grammar is not the usual one

Trang 34

Removing Ambiguity

• An unambiguous grammar for expressions

<id>  A | B | C

Trang 35

Example: Derivation and parser tree of

<id>

Trang 36

Associativity of Operators

• Addition or multiplication become left or right associative However, the associativity can be a problem with other operators, e.g subtraction

and division

• When a BNF rule has its LHS also appearing at

the beginning of its RHS, the rule is said to be

left recursive

• We can fix this problem with following rules:

– Left recursive rules represent left associative

– Right recursive rules represent right associative

Trang 37

Associativity of Operators (cont.)

• In most languages that provide

exponentiation operator, it is right associative

• The following rules could be used to describe exponentiation as a right associative operator

Trang 38

An Unambiguous Grammar for statement

if-then-else

• The ambiguous grammar of if-then-else

statement has following rules:

<if_stmt>  if <logic_expr> then <stmt> |

if <logic_expr> then <stmt> else <stmt>

• The simplest sentential form that illustrates this ambiguity is

if <logic_expr> then

if <logic_expr> then <stmt>

else <stmt>

Trang 39

The two parser trees …

<if_stmt>

if <logic_expr> then <stmt> else <stmt>

if <logic_expr> then <stmt>

<if_stmt>

Trang 40

The two parser trees …

Trang 41

An Unambiguous Grammar … (cont.)

• The rule for if constructs in most languages is

that an else clause, when present, is matched

with the nearest previous unmatched then (or if)

• The unambiguous grammar based on this rule

follows

<matched> 

if <logic_expr> then <matched> else <matched> |

any non-if statement

<unmatched> 

if <logic_expr> then <stmt> |

Trang 42

There is just one possible parse tree …

Trang 43

Extended BNF

• Extended BNF does not enhance the descriptive

power of BNF; it only increases BNF‟s readability and writability

• Three extensions:

– Optional parts are placed in square brackets

<selection>  if (<expression>) <statement> [ else <statement> ] ;

– Put alternative parts of RHSs in parentheses and

separate them with vertical bars

<for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>

– Put repetitions (0 or more) in curly braces

<ident> -> letter { letter | digit }

Trang 44

Example: BNF and EBNF versions of an expression grammar

BNF: <expr>  <expr> + <term> |

EBNF: <expr>  <term> {(+ | –) <term>}

<term>  <factor> {(* | /) <factor>}

Trang 45

Syntax Graphs

• The information in BNF and EBNF rules can be

represented in a directed graph Such graphs are

• A separate graph is used for each syntactic unit

• Syntax graphs use different kinds of nodes to

represent the terminal and nonterminal symbols

of the right sides of a grammar's rules

– Rectangle nodes contain the names of syntactic units (nonterminals)

– Circles or ellipses contain terminal symbols

Trang 46

Example: The Ada if statement

condition

end if ; if_stmt

Trang 47

Attribute Grammars

• CFGs cannot describe all of the syntax of

programming languages

• Additions to CFGs to carry some semantic

info along through parse trees

• Primary value of attribute grammars :

– Static semantics specification

– Compiler design (static semantics checking)

Trang 48

• Example 3 : If the end of an Ada subprogram is

followed by a name, that name must match the name

of the subprogram

• These problems exemplify the category of language rules called static semantics rules They cannot be

specified in BNF

Trang 49

Attribute Grammars - Basic Concepts

• Attributes , which are associated with grammar symbols, are similar to variables in the sense

that they can have values assigned to them

• Attribute computation functions (or semantic

functions ) are associated with grammar rules

They are used to specify how attribute values

are computed

• Predicate functions , which state the static

semantic rules of the language, are associated with grammar rules

Trang 50

Attribute Grammars - Definition

• An attribute grammar is a CFG G = (S, N, T, P)

with the following additions:

– For each grammar symbol X there is a set A(X)

of attribute values

define certain attributes of the nonterminals in the rule

– Each rule has a (possibly empty) set of

predicate functions to check for attribute

consistency

Trang 51

Attribute Grammars (cont.)

• The set A(X) consists of two disjoint sets:

– Synthesized attributes S(X): to pass semantic

information up a parser tree

– Inherited attributes I(X): to pass semantic

information down a parser tree

• Let X0  X1 … Xn be a rule

– Functions of the form S(X0) = f(A(X1), A(Xn))

define synthesized attributes of X0

– Functions of the form I(Xj) = f(A(X0), , A(Xn)), for

1  j  n, define inherited attributes of X

Trang 52

• The value of an inherited attribute on a parse tree node depends on the attribute values of that node's parent node and those of its

sibling nodes

– To avoid circularity, inherited attributes are often restricted to functions of the form:

I(Xj) = f(A(X0), , A(Xj-1))

• Initially, there are intrinsic attributes on the leaves

Trang 53

Example: Synthesized attributes

E → E1 + T E.val = E1.val + T.val

Trang 54

Annotated parse tree for 3 * 4 + 5

val = 12

val = 5

lexval = 5

val = 5 val = 17

Rules Semantic functions

E → E1 + T E.val = E1.val + T.val

Trang 55

Example: Inherited attributes

L → L1 , id L1.inh = id.type = L inh

Trang 56

Annotated parse tree for

Rules Semantic functions

D → T L L.inh = T.type

T → int T.type = integer

T → float T.type = float

L → L1 , id L1.inh = id.type = L.inh

L → id id.type = L.inh

Trang 57

• A predicate function has the form of a Boolean expression on the attribute set {A(X0), , A(Xn)}

• The only derivations allowed with an attribute grammar are those in which every predicate

associated with every nonterminal is true

violation of the syntax or static semantics rules

of the language

Trang 59

Example: Assignment Statement

• The only variable names are A, B, and C

• The RHS can either be a variable or an expression in the form of a variable added to another variable

• The variables can be one of two types: int or real

• The type of the expression when the operand types

are not the same is always real When they are the

same, the expression type is that of the operands

• The type of the LHS must match the type of the RHS The assignment is valid only if the LHS and the value resulting from evaluating the RHS have the same type

Tiêu đề	Describing Syntax and Semantics
Trường học	Addison-Wesley
Chuyên ngành	Programming Languages
Thể loại	chapter
Năm xuất bản	2006

Định dạng
Số trang	99
Dung lượng	494,37 KB