Describing syntax and semantics

• A language may be hard to learn, hard to implement and any ambiguity in the implement, and any ambiguity in the specification may lead to dialect differences if we do not have a cle

Trang 2

• A language may be hard to learn, hard to

implement and any ambiguity in the

implement, and any ambiguity in the

specification may lead to dialect differences if

we do not have a clear language definition

• Most new programming languages are subjected

to a period of scrutiny by potential users before

to a period of scrutiny by potential users before their designs are completed

– Other language designers

– Implementors

– Programmers (the users of the language)

– Semantics - the meaning of the expressions,

statements, and program units

• Semantics should follow from syntax the form

• Semantics should follow from syntax, the form

of statements should be clear and imply what the statements do or how they should be used the statements do or how they should be used

Trang 3

• A A language language is a set of sentences is a set of sentences

• A lexeme is the lowest level syntactic unit of a

language (e.g., *, +, =, sum, begin)

• A token is a category of lexemes (e.g.,

identifier)

• Languages can be formally defined in two

distinct ways: by recognition and by generation

Trang 4

the language and decides whether the input strings belong to the language

strings belong to the language

• Example: syntax analysis part of a compiler

Trang 5

Language Generators

• A device that generates sentences of a A device that generates sentences of a

language

One can determine if the syntax of a

• One can determine if the syntax of a

particular sentence is correct by comparing

it to the structure of the generator

Language Recognizers vs Generators

in trial-and-error mode (black box)

The structure of the language generator is

• The structure of the language generator is

an open-book which people can easily read

and understand

Trang 6

Formal Methods of Describing Syntax

generation mechanisms that are commonly used to describe the syntax of programming

used to describe the syntax of programming languages

– The tokens of programming languages can be

described by regular grammars

– Whole programming languages, with minor

exceptions, can be described by context-free p , y

grammars

Trang 7

• BNF is equivalent to context-free grammars

• Extended BNF (EBNF) improves readability and writability of BNF

symbols )

Backus-Naur Form (cont.)

• Although BNF is simple, it is sufficiently

powerful to describe the great majority of the syntax of programming languages:

– Lists of similar constructs

– The order in which different constructs must

Trang 8

Grammar and Rules

• A grammar is a finite nonempty set of rules

• A rule has a left-hand side (LHS) and a

• An abstraction (or nonterminal symbol) can

have more than one RHS

<if_stmt> 

if <logic_expr> then <stmt> |

if <logic_expr> then <stmt> else <stmt>

Describing Lists

• Syntactic lists (for example, a list of

identifiers appearing on a data declaration

RHS

<ident_list>  identifier | identifier , <ident_list>

Note: Comma ‘,’ is a terminal

Trang 9

Example: A grammar of expressions

• Seven terminal symbols:

+ - * ( ) x y

+ ( ) x y

• Four non-terminal symbols:

‹expr› ‹term› ‹factor› ‹var›

• Start/goal symbol:

‹expr›

Rules of grammar

‹expr›  ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›

‹term›  ‹factor› | ‹term› * ‹factor›

‹factor›  ‹var› | ( ‹expr› )

‹var›  x | y

Derivations

• A derivation is a repeated application of

ending with a sentence (all terminal

symbols)

• Derivations can be used to generate all the Derivations can be used to generate all the possible sentences in a grammar

• Every string of symbols in the derivation is a

sentential form

Trang 10

Derivations (cont.)

• A sentence is a sentential form that has only terminal symbols

• A leftmost derivation is one in which the

leftmost nonterminal in each sentential form

is the one that is expanded

rightmost

• Derivation order should have no effect on the

language generated by a grammar

Example: a grammar and left derivation

<noun-phrase>  <article> <noun>

<article>  a | the

<noun>  girl | dog

<verb-phrase>  <verb> <noun-phrase>

<verb>  sees | pets

 <article> <noun> <verb-phrase>

 <article> <noun> <verb phrase> .

 the <noun> <verb-phrase> .

 the girl <verb-phrase>

 the girl <verb> <noun phrase>

 the girl <verb> <noun-phrase>

 the girl sees <noun-phrase>

 the girl sees <article> <noun>

 the girl sees a <noun>

 the girl sees a dog

Trang 11

Context free?

• In the previous example you might wonder

about the idea of context

• In a context-free grammar we find that

replacements do not have any context in which they cannot occur

– The dog pets the girl = 

– The girl pets the dog = 

• Of course this means that there are certain

contexts that the rules don’t work, thus it would

‹term› * ‹factor› + ‹term› 

‹factor› * ‹factor› + ‹term› 

( ‹expr› ) * ‹factor› + ‹term› 

( p )

( ‹expr› - ‹term› ) * ‹factor› + ‹term› 

( ‹term› - ‹term› ) * ‹factor› + ‹term› 

( ‹factor› - ‹term› ) * ‹factor› + ‹term› 

( ‹var› - ‹term› ) * ‹factor› + ‹term› 

( ‹var› ‹term› ) ‹factor› + ‹term› 

( x - ‹term› ) * ‹factor› + ‹term› 

( x - ‹factor› ) * ‹factor› + ‹term› 

( x - ‹var› ) * ‹factor› + ‹term› 

Trang 12

Example: A grammar for a small language

<program>  begin <stmt list> end

Example: A left derivation of grammar

 begin <stmt list> end

 begin <stmt_list> end

 begin <stmt> ; <stmt_list> end

 begin <var> = <expression> ; <stmt_list> end

 begin A = <expression> ; <stmt_list> end

 begin A = <var> + <var> ; <stmt_list> end

 begin A = B + <var> ; <stmt list> end

 begin A = B + <var> ; <stmt_list> end

 begin A = B + C ; <stmt_list> end

 begin A = B + C ; <stmt > end

 begin A = B + C ; <var> = <expression> end

 begin A = B + C ; B = <expression> end

 begin A = B + C ; B = <var> end

 begin A B + C ; B <var> end

 begin A = B + C ; B = C end

Trang 13

Parse Tree

• Grammars naturally describe the hierarchical syntactic structure of the sentences of the

languages they define

trees

– Every internal node of a parse tree is labeled with a nonterminal symbol

– Every leaf is labeled with a terminal symbol

– Every subtree of a parse tree describes one instance of

an abstraction in the statement

Example: Parser tree of sentence

“the girl sees a dog ” the girl sees a dog.

t sentence noun-phrase verb-phrase

article noun

Trang 14

Example: A Parser Tree

Rules of grammar:

g

<assign>  <id> = <expr>

<expr>  <id> + <expr> |

<digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Trang 15

… given 234, we have different derivations …

number digit digit

digit digit digit

generates a sentential form that has two or more distinct parse trees

• Ambiguity should be avoided

Trang 16

Two distinct parser trees for sentence

and their meaning

g

A = B + (C * A) A = (B + C) * A

Operator Precedence

• An operator in an arithmetic expression

which is generated lower in the parse tree can be used to indicate that it has

precedence over an operator produced

higher up in the tree

• Although the above grammar is not

ambiguous, the precedence order of its

operators is not the usual one

Trang 17

Removing Ambiguity

• An unambiguous grammar for expressions

<id>  A | B | C

Example: Derivation and parser tree of sentence A = B + C * A

a o

<id>

 A = B + C * A B C

Trang 18

Associativity of Operators

• Addition becomes left or right associative This

isn’t so bad with addition but associativity can

be a problem with other operators, e.g

subtraction and division

• When a BNF rule has its LHS also appearing at

the beginning of its RHS, the rule is said to be left recursive

• We can fix this problem with following rules:

– Left recursive rules become left associative

– Right recursive rules become right associative

Associativity of Operators (cont.)

exponentiation operator, it is right associative The following rules could be used to describe

• The following rules could be used to describe

Trang 19

An Unambiguous Grammar for statement

if-then-else

• The ambiguous grammar of if-then-else

statement has following rules:

<if_stmt>  if <logic_expr> then <stmt> |

if <logic_expr> then <stmt> else <stmt>

• The simplest sentential form that illustrates this ambiguity is

Trang 20

if <logic expr> then <stmt> |

if <logic_expr> then <stmt> |

if <logic_expr> then <matched> else <unmatched>

There is just one possible parse tree …

Trang 21

– Optional parts are placed in brackets ([])

<selection>  if (<expression>) <statement> [ l <statement> ]

<selection>  if (<expression>) <statement> [else <statement> ]

– Put alternative parts of RHSs in parentheses and

separate them with vertical bars p

<for_stmt>  for <var> := <expr> (to|downto)<expr> do <stmt>

– Put repetitions (0 or more) in braces ({})

<ident> -> letter { letter | digit }

Example: BNF and EBNF versions of an expression grammar

EBNF: <expr>  <term> {(+ | ) <term>}

<term>  <factor> {(* | /) <factor>}

Trang 22

Syntax Graphs

• The information in BNF and EBNF rules can be represented in a directed graph Such graphs are

• A separate graph is used for each syntactic unit

• Syntax graphs use different kinds of nodes to represent the terminal and nonterminal symbols

of the right sides of a grammar's rules

– Rectangle nodes contain the names of syntactic units (nonterminals)

– Circles or ellipses contain terminal symbols

Example: The Ada if statement

<else if>  elsif <condition> then <stmts>

<else_if>  elsif <condition> then <stmts>

Trang 23

Attribute Grammars

• CFGs cannot describe all of the syntax of

programming languages

info along through parse trees

– Static semantics specification

Compiler design (static semantics checking)

– Compiler design (static semantics checking)

Example 2 : All variables must be declared before they

• Example 2 : All variables must be declared before they are referenced

• Example 3 Example 3 : If the end of an Ada subprogram is : If the end of an Ada subprogram is

followed by a name, that name must match the name

of the subprogram

• These problems exemplify the category of language rules called static semantics rules They cannot be specified in BNF

Trang 24

Attribute Grammars - Basic Concepts

• Attributes , which are associated with grammar symbols, are similar to variables in the sense

that they can have values assigned to them

• Attribute computation functions (or semantic functions ) are associated with grammar rules They are used to specify how attribute values are computed

• Predicate functions , which state the static

semantic rules of the language, are associated

with grammar rules

Attribute Grammars - Definition

• An attribute grammar is a CFG G = (S, N, T, P)

with the following additions:

– For each grammar symbol X there is a set A(X)

of attribute values

define certain attributes of the nonterminals in the rule

– Each rule has a (possibly empty) set of

predicate functions to check for attribute

consistency

Trang 25

Attribute Grammars (cont.)

• The set A(X) consists of two disjoint sets:

– Synthesized attributes S(X): to pass semantic

information up a parser tree

– Inherited attributesInherited attributes I(X): to pass semantic informationI(X): to pass semantic information down a parser tree

• Let X Let X00   X X11 … X … Xnn be a rule be a rule

• Functions of the form S(X0) = f(A(X1), A(Xn)) define synthesized attributes of X y 00

• Functions of the form I(Xj) = f(A(X0), , A(Xn)), for

1  j  n, define inherited attributes of Xj

• Initially, there are intrinsic attributes on the leaves

• The value of an inherited attribute on a parse tree node depends on the attribute values of that node's parent node and those of its

sibling nodes

• To avoid circularity, inherited attributes are To avoid circularity, inherited attributes are often restricted to functions of the form:

I(Xj) = f(A(X0), , A(Xj-1))

Trang 26

• A predicate function has the form of a Boolean

• The only derivations allowed with an attribute

• The only derivations allowed with an attribute grammar are those in which every predicate

violation of the syntax or static semantics rules

of the language

o t e a guage

Example

• Rule in English: “The name on the end of an

• Rule in English: The name on the end of an

Ada procedure must match the procedure's

Trang 27

Example: Assignment Statement

• The only variable names are A, B, and C

• The RHS can either be a variable or an expression in the form of a variable added to another variable

• The variables can be one of two types: int or real

• The type of the expression when the operand types

are not the same is always real When they are the

same, the expression type is that of the operands

• The type of the LHS must match the type of the RHS The assignment is valid only if the LHS and the value

resulting from evaluating the RHS have the same type

Example (cont.)

• The syntax portion of attribute grammar is:

Định dạng
Số trang	46
Dung lượng	231,74 KB