• A language may be hard to learn, hard to implement and any ambiguity in the implement, and any ambiguity in the specification may lead to dialect differences if we do not have a cle
Trang 2• A language may be hard to learn, hard to
implement and any ambiguity in the
implement, and any ambiguity in the
specification may lead to dialect differences if
we do not have a clear language definition
we do not have a clear language definition
• Most new programming languages are subjected
to a period of scrutiny by potential users before
to a period of scrutiny by potential users before their designs are completed
– Other language designers
– Implementors
Copyright © 2006 Addison-Wesley All rights reserved 1-3
– Programmers (the users of the language)
– Semantics - the meaning of the expressions,
statements, and program units
• Semantics should follow from syntax the form
• Semantics should follow from syntax, the form
of statements should be clear and imply what the statements do or how they should be used the statements do or how they should be used
Trang 3• A A language language is a set of sentences is a set of sentences
• A lexeme is the lowest level syntactic unit of a
language (e.g., *, +, =, sum, begin)
• A token is a category of lexemes (e.g.,
identifier)
• Languages can be formally defined in two
distinct ways: by recognition and by generation
Trang 4the language and decides whether the input strings belong to the language
strings belong to the language
• Example: syntax analysis part of a compiler
Trang 5Language Generators
• A device that generates sentences of a A device that generates sentences of a
language
One can determine if the syntax of a
• One can determine if the syntax of a
particular sentence is correct by comparing
it to the structure of the generator
Copyright © 2006 Addison-Wesley All rights reserved 1-9
Language Recognizers vs Generators
in trial-and-error mode (black box)
The structure of the language generator is
• The structure of the language generator is
an open-book which people can easily read
and understand
Trang 6Formal Methods of Describing Syntax
generation mechanisms that are commonly used to describe the syntax of programming
used to describe the syntax of programming languages
– The tokens of programming languages can be
described by regular grammars
– Whole programming languages, with minor
exceptions, can be described by context-free p , y
grammars
Trang 7• BNF is equivalent to context-free grammars
• Extended BNF (EBNF) improves readability and writability of BNF
Copyright © 2006 Addison-Wesley All rights reserved 1-13
symbols )
Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
Trang 8Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a
<assign> <var> = <expression>
• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt>
if <logic_expr> then <stmt> |
Copyright © 2006 Addison-Wesley All rights reserved 1-15
if <logic_expr> then <stmt> else <stmt>
Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration
RHS
RHS
<ident_list> identifier | identifier , <ident_list>
Note: Comma ‘,’ is a terminal
Trang 9Example: A grammar of expressions
• Seven terminal symbols:
+ - * ( ) x y
+ ( ) x y
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›
Rules of grammar
‹expr› ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›
‹term› ‹factor› | ‹term› * ‹factor›
‹factor› ‹var› | ( ‹expr› )
Copyright © 2006 Addison-Wesley All rights reserved 1-17
‹var› x | y
Derivations
• A derivation is a repeated application of
ending with a sentence (all terminal
symbols)
• Derivations can be used to generate all the Derivations can be used to generate all the possible sentences in a grammar
• Every string of symbols in the derivation is a
• Every string of symbols in the derivation is a
sentential form
Trang 10Derivations (cont.)
• A sentence is a sentential form that has only terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded
rightmost
rightmost
• Derivation order should have no effect on the
Copyright © 2006 Addison-Wesley All rights reserved 1-19
language generated by a grammar
Example: a grammar and left derivation
<sentence> <noun-phrase> <verb-phrase>
<noun-phrase> <article> <noun>
<article> a | the
<noun> girl | dog
<verb-phrase> <verb> <noun-phrase>
<verb> sees | pets
<sentence> <noun-phrase> <verb-phrase>
<article> <noun> <verb-phrase>
<article> <noun> <verb phrase> .
the <noun> <verb-phrase> .
the girl <verb-phrase>
the girl <verb> <noun phrase>
the girl <verb> <noun-phrase>
the girl sees <noun-phrase>
the girl sees <article> <noun>
the girl sees a <noun>
the girl sees a <noun>
the girl sees a dog
Trang 11Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which they cannot occur
– The dog pets the girl =
– The girl pets the dog =
• Of course this means that there are certain
contexts that the rules don’t work, thus it would
Copyright © 2006 Addison-Wesley All rights reserved 1-21
‹term› * ‹factor› + ‹term›
‹factor› * ‹factor› + ‹term›
( ‹expr› ) * ‹factor› + ‹term›
( p )
( ‹expr› - ‹term› ) * ‹factor› + ‹term›
( ‹term› - ‹term› ) * ‹factor› + ‹term›
( ‹factor› - ‹term› ) * ‹factor› + ‹term›
( ‹var› - ‹term› ) * ‹factor› + ‹term›
( ‹var› ‹term› ) ‹factor› + ‹term›
( x - ‹term› ) * ‹factor› + ‹term›
( x - ‹factor› ) * ‹factor› + ‹term›
( x - ‹var› ) * ‹factor› + ‹term›
Trang 12Example: A grammar for a small language
<program> begin <stmt list> end
Copyright © 2006 Addison-Wesley All rights reserved 1-23
Example: A left derivation of grammar
<program>
begin <stmt list> end
begin <stmt_list> end
begin <stmt> ; <stmt_list> end
begin <var> = <expression> ; <stmt_list> end
begin A = <expression> ; <stmt_list> end
begin A = <var> + <var> ; <stmt_list> end
begin A = B + <var> ; <stmt list> end
begin A = B + <var> ; <stmt_list> end
begin A = B + C ; <stmt_list> end
begin A = B + C ; <stmt > end
begin A = B + C ; <var> = <expression> end
begin A = B + C ; B = <expression> end
begin A = B + C ; B = <var> end
begin A B + C ; B <var> end
begin A = B + C ; B = C end
Trang 13Parse Tree
• Grammars naturally describe the hierarchical syntactic structure of the sentences of the
languages they define
trees
– Every internal node of a parse tree is labeled with a nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Copyright © 2006 Addison-Wesley All rights reserved 1-25
an abstraction in the statement
Example: Parser tree of sentence
“the girl sees a dog ” the girl sees a dog.
t sentence noun-phrase verb-phrase
article noun
Trang 14Example: A Parser Tree
<assign>
Rules of grammar:
<id> = <expr>
g
<assign> <id> = <expr>
<expr> <id> + <expr> |
<number>
<number> <number> <digit> | <digit>
<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Trang 15… given 234, we have different derivations …
number digit digit
digit digit digit
generates a sentential form that has two or more distinct parse trees
• Ambiguity should be avoided
Trang 16Two distinct parser trees for sentence
and their meaning
Copyright © 2006 Addison-Wesley All rights reserved 1-31
g
A = B + (C * A) A = (B + C) * A
Operator Precedence
• An operator in an arithmetic expression
which is generated lower in the parse tree can be used to indicate that it has
precedence over an operator produced
higher up in the tree
higher up in the tree
• Although the above grammar is not
ambiguous, the precedence order of its
operators is not the usual one
Trang 17Removing Ambiguity
• An unambiguous grammar for expressions
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
Copyright © 2006 Addison-Wesley All rights reserved 1-33
Example: Derivation and parser tree of sentence A = B + C * A
a o
<id>
A = B + C * A B C
Trang 18Associativity of Operators
• Addition becomes left or right associative This
isn’t so bad with addition but associativity can
be a problem with other operators, e.g
subtraction and division
• When a BNF rule has its LHS also appearing at
the beginning of its RHS, the rule is said to be left recursive
• We can fix this problem with following rules:
– Left recursive rules become left associative
Copyright © 2006 Addison-Wesley All rights reserved 1-35
– Right recursive rules become right associative
Associativity of Operators (cont.)
exponentiation operator, it is right associative The following rules could be used to describe
• The following rules could be used to describe
<factor> <expr> ** <factor> | <expr>
<expr> ( <expr> ) | <id>
Trang 19An Unambiguous Grammar for statement
if-then-else
• The ambiguous grammar of if-then-else
statement has following rules:
<if_stmt> if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
• The simplest sentential form that illustrates this ambiguity is
Trang 20if <logic expr> then <stmt> |
Copyright © 2006 Addison-Wesley All rights reserved 1-39
if <logic_expr> then <stmt> |
if <logic_expr> then <matched> else <unmatched>
There is just one possible parse tree …
Trang 21– Optional parts are placed in brackets ([])
<selection> if (<expression>) <statement> [ l <statement> ]
<selection> if (<expression>) <statement> [else <statement> ]
– Put alternative parts of RHSs in parentheses and
separate them with vertical bars p
<for_stmt> for <var> := <expr> (to|downto)<expr> do <stmt>
– Put repetitions (0 or more) in braces ({})
Copyright © 2006 Addison-Wesley All rights reserved 1-41
<ident> -> letter { letter | digit }
Example: BNF and EBNF versions of an expression grammar
EBNF: <expr> <term> {(+ | ) <term>}
<term> <factor> {(* | /) <factor>}
Trang 22Syntax Graphs
• The information in BNF and EBNF rules can be represented in a directed graph Such graphs are
• A separate graph is used for each syntactic unit
• Syntax graphs use different kinds of nodes to represent the terminal and nonterminal symbols
of the right sides of a grammar's rules
– Rectangle nodes contain the names of syntactic units (nonterminals)
Copyright © 2006 Addison-Wesley All rights reserved 1-43
– Circles or ellipses contain terminal symbols
Example: The Ada if statement
<else if> elsif <condition> then <stmts>
<else_if> elsif <condition> then <stmts>
Trang 23Attribute Grammars
• CFGs cannot describe all of the syntax of
programming languages
info along through parse trees
– Static semantics specification
Compiler design (static semantics checking)
– Compiler design (static semantics checking)
Copyright © 2006 Addison-Wesley All rights reserved 1-45
Example 2 : All variables must be declared before they
• Example 2 : All variables must be declared before they are referenced
• Example 3 Example 3 : If the end of an Ada subprogram is : If the end of an Ada subprogram is
followed by a name, that name must match the name
of the subprogram
• These problems exemplify the category of language rules called static semantics rules They cannot be specified in BNF
Trang 24Attribute Grammars - Basic Concepts
• Attributes , which are associated with grammar symbols, are similar to variables in the sense
that they can have values assigned to them
• Attribute computation functions (or semantic functions ) are associated with grammar rules They are used to specify how attribute values are computed
• Predicate functions , which state the static
semantic rules of the language, are associated
Copyright © 2006 Addison-Wesley All rights reserved 1-47
with grammar rules
Attribute Grammars - Definition
• An attribute grammar is a CFG G = (S, N, T, P)
with the following additions:
– For each grammar symbol X there is a set A(X)
of attribute values
define certain attributes of the nonterminals in the rule
– Each rule has a (possibly empty) set of
predicate functions to check for attribute
consistency
Trang 25Attribute Grammars (cont.)
• The set A(X) consists of two disjoint sets:
– Synthesized attributes S(X): to pass semantic
information up a parser tree
– Inherited attributesInherited attributes I(X): to pass semantic informationI(X): to pass semantic information down a parser tree
• Let X Let X00 X X11 … X … Xnn be a rule be a rule
• Functions of the form S(X0) = f(A(X1), A(Xn)) define synthesized attributes of X y 00
• Functions of the form I(Xj) = f(A(X0), , A(Xn)), for
1 j n, define inherited attributes of Xj
Copyright © 2006 Addison-Wesley All rights reserved 1-49
• Initially, there are intrinsic attributes on the leaves
Attribute Grammars (cont.)
• The value of an inherited attribute on a parse tree node depends on the attribute values of that node's parent node and those of its
sibling nodes
• To avoid circularity, inherited attributes are To avoid circularity, inherited attributes are often restricted to functions of the form:
I(Xj) = f(A(X0), , A(Xj-1))
Trang 26Attribute Grammars (cont.)
• A predicate function has the form of a Boolean
• The only derivations allowed with an attribute
• The only derivations allowed with an attribute grammar are those in which every predicate
violation of the syntax or static semantics rules
of the language
Copyright © 2006 Addison-Wesley All rights reserved 1-51
o t e a guage
Example
• Rule in English: “The name on the end of an
• Rule in English: The name on the end of an
Ada procedure must match the procedure's
Trang 27Example: Assignment Statement
• The only variable names are A, B, and C
• The RHS can either be a variable or an expression in the form of a variable added to another variable
• The variables can be one of two types: int or real
• The type of the expression when the operand types
are not the same is always real When they are the
same, the expression type is that of the operands
• The type of the LHS must match the type of the RHS The assignment is valid only if the LHS and the value
Copyright © 2006 Addison-Wesley All rights reserved 1-53
resulting from evaluating the RHS have the same type
Example (cont.)
• The syntax portion of attribute grammar is:
<assign> <var> = <expr>
<expr> <var> + <var> | <var>