Chapter 3 Topics • Introduction • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Attributes Grammars • Describing the Meanings of Programs: Dynamic Semantics Introduction A language may be hard to learn, hard to implement, and any ambiguity in the specification may lead to dialect differences if we do not have a clear language definition Most new programming languages are subjected to a period of scrutiny by potential users before their designs are completed Who must use language definitions – Other language designers – Implementors – Programmers (the users of the language)
Trang 1Chapter 3
Describing Syntax
and Semantics
Trang 2Chapter 3 Topics
• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attributes Grammars
• Describing the Meanings of Programs:
Dynamic Semantics
Trang 3• A language may be hard to learn, hard to
implement, and any ambiguity in the
specification may lead to dialect differences if
we do not have a clear language definition
• Most new programming languages are subjected
to a period of scrutiny by potential users before
their designs are completed
• Who must use language definitions
– Other language designers
– Implementors
– Programmers (the users of the language)
Trang 4Introduction (cont.)
• The study of programming languages can be
divided into examinations of syntax and
semantics
– Syntax - the form or structure of the expressions,
statements, and program units
– Semantics - the meaning of the expressions,
statements, and program units
• Semantics should follow from syntax, the form
of statements should be clear and imply what
the statements do or how they should be used
Trang 6The General Problem of Describing Syntax
• A sentence is a string of characters over some alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic unit of a language (e.g., *, +, =, sum, begin)
• A token is a category of lexemes (e.g.,
identifier, number, operator, …)
Trang 8The Definition of Languages
• Languages can be formally defined in two
distinct ways: by recognition and by
generation
– A recognition device of the language reads
input strings and decides whether the input
strings belong to the language
– Example: syntax analysis part of a compiler
Trang 9The Definition of Languages (cont.)
– A device that generates sentences of a language – One can determine if the syntax of a particular
sentence is correct by comparing it to the
structure of the generator
Trang 10Language Recognizers vs Generators
• The language recognizer can only be used
in trial-and-error mode (black box)
• The structure of the language generator is
an open-book which people can easily read and understand
Trang 11Formal Methods of Describing Syntax
• This section discusses the formal language generation mechanisms that are commonly used to describe the syntax of programming languages
• These mechanisms are often called
grammars
• We will discuss the class of languages called context-free languages
Trang 12Context-Free Grammars
• Noam Chomsky
– A linguist
– Described four classes of grammars in the mid-1950s
• Two of these grammar class, context-free
and regular grammars, are useful in
computer science
– The tokens of programming languages can be
described by regular grammars
– Whole programming languages, with minor
exceptions, can be described by context-free
grammars
Trang 13• BNF is equivalent to context-free grammars
• In BNF, abstractions are used to represent
syntactic structures (also called nonterminal symbols )
Trang 14Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of
the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
Trang 15Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a
<assign> <var> = <expression>
• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt>
if <logic_expr> then <stmt> |
Trang 16Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration statement) are described using recursion
• A rule is recursive if its LHS appears in its RHS
<ident_list> identifier | identifier , <ident_list>
Note: Comma „,‟ is a terminal
Trang 17Example: A grammar of expressions
• Seven terminal symbols:
+ * ( ) x y
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›
Rules of grammar
‹expr› ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›
‹term› ‹factor› | ‹term› * ‹factor›
‹factor› ‹var› | ( ‹expr› )
‹var› x | y
Trang 18• A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
Trang 19<sentence> <noun-phrase> <verb-phrase>
<noun-phrase> <article> <noun>
<article> a | the
<noun> girl | dog
<verb-phrase> <verb> <noun-phrase>
<verb> sees | pets
<sentence> <noun-phrase> <verb-phrase>
<article> <noun> <verb-phrase> .
the <noun> <verb-phrase> .
the girl <verb-phrase>
the girl <verb> <noun-phrase>
the girl sees <noun-phrase>
the girl sees <article> <noun>
the girl sees a <noun>
Trang 20Derivations (cont.)
• A sentence is a sentential form that has only terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded
• A derivation may be neither leftmost nor
rightmost
• Derivation order should have no effect on the language generated by a grammar
Trang 21Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which they cannot occur
– The dog pets the girl =
– The girl pets the dog =
• Of course this means that there are certain
contexts that the rules don‟t work, thus it would not be “context-free”
Trang 22Example: A left derivation of expression:
( x - y ) * x + y
‹expr›
‹expr› + ‹term›
‹term› + ‹term›
‹term› * ‹factor› + ‹term›
‹factor› * ‹factor› + ‹term›
( ‹expr› ) * ‹factor› + ‹term›
( ‹expr› - ‹term› ) * ‹factor› + ‹term›
( ‹term› - ‹term› ) * ‹factor› + ‹term›
( ‹factor› - ‹term› ) * ‹factor› + ‹term›
( ‹var› - ‹term› ) * ‹factor› + ‹term›
( x - ‹term› ) * ‹factor› + ‹term›
( x - ‹factor› ) * ‹factor› + ‹term›
( x - ‹var› ) * ‹factor› + ‹term›
Trang 23Example: A grammar for a small language
<program> begin <stmt_list> end
Trang 24Example: A left derivation of grammar
<program>
begin <stmt_list> end
begin <stmt> ; <stmt_list> end
begin <var> = <expression> ; <stmt_list> end
begin A = <expression> ; <stmt_list> end
begin A = <var> + <var> ; <stmt_list> end
begin A = B + <var> ; <stmt_list> end
begin A = B + C ; <stmt_list> end
begin A = B + C ; <stmt > end
begin A = B + C ; <var> = <expression> end
begin A = B + C ; B = <expression> end
begin A = B + C ; B = <var> end
Trang 25Parse Tree
• Grammars naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define
trees
– Every internal node of a parse tree is labeled with a
nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Trang 26Example: Parser tree of sentence
“the girl sees a dog.”
sentence noun-phrase verb-phrase
article noun verb noun-phrase
article noun the girl sees
Trang 27Example: A Parser Tree
Rules of grammar:
<assign> <id> = <expr>
<expr> <expr> + <expr> |
<expr> * <expr> |
<id> |(<expr>)
<id> A | B | C
Trang 28<assign> <id> = <expr>
<expr> <expr> + <expr> | <expr> * <expr> |
( <expr> ) | <number> | <id>
<number> <number> <digit> | <digit>
<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<id> A | B | C
Trang 29… given 234, we have different derivations …
number digit digit
digit digit digit
4
Trang 30Ambiguity (cont.)
• A grammar is ambiguous if and only if it
generates a sentential form that has two or more distinct parse trees
• Ambiguity should be avoided
Trang 31Two distinct parser trees for sentence
Trang 32Two distinct parser trees for sentence
Trang 33Operator Precedence
• An operator in an arithmetic expression
which is generated lower in the parse tree
can be used to indicate that it has
precedence over an operator produced
higher up in the tree
• The precedence order of multiplication and addition operators in the above grammar is not the usual one
Trang 34Removing Ambiguity
• An unambiguous grammar for expressions
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
Trang 35Example: Derivation and parser tree of
<term> * <factor>
<factor>
<id>
Trang 36Associativity of Operators
• Addition or multiplication become left or right associative However, the associativity can be a problem with other operators, e.g subtraction
and division
• When a BNF rule has its LHS also appearing at
the beginning of its RHS, the rule is said to be
left recursive
• We can fix this problem with following rules:
– Left recursive rules represent left associative
– Right recursive rules represent right associative
Trang 37Associativity of Operators (cont.)
• In most languages that provide
exponentiation operator, it is right associative
• The following rules could be used to describe exponentiation as a right associative operator
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> <expo> ** <factor> | <expo>
<expo> ( <expr> ) | <id>
Trang 38An Unambiguous Grammar for statement
if-then-else
• The ambiguous grammar of if-then-else
statement has following rules:
<if_stmt> if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
• The simplest sentential form that illustrates this ambiguity is
if <logic_expr> then
if <logic_expr> then <stmt>
else <stmt>
Trang 39The two parser trees …
<if_stmt>
if <logic_expr> then <stmt> else <stmt>
if <logic_expr> then <stmt>
<if_stmt>
Trang 40The two parser trees …
Trang 41An Unambiguous Grammar … (cont.)
• The rule for if constructs in most languages is
that an else clause, when present, is matched
with the nearest previous unmatched then (or if)
• The unambiguous grammar based on this rule
follows
<stmt> <matched> | <unmatched>
<matched>
if <logic_expr> then <matched> else <matched> |
any non-if statement
<unmatched>
if <logic_expr> then <stmt> |
Trang 42There is just one possible parse tree …
Trang 43Extended BNF
• Extended BNF does not enhance the descriptive
power of BNF; it only increases BNF‟s readability and writability
• Three extensions:
– Optional parts are placed in square brackets
<selection> if (<expression>) <statement> [ else <statement> ] ;
– Put alternative parts of RHSs in parentheses and
separate them with vertical bars
<for_stmt> for <var> := <expr> (to | downto) <expr> do <stmt>
– Put repetitions (0 or more) in curly braces
<ident> -> letter { letter | digit }
Trang 44Example: BNF and EBNF versions of an expression grammar
BNF: <expr> <expr> + <term> |
EBNF: <expr> <term> {(+ | –) <term>}
<term> <factor> {(* | /) <factor>}
Trang 45Syntax Graphs
• The information in BNF and EBNF rules can be
represented in a directed graph Such graphs are
• A separate graph is used for each syntactic unit
• Syntax graphs use different kinds of nodes to
represent the terminal and nonterminal symbols
of the right sides of a grammar's rules
– Rectangle nodes contain the names of syntactic units (nonterminals)
– Circles or ellipses contain terminal symbols
Trang 46Example: The Ada if statement
condition
end if ; if_stmt
Trang 47Attribute Grammars
• CFGs cannot describe all of the syntax of
programming languages
• Additions to CFGs to carry some semantic
info along through parse trees
• Primary value of attribute grammars :
– Static semantics specification
– Compiler design (static semantics checking)
Trang 48• Example 3 : If the end of an Ada subprogram is
followed by a name, that name must match the name
of the subprogram
• These problems exemplify the category of language rules called static semantics rules They cannot be
specified in BNF
Trang 49Attribute Grammars - Basic Concepts
• Attributes , which are associated with grammar symbols, are similar to variables in the sense
that they can have values assigned to them
• Attribute computation functions (or semantic
functions ) are associated with grammar rules
They are used to specify how attribute values
are computed
• Predicate functions , which state the static
semantic rules of the language, are associated with grammar rules
Trang 50Attribute Grammars - Definition
• An attribute grammar is a CFG G = (S, N, T, P)
with the following additions:
– For each grammar symbol X there is a set A(X)
of attribute values
define certain attributes of the nonterminals in the rule
– Each rule has a (possibly empty) set of
predicate functions to check for attribute
consistency
Trang 51Attribute Grammars (cont.)
• The set A(X) consists of two disjoint sets:
– Synthesized attributes S(X): to pass semantic
information up a parser tree
– Inherited attributes I(X): to pass semantic
information down a parser tree
• Let X0 X1 … Xn be a rule
– Functions of the form S(X0) = f(A(X1), A(Xn))
define synthesized attributes of X0
– Functions of the form I(Xj) = f(A(X0), , A(Xn)), for
1 j n, define inherited attributes of X
Trang 52Attribute Grammars (cont.)
• The value of an inherited attribute on a parse tree node depends on the attribute values of that node's parent node and those of its
sibling nodes
– To avoid circularity, inherited attributes are often restricted to functions of the form:
I(Xj) = f(A(X0), , A(Xj-1))
• Initially, there are intrinsic attributes on the leaves
Trang 53Example: Synthesized attributes
E → E1 + T E.val = E1.val + T.val
Trang 54Annotated parse tree for 3 * 4 + 5
val = 12
val = 5
lexval = 5
val = 5 val = 17
Rules Semantic functions
E → E1 + T E.val = E1.val + T.val
Trang 55Example: Inherited attributes
L → L1 , id L1.inh = id.type = L inh
Trang 56Annotated parse tree for
Rules Semantic functions
D → T L L.inh = T.type
T → int T.type = integer
T → float T.type = float
L → L1 , id L1.inh = id.type = L.inh
L → id id.type = L.inh
Trang 57Attribute Grammars (cont.)
• A predicate function has the form of a Boolean expression on the attribute set {A(X0), , A(Xn)}
• The only derivations allowed with an attribute grammar are those in which every predicate
associated with every nonterminal is true
violation of the syntax or static semantics rules
of the language
Trang 59Example: Assignment Statement
• The only variable names are A, B, and C
• The RHS can either be a variable or an expression in the form of a variable added to another variable
• The variables can be one of two types: int or real
• The type of the expression when the operand types
are not the same is always real When they are the
same, the expression type is that of the operands
• The type of the LHS must match the type of the RHS The assignment is valid only if the LHS and the value resulting from evaluating the RHS have the same type