0 ratings0% found this document useful (0 votes) 47 views27 pagesCompiler Engineering
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
©) studocu
Unit 2 - Compiler Design - wwwage
e
el WINOTES.IN
Subject Name: Compiler Design
Subject Code: CS-7002
Semester: 7”
LIKE & FOLLOW US ON FACEBOOK
facebook.com/rgpvnotes.inDownloaded from be.zgpvnctes.in
ROPV ATEN
€$7002 Compiler Design
Unita
1 INTRODUCTION OF SYNTAX ANALYSIS
The second stage of translation is called Syntax analysis or parsing. In this phase expressions, statements,
declarations etc... are identified by using the results of lexical analysis. Syntax analysis is aided by using
techniques based on formal grammar of the programming language
‘Where lexical analysis splits the input into tokens, the purpose of syntax analysis (also known as parsing) is to.
recombine these tokens. Not back into a list of characters, but into something that reflects the structure of the
text. This “something” is typically a data structure called the syntax tree of the text. As the name indicates, this
is a tree structure, The leaves of this tree are the tokens found by the lexical analysis, and if the leaves are
read from left to right, the sequence is the same as in the input text. Hence, what is important in the syntax
tree is how these leaves are combined to form the structure of the tree and how the interior nodes of the tree
are labeled. In addition to finding the structure of the input text, the syntax analysis must also reject invalid
texts by reporting syntax errors.
As syntax analysis is less local in nature than lexical analysis, more advanced methods are required. We,
however, use the same basic strategy: A notation suitable for human understanding is transformed into a
machine-like low-level notation suitable for efficient execution. This process is called parser generation.
3 Token Token
; Stream Stream
3 |e 7 ee 7
fo Analyzer fr" Analyzer
| Regular expressions Context-free |
Finite automata Grammar
Figure 1: Syntax Analyzer
‘A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser
analyzes the source code (token stream) against the production rules to detect any errors in the code. The
output of this phase is a parse tree.
This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse
tree as the output of the phase.
1.1 CONTEXT FREE GRAMMAR
Definition - A context-free grammar (CFG) consisting of a fir
P,S), where
Nisa set of non-terminal symbols.
Tisa set of terminals where No T= NULL.
Pisa set of rules, P: N-> (NU T)*, i,, the left-hand side of the production rule P does have any right context
or left context.
Sis the start symbol.
Example
The grammar ({A}, {a, b, ¢}, P, A), PsA > aA, A> abe,
ite set of grammar rules is a quadruple (N, T,
Page no: 1 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
PA TNOTES IN
The grammar ({S, a, b}, {a, b}, P, S), P:S > aSa,S > bSb, Se
Generation of Derivation Tree
‘A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information
a string derived from a context-free grammar.
Representation Technique
Root vertex ~ Must be labeled by the start symbol.
Vertex - Labeled by a non-terminal symbol.
Leaves - Labeled by a terminal symbol or €.
HFS XiXo sane Xn iS @ production rule in a CFG, then the parse tree / derivation tree will be as follows ~
Figure 2: Derivation tree
There are two different approaches to draw a derivation tree -
Top-down Approach ~
Starts with the starting symbol S
Goes down to tree leaves using productions
Bottom-up Approach -
Starts from tree leaves
Proceeds upward to the root which is the starting symbol S
Leftmost and Rightmost Derivation of a String
Leftmost derivation - A leftmost derivation is obtained by applying production to the leftmost variable in each
step.
Rightmost derivation - A rightmost derivation is obtained by applying production to the rightmost variable in
each step,
Example
Let any set of production rules in a CFG be
XD XK | XOX [X] 9
over an alphabet {a}.
The leftmost derivation for the string “atata" may be -
XD XIX > atX Da +X"X > ataX> arata
The stepwise derivation of the above string is shown as below ~
‘Wecormnvembemnestoteen Ey studocu
Page no: 2 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
SRV
Figure 1:Solution of Bottom up Approach
The rightmost derivation for the above string “ata*a" may be -
XD XK > Xta > KXta D Ktata 5 atata
The stepwise derivation of the above string is shown as below ~
Step 2:
Figure 2:Solution of Bottom up Approach
Page no: 3 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
ROPV ATEN
2. PARSING
‘Syntax analyzers follow production rules defined by means of context-free grammar. The way the production
rules are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing.
2.1 TOP-DOWN PARSING
‘A program that performs syntax analysis is called a parser. A syntax analyzer takes tokens as input and output
error message if the program syntax is wrong. The parser uses symbol-look-ahead and an approach called top-
down parsing without backtracking. Top-down parsers check to see if a string can be generated by a grammar
by creating a parse tree starting from the initial symbol and working down. Bottom-up parsers, however,
check to see a string can be generated from a grammar by creating a parse tree from the leaves, and working
up. Early parser generators such as YACC creates bottom-up parsers whereas many of Java parser
generators such as JavaCC create top-down parsers.
Top-Down
|
Recursive Descent
a OO
Back-tracking Non Back-tracking
|
Predictive Parser
Lt Parser
Figure 3:Top down parsing
2.2 BRUTE-FORCE APPROACH
A top-down parse moves from the goal symbol to a string of terminal symbols. In the terminology of trees,
this is moving from the root of the tree to a set of the leaves in the syntax tree for a program. In using full
backup we are willing to attempt to create a syntax tree by following branches until the correct set of
terminals is reached. In the worst possible case, that of trying to parse a string which is not in the language, all
possible combinations are attempted before the failure to parse is recognized.
Top-down parsing with full backup is a" brute-force” method of parsing. In general terms, this method
operates as follows:
1, Given a particular non-terminal that is to be expanded, the first production for this non-terminal is applied.
2. Then, within this newly expanded string, the next (leftmost) non-terminal is selected for expansion and its
first production is applied.
3, This process (step 2) of applying productions is repeated for all subsequent non-terminals that are selected
until such time as the process cannot or should not be continued. This termination (if it ever occurs) may be
due to two causes. First, no more non-terminals may be present, in which case the string has been successfully
parsed. Second, it may result from an incorrect expansion which would be indicated by the production of a
substring of terminals which does not match the appropriate segment of the source string. In the case of such
an incorrect expansion, the process is "backed up" by undoing the most recently applied production. instead
of using the particular expansion that caused the error, the next production of this non-terminal is used as the
next expansion, and then the process of production applicationgontinues as before.
“he document ela ree ot charge on studocu
Page no: 4 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
© pcpyiliesHN)
If, on the other hand, no further productions are available to replace the production that caused the error,
this error-causing expansion is replaced by the non-terminal itself, and the process is backed up again to undo
the next most recently applied production. This backing up continues either until we are able to resume
normal application of productions to selected non-terminals or until we have backed up to the goal symbol
and there are no further productions to be tried. In the latter case, the given string must be unappeasable
because it is not part of the language determined by this particular grammar.
‘As an example of this brute-force parsing technique, let us consider the simple grammar
“$+ aAd\aBo A->blc B- ced\dde
where S is the goal or start symbol. Figure 6-1 illustrates the working of this brute-force parsing technique by
showing the sequence of syntax trees generated during the parse of the string ‘accd’.
2.3 RECURSIVE DECENT PARSING
Typically, top-down parsers are implemented as a set of recursive functions that descent through a parse tree
for a string. This approach is known as recursive descent parsing, also known as LL(k) parsing where the first L
stands for left-to-right, the second L stands for leftmost-derivation, and k indicates k-symbol look ahead.
Therefore, a parser using the single symbol look-ahead method and top-down parsing without backtracking is
called LL(1) parser. In the following sections, we will also use an extended BNF notation in which some
regulation expression operators are to be incorporated.
‘A syntax expression defines sentences of the form , or . A syntax of the form defines sentences that consist of
a sentence of the form followed by a sentence of the form followed by a sentence of the form . A syntax of the
form defines zero or one occurrence of the form . A syntax of the form defines zero or more occurrences of
the form
A usual implementation of an LL(1) parser is.
© initialize its data structures,
* get the lookahead token by calling scanner routines, and
* call the routine that implements the start symbol.
Here is an example.
proc syntax Analysis()
begin
initialize(); // initialize global data and structures
nextToken(); // get the lookahead token
program(); // parser routine that implements the start symbol
end;
Example for backtracking:
Consider the grammar G
S>cAd
A abla
and the input string w=cad.
The parse tree can be constructed using the following top-down approach :
Step1:
Initially create a tree with single node labeled S. An input pointer points to ‘c’, the first symbol of w. Expand
the tree with the production of S.
Step2:
Page no: 5 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
SRV
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second symbol of w ‘a’
and consider the next leaf ‘A’. Expand A using the first alternative.
step:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer to third
symbol of w ‘d’, But the third leaf of tree is b which does not match with the input symbol d
Hence discard the chosen production and reset the pointer to second position. This is called backtracking
Stepd:
Now try the second alternative for A.
Now we can halt and announce the successful completion of parsing.
Example for recursive decent parsing:
Aleft-recursive grammar can cause a recursive-descent parser to go into an
Hence, elimination of left-recursion must be done before parsing
nite loop.
2.4 PREDICTIVE PARSING
Predictive parsing is a special case of recursive descent parsing where no backtracking is required
The key problem of predictive parsing is to determine the production to be applied for a non-terminal in case
of alternatives.
Non-recursive predictive parser
INPUT el +[e]s
STACK
x Predictive parsing program ourpur
F_
s 4
Parsing Table M
Figure 4; Predictive parsing
The table-driven predictive parser has an input buffer, stack, a parsing table and an output stream,
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string,
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack. |
stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array MIA, a], where ‘A’ is a non-terminal and ‘a’ is a terminal. Predictive parsing
program. The parser is controlled by a program that considers X, the symbol on top of stack, and a, the current
input symbol. These two symbols determine the parser action. There are three possibilities:
itially, the
IfX = a= $, the parser halts and announces successful completion of parsing.
IfX =a #$, the parser pops X off the stack and advances the input pointer tothe next input symbol.
IfX is a non-terminal , the program consults entry M[X, a] of the parsing table M. This entry will either
be an X-production of the grammar or an error entry,
If M[X, al = {X > UVW}.the parser replaces X on top of the stack by UW.
If M[X, a] = error, the parser calls an error recovery routine
‘Wecormnvembemestoneen Ey studocu
Page no: 6 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
ROPV ATEN
Predictive parsing table construction:
The construction of a predictive parser is aided by two functions associated with a grammar G
1. FIRST
2. FOLLOW
Rules for first(}:
1 If Xis terminal, then FIRST(X) is {).
2. IfX > eis a production, then add € to FIRST(X)
3. IFX is non-terminal and X > act is a production then add a to FIRST(X).
4, If X is non-terminal and X > Y1 Y2...Yk is a production, then place a in FIRST(X) if for some a is in FIRST(Yi),
and ¢ isn all of FIRST(V1)..FIRST(Yi-1); that is, Y1,...Yi-L ,=> €. If is in FIRST(Y}) for all j=1,2,...k, then add e to
FIRST(X).
Rules for follow( }:
1. If Sis start symbol, then FOLLOW(S) contains $.
2. If there is a production A -> aBB, then everything in FIRST(B) except e is placed in follow(8).
3, If there is a production A > aB, or a production A > aBB where FIRST(B) contains ¢, then everything in
FOLLOW(A) is in FOLLOW(8)
Example:
Consider the following grammar :
ESET |T
TOMEI
FO(E) Lid
After eliminating left-recursion the grammar is
ESTE’
E+E |e
TFT
Tor le
(, id}
+e}
( id}
FIRST(T’) = {*, €}
FIRST(F) = {(,, id }
Follow(
FOLLOW(E) = {S, ) }
FOLLOW(E') = {$,)}
FOLLOW(T) = { +, $,)}
FOLLOW(T’) = { +, $,)}
FOLLOW(F) = {+,*, $,)}
Page no: 7 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
ROPV ATEN
Predictive parsing table :
NON- id + . C y $
TERMINAL
E ESTE ESTE
E E> +TE” Ese Ese
T TofT ToFT
T Toe [ToT Toe | Poe
F Foid F>@®
Table 1: Predictive parsing table
3, BOTTOM-UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards the root is called
bottom-up parsing
‘A general type of bottom-up parser is a shift-reduce parser.
Bottom-Up
|
Shift-Reduce
|
LR Parsing
SLR Parsing LR Parser LALR Parser
Figure 5; Bottum up parsing
3.1 SHIFT-REDUCE PARSING
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
© Shift step: The shift step refers to the advancement of the input pointer to the next input symbol,
which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a
single node of the parse tree.
Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is
known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is
performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.
Example:
Consider the grammar:
S > aABe
A> Abc|b
Bod
The sentence to be recognized is abbcde
‘The document evelbl tae of chergeon © studocu
Page no: 8 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
2
=
se CPV
REDUCTION (LEFT MOST) RIGHTMOST DERIVATION
‘abcde (A>b) ‘S>aABe
‘aAcde (A->Avc) >aAde
‘aAde( B>d) ~>aAbede
‘2ABe(S->aABe) ~>abbede
Table 2:Shift Reduce Parser
3.2 OPERATOR PRECEDENCE PARSING
Bottom-up parsers for a large class of context-free grammars can be easily developed using operator
grammars. Operator grammars have the property that no production right side is empty or has two adjacent
non-terminals. This property enables the implementation of efficient operator-precedence parsers. These
parser rely on the following three precedence relations:
Relation Meaning
a < ba yields precedence tob
= ba has the same precedence as b
a> ba takes precedence over b
These operator precedence relations allow to delimit the handles in the right sentential forms: <: marks the
left end, = appears in the interior of the handle, and -> marks the right end.
Example: The input string
id + id2 * ids
after inserting precedence relations becomes
$< idl > +*$
Having precedence relations allows to identify handles as follows:
scan the string from left until seeing >
scan backwards the string from right to left until seeing <:
everything between the two relations <: and -> forms the handle
id + *
es
Id
>
+ < > <
vivivfv
Table 3: Operator precedence parsing
3.3 LR PARSING INTRODUCTION
The "L" is for left-to-right scanning of the input and the "R
reverse.
is for constructing a rightmost derivation in
WHY LR PARSING:
LR parsers can be constructed to recognize virtually all programming-language constructs for which context-
free grammars can be written.
The LR parsing method is the most general non-backtracking shift-reduce parsing method known, yet it can be
implemented as efficiently as other shift-reduce methods.
Page no: 9 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
PA TNOTES IN
The class of grammars that can be parsed using LR methods is a proper subset of the class of grammars that
can be parsed with predictive parsers.
‘An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.
input
input |
$1 [parser|_guet
&
action goto
Parse table
Figure 6 :LR Parser
‘The disadvantage is that it takes too much work to construct an LR parser by hand for a typical programming-
language grammar. But there are lots of LR parser generators available to make this task easy.
There are three widely used algorithms available for constructing an LR parser:
SLR(1)- Simple LR Parser:
© Works on smallest class of grammar
* Fewnumber of states, hence very small table
* Simple and fast construction
LR(1) = LR Parser:
* Works on complete set of LR(1) Grammar
© Generates large table and large number of states
* Slow construction
LALR(1) - Look-Ahead LR Parser:
* Works on intermediate size of grammar
* Number of states are same as in SLR(1)
3.4 SLR(1) ~ SIMPLE LR PARSER:
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves and working
up towards the root. In other words, it is a process of “reducing” (opposite of deriving a symbol using a
production rule) a string w to the start symbol of a grammar. At every (reduction) step, a particular substring
matching the RHS of a production rule is replaced by the symbol on the LHS of the production.
‘A general form of shift-reduce parsing is LR (scanning from Left to right and using Right-most derivation in
reverse) parsing, which is used in a number of automatic parser generators like Yacc, Bison, ete.
‘A convenient way to implement a shift-reduce parser is to use a stack to hold grammar symbols and an input
buffer to hold the string w to be parsed. The symbol $ is used to mark the bottom of the stack and also the
right-end of the input.
Notation ally, the top of the stack is identified through a separator symbol |, and the input string to be parsed
appears on the right side of |. The stack content appears on the left of |.
For example, an intermediate stage of parsing can be shown as follows:
Sidi | + id2* id3S (1)
Here "Sid is in the stack, while the input yet to be seen is “+ id2 * id3$*
In shift-reduce parser, there are two fundamental operations: shift and reduce.
‘Wecormnvembemnestoteen Ey studocu
Page no: 10 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
ROPV ATEN
Shift operation: The next input symbol is shifted onto the top of the stack,
‘After shifting + into the stack, the above state captured in (1) would change into
Sid + | id2* ids
Reduce operation: Replaces a set of grammar symbols on the top of the stack with the LHS of a production
rule.
After reducing id1 using E > id, the state (1) would change into:
SE | +id2 * id3$
In every example, we introduce a new start symbol (S'), and define a new production from this new start
symbol to the original start symbol of the grammar.
Consider the following grammar (putting an explicit end-marker $ at the end of the first production:
(sss
(2) sa
(3)5>b
For this example, the NFA for the stack can be shown as follows:
sos £55 sa SJ ssa |?
5
a oe}? [or]
Figure 7: Shift Operation
After doing e-closure, the resulting DFA is as follows:
4
sos
sSisa |S
S38 .
uy
3
:
Figure 8: Reduce Operation
The states of DFA are also called “Canonical Collection of Items”. Using the above notation, the ACTION-GOTO
table can be shown as follows:
State a b s s
1 s3 g2
2 s4,r1 ra rla
3 3. 3. 3,
4 r2 12 r2
Table 4: Simple LR Parser
3.5. CANONICAL LR PARSING
Carry extra information in the state so that wrong reductions by A —- a will be ruled out
Redefine LR items to include a terminal symbol as a second component (look ahead symbol)
The general form of the item becomes [A —* a. , a] which is called LR(1) item.
Item [A —- ea] calls for reduction only if next input is a. The set of symbols
Page no: 11 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
S ROVER
Canonical LR parsers solve this problem by storing extra information in the state itself. The problem we have
with SLR parsers is because it does reduction even for those symbols of follow(A) for which it is invalid. So LR
items are redefined to store 1 terminal (look ahead symbol) along with state and thus, the items now are LR(1)
items.
‘An LR(1) item has the form : [A—- 2. , a] and reduction is done using this rule only if input is ‘a’, Clearly the
symbols a's form a subset of follow(A),
Closure(!)
repeat
for each item [A —* a8, alin |
for each production B —~y in G'
and for each terminal b in First{ = a)
add item [3 —.y, b] tol
until no more additions to |
To find closure for Canonical LR parsers:
Repeat
for each item [A —+a.8, a] int
for each production B —-y in G*
and for each terminal b in First( a)
add item [B —-.y, b] tol
until no more items can be added to!
Example
Consider the following grammar
sims
s—cc
cc ld
Compute closure(I) where I=((s' —*.S, $]}
sis, $
S—.CC, $
cc, c
C+ t, d
co. c
cd, d
For the given grammar:
S's
s—cc
c—-cCld
1: closure([S' —*S, $])
S's s as first( e $) = (5)
is —-.cc s as first(CS) = first(C) = {c, db
= c as first(Cc) = first(C) = (c, d)
‘Wecormnvembemnestoteen Ey studocu
Page no: 12 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
= RCP
ccc a [as first(Cd) = first(C) = {c, d}
lc —-.d ic as first( ec) = {ch
cd a as first(e d) = (0)
Table 5: Canonical LR Par
Example
Construct sets of LR(1) items for the grammar on previous slide
lo sis, $
S—.cc, $
ccc, c/d
co. c/d
h gotolloS)
s'—-S, $
b goto(!0,C)
s— cc, $
C6, $
cd, $
Is goto(! 0c)
Cc -cc, cfd
cc, old
ca, old
la goto(! 0,4)
co-4, old
Is goto! 2,C)
s—-CC, $
Is goto(! 2c)
Cc, $
cc, $
c—4, $
iy goto(! 2,4)
cd, s
Ie: goto(! 3 ,C)
ccc, od
bb goto! 6 ,C)
ccc, $
To construct sets of LR(1) items for the grammar given in previous slide we will begin by computing closure of
{[S‘ —.S, S]}.
To compute closure we use the function given previously.
In this case a=, B=S, 8 =e and a=$. So add item [S —- .CC, $}
Now first(CS) contains ¢ and d so we add following items
Page no: 13 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
ANH TINOTESIIN
we have A=5,a=e,B=C, BC and a=$
Now first(C$) = first(C) contains ¢ and d
so we add the items [C —-.cC, c], [C —- . E,+T forint ¥’)
Types of translation
* Lattributed translation
It performs translation during parsing itself.
No need of explicit tree construction.
Lrepresents ‘left to right’.
« S-attributed translation
‘Wecormnvembemnestoteen Ey studocu
Page no: 16 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
PA TNOTES IN
It is performed in connection with bottom up parsing.
'S' represents synthesized.
Types of attributes
* Inherited attributes
It is defined by the semantic rule associated with the production at the parent of node.
Attributes values are confined to the parent of node, its siblings and by itself.
The non-terminal concerned must be in the body of the production.
‘* Synthesized attributes
It is defined by the semantic rule associated with the production at the node.
Attributes values are confined to the children of node and by itself.
The non terminal concerned must be in the head of production.
Terminals have synthesized attributes which are the lexical values (denoted by lex val) generated by the lexical
analyzer.
Syntax directed definition of simple desk calculator
Production ‘Semantic rules
LE Lval = E.val
E> EvT Eval = Eval T.val
E->T Eval = T.val
Tot Tval = T.val x F.val
To>F Tval = F.val
F->(E) F.val = Eval
F--> digit F.val = digit lexval
Table 8: Syntax directed
Syntax-directed definition-inherited attributes
Production Semantic Rules
Linh = T-type
T.type =integer
addType (id.entry, Linh)
Lid addType (id.entry, Linh)
Table 9: Syntax directed
+ Symbol Tis associated with a synthesized attribute type.
‘+ Symbol Lis associated with an inherited attribute inh,
Types of Syntax Directed Definitions
5.1 S-ATTRIBUTED DEFINITIONS
‘Syntax directed definition that involves only synthesized attributes is called S-attributed. Attribute values for
the non-terminal at the head is computed from the attribute values of the symbols at the body of the
production.
The attributes of a S-attributed SDD can be evaluated in bottom up order of nodes of the parse tree. i.e, by
performing post order traversal of the parse tree and evaluating the attributes at a node when the traversal
leaves that node for the last time.
Page no: 17 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
SS RovaEM
Production Semantic rules
L Lval = E.val
E E.val = Ex.val+ T.val
E Eval=T.val
T T.val = T.valx F.val
T T.val = F.val
F—-> (6) F.val = E.val
F—> digit F.val = digit-lexval
Table 10: S-attributed Definitions
LATTRIBUTED DEFINITIONS
The syntax directed definition in which the edges of dependency graph for the attributes in production body,
can go from left to right and not from right to left is called L-attributed definitions. Attributes of L-attributed
definitions may either be synthesized or inherited.
If the attributes are inherited, it must be computed from:
+ Inherited attribute associated with the production head.
+ Either by inherited or synthesized attribute associated with the production located to the left of the
attribute which is being computed
+ Either by inherited or synthesized attribute associated with the attribute under consideration in such a way
that no cycles can be formed by it in dependency graph,
Production ‘Semantic Rules
Tinh = F.val
Tinh =7-.inh x F.val
Table 11: L-attributed Definitions
In production 1, the inherited attribute T" is computed from the value of F which is to its left. In production 2,
the inherited attributed TI’ is computed from T’. inh associated with its head and the value of F which appears
to its left in the production. i.e., for computing inherited attribute it must either use from the above or from
the left information of SDD
Construction of Syntax Trees
SDDs are useful for is construction of syntax trees. A syntax tree is a condensed form of parse tree.
| Syntax Tree
= Parse Tree
Figure 10: Syntax Trees
‘Syntax trees are useful for representing programming language constructs like expressions and statements.
“They help compiler design by decoupling parsing from translation.
+ Each node of a syntax tree represents a construct; the children of the node represent the meaningful
components of the construct.
‘Wecormnvembemnestoteen Ey studocu
Page no: 18 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
RCPV OTSA
+ e.g, a syntax-tree node representing an expression E1 + £2 has label + and two children representing the sub
expressions E1 and E2
+ Each node is implemented by objects with suitable number of fields; each object will have an op field that is
the label of the node with additional fields as follows:
If the node is a leaf, an additional field holds the lexical value for the leaf . This is,
created by function Leaf (op, val)
— If the node is an interior node, there are as many fields as the node has children
in the syntax tree. This is created by function Node (op, C1, €2,...,Ck) «
Example: The S-attributed definition in figure below constructs syntax trees for a simple expression grammar
involving only the binary operators + and -. As usual, these operators are at the same precedence level and are
jointly left associative. All non-terminals have one synthesized attribute node, which represents a node of the
syntax
tree.
Production Semantic Rules
1NhESE,+T E.node = new Node ('+", E,.node, T.node )
2)E-E,-T E.node = new Node ('-', E,.node, T.node )
3)E>T E.node = T.node
4)T>(E) E.node = T.node
5) Ti T.node = new Leaf (id, id.entry )
6) T> num T.node = new Leaf (num, num.va! )
Syntax tree for a-A+c using the above SDD is shown below.
E.node
E.nodée +
to entry fore
tp
to entry for a
Page no: 19 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
S ROVER
Figure 11:L-Attribute
5.2 BOTTOM-UP EVALUATION OF SDT
Given an SDT, we can evaluate attributes even during bottom-up parsing. To carry out the semantic actions,
parser stack is extended with semantic stack. The set of actions performed on semantic stack are mirror
reflections of parser stack. Maintaining semantic stack is very easy.
During shift action, the parser pushes grammar symbols on the parser stack, whereas attributes are pushed on
to semantic stack.
During reduce action, parser reduces handle, whereas in semantic stack, attributes are evaluated by the
corresponding semantic action and are replaced by the result.
For example, consider the SDT
ADKYZ (Ara=f(X-x,Y-y,Z-2)}
Strictly speaking, attributes are evaluated as follows
ADKYZ {val[ntop] := f(val{top - 2}, val{top - 1], val{top});}
Evaluation of Synthesized Attributes
+ Whenever a token is shifted onto the stack, then it is shifted along with its attribute value placed in
val{top].
+ Just before a reduction takes place the semantic rules are executed.
‘+ Ifthere is a synthesized attribute with the left-hand side non-terminal, then carrying out semantic rules
will place the value of the synthesized attribute in vallntop].
Let us understand this with an example:
ESET {val[ntop] := val{top-2] + valftop];}
E>T {val[top] := val[top];}
TOT Fr {val[ntop] := val{top-2] * val{top];}
TF {val{top] := val{top];}
Fae) {val[ntop] := val{top-1;)
F>num {val[top] := num.tvalue;}
Table 1
Figure shows the result of shift action. Now after performing reduce action by E -> E * T resulting stack is as
shown in figure.
Along with bottom-up parsing, this is how attributes can be evaluated using shift action/reduce action
5.3 L-ATTRIBUTED DEFINITION
It allows both types, that is, synthesized as well as inherited. But if at all an inherited attribute is present, there
is a restriction. The restriction is that each inherited attribute is restricted to inherit either from parent or from
left sibling only.
For example, consider the rule
‘The document evelbl tae of chergeon © studocu
Page no: 20 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
2
SAUTER
A XYZPQ assume that there is an inherited attribute, “i” is present with each non-terminal. Then,
Ze = f(Aei| X*i| Yei) but Zei = f(P*i| Qei) is wrong as they are right siblings.
Semantic actions can be placed anywhere on the right hand side.
Attributes are generally evaluated by traversing the parse tree depth first and left to right. It is possible to
rewrite any L-attributes to S-attributed definition.
attributed definition for conve!
8 infix to post fix form
E> Te”
EB" D+THLE" le
TOFT’
TST FHT" Je
Foid #3
where #1 corresponds to printing “+” operator, #2 corresponds to printing “*,” and # 3 corresponds to printing,
id.val
Look at the above SDT; there are no attributes, it is L-attributed definition as the semantic actions are in
between grammar symbols. This is a simple example of L-attributed definition. Let us analyze this L-attributed
definition and understand how to evaluate attributes with depth first left to right traversal. Take the parse
tree for the input string “a + b*c” and perform Depth first left to right traversal, i.e. at each node traverse the
left sub tree depth wise completely then right sub tree completely.
Follow the traversal in. During the traversal whenever any dummy non-terminal is seen, carry out the
translation.
Converting L-Attributed to S-Attributed Definition
Now that we understand that S-attributed is simple compared to L-attributed definition, let us see how to
convert an L-attributed to an equivalent S-attributed.
Consider an L-attributed with semantic actions in between the grammar symbols. Suppose we have an L-
attributed as follows:
SPADB
How to convert it to an equivalent S-attributed definition? It is very simple!
Replace actions by non-terminal as follows:
s>amB
Me 1
Convert the following L-attributed definition to equivalent S-attributed definition.
E>Te”
Ev >4T #LE” |e
Torr"
TF wT" |e
F>id #3
Page no: 21 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
PA TNOTES IN
Table 13:Solution
Solution:
Replace dummy non-terminals that is, actions by non-terminals.
E> TE"
Ev S4TAE’ |e
A> {print(“+");}
TFT”
T'S *FBT" |e
BS {print("*”);}
Fid { print(“id”);}
Table 14:Solution
6. YAcc
YACC—Yet Another Compiler Compiler—is a tool for construction of automatic LALR parser generator.
Using Yacc
Yacc specifications are prepared in a file with extension ".y” For example, "test.y.” Then run this file with the
Yacc command as "Syacc test.y.” This translates yacc specifications into C-specifications under the default file
mane “y.tab.c,” where all the translations are under a function name called yyparse(); Now compile “y.tab.c”
with C-compiler and test the program. The steps to be performed are given below:
‘Commands to execute
Syacc testy
This gives an output “y.tab.c,” which is a parser in c under a function name yyparse().
‘With -v option (Syacc -v test.y}, produces file y.output, which gives complete information about the LAL
parser like DFA states, conflicts, number of terminals used, etc.
Sec y.tab.c
$.fa.out
Preparing the Yacc specification file
Every yace specification file consists of three sections: the declarations, grammar rules, and supporting
subroutines. The sections are separated by double percent “4%” marks.
declarations
%%
Translation rules
%%
supporting subroutines
‘Wecormnvembemnestoteen Ey studocu
Page no: 22 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
SRV
The declaration section is optional. In case if there are no supporting subroutines, then the second %% can
also be skipped; thus, the smallest legal Yacc specification is
%%
Translation rules
Declarations section
Declaration part contains two types of declarations—Yacc declarations or C-declarations. To distinguish
between the two, C-declarations are enclosed within %{ and 9%}. Here we can have C-declarations like global
variable declarations (int x=|;), header files (jinclude....), and macro definitions(#define...). This may be used
for defining subroutines in the last section or action part in grammar rules.
Yacc declarations are nothing but tokens or terminals. We can define tokens by %étoken in the declaration part.
For example, “num” is a terminal in grammar, then we define
% token num in the declaration part. In grammar rules, symbols within single quotes are also taken as
terminals.
We can define the precedence and associativity of the tokens in the declarations section. This is done using
‘Séleft, 9oright, followed by a list of tokens. The tokens defined on the same line will have the same precedence
and associativity; the lines are listed in the order of increasing precedence. Thus,
Séleft “+
oéleft
are used to define the associativity and precedence of the four basic arithmetic operators ‘+’,'"'/',"".
Operators *" and ‘/’ have higher precedence than ‘+’ and both are left associative, The keyword %left is used
to define left associativity and %right is used to define right associativity.
Translation rules section
This section is the heart of the yace specification file. Here we write grammar. With each grammar rule, the
User may associate actions to be performed each time the rule is recognized in the input process. These
actions may return values, and may obtain the values returned by previous actions. Moreover, the lexical
analyzer can return values when a token is matched. An action is defined with a set of C statements. Action
can do input and output, call subprograms, and alter external vectors and variables. The action normally sets
the pseudo variable "$$" to some value to return a value. For example, an action that does nothing but return
the value Lis {$$ = 1;)
To obtain the values returned by previous actions, we use the pseudo-variables $1, $2, .. ., which refer to the
values returned by the grammar symbol on the right side of a rule, reading from left to right.
7. ANALYSIS SYNTAX DIRECTED DEFINATION
Are a generalizations of context-free grammars in which
1. Grammar symbols have an associated set of Attributes;
2. Productions are associated with Semantic Rules for computing the values of attributes,
‘+ Such formalism generates Annotated Parse-Trees where each node of the tree is a record with a field for
each attribute (e.g., X.a indicates the attribute a of the grammar symbol X).
+ The value of an attribute of a grammar symbol at a given parse-tree node is defined by a semantic rule
associated with the production used at that node.
‘+ We distinguish between two kinds of attributes:
1, Synthesized Attributes: They are computed from the values of the attributes of the children nodes.
Page no: 23 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in
© pcpyilirHNN
2. Inherited Attributes: They are computed from the values of the attributes of both the siblings and the
Parent nodes.
Form of Syntax Directed Defini
Each production, A | a, is associated with a set of semantic rules: b := f(cl, ¢2,..., ck), where fis a function
and either.
bis a synthesized attribute of A, and cl, ¢2,..., ck are attributes of the grammar symbols of the production,
or
bis an inherited attribute of a grammar symbol in a, and cl, c2,..., ck are attributes of grammar symbols
in aor attributes of A.
‘Wecormnvembemnestoteen Ey studocu
Page no: 24 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPViy
e
el WINOTES.IN
We hope you find these notes useful.
You can get previous year question papers at
https://qp.rgpvnotes.in .
If you have any queries or you want to submit your
study notes please write us at
rgpvnotes.in@ gmail.com
LIKE & FOLLOW US ON FACEBOOK
facebook.com/rgpvnotes.in
Downloaded by Sheela gkp Vidya (okpshesia200S¢@amel com)