Compiler Engineering

Uploaded by

a8903851276

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

47 views27 pages

Compiler Engineering

Uploaded by

a8903851276

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 27

©) studocu Unit 2 - Compiler Design - wwwage e el WINOTES.IN Subject Name: Compiler Design Subject Code: CS-7002 Semester: 7” LIKE & FOLLOW US ON FACEBOOK facebook.com/rgpvnotes.inDownloaded from be.zgpvnctes.in ROPV ATEN €$7002 Compiler Design Unita 1 INTRODUCTION OF SYNTAX ANALYSIS The second stage of translation is called Syntax analysis or parsing. In this phase expressions, statements, declarations etc... are identified by using the results of lexical analysis. Syntax analysis is aided by using techniques based on formal grammar of the programming language ‘Where lexical analysis splits the input into tokens, the purpose of syntax analysis (also known as parsing) is to. recombine these tokens. Not back into a list of characters, but into something that reflects the structure of the text. This “something” is typically a data structure called the syntax tree of the text. As the name indicates, this is a tree structure, The leaves of this tree are the tokens found by the lexical analysis, and if the leaves are read from left to right, the sequence is the same as in the input text. Hence, what is important in the syntax tree is how these leaves are combined to form the structure of the tree and how the interior nodes of the tree are labeled. In addition to finding the structure of the input text, the syntax analysis must also reject invalid texts by reporting syntax errors. As syntax analysis is less local in nature than lexical analysis, more advanced methods are required. We, however, use the same basic strategy: A notation suitable for human understanding is transformed into a machine-like low-level notation suitable for efficient execution. This process is called parser generation. 3 Token Token ; Stream Stream 3 |e 7 ee 7 fo Analyzer fr" Analyzer | Regular expressions Context-free | Finite automata Grammar Figure 1: Syntax Analyzer ‘A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. The output of this phase is a parse tree. This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse tree as the output of the phase. 1.1 CONTEXT FREE GRAMMAR Definition - A context-free grammar (CFG) consisting of a fir P,S), where Nisa set of non-terminal symbols. Tisa set of terminals where No T= NULL. Pisa set of rules, P: N-> (NU T)*, i,, the left-hand side of the production rule P does have any right context or left context. Sis the start symbol. Example The grammar ({A}, {a, b, ¢}, P, A), PsA > aA, A> abe, ite set of grammar rules is a quadruple (N, T, Page no: 1 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in PA TNOTES IN The grammar ({S, a, b}, {a, b}, P, S), P:S > aSa,S > bSb, Se Generation of Derivation Tree ‘A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information a string derived from a context-free grammar. Representation Technique Root vertex ~ Must be labeled by the start symbol. Vertex - Labeled by a non-terminal symbol. Leaves - Labeled by a terminal symbol or €. HFS XiXo sane Xn iS @ production rule in a CFG, then the parse tree / derivation tree will be as follows ~ Figure 2: Derivation tree There are two different approaches to draw a derivation tree - Top-down Approach ~ Starts with the starting symbol S Goes down to tree leaves using productions Bottom-up Approach - Starts from tree leaves Proceeds upward to the root which is the starting symbol S Leftmost and Rightmost Derivation of a String Leftmost derivation - A leftmost derivation is obtained by applying production to the leftmost variable in each step. Rightmost derivation - A rightmost derivation is obtained by applying production to the rightmost variable in each step, Example Let any set of production rules in a CFG be XD XK | XOX [X] 9 over an alphabet {a}. The leftmost derivation for the string “atata" may be - XD XIX > atX Da +X"X > ataX> arata The stepwise derivation of the above string is shown as below ~ ‘Wecormnvembemnestoteen Ey studocu Page no: 2 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in SRV Figure 1:Solution of Bottom up Approach The rightmost derivation for the above string “ata*a" may be - XD XK > Xta > KXta D Ktata 5 atata The stepwise derivation of the above string is shown as below ~ Step 2: Figure 2:Solution of Bottom up Approach Page no: 3 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in ROPV ATEN 2. PARSING ‘Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing. 2.1 TOP-DOWN PARSING ‘A program that performs syntax analysis is called a parser. A syntax analyzer takes tokens as input and output error message if the program syntax is wrong. The parser uses symbol-look-ahead and an approach called top- down parsing without backtracking. Top-down parsers check to see if a string can be generated by a grammar by creating a parse tree starting from the initial symbol and working down. Bottom-up parsers, however, check to see a string can be generated from a grammar by creating a parse tree from the leaves, and working up. Early parser generators such as YACC creates bottom-up parsers whereas many of Java parser generators such as JavaCC create top-down parsers. Top-Down | Recursive Descent a OO Back-tracking Non Back-tracking | Predictive Parser Lt Parser Figure 3:Top down parsing 2.2 BRUTE-FORCE APPROACH A top-down parse moves from the goal symbol to a string of terminal symbols. In the terminology of trees, this is moving from the root of the tree to a set of the leaves in the syntax tree for a program. In using full backup we are willing to attempt to create a syntax tree by following branches until the correct set of terminals is reached. In the worst possible case, that of trying to parse a string which is not in the language, all possible combinations are attempted before the failure to parse is recognized. Top-down parsing with full backup is a" brute-force” method of parsing. In general terms, this method operates as follows: 1, Given a particular non-terminal that is to be expanded, the first production for this non-terminal is applied. 2. Then, within this newly expanded string, the next (leftmost) non-terminal is selected for expansion and its first production is applied. 3, This process (step 2) of applying productions is repeated for all subsequent non-terminals that are selected until such time as the process cannot or should not be continued. This termination (if it ever occurs) may be due to two causes. First, no more non-terminals may be present, in which case the string has been successfully parsed. Second, it may result from an incorrect expansion which would be indicated by the production of a substring of terminals which does not match the appropriate segment of the source string. In the case of such an incorrect expansion, the process is "backed up" by undoing the most recently applied production. instead of using the particular expansion that caused the error, the next production of this non-terminal is used as the next expansion, and then the process of production applicationgontinues as before. “he document ela ree ot charge on studocu Page no: 4 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in © pcpyiliesHN) If, on the other hand, no further productions are available to replace the production that caused the error, this error-causing expansion is replaced by the non-terminal itself, and the process is backed up again to undo the next most recently applied production. This backing up continues either until we are able to resume normal application of productions to selected non-terminals or until we have backed up to the goal symbol and there are no further productions to be tried. In the latter case, the given string must be unappeasable because it is not part of the language determined by this particular grammar. ‘As an example of this brute-force parsing technique, let us consider the simple grammar “$+ aAd\aBo A->blc B- ced\dde where S is the goal or start symbol. Figure 6-1 illustrates the working of this brute-force parsing technique by showing the sequence of syntax trees generated during the parse of the string ‘accd’. 2.3 RECURSIVE DECENT PARSING Typically, top-down parsers are implemented as a set of recursive functions that descent through a parse tree for a string. This approach is known as recursive descent parsing, also known as LL(k) parsing where the first L stands for left-to-right, the second L stands for leftmost-derivation, and k indicates k-symbol look ahead. Therefore, a parser using the single symbol look-ahead method and top-down parsing without backtracking is called LL(1) parser. In the following sections, we will also use an extended BNF notation in which some regulation expression operators are to be incorporated. ‘A syntax expression defines sentences of the form , or . A syntax of the form defines sentences that consist of a sentence of the form followed by a sentence of the form followed by a sentence of the form . A syntax of the form defines zero or one occurrence of the form . A syntax of the form defines zero or more occurrences of the form A usual implementation of an LL(1) parser is. © initialize its data structures, * get the lookahead token by calling scanner routines, and * call the routine that implements the start symbol. Here is an example. proc syntax Analysis() begin initialize(); // initialize global data and structures nextToken(); // get the lookahead token program(); // parser routine that implements the start symbol end; Example for backtracking: Consider the grammar G S>cAd A abla and the input string w=cad. The parse tree can be constructed using the following top-down approach : Step1: Initially create a tree with single node labeled S. An input pointer points to ‘c’, the first symbol of w. Expand the tree with the production of S. Step2: Page no: 5 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in SRV The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative. step: The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer to third symbol of w ‘d’, But the third leaf of tree is b which does not match with the input symbol d Hence discard the chosen production and reset the pointer to second position. This is called backtracking Stepd: Now try the second alternative for A. Now we can halt and announce the successful completion of parsing. Example for recursive decent parsing: Aleft-recursive grammar can cause a recursive-descent parser to go into an Hence, elimination of left-recursion must be done before parsing nite loop. 2.4 PREDICTIVE PARSING Predictive parsing is a special case of recursive descent parsing where no backtracking is required The key problem of predictive parsing is to determine the production to be applied for a non-terminal in case of alternatives. Non-recursive predictive parser INPUT el +[e]s STACK x Predictive parsing program ourpur F_ s 4 Parsing Table M Figure 4; Predictive parsing The table-driven predictive parser has an input buffer, stack, a parsing table and an output stream, Input buffer: It consists of strings to be parsed, followed by $ to indicate the end of the input string, Stack: It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack. | stack contains the start symbol on top of $. Parsing table: It is a two-dimensional array MIA, a], where ‘A’ is a non-terminal and ‘a’ is a terminal. Predictive parsing program. The parser is controlled by a program that considers X, the symbol on top of stack, and a, the current input symbol. These two symbols determine the parser action. There are three possibilities: itially, the IfX = a= $, the parser halts and announces successful completion of parsing. IfX =a #$, the parser pops X off the stack and advances the input pointer tothe next input symbol. IfX is a non-terminal , the program consults entry M[X, a] of the parsing table M. This entry will either be an X-production of the grammar or an error entry, If M[X, al = {X > UVW}.the parser replaces X on top of the stack by UW. If M[X, a] = error, the parser calls an error recovery routine ‘Wecormnvembemestoneen Ey studocu Page no: 6 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in ROPV ATEN Predictive parsing table construction: The construction of a predictive parser is aided by two functions associated with a grammar G 1. FIRST 2. FOLLOW Rules for first(}: 1 If Xis terminal, then FIRST(X) is {). 2. IfX > eis a production, then add € to FIRST(X) 3. IFX is non-terminal and X > act is a production then add a to FIRST(X). 4, If X is non-terminal and X > Y1 Y2...Yk is a production, then place a in FIRST(X) if for some a is in FIRST(Yi), and ¢ isn all of FIRST(V1)..FIRST(Yi-1); that is, Y1,...Yi-L ,=> €. If is in FIRST(Y}) for all j=1,2,...k, then add e to FIRST(X). Rules for follow( }: 1. If Sis start symbol, then FOLLOW(S) contains $. 2. If there is a production A -> aBB, then everything in FIRST(B) except e is placed in follow(8). 3, If there is a production A > aB, or a production A > aBB where FIRST(B) contains ¢, then everything in FOLLOW(A) is in FOLLOW(8) Example: Consider the following grammar : ESET |T TOMEI FO(E) Lid After eliminating left-recursion the grammar is ESTE’ E+E |e TFT Tor le (, id} +e} ( id} FIRST(T’) = {*, €} FIRST(F) = {(,, id } Follow( FOLLOW(E) = {S, ) } FOLLOW(E') = {$,)} FOLLOW(T) = { +, $,)} FOLLOW(T’) = { +, $,)} FOLLOW(F) = {+,*, $,)} Page no: 7 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in ROPV ATEN Predictive parsing table : NON- id + . C y $ TERMINAL E ESTE ESTE E E> +TE” Ese Ese T TofT ToFT T Toe [ToT Toe | Poe F Foid F>@® Table 1: Predictive parsing table 3, BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing ‘A general type of bottom-up parser is a shift-reduce parser. Bottom-Up | Shift-Reduce | LR Parsing SLR Parsing LR Parser LALR Parser Figure 5; Bottum up parsing 3.1 SHIFT-REDUCE PARSING Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step and reduce-step. © Shift step: The shift step refers to the advancement of the input pointer to the next input symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single node of the parse tree. Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol. Example: Consider the grammar: S > aABe A> Abc|b Bod The sentence to be recognized is abbcde ‘The document evelbl tae of chergeon © studocu Page no: 8 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in 2 = se CPV REDUCTION (LEFT MOST) RIGHTMOST DERIVATION ‘abcde (A>b) ‘S>aABe ‘aAcde (A->Avc) >aAde ‘aAde( B>d) ~>aAbede ‘2ABe(S->aABe) ~>abbede Table 2:Shift Reduce Parser 3.2 OPERATOR PRECEDENCE PARSING Bottom-up parsers for a large class of context-free grammars can be easily developed using operator grammars. Operator grammars have the property that no production right side is empty or has two adjacent non-terminals. This property enables the implementation of efficient operator-precedence parsers. These parser rely on the following three precedence relations: Relation Meaning a < ba yields precedence tob = ba has the same precedence as b a> ba takes precedence over b These operator precedence relations allow to delimit the handles in the right sentential forms: <: marks the left end, = appears in the interior of the handle, and -> marks the right end. Example: The input string id + id2 * ids after inserting precedence relations becomes $< idl > +*$ Having precedence relations allows to identify handles as follows: scan the string from left until seeing > scan backwards the string from right to left until seeing <: everything between the two relations <: and -> forms the handle id + * es Id > + < > < vivivfv Table 3: Operator precedence parsing 3.3 LR PARSING INTRODUCTION The "L" is for left-to-right scanning of the input and the "R reverse. is for constructing a rightmost derivation in WHY LR PARSING: LR parsers can be constructed to recognize virtually all programming-language constructs for which context- free grammars can be written. The LR parsing method is the most general non-backtracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods. Page no: 9 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in PA TNOTES IN The class of grammars that can be parsed using LR methods is a proper subset of the class of grammars that can be parsed with predictive parsers. ‘An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. input input | $1 [parser|_guet & action goto Parse table Figure 6 :LR Parser ‘The disadvantage is that it takes too much work to construct an LR parser by hand for a typical programming- language grammar. But there are lots of LR parser generators available to make this task easy. There are three widely used algorithms available for constructing an LR parser: SLR(1)- Simple LR Parser: © Works on smallest class of grammar * Fewnumber of states, hence very small table * Simple and fast construction LR(1) = LR Parser: * Works on complete set of LR(1) Grammar © Generates large table and large number of states * Slow construction LALR(1) - Look-Ahead LR Parser: * Works on intermediate size of grammar * Number of states are same as in SLR(1) 3.4 SLR(1) ~ SIMPLE LR PARSER: Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves and working up towards the root. In other words, it is a process of “reducing” (opposite of deriving a symbol using a production rule) a string w to the start symbol of a grammar. At every (reduction) step, a particular substring matching the RHS of a production rule is replaced by the symbol on the LHS of the production. ‘A general form of shift-reduce parsing is LR (scanning from Left to right and using Right-most derivation in reverse) parsing, which is used in a number of automatic parser generators like Yacc, Bison, ete. ‘A convenient way to implement a shift-reduce parser is to use a stack to hold grammar symbols and an input buffer to hold the string w to be parsed. The symbol $ is used to mark the bottom of the stack and also the right-end of the input. Notation ally, the top of the stack is identified through a separator symbol |, and the input string to be parsed appears on the right side of |. The stack content appears on the left of |. For example, an intermediate stage of parsing can be shown as follows: Sidi | + id2* id3S (1) Here "Sid is in the stack, while the input yet to be seen is “+ id2 * id3$* In shift-reduce parser, there are two fundamental operations: shift and reduce. ‘Wecormnvembemnestoteen Ey studocu Page no: 10 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in ROPV ATEN Shift operation: The next input symbol is shifted onto the top of the stack, ‘After shifting + into the stack, the above state captured in (1) would change into Sid + | id2* ids Reduce operation: Replaces a set of grammar symbols on the top of the stack with the LHS of a production rule. After reducing id1 using E > id, the state (1) would change into: SE | +id2 * id3$ In every example, we introduce a new start symbol (S'), and define a new production from this new start symbol to the original start symbol of the grammar. Consider the following grammar (putting an explicit end-marker $ at the end of the first production: (sss (2) sa (3)5>b For this example, the NFA for the stack can be shown as follows: sos £55 sa SJ ssa |? 5 a oe}? [or] Figure 7: Shift Operation After doing e-closure, the resulting DFA is as follows: 4 sos sSisa |S S38 . uy 3 : Figure 8: Reduce Operation The states of DFA are also called “Canonical Collection of Items”. Using the above notation, the ACTION-GOTO table can be shown as follows: State a b s s 1 s3 g2 2 s4,r1 ra rla 3 3. 3. 3, 4 r2 12 r2 Table 4: Simple LR Parser 3.5. CANONICAL LR PARSING Carry extra information in the state so that wrong reductions by A —- a will be ruled out Redefine LR items to include a terminal symbol as a second component (look ahead symbol) The general form of the item becomes [A —* a. , a] which is called LR(1) item. Item [A —- ea] calls for reduction only if next input is a. The set of symbols Page no: 11 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in S ROVER Canonical LR parsers solve this problem by storing extra information in the state itself. The problem we have with SLR parsers is because it does reduction even for those symbols of follow(A) for which it is invalid. So LR items are redefined to store 1 terminal (look ahead symbol) along with state and thus, the items now are LR(1) items. ‘An LR(1) item has the form : [A—- 2. , a] and reduction is done using this rule only if input is ‘a’, Clearly the symbols a's form a subset of follow(A), Closure(!) repeat for each item [A —* a8, alin | for each production B —~y in G' and for each terminal b in First{ = a) add item [3 —.y, b] tol until no more additions to | To find closure for Canonical LR parsers: Repeat for each item [A —+a.8, a] int for each production B —-y in G* and for each terminal b in First( a) add item [B —-.y, b] tol until no more items can be added to! Example Consider the following grammar sims s—cc cc ld Compute closure(I) where I=((s' —*.S, $]} sis, $ S—.CC, $ cc, c C+ t, d co. c cd, d For the given grammar: S's s—cc c—-cCld 1: closure([S' —*S, $]) S's s as first( e $) = (5) is —-.cc s as first(CS) = first(C) = {c, db = c as first(Cc) = first(C) = (c, d) ‘Wecormnvembemnestoteen Ey studocu Page no: 12 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in = RCP ccc a [as first(Cd) = first(C) = {c, d} lc —-.d ic as first( ec) = {ch cd a as first(e d) = (0) Table 5: Canonical LR Par Example Construct sets of LR(1) items for the grammar on previous slide lo sis, $ S—.cc, $ ccc, c/d co. c/d h gotolloS) s'—-S, $ b goto(!0,C) s— cc, $ C6, $ cd, $ Is goto(! 0c) Cc -cc, cfd cc, old ca, old la goto(! 0,4) co-4, old Is goto! 2,C) s—-CC, $ Is goto(! 2c) Cc, $ cc, $ c—4, $ iy goto(! 2,4) cd, s Ie: goto(! 3 ,C) ccc, od bb goto! 6 ,C) ccc, $ To construct sets of LR(1) items for the grammar given in previous slide we will begin by computing closure of {[S‘ —.S, S]}. To compute closure we use the function given previously. In this case a=, B=S, 8 =e and a=$. So add item [S —- .CC, $} Now first(CS) contains ¢ and d so we add following items Page no: 13 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in ANH TINOTESIIN we have A=5,a=e,B=C, BC and a=$ Now first(C$) = first(C) contains ¢ and d so we add the items [C —-.cC, c], [C —- . E,+T forint ¥’) Types of translation * Lattributed translation It performs translation during parsing itself. No need of explicit tree construction. Lrepresents ‘left to right’. « S-attributed translation ‘Wecormnvembemnestoteen Ey studocu Page no: 16 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in PA TNOTES IN It is performed in connection with bottom up parsing. 'S' represents synthesized. Types of attributes * Inherited attributes It is defined by the semantic rule associated with the production at the parent of node. Attributes values are confined to the parent of node, its siblings and by itself. The non-terminal concerned must be in the body of the production. ‘* Synthesized attributes It is defined by the semantic rule associated with the production at the node. Attributes values are confined to the children of node and by itself. The non terminal concerned must be in the head of production. Terminals have synthesized attributes which are the lexical values (denoted by lex val) generated by the lexical analyzer. Syntax directed definition of simple desk calculator Production ‘Semantic rules LE Lval = E.val E> EvT Eval = Eval T.val E->T Eval = T.val Tot Tval = T.val x F.val To>F Tval = F.val F->(E) F.val = Eval F--> digit F.val = digit lexval Table 8: Syntax directed Syntax-directed definition-inherited attributes Production Semantic Rules Linh = T-type T.type =integer addType (id.entry, Linh) Lid addType (id.entry, Linh) Table 9: Syntax directed + Symbol Tis associated with a synthesized attribute type. ‘+ Symbol Lis associated with an inherited attribute inh, Types of Syntax Directed Definitions 5.1 S-ATTRIBUTED DEFINITIONS ‘Syntax directed definition that involves only synthesized attributes is called S-attributed. Attribute values for the non-terminal at the head is computed from the attribute values of the symbols at the body of the production. The attributes of a S-attributed SDD can be evaluated in bottom up order of nodes of the parse tree. i.e, by performing post order traversal of the parse tree and evaluating the attributes at a node when the traversal leaves that node for the last time. Page no: 17 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in SS RovaEM Production Semantic rules L Lval = E.val E E.val = Ex.val+ T.val E Eval=T.val T T.val = T.valx F.val T T.val = F.val F—-> (6) F.val = E.val F—> digit F.val = digit-lexval Table 10: S-attributed Definitions LATTRIBUTED DEFINITIONS The syntax directed definition in which the edges of dependency graph for the attributes in production body, can go from left to right and not from right to left is called L-attributed definitions. Attributes of L-attributed definitions may either be synthesized or inherited. If the attributes are inherited, it must be computed from: + Inherited attribute associated with the production head. + Either by inherited or synthesized attribute associated with the production located to the left of the attribute which is being computed + Either by inherited or synthesized attribute associated with the attribute under consideration in such a way that no cycles can be formed by it in dependency graph, Production ‘Semantic Rules Tinh = F.val Tinh =7-.inh x F.val Table 11: L-attributed Definitions In production 1, the inherited attribute T" is computed from the value of F which is to its left. In production 2, the inherited attributed TI’ is computed from T’. inh associated with its head and the value of F which appears to its left in the production. i.e., for computing inherited attribute it must either use from the above or from the left information of SDD Construction of Syntax Trees SDDs are useful for is construction of syntax trees. A syntax tree is a condensed form of parse tree. | Syntax Tree = Parse Tree Figure 10: Syntax Trees ‘Syntax trees are useful for representing programming language constructs like expressions and statements. “They help compiler design by decoupling parsing from translation. + Each node of a syntax tree represents a construct; the children of the node represent the meaningful components of the construct. ‘Wecormnvembemnestoteen Ey studocu Page no: 18 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in RCPV OTSA + e.g, a syntax-tree node representing an expression E1 + £2 has label + and two children representing the sub expressions E1 and E2 + Each node is implemented by objects with suitable number of fields; each object will have an op field that is the label of the node with additional fields as follows: If the node is a leaf, an additional field holds the lexical value for the leaf . This is, created by function Leaf (op, val) — If the node is an interior node, there are as many fields as the node has children in the syntax tree. This is created by function Node (op, C1, €2,...,Ck) « Example: The S-attributed definition in figure below constructs syntax trees for a simple expression grammar involving only the binary operators + and -. As usual, these operators are at the same precedence level and are jointly left associative. All non-terminals have one synthesized attribute node, which represents a node of the syntax tree. Production Semantic Rules 1NhESE,+T E.node = new Node ('+", E,.node, T.node ) 2)E-E,-T E.node = new Node ('-', E,.node, T.node ) 3)E>T E.node = T.node 4)T>(E) E.node = T.node 5) Ti T.node = new Leaf (id, id.entry ) 6) T> num T.node = new Leaf (num, num.va! ) Syntax tree for a-A+c using the above SDD is shown below. E.node E.nodée + to entry fore tp to entry for a Page no: 19 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in S ROVER Figure 11:L-Attribute 5.2 BOTTOM-UP EVALUATION OF SDT Given an SDT, we can evaluate attributes even during bottom-up parsing. To carry out the semantic actions, parser stack is extended with semantic stack. The set of actions performed on semantic stack are mirror reflections of parser stack. Maintaining semantic stack is very easy. During shift action, the parser pushes grammar symbols on the parser stack, whereas attributes are pushed on to semantic stack. During reduce action, parser reduces handle, whereas in semantic stack, attributes are evaluated by the corresponding semantic action and are replaced by the result. For example, consider the SDT ADKYZ (Ara=f(X-x,Y-y,Z-2)} Strictly speaking, attributes are evaluated as follows ADKYZ {val[ntop] := f(val{top - 2}, val{top - 1], val{top});} Evaluation of Synthesized Attributes + Whenever a token is shifted onto the stack, then it is shifted along with its attribute value placed in val{top]. + Just before a reduction takes place the semantic rules are executed. ‘+ Ifthere is a synthesized attribute with the left-hand side non-terminal, then carrying out semantic rules will place the value of the synthesized attribute in vallntop]. Let us understand this with an example: ESET {val[ntop] := val{top-2] + valftop];} E>T {val[top] := val[top];} TOT Fr {val[ntop] := val{top-2] * val{top];} TF {val{top] := val{top];} Fae) {val[ntop] := val{top-1;) F>num {val[top] := num.tvalue;} Table 1 Figure shows the result of shift action. Now after performing reduce action by E -> E * T resulting stack is as shown in figure. Along with bottom-up parsing, this is how attributes can be evaluated using shift action/reduce action 5.3 L-ATTRIBUTED DEFINITION It allows both types, that is, synthesized as well as inherited. But if at all an inherited attribute is present, there is a restriction. The restriction is that each inherited attribute is restricted to inherit either from parent or from left sibling only. For example, consider the rule ‘The document evelbl tae of chergeon © studocu Page no: 20 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in 2 SAUTER A XYZPQ assume that there is an inherited attribute, “i” is present with each non-terminal. Then, Ze = f(Aei| X*i| Yei) but Zei = f(P*i| Qei) is wrong as they are right siblings. Semantic actions can be placed anywhere on the right hand side. Attributes are generally evaluated by traversing the parse tree depth first and left to right. It is possible to rewrite any L-attributes to S-attributed definition. attributed definition for conve! 8 infix to post fix form E> Te” EB" D+THLE" le TOFT’ TST FHT" Je Foid #3 where #1 corresponds to printing “+” operator, #2 corresponds to printing “*,” and # 3 corresponds to printing, id.val Look at the above SDT; there are no attributes, it is L-attributed definition as the semantic actions are in between grammar symbols. This is a simple example of L-attributed definition. Let us analyze this L-attributed definition and understand how to evaluate attributes with depth first left to right traversal. Take the parse tree for the input string “a + b*c” and perform Depth first left to right traversal, i.e. at each node traverse the left sub tree depth wise completely then right sub tree completely. Follow the traversal in. During the traversal whenever any dummy non-terminal is seen, carry out the translation. Converting L-Attributed to S-Attributed Definition Now that we understand that S-attributed is simple compared to L-attributed definition, let us see how to convert an L-attributed to an equivalent S-attributed. Consider an L-attributed with semantic actions in between the grammar symbols. Suppose we have an L- attributed as follows: SPADB How to convert it to an equivalent S-attributed definition? It is very simple! Replace actions by non-terminal as follows: s>amB Me 1 Convert the following L-attributed definition to equivalent S-attributed definition. E>Te” Ev >4T #LE” |e Torr" TF wT" |e F>id #3 Page no: 21 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in PA TNOTES IN Table 13:Solution Solution: Replace dummy non-terminals that is, actions by non-terminals. E> TE" Ev S4TAE’ |e A> {print(“+");} TFT” T'S *FBT" |e BS {print("*”);} Fid { print(“id”);} Table 14:Solution 6. YAcc YACC—Yet Another Compiler Compiler—is a tool for construction of automatic LALR parser generator. Using Yacc Yacc specifications are prepared in a file with extension ".y” For example, "test.y.” Then run this file with the Yacc command as "Syacc test.y.” This translates yacc specifications into C-specifications under the default file mane “y.tab.c,” where all the translations are under a function name called yyparse(); Now compile “y.tab.c” with C-compiler and test the program. The steps to be performed are given below: ‘Commands to execute Syacc testy This gives an output “y.tab.c,” which is a parser in c under a function name yyparse(). ‘With -v option (Syacc -v test.y}, produces file y.output, which gives complete information about the LAL parser like DFA states, conflicts, number of terminals used, etc. Sec y.tab.c $.fa.out Preparing the Yacc specification file Every yace specification file consists of three sections: the declarations, grammar rules, and supporting subroutines. The sections are separated by double percent “4%” marks. declarations %% Translation rules %% supporting subroutines ‘Wecormnvembemnestoteen Ey studocu Page no: 22 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in SRV The declaration section is optional. In case if there are no supporting subroutines, then the second %% can also be skipped; thus, the smallest legal Yacc specification is %% Translation rules Declarations section Declaration part contains two types of declarations—Yacc declarations or C-declarations. To distinguish between the two, C-declarations are enclosed within %{ and 9%}. Here we can have C-declarations like global variable declarations (int x=|;), header files (jinclude....), and macro definitions(#define...). This may be used for defining subroutines in the last section or action part in grammar rules. Yacc declarations are nothing but tokens or terminals. We can define tokens by %étoken in the declaration part. For example, “num” is a terminal in grammar, then we define % token num in the declaration part. In grammar rules, symbols within single quotes are also taken as terminals. We can define the precedence and associativity of the tokens in the declarations section. This is done using ‘Séleft, 9oright, followed by a list of tokens. The tokens defined on the same line will have the same precedence and associativity; the lines are listed in the order of increasing precedence. Thus, Séleft “+ oéleft are used to define the associativity and precedence of the four basic arithmetic operators ‘+’,'"'/',"". Operators *" and ‘/’ have higher precedence than ‘+’ and both are left associative, The keyword %left is used to define left associativity and %right is used to define right associativity. Translation rules section This section is the heart of the yace specification file. Here we write grammar. With each grammar rule, the User may associate actions to be performed each time the rule is recognized in the input process. These actions may return values, and may obtain the values returned by previous actions. Moreover, the lexical analyzer can return values when a token is matched. An action is defined with a set of C statements. Action can do input and output, call subprograms, and alter external vectors and variables. The action normally sets the pseudo variable "$$" to some value to return a value. For example, an action that does nothing but return the value Lis {$$ = 1;) To obtain the values returned by previous actions, we use the pseudo-variables $1, $2, .. ., which refer to the values returned by the grammar symbol on the right side of a rule, reading from left to right. 7. ANALYSIS SYNTAX DIRECTED DEFINATION Are a generalizations of context-free grammars in which 1. Grammar symbols have an associated set of Attributes; 2. Productions are associated with Semantic Rules for computing the values of attributes, ‘+ Such formalism generates Annotated Parse-Trees where each node of the tree is a record with a field for each attribute (e.g., X.a indicates the attribute a of the grammar symbol X). + The value of an attribute of a grammar symbol at a given parse-tree node is defined by a semantic rule associated with the production used at that node. ‘+ We distinguish between two kinds of attributes: 1, Synthesized Attributes: They are computed from the values of the attributes of the children nodes. Page no: 23 Downloaded by SheeRehiewaisiponccBaeselbmole| ten)get real-time updates from RGPVDownloaded from be.zgpvnctes.in © pcpyilirHNN 2. Inherited Attributes: They are computed from the values of the attributes of both the siblings and the Parent nodes. Form of Syntax Directed Defini Each production, A | a, is associated with a set of semantic rules: b := f(cl, ¢2,..., ck), where fis a function and either. bis a synthesized attribute of A, and cl, ¢2,..., ck are attributes of the grammar symbols of the production, or bis an inherited attribute of a grammar symbol in a, and cl, c2,..., ck are attributes of grammar symbols in aor attributes of A. ‘Wecormnvembemnestoteen Ey studocu Page no: 24 Downlosded by SheeRehiewaisiponceBaeselbmole| ten)get real-time updates from RGPViy e el WINOTES.IN We hope you find these notes useful. You can get previous year question papers at https://qp.rgpvnotes.in . If you have any queries or you want to submit your study notes please write us at rgpvnotes.in@ gmail.com LIKE & FOLLOW US ON FACEBOOK facebook.com/rgpvnotes.in Downloaded by Sheela gkp Vidya (okpshesia200S¢@amel com)

Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
24 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
67 pages
2024 CD-Ch03 Syntaxx Analysis
No ratings yet
2024 CD-Ch03 Syntaxx Analysis
28 pages
PCC Unit 3
No ratings yet
PCC Unit 3
9 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
CD - CH3 - Syntax Analysis (Parsing)
No ratings yet
CD - CH3 - Syntax Analysis (Parsing)
109 pages
CD 2,3 Unit's Material
100% (1)
CD 2,3 Unit's Material
170 pages
CD Notes
No ratings yet
CD Notes
194 pages
BCS 324 Compiler Design Notes - Unit2
No ratings yet
BCS 324 Compiler Design Notes - Unit2
37 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
16 pages
CD Unit 3
No ratings yet
CD Unit 3
76 pages
Unit 2
No ratings yet
Unit 2
22 pages
Syntax Analysis Parsing
No ratings yet
Syntax Analysis Parsing
9 pages
Chapter 3 Syntax Analysis Full Reading Material
No ratings yet
Chapter 3 Syntax Analysis Full Reading Material
76 pages
Compiler Theory: (A Simple Syntax-Directed Translator)
No ratings yet
Compiler Theory: (A Simple Syntax-Directed Translator)
50 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
CH2 2
No ratings yet
CH2 2
30 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
37 pages
Compiler 2
100% (1)
Compiler 2
45 pages
Class Three
No ratings yet
Class Three
74 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
CH03
No ratings yet
CH03
57 pages
Unit-2 Syntax Analysis
No ratings yet
Unit-2 Syntax Analysis
27 pages
UNIT-2: Parsing
No ratings yet
UNIT-2: Parsing
18 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
42 pages
Parser
No ratings yet
Parser
40 pages
Compler
No ratings yet
Compler
35 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
17 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
Unit 2
No ratings yet
Unit 2
10 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
Cs1622 Parsing Part2 Bun
No ratings yet
Cs1622 Parsing Part2 Bun
5 pages
Syntax Analysis or Parsing
No ratings yet
Syntax Analysis or Parsing
11 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
Parsing Techniques Explained
No ratings yet
Parsing Techniques Explained
6 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Syntax Analysis for Programmers
No ratings yet
Syntax Analysis for Programmers
58 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
34 pages
Csc3205-Syntax - Analysis PDF
No ratings yet
Csc3205-Syntax - Analysis PDF
63 pages
Unit 2 (CD)
No ratings yet
Unit 2 (CD)
12 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
54 pages
Adama Science and Technology University: School of Electrical Engineering and Computing
No ratings yet
Adama Science and Technology University: School of Electrical Engineering and Computing
10 pages
CD Unit-2
No ratings yet
CD Unit-2
107 pages
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
No ratings yet
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
70 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
Compiler Designnotes
No ratings yet
Compiler Designnotes
18 pages
Atcd Unit 2
No ratings yet
Atcd Unit 2
49 pages

Compiler Engineering

Uploaded by

Compiler Engineering

Uploaded by

You might also like