KEMBAR78
Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT | PDF | Parsing | Compiler
0% found this document useful (0 votes)
115 views8 pages

Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT

Neha Vijay Khairnar wrote a LEX code and YACC code to detect tokens and syntax in a compiler design assignment. The LEX code detects datatypes, variable names, operators, numbers, and other tokens. The YACC code detects the syntax of variable declarations with different data types like int, float, char, and boolean. Both codes were tested and able to correctly identify syntax in sample inputs. Overall, the problem statement of writing LEX and YACC codes for syntax analysis was achieved.

Uploaded by

Krishna Ashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views8 pages

Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT

Neha Vijay Khairnar wrote a LEX code and YACC code to detect tokens and syntax in a compiler design assignment. The LEX code detects datatypes, variable names, operators, numbers, and other tokens. The YACC code detects the syntax of variable declarations with different data types like int, float, char, and boolean. Both codes were tested and able to correctly identify syntax in sample inputs. Overall, the problem statement of writing LEX and YACC codes for syntax analysis was achieved.

Uploaded by

Krishna Ashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Compiler Design Assignment

Lexical Analysis
Submission Date- 30/08/2021
Name- Neha Vijay Khairnar
ID-191081036
Branch- IT

Aim- To write a LEX code and YACC code to do syntax analysis.

Problem Statement-
 In LEX code detect the tokens (datatype, variable name, equal to operator,
comma, number, Semicolon).
 In YACC code detect the syntax of variable declaration and definition.

Theory-
 LEX code-
 Lex is a computer program that generates lexical analyzers and was written by
Mike Lesk and Eric Schmidt.
 Lex reads an input stream specifying the lexical analyzer and outputs source
code implementing the lexer in the C programming language.
 The extension used for a lex code file is .l for example- sum.l and after
compilation of the lex code file a file lex.yy.c is generated which is basically
conversion of lex code into c code.
 Compilation of lex code steps/sequence of commands-
1. flex lex_code_filename
2. gcc lex.yy.c or cc lex.yy.c -lfl
3. ./a.out
 FLEX (fast lexical analyzer generator) is a tool/computer program for
generating lexical analyzers.
 The function yylex() is automatically generated by the flex when it is provided
with a .l file and this yylex() function is expected by parser to call to retrieve
tokens from current/this token stream. 
 Steps for working of flex-
The figure below shows the steps followed by flex-

Step 1: An input file describes the lexical analyzer to be generated named lex.l is
written in lex language. The lex compiler transforms lex.l to C program, in a file
that is always named lex.yy.c. 
Step 2: The C complier compile lex.yy.c file into an executable file called a.out. 
Step 3: The output file a.out take a stream of input characters and produce a
stream of tokens. 

 Program Structure-
In the input file, there are 3 sections: 
1) Definition Section: The definition section contains the declaration of variables,
regular definitions, manifest constants. In the definition section, text is enclosed
in “%{ %}” brackets. Anything written in this brackets is copied directly to the
file lex.yy.c
Syntax-
%{
//definitions
%}

2) Rules Section: The rules section contains a series of rules in the form: pattern


action and pattern must be unintended and action begin on the same line in {}
brackets. The rule section is enclosed in “%% %%”.
Syntax-
%%
Rules/Pattern actions
%%

3) User Code Section: This section contains C statements and additional


functions. We can also compile these functions separately and load with the
lexical analyzer.
 Basic Program structure of LEX code-
%{
// Definitions
%}

%%
Rules
%%

User code section

 YACC code -
 YACC is a parse generator. A parser generator is a program that takes as input a
specification of a syntax, and produces as output a procedure for recognizing that
language. 

 YACC stands for Yet Another Compiler Compiler.


 A YACC code file has the extension .y example- sum.y
 The input of YACC is the rule or grammar and the output is a C program.
  YACC was originally designed for being complemented by Lex.

Input: A yacc code file file.y

Output: A parser y.tab.c (yacc)

o The output file "file.output" contains the parsing tables.


o The file "file.tab.h" contains declarations.
o The parser called the yyparse ().
o Parser expects to use a function called yylex () to get tokens.
o If called with the –d option in the command line, Yacc produces as output a
header file y.tab.h with all its specific definition.
o If called with the –v option, Yacc produces as output a
file y.output containing a textual description of the LALR(1) parsing table
used by the parser. This is useful for tracking down how the parser solves
conflicts.

 YACC input file is divided in three parts-


/* definitions */
....

%%
/* rules */
....
%%

/* auxiliary routines */
....

1) Definition part-
 The definition part includes information about the tokens used in the syntax
definition: 
 Syntax-
%token NUMBER
%token ID
 It can also include the specification of the starting symbol in the grammar: 
%start nonterminal
 Yacc automatically assigns numbers for tokens, but it can be overridden by 
%token NUMBER 621

2) Rule part-
 The rules part contains grammar definition in a modified BNF form. 
 Actions is C code in { } and can be embedded inside.
3) Auxiliary Routine Part-
 The auxiliary routines part is only C code.
 It includes function definitions for every function needed in rules part.
 It can also contain the main() function definition if the parser is going to be
run as a program. 
 
 The main() function must call the function yyparse(). 

 For Compiling YACC Program: 


 
1. Write lex program in a file file.l and yacc in a file file.y 
 
2. Open Terminal and Navigate to the Directory where you have saved the files. 
 
3. type lex file.l  / flex file.l / flex file.l -lfl
 
4. type yacc file.y / yacc -d file.y 
 
5. type cc lex.yy.c y.tab.h -ll / gcc lex.yy.c y.tab.c
 
6. type ./a.out 

Solution-
LEX code to detect the token –
%{
#include <stdio.h>
#include "y.tab.h"

%}

DIGIT [0-9]
REAL {DIGIT}+[.]{DIGIT}*
LETTER [A-Za-z]
ASSIGN =
%%
[\t ] ;
int {printf("%s\tDataType\n",yytext);return (INT);}
float {printf("%s\t DataType\n",yytext);return (FLOAT);}
char {printf("%s\tDataType\n",yytext);return (CHAR);}
boolean {printf("%s\t DataType\n",yytext);return (BOOLEAN);}
true|false { printf("%s\tBOOLEAN VALue\n",yytext);return (BOOLEANVALUE);}
['][^\t\n]['] { printf("%s\tCHAR VALUE\n",yytext);return CHARVALUE;}
[a-zA-z]+[a-zA-z0-9_]* {printf("%s\tVariable Name\n",yytext);return ID;}
{REAL} { printf("%s\tNUMBER\n",yytext);return REAL;}
{DIGIT}+ { printf("%s\tNUMBER\n",yytext);return NUM;}
"," {printf("%s\tSpecial Symbol\n",yytext);return COMMA;}
";" {printf("%s\tSpecial Symbol\n",yytext);;return SC;}
{ASSIGN} {printf("%s\tOperator\n",yytext);return AS;}
\n return NL;
.;

%%

int yywrap()
{
return 1;
}

YACC code detect the syntax of variable declaration and definition.


%{
#include<stdio.h>
void yyerror(char*);
int yylex();

%}
%token ID SC INT CHAR FLOAT BOOLEAN BOOLEANVALUE CHARVALUE REAL
AS NUM COMMA NL

%%
s: type1|type2|type3|type4
;
type1:INT varlist SC NL { printf("Syntax is Correct\n"); return 0;}
;
type2:FLOAT varlist2 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type3:CHAR varlist3 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type4:BOOLEAN varlist4 SC NL{ printf("Syntax is Correct\n");return 0;}
;
varlist: ID | ID COMMA varlist | ID AS NUM |ID AS NUM COMMA varlist | //THIS IS FOR
EPSILON CASE (EMPTY)
;
varlist2: ID | ID COMMA varlist2 | ID AS REAL |ID AS REAL COMMA varlist2 |
;
varlist3: ID | ID COMMA varlist3 | ID AS CHARVALUE | ID AS CHARVALUE COMMA
varlist3 |
;
varlist4: ID | ID COMMA varlist4 | ID AS BOOLEANVALUE | AS BOOLEANVALUE
COMMA varlist4 |
;
%%
void yyerror(char *s )
{

fprintf(stderr, "Syntax is not correct \n");


}

int main()
{
yyparse();

return 0;
}

Output-
Input-  int a=10

Input- int a=10,b;

Conclusion- Hence we successfully wrote a LEX code and YACC code to do syntax
analysis and solved the given problem statement.

You might also like