Compiler Design Assignment
Lexical Analysis
Submission Date- 30/08/2021
Name- Neha Vijay Khairnar
ID-191081036
Branch- IT
Aim- To write a LEX code and YACC code to do syntax analysis.
Problem Statement-
In LEX code detect the tokens (datatype, variable name, equal to operator,
comma, number, Semicolon).
In YACC code detect the syntax of variable declaration and definition.
Theory-
LEX code-
Lex is a computer program that generates lexical analyzers and was written by
Mike Lesk and Eric Schmidt.
Lex reads an input stream specifying the lexical analyzer and outputs source
code implementing the lexer in the C programming language.
The extension used for a lex code file is .l for example- sum.l and after
compilation of the lex code file a file lex.yy.c is generated which is basically
conversion of lex code into c code.
Compilation of lex code steps/sequence of commands-
1. flex lex_code_filename
2. gcc lex.yy.c or cc lex.yy.c -lfl
3. ./a.out
FLEX (fast lexical analyzer generator) is a tool/computer program for
generating lexical analyzers.
The function yylex() is automatically generated by the flex when it is provided
with a .l file and this yylex() function is expected by parser to call to retrieve
tokens from current/this token stream.
Steps for working of flex-
The figure below shows the steps followed by flex-
Step 1: An input file describes the lexical analyzer to be generated named lex.l is
written in lex language. The lex compiler transforms lex.l to C program, in a file
that is always named lex.yy.c.
Step 2: The C complier compile lex.yy.c file into an executable file called a.out.
Step 3: The output file a.out take a stream of input characters and produce a
stream of tokens.
Program Structure-
In the input file, there are 3 sections:
1) Definition Section: The definition section contains the declaration of variables,
regular definitions, manifest constants. In the definition section, text is enclosed
in “%{ %}” brackets. Anything written in this brackets is copied directly to the
file lex.yy.c
Syntax-
%{
//definitions
%}
2) Rules Section: The rules section contains a series of rules in the form: pattern
action and pattern must be unintended and action begin on the same line in {}
brackets. The rule section is enclosed in “%% %%”.
Syntax-
%%
Rules/Pattern actions
%%
3) User Code Section: This section contains C statements and additional
functions. We can also compile these functions separately and load with the
lexical analyzer.
Basic Program structure of LEX code-
%{
// Definitions
%}
%%
Rules
%%
User code section
YACC code -
YACC is a parse generator. A parser generator is a program that takes as input a
specification of a syntax, and produces as output a procedure for recognizing that
language.
YACC stands for Yet Another Compiler Compiler.
A YACC code file has the extension .y example- sum.y
The input of YACC is the rule or grammar and the output is a C program.
YACC was originally designed for being complemented by Lex.
Input: A yacc code file file.y
Output: A parser y.tab.c (yacc)
o The output file "file.output" contains the parsing tables.
o The file "file.tab.h" contains declarations.
o The parser called the yyparse ().
o Parser expects to use a function called yylex () to get tokens.
o If called with the –d option in the command line, Yacc produces as output a
header file y.tab.h with all its specific definition.
o If called with the –v option, Yacc produces as output a
file y.output containing a textual description of the LALR(1) parsing table
used by the parser. This is useful for tracking down how the parser solves
conflicts.
YACC input file is divided in three parts-
/* definitions */
....
%%
/* rules */
....
%%
/* auxiliary routines */
....
1) Definition part-
The definition part includes information about the tokens used in the syntax
definition:
Syntax-
%token NUMBER
%token ID
It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Yacc automatically assigns numbers for tokens, but it can be overridden by
%token NUMBER 621
2) Rule part-
The rules part contains grammar definition in a modified BNF form.
Actions is C code in { } and can be embedded inside.
3) Auxiliary Routine Part-
The auxiliary routines part is only C code.
It includes function definitions for every function needed in rules part.
It can also contain the main() function definition if the parser is going to be
run as a program.
The main() function must call the function yyparse().
For Compiling YACC Program:
1. Write lex program in a file file.l and yacc in a file file.y
2. Open Terminal and Navigate to the Directory where you have saved the files.
3. type lex file.l / flex file.l / flex file.l -lfl
4. type yacc file.y / yacc -d file.y
5. type cc lex.yy.c y.tab.h -ll / gcc lex.yy.c y.tab.c
6. type ./a.out
Solution-
LEX code to detect the token –
%{
#include <stdio.h>
#include "y.tab.h"
%}
DIGIT [0-9]
REAL {DIGIT}+[.]{DIGIT}*
LETTER [A-Za-z]
ASSIGN =
%%
[\t ] ;
int {printf("%s\tDataType\n",yytext);return (INT);}
float {printf("%s\t DataType\n",yytext);return (FLOAT);}
char {printf("%s\tDataType\n",yytext);return (CHAR);}
boolean {printf("%s\t DataType\n",yytext);return (BOOLEAN);}
true|false { printf("%s\tBOOLEAN VALue\n",yytext);return (BOOLEANVALUE);}
['][^\t\n]['] { printf("%s\tCHAR VALUE\n",yytext);return CHARVALUE;}
[a-zA-z]+[a-zA-z0-9_]* {printf("%s\tVariable Name\n",yytext);return ID;}
{REAL} { printf("%s\tNUMBER\n",yytext);return REAL;}
{DIGIT}+ { printf("%s\tNUMBER\n",yytext);return NUM;}
"," {printf("%s\tSpecial Symbol\n",yytext);return COMMA;}
";" {printf("%s\tSpecial Symbol\n",yytext);;return SC;}
{ASSIGN} {printf("%s\tOperator\n",yytext);return AS;}
\n return NL;
.;
%%
int yywrap()
{
return 1;
}
YACC code detect the syntax of variable declaration and definition.
%{
#include<stdio.h>
void yyerror(char*);
int yylex();
%}
%token ID SC INT CHAR FLOAT BOOLEAN BOOLEANVALUE CHARVALUE REAL
AS NUM COMMA NL
%%
s: type1|type2|type3|type4
;
type1:INT varlist SC NL { printf("Syntax is Correct\n"); return 0;}
;
type2:FLOAT varlist2 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type3:CHAR varlist3 SC NL{ printf("Syntax is Correct\n");return 0;}
;
type4:BOOLEAN varlist4 SC NL{ printf("Syntax is Correct\n");return 0;}
;
varlist: ID | ID COMMA varlist | ID AS NUM |ID AS NUM COMMA varlist | //THIS IS FOR
EPSILON CASE (EMPTY)
;
varlist2: ID | ID COMMA varlist2 | ID AS REAL |ID AS REAL COMMA varlist2 |
;
varlist3: ID | ID COMMA varlist3 | ID AS CHARVALUE | ID AS CHARVALUE COMMA
varlist3 |
;
varlist4: ID | ID COMMA varlist4 | ID AS BOOLEANVALUE | AS BOOLEANVALUE
COMMA varlist4 |
;
%%
void yyerror(char *s )
{
fprintf(stderr, "Syntax is not correct \n");
}
int main()
{
yyparse();
return 0;
}
Output-
Input- int a=10
Input- int a=10,b;
Conclusion- Hence we successfully wrote a LEX code and YACC code to do syntax
analysis and solved the given problem statement.