Course : IT794 Compiler Construction
Lab Manuaul
Tools :
1. Lex / flex
2. Biason – Yacc
3. gcc
4. ANTLR (lex + yacc)
Installation and Setup
Windows
Option 1
o Flex
o Biason
o Dev-c++ => gcc
o http://www.surajgaikwad.com/2013/10/compile-lex-and-yacc-progs-on-windows.html
o Add path of flex, yacc , and gcc to environment variable PATH
Option 2:
o Winbison => flex + bison
o https://sourceforge.net/projects/winflexbison/
o Dev-c++ from link given in option 1
o Add path of win_bison.exe, and dev-c++/bin for gcc to environment variable PATH
Linux/Ubuntu
apt-get install bison
apt-get install flex
Pre-requisite:
Familiar with tokens, grammar, and implementation effect of programming language
Data structures like Array, Linked List
Define Your compiler project and write following specification
Source Language // mini version of existing programming language OR your own programming
language
e.g. mini C (program file extension .c)
Target Language e.g. assembly language (file extension .asm) compatible with TASM
Tokens Data Type (minimum 2) e.g. int, char
Keywords
Operators Binary Artihmetic operators, Assignment
operator, + <any other operator of your choice>
Constants
Control construct
Loop construct
Comments // specify your desired comment pattern for your
programming language
Special symbols
Lex Tool
Generates Lexical Analyzer for source programing language of your compiler
Recognize token based on given token pattern
Input : Lex file (.l extension) specify patterns and actions for all possible tokens of source programing
language
Structure of .l file
A lex specification consists of three parts:
%{
section 1 : regular definitions, C declarations, include file
%}
%%
section 2: translation rules
%%
section 3 : user-defined auxiliary procedures
Where, The translation rules specify is in following form , each pattern is specified using regular
ecpression
p1 { action1 }
p2 { action2 }
…
pn { actionn }
Notation of Regular Expression
x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r{1,5} match 1 to 5 occurrences of r
r{2,} match 2 or more occurrences of r
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
Examples 1 : spec. l // recognize Number pattern only
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf(“%s is number\n”, yytext); }
.|\n { } // ignore all other characters and new line
%%
int yywrap(void) ///override function for windows, if reference error for this fuction
{
return 0;
}
int yyerror(char *errormsg) ///override function for windows, if reference error for this fuction
{
fprintf(stderr, "%s\n", errormsg);
exit(1);
}
main()
{
yylex();
return 0;
}
Compile and Execute
flex spec.l
gcc lex.yy.c
a < test.c
Example 2: Use of defining regular definitions (like digit, letter )and use in pattern.
%{
#include <stdio.h>
%}
digit [0-9]
letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}