UNIVERSITY OF PORTHARCOURT
A SEMINAR PRESENTATION ON
TYPE CHECKING IN COMPILER CONSTRUCTION
SUBMITTED BY: ANYAMA OSCAR UZOMA MATRIC N0.: G2012/MSC/COMP/FT/374
LECTURER NAME: DR. UGWU COURSE TITLE: COMPILER CONSTRUCTION
1|Page
DATE: 30TH MAY, 2013
IN PARTIAL FULFILLMENT OF THE AWARD OF A MASTERS DEGREE IN THE DEPARTMENT OF COMPUTER SCIENCE.
1.0 INTRODUCTION
What is a compiler? A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The reasoning for transforming the source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). 1.0.1 COMPILATION PHASES Lexical Analysis Phase Syntax Analysis Phase Semantic Analysis Phase This phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses the hierarchical structure determined by the syntax-analysis phase to identify the operators and operands of expressions and statements. An important component of semantic analysis is type checking. - Intermediate Code Generation - Code optimization - Code Generation Symbol Table Management: o Symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. o Records the identifier used in the source program and collect information about the identifier such as its type, (by semantic and intermediate code).
2|Page
Records its scope, (by semantic and intermediate code), storage allocation, (by code generation), number of arguments and its type for procedure, the type returned.
Error Detecting and Reporting: Each phase encounters errors. Semantic phase detects the constructs that have no meaning to operand. 1.2 TYPE SYSTEMS A type system associates types with each computed value. It is a collection of rules that assign a property called a type to the various constructssuch as variables, expressions, functions or modules. The main purpose of a type system is to reduce bugs in computer programs. An example of a simple type system is function declaration.. The interface of a function states the name of the function and a list of values that are passed to the function's code. The code of an invoking function states the name of the invoked, along with the names of variables that hold values to pass to it. The invoked function's code accesses the values and makes use of them. If the instructions inside the function are written with the assumption of receiving an integer value, but the calling code passed a floating-point value, then the wrong result will be computed by the invoked function. The compiler checks the type declared for each variable sent, against the type declared for each variable in the interface of the invoked function. If the types do not match, the compiler throws a compile-time error. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types for the program's constants, variables, and methods (functions), e.g., treating an integer (int) as a floating-point number (float). 1.2.1 TYPE What is a type? Formally, a type can be defined as "any property of a program we can determine without executing the program. A type is a set of values which a variable can possess and a set of functions that one can apply to these values. 1.2.2 Data types Primitive data types A built-in type is a data type for which the programming language provides built-in support Machine data types
3|Page
Machine data types need to be exposed or made available in systems or low-level programming languages, allowing fine-grained control over hardware eg bits, byte and octet representations of machine language. Boolean type: The Boolean type represents the values: true and false. Numeric types: Such as: The integer data types, or "whole numbers". May be subtyped according to their ability to contain negative values (e.g. unsigned in C and C++). May also have a small number of predefined subtypes (such as short and long in C/C++); or allow users to freely define subranges such as 1..12 (e.g. Pascal/Ada).
Floating point data types, sometimes misleadingly called reals, contain fractional values. They usually have predefined limits on both their maximum values and their precision. eg 20.0005, 2.3e5
Composite type Composite types are derived from more than one primitive type. eg An array stores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixed-length or expandable. Record (also called tuple or struct) Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members. Union, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and an integer; whereas, in a union, there is only one value at a time. An object contains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Enumerations The enumerated type, this has values which are different from each other, and which can be compared and assigned, but which do not necessarily have any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily. For example, the four suits in a deck of 4|Page
playing cards may be four enumerators named CLUB, DIAMOND, HEART, SPADE, belonging to an enumerated type named suit. If a variable V is declared having suit as its data type, one can assign any of those four values to it. o String and text types
Alphanumeric character: A letter of the alphabet, digit, blank space, punctuation mark, etc. Alphanumeric strings, a sequence of characters. They are typically used to represent words and text. o Pointers and references
The main non-composite, derived type is the pointer, a data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address o Function
It is a sequence of instructions for performing a particular task. Most programming languages, including most machine languages, allow the programmer to define functions. This allows the function to be called from multiple places, even from within itself (in which case it is called recursive). The programming language implementation takes care of returning value. o Abstract data type: Any type that does not specify an implementation is an abstract data type. For instance, a stack (which is an abstract type) can be implemented as an array (a contiguous block of memory containing multiple values), or as a linked list (a set of non-contiguous memory blocks linked by pointers).
Examples include: A queue is a first-in first-out list.
1.3
COMPILER CHECKS
A compiler must check that the source program follows both the syntactic and semantic conventions of the source language. Examples of static checks include: Type checks: A compiler should report an error if an operator is applied to an incompatible operand; for example, if an array variable and a function variable are added together. 5|Page
Flow-of-control checks: Statements that cause flow of control to leave a construct must have some place to which to transfer to transfer the flow of control. Uniqueness checks: There are situations in which an object must be defined exactly once.
2.0
TYPE CHECK
A type checker verifies that the type of a construct matches that expected by its context. A type check implements a type system For example, the built-in arithmetic operator mod in Pascal requires integer operands, so a type checker must verify that the operands of mod have type integer. o Similarly, the type checker must verify that dereferencing is applied only to a pointer. o Indexing is done only on an array. o A user-defined function is applied to the correct number and type of arguments. This checking can happen statically (at compile time), dynamically (at run time), or as a combination thereof.
2.1
PROPERTIES OF TYPE SYSTEMS
Type systems should be: Verifiable: there should be an algorithm (called a type checking algorithm) that can ensure that a program is well behaved. Type systems should be transparent: a programmer should be able to predict easily whether programs will type check. Type systems should be enforceable: type declarations should be statically checked as much as possible and otherwise dynamically checked.
6|Page
2.2
TYPE EXPRESSION
A basic type is a type expression. Among the basic types are Boolean, char, integer, and real. A special basic type, type_error, will signal an error during type checking. A type constructor applied to type expressions is a type expression. eg a matrix(3, 3) might be the type of a 33 matrix. We can then define typing rules such as the following rule for matrix multiplication:
matrix_{multiply} : matrix(k, m) \times matrix(m, n) \to matrix(k, n)
where k, m, n are arbitrary positive integer values.
Linear types Linear types, based on the theory of linear logic, and closely related to uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. such as 'str = str + "a"'
Table showing type check of a program
EXAMPLES OF TYPE CHECKING 1. Your code calls the pow() (raise to a power) library function, but you forgot to include math.h. Because you've supplied no prototype for the pow() function (its in math.h), the compiler warns you that it assumes pow() returns an int and that it assumes nothing about pow()'s parameters: somefile.cpp:6: warning: 7|Page
implicit declaration of function `int pow(...)' This is a problem since pow() actually returns a double. In addition, the compiler can't type-check (and possibly convert) values passed to pow() if it doesn't know how many and what type those parameters are supposed to be.
2.
You misspell the name of a function (or method) when you declare, define or call it: 1. void Foo(); 2. int main() 3. { 4. Foo(); 5. return 0; 6. } 7. void foo() 8. { 9. // do something 10.} somefile.o(address): undefined reference to `Foo(void)'
3.
The program divided by zero, as in: 1. 2. 3. 4. int scores = 500; int num = 0; int avg; avg = scores / num;
The program would crash saying: Floating exception
ERROR RECOVERY Finally, Since type checking has the potential for catching errors in the programs, it is important for a type checker to do something reasonable when an error is discovered .At the very least, the compiler must report the nature and location of the error. It is desirable for the type checker to recover from the errors, so it can check the rest of the input. Since error handling affects the type checking rules, it has to be designed into the system right from the start; the rules must be prepared to cope with errors. 8|Page
REFERENCES Aho, Alfred V.; Sethi, Ravi; and Ullman, Jeffrey D., Compilers: Principles, Techniques and Tools Allen, Randy; and Kennedy, Ken, Optimizing Compilers for Modern Architectures, Morgan Kaufmann compilerdesigndetails.blogspot.com/ http://compilerdesigndetails.blogspot.com/2012/02/v-behaviorurldefaultvmlo.html http://en.wikipedia.org/wiki/Compiler http://compilerdesigndetails.blogspot.com/2012/02/v-behaviorurldefaultvmlo.html Programming Languages: Application and Interpretation , Shriram Krishnamurthi, Brown University
9|Page