Type Checking
The general topic of type checking includes two parts
Type synthesis assigning a type to each expression in the language Type checking making sure that these types are used in contexts where they are legal, catching type-related errors
Strongly typed languages are ones in which every expression can be assigned an unambiguous type
Weakly typed languages could have run-time errors due to type incompatibility
Statically typed languages vs. dynamically type languages
Capable of being checked at compile time vs. not
Static type checking vs. dynamic type checking
Actually check types at compile time vs. run time Run time type checking is less efficient because type information must be stored 1
Base Types
Numbers
integer C specifies length in relative terms, short and long Java specifies specific lengths: byte, short, int and long have 8, 16, 32 and 64 bits, respectively floating point numbers Many languages have two sizes, 32 and 64 bit
Can use IEEE representation standards
Characters
Single letter, used to be 8 bit ASCII standard, now can also be a 16 bit Unicode
Booleans
Two values: true and false C uses the 0 and 1 subrange of integers
Strings
Some languages have strings as a base type with catenation operators 2
Compound Types: Arrays and Strings
Arrays aggregate of values of the same type
Arrays have a base type for elements, may have an indexing range for each dimension int a [100][25], in C If the indexing range is known, then the compiler can compute space allocation, otherwise indexing is relative and space is allocated by a run-time allocator The main operation is indexing; some languages allow whole array operations
Strings sequence of characters
Also can have bit strings C treats strings as arrays of characters Strings can have comparison operators that use lexicographic order fee < fie
3
More Compound Types
Records or Structures components may have different types and may be indexed by names struct { double r; int i; }
Representation as ordered product of elements
Variant records or Unions a component may be one of a choice of types union ( double r; int i; }
Representation can save space for the largest element and may have a tag to indicate which type the value is Take care not to have run-time (value) errors
Other Types
Enumerated types the programmer can create a type name for a specific set of constant values enum WeekDay {Sunday, Monday, Saturday}
Representation as for a small set
Pointers abstraction of addresses
Can create a reference to, or derefence an object Distinguish between pointer to integer and pointer to boolean, etc. C allows arithmetic on pointers
Void Classes may or may not create new types
Classes can be represented by an extended type of record for data and methods But are not just types since they also have features such as inheritance
5
Function Types, New Type Names
Procedure and function types are sometimes called signatures
Give number and types of parameters May include parameter-passing information such as by value or by reference Give type of result (or indicate no result) strlength : String -> unsigned int
Type declarations or type definitions allow the programmer to assign new type names to a type expression
These names should also be stored in the symbol table Will have scope and other attributes Definitions may be recursive in some languages If so, size of object will be unknown
6
Representing Types
Types can be represented as expressions, or quite often as trees:
array(9)
function
type
arglist
result type argn type
7
arg1 type
Type Equivalence
Name equivalence in this kind of equivalence, two types must have the same name
Assumes that the programmer will introduce new names exactly when they want types to be different if you say t1 = int and t2 = int, t1 and t2 are different
Structural equivalence two objects are interchangeable if their types have the same fields with equivalent types. int x[10][10] and int y[10][10] x and y have equivalent types, the 10 by 10 arrays
More complex situations arise in structural equivalence of other compound types, e.g. may have mutually recursive type definitions
Type checking rules extend this to a notion of type compatibility
8
Type Synthesis and Type Checking
Assigning types to language expressions (type synthesis) can be done by a traversal of the abstract syntax tree At each node, a type checking rule will say which types are allowed Description here is for languages with type declarations
Constants are assigned a type If there is not a known type, then there will be a set of possibles Variables are looked up in the symbol table Note that we are assuming an L-attributed grammar so that declarations are processed first Assignment The type of the assignable entity on the left must be the typeEqual to the type of the value on the right
9
Types for expressions
Arithmetic and other operators have result types defined in terms of the types of the subnodes in the tree Statements have substructures that need to be checked for type correctness
Condition of if and while statements must have type boolean
Array reference
Suppose we have exp1 -> exp2[exp3], then an adhoc SDT: if (isArrayType(exp2.type)) and typeEqual(exp3.type, integer) then exp1.type = exp2.type.basetype // get the basetype child else type-error(exp1)
Function calls have similar rules to check the signature of the function name the parameters
10
Issues for typeEqual
This is sometimes called type compatibility Overloading
Arithmetic operators 2 + 3 means integer addition 2.0 + 3.0 means floating pt addition Language may have an arithmetic operator table, telling which type of operator will be used based on the expressions Type of: a int int int float float double b int float double float double double a+b int float double float double double
11
Overloading Functions
Can declare the same function (or method) name with different numbers and types of parameters int max(int x,y) double max(double x,y)
Java and C++ allow such overloaded declarations
Need to augment the symbol table functionality to allow for the name to have multiple signatures
The lookup procedure is always give a name to look up we can add a typelist argument and the lookup can tell us if there is a function declared with that signature Or the lookup procedure is given the name to look up and in the case of a method, it can return sets of allowable signatures
12
Conversion and Coercion
The typeEqual comparison is commonly extend to allow arithmetic expressions of mixed type and for other cases of types which are compatible, but not equal
If mixed types are allowed in an arithmetic expression, then a conversion should be inserted (into the AST or the code) 2 * 3.14 becomes code like t1 = float ( 2) t2 = t1 * 3.14
Conversion from one type to another is said to be implicit it if is done automatically by the compiler, and is also called coercion Conversion is said to be explicit if the programmer must write the conversion
Called casts in C and Java languages
13
Widening and Narrowing
The rules for Java type conversion distinguishes between
Widening conversions which preserve information Narrowing conversion which can lose information
Conversions between primitive types in Java: (there is also widening and narrowing for references, i.e. objects) Widening Narrowing (usually with a cast) double float long int short byte
14
double float long int char char short byte
Generating Type Conversions
For an arithmetic expressions e1 e2 op e3, the algorithm can generally use widening, to possibly a third type that is greater than both the types of e2 and e3 in the widening tree: Let the new type of e1 be the max of e2.type and e3.type Generate a widening conversion of e1 if necessary Generate a widening conversion of e2 if necessary Set the type (and later the code) of e1 Type conversions and coercions also apply to assignment if r is a double and i is an int: allow r = i;
C also allows i = r, with the corresponding loss of information For classes, we have the subtype principle, objects of subclasses can be assigned to objects of superclasses (with the corresponding loss of information), but not vice versa
15
Continuing type synthesis and checking rules
Expressions: the two expressions involved with boolean operators, such as &&, must both be boolean Functions: the type of each actual parameter must be typeEqual to its formal parameter Classes:
if specified, the parent of the class must be a properly declared class If a class says that it implements an interface, then all methods of the interface must be implemented
16
Polymorphic Typing
A language is polymorphic is language constructs can have more than one type procedure swap(anytype x,y) where anytype is considered to be a type variable Polymorphic functions have type patterns or type schemes, instead of actual type expressions The type checker must check that the types of the actual parameters fit the pattern
Technically, the type checker must find a substitution of actual types for type variables that satisfies the type equivalence between the formal type pattern and the actual type expression In complex cases with recursion, may need to do unification to solve the substitution problem Most notably in the language ML
Java and C# allow polymorphic type constructors, called generics
17
References
Keith Cooper and Linda Torczon, Engineering a Compiler, Elsevier, 2004.. Kenneth C. Louden, Compiler Construction: Principles and Practices, PWS Publishing, 1997. Aho, Lam, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 2006. (The purple dragon book) Charles Fischer and Richard LeBlanc, Jr., Crafting a Compiler with C, Benjamin Cummings, 1991.
18