KEMBAR78
Declare Your Language: What is a Compiler? | PDF
Eelco Visser
IN4303 Compiler Construction
TU Delft
September 2017
Declare Your Language
Chapter 1: What is a Compiler?
Course Organization
2
3
No use of electronic devices during lectures
4
http://eelcovisser.org/
Brightspace: Announcements
5
Course Website
6
https://tudelft-in4303-2017.github.io/
7
8
9
10
GitHub for Code
11
https://github.com/TUDelft-IN4303-2017
WebLab for Grade Registration
12
https://weblab.tudelft.nl/in4303/2017-2018/
Sign in to WebLab using “Single Sign On for TU Delft”
13
Enroll in WebLab
14
https://weblab.tudelft.nl/in4303/2017-2018/
Spoofax Documentation
15
http://www.metaborg.org
Recommended Book
16
Lecture Notes (Under Construction)
17
http://www.declare-your-language.org
Grades for Lab Assignments
18
https://tudelft-in4303-2017.github.io/assignments/
Deadlines
19
https://tudelft-in4303-2017.github.io/assignments/
Academic Misconduct
20
DON’T!
What is a Compiler?
21
Etymology
22
Latin
Etymology
From con- (“with, together”) + pīlō (“ram down”).

Pronunciation
	•	(Classical) IPA(key): /komˈpiː.loː/, [kɔmˈpiː.ɫoː]

Verb
compīlō (present infinitive compīlāre, perfect active compīlāvī, supine
compīlātum); first conjugation

	1.	 I snatch together and carry off; plunder, pillage, rob, steal.

https://en.wiktionary.org/wiki/compilo#Latin
Dictionary
23
English
Verb
compile (third-person singular simple present compiles, present participle compiling,
simple past and past participle compiled)

	1.	 (transitive) To put together; to assemble; to make by gathering things from various
sources. Samuel Johnson compiled one of the most influential dictionaries of the English
language.

	2.	 (obsolete) To construct, build. quotations 

	3.	 (transitive, programming) To use a compiler to process source code and produce
executable code. After I compile this program I'll run it and see if it works.

	4.	 (intransitive, programming) To be successfully processed by a compiler into executable
code. There must be an error in my source code because it won't compile.

	5.	 (obsolete, transitive) To contain or comprise. quotations 

	6.	 (obsolete) To write; to compose.
https://en.wiktionary.org/wiki/compile
Etymology
24
The first compiler was written by Grace Hopper, in 1952, for the A-0 System
language. The term compiler was coined by Hopper.[1][2] The A-0 functioned more
as a loader or linker than the modern notion of a compiler.
https://en.wikipedia.org/wiki/History_of_compiler_construction
Compiling = Translating
25
High-Level
Language
compiler
Low-Level
Language
A compiler translates high-level programs to low-level programs
Compiling = Translating
26
C gcc X86
GCC translates C programs to object code for X86 (and other architectures)
Compiling = Translating
27
Java javac
JVM
bytecode
A Java compiler translates Java programs to bytecode instructions for Java Virtual Machine
Architecture: Multi-Pass Compiler
28
Java Type Check
JVM
bytecode
A modern compiler typically consists of sequence of stages or passes
Parse CodeGenOptimize
Intermediate Representations
29
Java Type Check
JVM
bytecode
A compiler is a composition of a series of translations between intermediate languages
Parse
CodeGen
Optimize
Abstract
Syntax
Tree
Annotated
AST
Transformed
AST
Parser
• Reads in program text, checks that it complies with the syntactic rules of the language, and
produces an abstract syntax tree, which represents the underlying (syntactic) structure of the
program.

Type checker
• Consumes an abstract syntax tree and checks that the program complies with the static semantic
rules of the language. To do that it needs to perform name analysis, relating uses of names to
declarations of names, and checks that the types of arguments of operations are consistent with
their specification.

Optimizer
• Consumes a (typed) abstract syntax tree and applies transformations that improve the program in
various dimensions such as execution time, memory consumption, and energy consumption.

Code generator
• Transforms the (typed, optimized) abstract syntax tree to instructions for a particular computer
architecture. (aka instruction selection)
Compiler Components (1)
30
Register allocator
• Assigns physical registers to symbolic registers in the generated instructions.

Linker
• Most modern languages support some form of modularity in order to divide programs into units.
When also supporting separate compilation, the compiler produces code for each program unit
separately. The linker takes the generated code for the program units and combines it into an
executable program.
Compiler Components (2)
31
Back-EndFront-End
Compiler = Front-end + Back-End
32
Java Type Check
JVM
bytecode
A compiler can typically be divided in a front-end (analysis) and a back-end (synthesis)
Parse CodeGenOptimize
Annotated
AST
Back-EndFront-End
Compiler = Front-end + Back-End
33
C Type Check X86Parse CodeGenOptimizeLLVM
A compiler can typically be divided in a front-end (analysis) and a back-end (synthesis)
Back-End
Front-End
Repurposing Back-End
34
C Type Check
X86
Repurposing: reuse a back-end for a different source language
Parse
CodeGenOptimizeLLVM
Front-End
C++ Type CheckParse
Back-EndFront-End
Retargeting Compiler
35
C Type Check X86
Retargeting: compile to different hardware architecture
Parse CodeGenOptimize
LLVM
Back-End
ArmCodeGenOptimize
Front-End
C++ Type CheckParse
What is a Compiler?
36
Java Type Check
JVM
bytecode
Parse CodeGenOptimize
Compiler Construction = Building Variants of Java?
A bunch of components for translating programs
Compiler
- translates high-level programs to machine code for a computer

Bytecode compiler
- generates code for a virtual machine

Just-in-time compiler
- defers (some aspects of) compilation to run time

Source-to-source compiler (transpiler)
- translate between high-level languages

Cross-compiler
- runs on different architecture than target architecture
Types of Compilers (1)
37
Interpreter
- directly executes a program (although prior to execution
program is typically transformed)

Hardware compiler
- generate configuration for FPGA or integrated circuit

De-compiler
- translates from low-level language to high-level
language
Types of Compilers (2)
38
Why Compilers?
39
- fetch data from memory

- store data in register

- perform basic operation on data in register

- fetch instruction from memory

- update the program counter

- etc.
Programming = Instructing Computer
40
41
"Computational thinking is the thought processes
involved in formulating a problem and expressing its
solution(s) in such a way that a computer—human or
machine—can effectively carry out."
Jeanette M. Wing. Computational Thinking Benefits Society. 

In Social Issues in Computing. January 10, 2014. 

http://socialissues.cs.toronto.edu/index.html
42
Problem
Domain
Solution
Domain
Programming is expressing intent
43
Intermediate
Language
linguistic abstraction | liNGˈgwistik abˈstrakSHən |
noun
1. a programming language construct that captures a programming design pattern
the linguistic abstraction saved a lot of programming effort
he introduced a linguistic abstraction for page navigation in web programming
2. the process of introducing linguistic abstractions
linguistic abstraction for name binding removed the algorithmic encoding of name resolution
Problem
Domain
Solution
Domain
From Instructions to Expressions
44
mov &a, &c
add &b, &c
mov &a, &t1
sub &b, &t1
and &t1,&c
Source: http://sites.google.com/site/arch1utep/home/course_outline/translating-complex-expressions-into-assembly-language-using-expression-trees
c = a
c += b
t1 = a
t1 -= b
c &= t1
c = (a + b) & (a - b)
From Calling Conventions to Procedures
45
f(e1)
calc:
push eBP ; save old frame pointer
mov eBP,eSP ; get new frame pointer
sub eSP,localsize ; reserve place for locals
.
. ; perform calculations, leave result in AX
.
mov eSP,eBP ; free space for locals
pop eBP ; restore old frame pointer
ret paramsize ; free parameter space and return
push eAX ; pass some register result
push byte[eBP+20] ; pass some memory variable (FASM/TASM syntax)
push 3 ; pass some constant
call calc ; the returned result is now in eAX
def f(x)={ ... }
http://en.wikipedia.org/wiki/Calling_convention
function definition and call in Scala
From Malloc to Garbage Collection
46
/* Allocate space for an array with ten elements of type int. */
int *ptr = (int*)malloc(10 * sizeof (int));
if (ptr == NULL) {
/* Memory could not be allocated, the program
should handle the error here as appropriate. */
} else {
/* Allocation succeeded. Do something. */
free(ptr); /* We are done with the int objects,
and free the associated pointer. */
ptr = NULL; /* The pointer must not be used again,
unless re-assigned to using malloc again. */
}
http://en.wikipedia.org/wiki/Malloc
int [] = new int[10];
/* use it; gc will clean up (hopefully) */
Linguistic Abstraction
47
identify pattern
use new abstraction
language A language B
design abstraction
Compiler Automates Work of Programmer
48
Problem
Domain
Solution
Domain
General-
Purpose
Language
CompilerProgrammer
Compilers for modern high-level languages

- Reduce the gap between problem domain and program

- Support programming in terms of computational
concepts instead of machine concepts

- Abstract from hardware architecture (portability)

- Protect against a range of common programming errors
Domain-Specific Languages
49
- Systems programming

- Embedded software

- Web programming

- Enterprise software

- Database programming

- Distributed programming

- Data analytics

- ...
Domains of Computation
50
Problem
Domain
Solution
Domain
General-
Purpose
Language
51
Problem
Domain
Solution
Domain
General-
Purpose
Language
“A programming language is low level when
its programs require attention to the irrelevant”
Alan J. Perlis. Epigrams on Programming.
SIGPLAN Notices, 17(9):7-13, 1982.
52
Solution
Domain
Problem
Domain
Domain-specific language (DSL)
noun
1. a programming language that provides notation, analysis,
verification, and optimization specialized to an application
domain
2. result of linguistic abstraction beyond general-purpose
computation
General-
Purpose
Language
Domain-
Specific
Language
Domain Analysis
- What are the features of the domain?

Language Design
- What are adequate linguistic abstractions?

- Coverage: can language express everything in the domain?

‣ often the domain is unbounded; language design is making choice what to cover

- Minimality: but not more

‣ allowing too much interferes with multi-purpose goal

Semantics
- What is the semantics of such definitions?

- How can we verify the correctness / consistency of language definitions?

Implementation
- How do we derive efficient language implementations from such definitions?

Evaluation
- Apply to new and existing languages to determine adequacy
Language Design Methodology
53
54
Solution
Domain
Problem
Domain
General-
Purpose
Language
Domain-
Specific
Language
55
Solution
Domain
Problem
Domain
General-
Purpose
Language
Domain-
Specific
Language
Making programming languages
is probably very expensive?
56
General-
Purpose
Language
Making programming languages
is probably very expensive?
Solution
Domain
Problem
Domain
General-
Purpose
Language
Domain-
Specific
Language
Language
Design
Compiler +
Editor (IDE)
57
Compiler +
Editor (IDE)
Meta-Linguistic Abstraction
Language
Design
General-
Purpose
Language
Declarative
Meta
Languages
Solution
Domain
Problem
Domain
General-
Purpose
Language
Domain-
Specific
Language
Language
Design
Applying compiler construction to the domain of compiler construction
58
Compiler +
Editor (IDE)
Language
Design
General-
Purpose
Language
Declarative
Meta
Languages
Solution
Domain
Problem
Domain
General-
Purpose
Language
Language
Design
That also applies to the definition of (compilers for) general purpose languages
59
Compiler +
Editor (IDE)
Language
Design
Declarative
Meta
Languages
60
Language Workbench
Language Design
Syntax
Definition
Static
Semantics
Dynamic
Semantics
Transforms
Meta-DSLs
Compiler +
Editor (IDE)
61
A Language Designer’s Workbench
Language Design
SDF3 Stratego
Consistency
Proof
NaBL2 DynSem
Responsive
Editor (IDE)
Tests
Incremental
Compiler
Syntax
Definition
Static
Semantics
Dynamic
Semantics
Transforms
Objective
- A workbench supporting design and implementation of programming languages

Approach
- Declarative multi-purpose domain-specific meta-languages

Meta-Languages
- Languages for defining languages

Domain-Specific
- Linguistic abstractions for domain of language definition (syntax, names, types, …)

Multi-Purpose
- Derivation of interpreters, compilers, rich editors, documentation, and verification
from single source

Declarative
- Focus on what not how; avoid bias to particular purpose in language definition
Declarative Language Definition
62
Representation
- Standardized representation for <aspect> of programs

- Independent of specific object language

Specification Formalism
- Language-specific declarative rules

- Abstract from implementation concerns

Language-Independent Interpretation
- Formalism interpreted by language-independent algorithm

- Multiple interpretations for different purposes

- Reuse between implementations of different languages
Separation of Concerns
63
SDF3: Syntax definition
- context-free grammars + disambiguation + constructors + templates

- derivation of parser, formatter, syntax highlighting, …

NaBL2: Names & Types
- name resolution with scope graphs

- type checking/inference with constraints

- derivation of name & type resolution algorithm

Stratego: Program Transformation
- term rewrite rules with programmable rewriting strategies

- derivation of program transformation system

FlowSpec: Data-Flow Analysis
- extraction of control-flow graph and specification of data-flow rules

- derivation of data-flow analysis engine

DynSem: Dynamic Semantics
- specification of operational (natural) semantics 

- derivation of interpreter
Meta-Languages in Spoofax Language Workbench
64
The Spoofax Language Workbench
- Lennart C. L. Kats, Eelco Visser

- OOPSLA 2010 

A Language Designer's Workbench
- A one-stop-shop for implementation and verification of language designs 

- Eelco Visser, Guido Wachsmuth, Andrew P. Tolmach, Pierre Neron, Vlad A. Vergu,
Augusto Passalaqua, Gabriël D. P. Konat

- Onward 2014
Literature
65
A Taste of Compiler
Construction
66
Language Definition in Spoofax Language Workbench
67
SDF3: Syntax
Definition
NaBL2: Static
Semantics
DynSem: Dynamic
Semantics
Programming
Environment+ + Stratego: Program
Transformation+
Calc: A Little Calculator Language
68
rY = 0.017; // yearly interest rate
Y = 30; // number of years
P = 379,000; // principal
N = Y * 12; // number of months
c = if(rY == 0) // no interest
P / N
else
let r = rY / 12 in
let f = (1 + r) ^ N in
(r * P * f) / (f - 1);
c; // payment per month
https://github.com/MetaBorgCube/metaborg-calc
http://www.metaborg.org/en/latest/source/langdev/meta/lang/tour/index.html
Calc: Syntax Definition
69
context-free syntax // numbers
Exp = <(<Exp>)> {bracket}
Exp.Num = NUM
Exp.Min = <-<Exp>>
Exp.Pow = <<Exp> ^ <Exp>> {right}
Exp.Mul = <<Exp> * <Exp>> {left}
Exp.Div = <<Exp> / <Exp>> {left}
Exp.Sub = <<Exp> - <Exp>> {left, prefer}
Exp.Add = <<Exp> + <Exp>> {left}
Exp.Eq = <<Exp> == <Exp>> {non-assoc}
Exp.Neq = <<Exp> != <Exp>> {non-assoc}
Exp.Gt = [[Exp] > [Exp]] {non-assoc}
Exp.Lt = [[Exp] < [Exp]] {non-assoc}
context-free syntax // variables and functions
Exp.Var = ID
Exp.Let = <
let <ID> = <Exp> in
<Exp>
>
Exp.Fun = < <ID+> . <Exp>>
Exp.App = <<Exp> <Exp>> {left}
Calc: Type System
70
rules // numbers
[[ Num(x) ^ (s) : NumT() ]].
[[ Pow(e1, e2) ^ (s) : NumT() ]] :=
[[ e1 ^ (s) : NumT() ]],
[[ e2 ^ (s) : NumT() ]].
[[ Mul(e1, e2) ^ (s) : NumT() ]] :=
[[ e1 ^ (s) : NumT() ]],
[[ e2 ^ (s) : NumT() ]].
[[ Add(e1, e2) ^ (s) : NumT() ]] :=
[[ e1 ^ (s) : NumT() ]],
[[ e2 ^ (s) : NumT() ]].
rules // variables and functions
[[ Var(x) ^ (s) : ty ]] :=
{x} -> s, {x} |-> d, d : ty.
[[ Let(x, e1, e2) ^ (s) : ty2 ]] :=
new s_let, {x} <- s_let, {x} : ty, s_let -P-> s,
[[ e1 ^ (s) : ty ]],
[[ e2 ^ (s_let) : ty2 ]].
[[ Fun([x], e) ^ (s) : FunT(ty1, ty2) ]] :=
new s_fun, {x} <- s_fun, {x} : ty1, s_fun -P-> s,
[[ e ^ (s_fun) : ty2 ]].
[[ App(e1, e2) ^ (s) : ty_res ]] :=
[[ e1 ^ (s) : ty_fun ]],
[[ e2 ^ (s) : ty_arg ]],
FunT(ty_arg, ty_res) instOf ty_fun.
Calc: Dynamic Semantics
71
rules // numbers
Num(n) --> NumV(parseB(n))
Pow(NumV(i), NumV(j)) --> NumV(powB(i, j))
Mul(NumV(i), NumV(j)) --> NumV(mulB(i, j))
Div(NumV(i), NumV(j)) --> NumV(divB(i, j))
Sub(NumV(i), NumV(j)) --> NumV(subB(i, j))
Add(NumV(i), NumV(j)) --> NumV(addB(i, j))
Lt(NumV(i), NumV(j)) --> BoolV(ltB(i, j))
Eq(NumV(i), NumV(j)) --> BoolV(eqB(i, j))
rules // variables and functions
E |- Var(x) --> E[x]
E |- Fun([x], e) --> ClosV(x, e, E)
E |- Let(x, v1, e2) --> v
where E {x |--> v1, E} |- e2 --> v
App(ClosV(x, e, E), v_arg) --> v
where E {x |--> v_arg, E} |- e --> v
Calc: Code Generation
72
rules // numbers
exp-to-java : Num(v) -> $[BigDecimal.valueOf([v])]
exp-to-java :
Add(e1, e2) -> $[[je1].add([je2])]
with
<exp-to-java> e1 => je1
; <exp-to-java> e2 => je2
exp-to-java :
Sub(e1, e2) -> $[[je1].subtract([je2])]
with
<exp-to-java> e1 => je1
; <exp-to-java> e2 => je2
rules // variables and functions
exp-to-java : Var(x) -> $[[x]]
exp-to-java :
Let(x, e1, e2) -> $[(([jty]) [x] -> [je2]).apply([je1])]
with
<nabl2-get-ast-type> e1 => ty1
; <nabl2-get-ast-type> e2 => ty2
; <type-to-java> FunT(ty1, ty2) => jty
; <exp-to-java> e1 => je1
; <exp-to-java> e2 => je2
exp-to-java :
f@Fun([x], e) -> $[(([jty]) [x] -> [je])]
with
<nabl2-get-ast-type> f => ty
; <type-to-java> ty => jty
; <exp-to-java> e => je
exp-to-java:
App(e1, e2) -> $[[e1].apply([e2])]
with
<exp-to-java> e1 => je1
; <exp-to-java> e2 => je2
Lecture Language: Tiger
73
https://github.com/MetaBorgCube/metaborg-tiger
Studying Compiler
Construction
74
The Basis
75
Java Type Check
JVM
bytecode
Parse CodeGenOptimize
76
Compiler construction techniques
are applicable in a wide range of
software (development) applications
Specific
• Understanding a specific compiler

• Understanding a programming language (MiniJava)

• Understanding a target machine (Java Virtual Machine)

• Understanding a compilation scheme (MiniJava to Byte Code)

Architecture
• Understanding architecture of compilers

• Understanding (concepts of) programming languages

• Understanding compilation techniques

Domains
• Understanding (principles of) syntax definition and parsing

• Understanding (principles of) static semantics and type checking

• Understanding (principles of) dynamic semantics and interpretation/code generation

Meta
• Understanding meta-languages and their compilation
Levels of Understanding Compilers
77
Q1
• What is a compiler? (Introduction)

• Syntax Definition

• Basic Parsing

• Term Rewriting

• Static Semantics & Name Resolution

• Type Constraints

• Constraint Resolution

Q2
• Dynamic Semantics

• Virtual Machines & Code Generation

• Just-in-Time Compilation (Interpreters & Partial Evaluation)

• Data-Flow Analysis

• Garbage Collection

• Advanced Parsing

• Overview
Lectures (Tentative)
78
Lectures: Tuesday, 17:45
in Lecture Hall Pi at EWI
Lectures are recorded with
Collegerama; but only if
attendance is sufficient

Declare Your Language: What is a Compiler?

  • 1.
    Eelco Visser IN4303 CompilerConstruction TU Delft September 2017 Declare Your Language Chapter 1: What is a Compiler?
  • 2.
  • 3.
    3 No use ofelectronic devices during lectures
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    WebLab for GradeRegistration 12 https://weblab.tudelft.nl/in4303/2017-2018/
  • 13.
    Sign in toWebLab using “Single Sign On for TU Delft” 13
  • 14.
  • 15.
  • 16.
  • 17.
    Lecture Notes (UnderConstruction) 17 http://www.declare-your-language.org
  • 18.
    Grades for LabAssignments 18 https://tudelft-in4303-2017.github.io/assignments/
  • 19.
  • 20.
  • 21.
    What is aCompiler? 21
  • 22.
    Etymology 22 Latin Etymology From con- (“with,together”) + pīlō (“ram down”). Pronunciation • (Classical) IPA(key): /komˈpiː.loː/, [kɔmˈpiː.ɫoː] Verb compīlō (present infinitive compīlāre, perfect active compīlāvī, supine compīlātum); first conjugation 1. I snatch together and carry off; plunder, pillage, rob, steal. https://en.wiktionary.org/wiki/compilo#Latin
  • 23.
    Dictionary 23 English Verb compile (third-person singularsimple present compiles, present participle compiling, simple past and past participle compiled) 1. (transitive) To put together; to assemble; to make by gathering things from various sources. Samuel Johnson compiled one of the most influential dictionaries of the English language. 2. (obsolete) To construct, build. quotations 3. (transitive, programming) To use a compiler to process source code and produce executable code. After I compile this program I'll run it and see if it works. 4. (intransitive, programming) To be successfully processed by a compiler into executable code. There must be an error in my source code because it won't compile. 5. (obsolete, transitive) To contain or comprise. quotations 6. (obsolete) To write; to compose. https://en.wiktionary.org/wiki/compile
  • 24.
    Etymology 24 The first compilerwas written by Grace Hopper, in 1952, for the A-0 System language. The term compiler was coined by Hopper.[1][2] The A-0 functioned more as a loader or linker than the modern notion of a compiler. https://en.wikipedia.org/wiki/History_of_compiler_construction
  • 25.
    Compiling = Translating 25 High-Level Language compiler Low-Level Language Acompiler translates high-level programs to low-level programs
  • 26.
    Compiling = Translating 26 Cgcc X86 GCC translates C programs to object code for X86 (and other architectures)
  • 27.
    Compiling = Translating 27 Javajavac JVM bytecode A Java compiler translates Java programs to bytecode instructions for Java Virtual Machine
  • 28.
    Architecture: Multi-Pass Compiler 28 JavaType Check JVM bytecode A modern compiler typically consists of sequence of stages or passes Parse CodeGenOptimize
  • 29.
    Intermediate Representations 29 Java TypeCheck JVM bytecode A compiler is a composition of a series of translations between intermediate languages Parse CodeGen Optimize Abstract Syntax Tree Annotated AST Transformed AST
  • 30.
    Parser • Reads inprogram text, checks that it complies with the syntactic rules of the language, and produces an abstract syntax tree, which represents the underlying (syntactic) structure of the program. Type checker • Consumes an abstract syntax tree and checks that the program complies with the static semantic rules of the language. To do that it needs to perform name analysis, relating uses of names to declarations of names, and checks that the types of arguments of operations are consistent with their specification. Optimizer • Consumes a (typed) abstract syntax tree and applies transformations that improve the program in various dimensions such as execution time, memory consumption, and energy consumption. Code generator • Transforms the (typed, optimized) abstract syntax tree to instructions for a particular computer architecture. (aka instruction selection) Compiler Components (1) 30
  • 31.
    Register allocator • Assignsphysical registers to symbolic registers in the generated instructions. Linker • Most modern languages support some form of modularity in order to divide programs into units. When also supporting separate compilation, the compiler produces code for each program unit separately. The linker takes the generated code for the program units and combines it into an executable program. Compiler Components (2) 31
  • 32.
    Back-EndFront-End Compiler = Front-end+ Back-End 32 Java Type Check JVM bytecode A compiler can typically be divided in a front-end (analysis) and a back-end (synthesis) Parse CodeGenOptimize Annotated AST
  • 33.
    Back-EndFront-End Compiler = Front-end+ Back-End 33 C Type Check X86Parse CodeGenOptimizeLLVM A compiler can typically be divided in a front-end (analysis) and a back-end (synthesis)
  • 34.
    Back-End Front-End Repurposing Back-End 34 C TypeCheck X86 Repurposing: reuse a back-end for a different source language Parse CodeGenOptimizeLLVM Front-End C++ Type CheckParse
  • 35.
    Back-EndFront-End Retargeting Compiler 35 C TypeCheck X86 Retargeting: compile to different hardware architecture Parse CodeGenOptimize LLVM Back-End ArmCodeGenOptimize Front-End C++ Type CheckParse
  • 36.
    What is aCompiler? 36 Java Type Check JVM bytecode Parse CodeGenOptimize Compiler Construction = Building Variants of Java? A bunch of components for translating programs
  • 37.
    Compiler - translates high-levelprograms to machine code for a computer Bytecode compiler - generates code for a virtual machine Just-in-time compiler - defers (some aspects of) compilation to run time Source-to-source compiler (transpiler) - translate between high-level languages Cross-compiler - runs on different architecture than target architecture Types of Compilers (1) 37
  • 38.
    Interpreter - directly executesa program (although prior to execution program is typically transformed) Hardware compiler - generate configuration for FPGA or integrated circuit De-compiler - translates from low-level language to high-level language Types of Compilers (2) 38
  • 39.
  • 40.
    - fetch datafrom memory - store data in register - perform basic operation on data in register - fetch instruction from memory - update the program counter - etc. Programming = Instructing Computer 40
  • 41.
    41 "Computational thinking isthe thought processes involved in formulating a problem and expressing its solution(s) in such a way that a computer—human or machine—can effectively carry out." Jeanette M. Wing. Computational Thinking Benefits Society. In Social Issues in Computing. January 10, 2014. http://socialissues.cs.toronto.edu/index.html
  • 42.
  • 43.
    43 Intermediate Language linguistic abstraction |liNGˈgwistik abˈstrakSHən | noun 1. a programming language construct that captures a programming design pattern the linguistic abstraction saved a lot of programming effort he introduced a linguistic abstraction for page navigation in web programming 2. the process of introducing linguistic abstractions linguistic abstraction for name binding removed the algorithmic encoding of name resolution Problem Domain Solution Domain
  • 44.
    From Instructions toExpressions 44 mov &a, &c add &b, &c mov &a, &t1 sub &b, &t1 and &t1,&c Source: http://sites.google.com/site/arch1utep/home/course_outline/translating-complex-expressions-into-assembly-language-using-expression-trees c = a c += b t1 = a t1 -= b c &= t1 c = (a + b) & (a - b)
  • 45.
    From Calling Conventionsto Procedures 45 f(e1) calc: push eBP ; save old frame pointer mov eBP,eSP ; get new frame pointer sub eSP,localsize ; reserve place for locals . . ; perform calculations, leave result in AX . mov eSP,eBP ; free space for locals pop eBP ; restore old frame pointer ret paramsize ; free parameter space and return push eAX ; pass some register result push byte[eBP+20] ; pass some memory variable (FASM/TASM syntax) push 3 ; pass some constant call calc ; the returned result is now in eAX def f(x)={ ... } http://en.wikipedia.org/wiki/Calling_convention function definition and call in Scala
  • 46.
    From Malloc toGarbage Collection 46 /* Allocate space for an array with ten elements of type int. */ int *ptr = (int*)malloc(10 * sizeof (int)); if (ptr == NULL) { /* Memory could not be allocated, the program should handle the error here as appropriate. */ } else { /* Allocation succeeded. Do something. */ free(ptr); /* We are done with the int objects, and free the associated pointer. */ ptr = NULL; /* The pointer must not be used again, unless re-assigned to using malloc again. */ } http://en.wikipedia.org/wiki/Malloc int [] = new int[10]; /* use it; gc will clean up (hopefully) */
  • 47.
    Linguistic Abstraction 47 identify pattern usenew abstraction language A language B design abstraction
  • 48.
    Compiler Automates Workof Programmer 48 Problem Domain Solution Domain General- Purpose Language CompilerProgrammer Compilers for modern high-level languages - Reduce the gap between problem domain and program - Support programming in terms of computational concepts instead of machine concepts - Abstract from hardware architecture (portability) - Protect against a range of common programming errors
  • 49.
  • 50.
    - Systems programming -Embedded software - Web programming - Enterprise software - Database programming - Distributed programming - Data analytics - ... Domains of Computation 50 Problem Domain Solution Domain General- Purpose Language
  • 51.
    51 Problem Domain Solution Domain General- Purpose Language “A programming languageis low level when its programs require attention to the irrelevant” Alan J. Perlis. Epigrams on Programming. SIGPLAN Notices, 17(9):7-13, 1982.
  • 52.
    52 Solution Domain Problem Domain Domain-specific language (DSL) noun 1.a programming language that provides notation, analysis, verification, and optimization specialized to an application domain 2. result of linguistic abstraction beyond general-purpose computation General- Purpose Language Domain- Specific Language
  • 53.
    Domain Analysis - Whatare the features of the domain? Language Design - What are adequate linguistic abstractions? - Coverage: can language express everything in the domain? ‣ often the domain is unbounded; language design is making choice what to cover - Minimality: but not more ‣ allowing too much interferes with multi-purpose goal Semantics - What is the semantics of such definitions? - How can we verify the correctness / consistency of language definitions? Implementation - How do we derive efficient language implementations from such definitions? Evaluation - Apply to new and existing languages to determine adequacy Language Design Methodology 53
  • 54.
  • 55.
  • 56.
    56 General- Purpose Language Making programming languages isprobably very expensive? Solution Domain Problem Domain General- Purpose Language Domain- Specific Language Language Design Compiler + Editor (IDE)
  • 57.
    57 Compiler + Editor (IDE) Meta-LinguisticAbstraction Language Design General- Purpose Language Declarative Meta Languages Solution Domain Problem Domain General- Purpose Language Domain- Specific Language Language Design Applying compiler construction to the domain of compiler construction
  • 58.
  • 59.
  • 60.
  • 61.
    61 A Language Designer’sWorkbench Language Design SDF3 Stratego Consistency Proof NaBL2 DynSem Responsive Editor (IDE) Tests Incremental Compiler Syntax Definition Static Semantics Dynamic Semantics Transforms
  • 62.
    Objective - A workbenchsupporting design and implementation of programming languages Approach - Declarative multi-purpose domain-specific meta-languages Meta-Languages - Languages for defining languages Domain-Specific - Linguistic abstractions for domain of language definition (syntax, names, types, …) Multi-Purpose - Derivation of interpreters, compilers, rich editors, documentation, and verification from single source Declarative - Focus on what not how; avoid bias to particular purpose in language definition Declarative Language Definition 62
  • 63.
    Representation - Standardized representationfor <aspect> of programs - Independent of specific object language Specification Formalism - Language-specific declarative rules - Abstract from implementation concerns Language-Independent Interpretation - Formalism interpreted by language-independent algorithm - Multiple interpretations for different purposes - Reuse between implementations of different languages Separation of Concerns 63
  • 64.
    SDF3: Syntax definition -context-free grammars + disambiguation + constructors + templates - derivation of parser, formatter, syntax highlighting, … NaBL2: Names & Types - name resolution with scope graphs - type checking/inference with constraints - derivation of name & type resolution algorithm Stratego: Program Transformation - term rewrite rules with programmable rewriting strategies - derivation of program transformation system FlowSpec: Data-Flow Analysis - extraction of control-flow graph and specification of data-flow rules - derivation of data-flow analysis engine DynSem: Dynamic Semantics - specification of operational (natural) semantics - derivation of interpreter Meta-Languages in Spoofax Language Workbench 64
  • 65.
    The Spoofax LanguageWorkbench - Lennart C. L. Kats, Eelco Visser - OOPSLA 2010 A Language Designer's Workbench - A one-stop-shop for implementation and verification of language designs - Eelco Visser, Guido Wachsmuth, Andrew P. Tolmach, Pierre Neron, Vlad A. Vergu, Augusto Passalaqua, Gabriël D. P. Konat - Onward 2014 Literature 65
  • 66.
    A Taste ofCompiler Construction 66
  • 67.
    Language Definition inSpoofax Language Workbench 67 SDF3: Syntax Definition NaBL2: Static Semantics DynSem: Dynamic Semantics Programming Environment+ + Stratego: Program Transformation+
  • 68.
    Calc: A LittleCalculator Language 68 rY = 0.017; // yearly interest rate Y = 30; // number of years P = 379,000; // principal N = Y * 12; // number of months c = if(rY == 0) // no interest P / N else let r = rY / 12 in let f = (1 + r) ^ N in (r * P * f) / (f - 1); c; // payment per month https://github.com/MetaBorgCube/metaborg-calc http://www.metaborg.org/en/latest/source/langdev/meta/lang/tour/index.html
  • 69.
    Calc: Syntax Definition 69 context-freesyntax // numbers Exp = <(<Exp>)> {bracket} Exp.Num = NUM Exp.Min = <-<Exp>> Exp.Pow = <<Exp> ^ <Exp>> {right} Exp.Mul = <<Exp> * <Exp>> {left} Exp.Div = <<Exp> / <Exp>> {left} Exp.Sub = <<Exp> - <Exp>> {left, prefer} Exp.Add = <<Exp> + <Exp>> {left} Exp.Eq = <<Exp> == <Exp>> {non-assoc} Exp.Neq = <<Exp> != <Exp>> {non-assoc} Exp.Gt = [[Exp] > [Exp]] {non-assoc} Exp.Lt = [[Exp] < [Exp]] {non-assoc} context-free syntax // variables and functions Exp.Var = ID Exp.Let = < let <ID> = <Exp> in <Exp> > Exp.Fun = < <ID+> . <Exp>> Exp.App = <<Exp> <Exp>> {left}
  • 70.
    Calc: Type System 70 rules// numbers [[ Num(x) ^ (s) : NumT() ]]. [[ Pow(e1, e2) ^ (s) : NumT() ]] := [[ e1 ^ (s) : NumT() ]], [[ e2 ^ (s) : NumT() ]]. [[ Mul(e1, e2) ^ (s) : NumT() ]] := [[ e1 ^ (s) : NumT() ]], [[ e2 ^ (s) : NumT() ]]. [[ Add(e1, e2) ^ (s) : NumT() ]] := [[ e1 ^ (s) : NumT() ]], [[ e2 ^ (s) : NumT() ]]. rules // variables and functions [[ Var(x) ^ (s) : ty ]] := {x} -> s, {x} |-> d, d : ty. [[ Let(x, e1, e2) ^ (s) : ty2 ]] := new s_let, {x} <- s_let, {x} : ty, s_let -P-> s, [[ e1 ^ (s) : ty ]], [[ e2 ^ (s_let) : ty2 ]]. [[ Fun([x], e) ^ (s) : FunT(ty1, ty2) ]] := new s_fun, {x} <- s_fun, {x} : ty1, s_fun -P-> s, [[ e ^ (s_fun) : ty2 ]]. [[ App(e1, e2) ^ (s) : ty_res ]] := [[ e1 ^ (s) : ty_fun ]], [[ e2 ^ (s) : ty_arg ]], FunT(ty_arg, ty_res) instOf ty_fun.
  • 71.
    Calc: Dynamic Semantics 71 rules// numbers Num(n) --> NumV(parseB(n)) Pow(NumV(i), NumV(j)) --> NumV(powB(i, j)) Mul(NumV(i), NumV(j)) --> NumV(mulB(i, j)) Div(NumV(i), NumV(j)) --> NumV(divB(i, j)) Sub(NumV(i), NumV(j)) --> NumV(subB(i, j)) Add(NumV(i), NumV(j)) --> NumV(addB(i, j)) Lt(NumV(i), NumV(j)) --> BoolV(ltB(i, j)) Eq(NumV(i), NumV(j)) --> BoolV(eqB(i, j)) rules // variables and functions E |- Var(x) --> E[x] E |- Fun([x], e) --> ClosV(x, e, E) E |- Let(x, v1, e2) --> v where E {x |--> v1, E} |- e2 --> v App(ClosV(x, e, E), v_arg) --> v where E {x |--> v_arg, E} |- e --> v
  • 72.
    Calc: Code Generation 72 rules// numbers exp-to-java : Num(v) -> $[BigDecimal.valueOf([v])] exp-to-java : Add(e1, e2) -> $[[je1].add([je2])] with <exp-to-java> e1 => je1 ; <exp-to-java> e2 => je2 exp-to-java : Sub(e1, e2) -> $[[je1].subtract([je2])] with <exp-to-java> e1 => je1 ; <exp-to-java> e2 => je2 rules // variables and functions exp-to-java : Var(x) -> $[[x]] exp-to-java : Let(x, e1, e2) -> $[(([jty]) [x] -> [je2]).apply([je1])] with <nabl2-get-ast-type> e1 => ty1 ; <nabl2-get-ast-type> e2 => ty2 ; <type-to-java> FunT(ty1, ty2) => jty ; <exp-to-java> e1 => je1 ; <exp-to-java> e2 => je2 exp-to-java : f@Fun([x], e) -> $[(([jty]) [x] -> [je])] with <nabl2-get-ast-type> f => ty ; <type-to-java> ty => jty ; <exp-to-java> e => je exp-to-java: App(e1, e2) -> $[[e1].apply([e2])] with <exp-to-java> e1 => je1 ; <exp-to-java> e2 => je2
  • 73.
  • 74.
  • 75.
    The Basis 75 Java TypeCheck JVM bytecode Parse CodeGenOptimize
  • 76.
    76 Compiler construction techniques areapplicable in a wide range of software (development) applications
  • 77.
    Specific • Understanding aspecific compiler • Understanding a programming language (MiniJava) • Understanding a target machine (Java Virtual Machine) • Understanding a compilation scheme (MiniJava to Byte Code) Architecture • Understanding architecture of compilers • Understanding (concepts of) programming languages • Understanding compilation techniques Domains • Understanding (principles of) syntax definition and parsing • Understanding (principles of) static semantics and type checking • Understanding (principles of) dynamic semantics and interpretation/code generation Meta • Understanding meta-languages and their compilation Levels of Understanding Compilers 77
  • 78.
    Q1 • What isa compiler? (Introduction) • Syntax Definition • Basic Parsing • Term Rewriting • Static Semantics & Name Resolution • Type Constraints • Constraint Resolution Q2 • Dynamic Semantics • Virtual Machines & Code Generation • Just-in-Time Compilation (Interpreters & Partial Evaluation) • Data-Flow Analysis • Garbage Collection • Advanced Parsing • Overview Lectures (Tentative) 78 Lectures: Tuesday, 17:45 in Lecture Hall Pi at EWI Lectures are recorded with Collegerama; but only if attendance is sufficient