Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
760 views39 pages

Lec14 Type Checking

This lecture covers type checking in compilers, focusing on judgments and inference rules. It discusses the importance of type soundness, the structure of type checkers, and the process of type inference, including the addition of booleans and arrays to a simple language. The lecture also highlights the relationship between type checking and programming language semantics, emphasizing the significance of well-typed programs.

Uploaded by

menber988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
760 views39 pages

Lec14 Type Checking

This lecture covers type checking in compilers, focusing on judgments and inference rules. It discusses the importance of type soundness, the structure of type checkers, and the process of type inference, including the addition of booleans and arrays to a simple language. The lecture also highlights the relationship between type checking and programming language semantics, emphasizing the significance of well-typed programs.

Uploaded by

menber988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CS153: Compilers

Lecture 14: Type Checking

Stephen Chong
https://www.seas.harvard.edu/courses/cs153
Contains content from lecture notes by Steve Zdancewic and Greg Morrisett
Announcements

•HW4 Oat v1 out


•Due Tuesday Oct 29 (12 days)

Stephen Chong, Harvard University 2


Today

•Type checking
•Judgments and inference rules

Stephen Chong, Harvard University 3


Basic Architecture
Source Code

Parsing

Fro
nt Elaboration
en
d

Lowering
Ba
ck
end
Optimization

Code Generation

Target Code
Stephen Chong, Harvard University 4
Elaboration

Untyped Abstract Typed Abstract


Syntax Trees Syntax Trees

Stephen Chong, Harvard University 5


Undefined Programs
•After parsing, we have AST
•We can interpret AST, or compile it and execute
•But: not all programs are well defined
•E.g., 3/0, “hello” - 7, 42(19), using a variable that isn’t in
scope, ...
•Types allow us to rule out many of these undefined behaviors
•Types can be thought of as an approximation of a computation
•E.g., if expression e has type int, then it means that e will evaluate to
some integer value
•E.g., we can ensure we never treat an integer value as if it were a function

Stephen Chong, Harvard University 6


Type Soundness
•Key idea: a well-typed program when executed does not attempt
any undefined operation
•Make a model of the source language
•i.e., an interpreter, or other semantics
•This tells us which operations are partial
•Partiality is different for different languages
• E.g., “Hi” + “ world” and “na”*16 may be meaningful in some languages
•Construct a function to check types: tc : AST -> bool
•AST includes types (or type annotations)
•If tc e returns true, then interpreting e will not result in an undefined
operation
•Prove that tc is correct

Stephen Chong, Harvard University 7


Simple Language

type tipe =
Int_t
| Arrow_t of tipe*tipe
| Pair_t of tipe*tipe

type exp =
Var of var | Int of int
| Plus_i of exp*exp Note: function
| Lambda of var * tipe * exp arguments have
| App of exp*exp type annotation
| Pair of exp * exp
| Fst of exp | Snd of exp

Stephen Chong, Harvard University 8


Interpreter
let rec interp (env:var->value)(e:exp) =
match e with
| Var x -> env x
| Int i -> Int_v i
| Plus_i(e1,e2) ->
(match interp env e1, interp env e2 of
| Int_v i, Int_v j -> Int_v(i+j)
| _,_ -> failwith “Bad operands!”)
| Lambda(x,t,e) -> Closure_v{env=env,code=(x,e)}
| App(e1,e2) ->
(match (interp env e1, interp env e2) with
| Closure_v{env=cenv,code=(x,e)},v ->
interp (extend cenv x v) e
| _,_ -> failwith “Bad operands!”)
Stephen Chong, Harvard University 9
Type Checker
let rec tc (env:var->tipe) (e:exp) =
match e with
| Var x -> env x
| Int _ -> Int_t
| Plus_i(e1,e2) ->
(match tc env e1, tc env e with
| Int_t, Int_t -> Int_t
| _,_ -> failwith “...”)
| Lambda(x,t,e) -> Arrow_t(t,tc (extend env x t) e)
| App(e1,e2) ->
(match (tc env e1, tc env e2) with
| Arrow_t(t1,t2), t ->
if (t1 != t) then failwith “...” else t2
| _,_ -> failwith “...”)
Stephen Chong, Harvard University 10
Notes

•Type checker is almost like an approximation of the


interpreter!
•But interpreter evaluates function body only when function
applied
•Type checker always checks body of function
•We needed to assume the input of a function had some
type t1, and reflect this in type of function (t1->t2)
•At call site (e1 e2), we don’t know what closure e1 will
evaluate to, but can calculate type of e1 and check that
e2 has type of argument

Stephen Chong, Harvard University 11


Growing the Language

•Adding booleans...
type tipe = ... | Bool_t

type exp = ... | True | False | If of exp*exp*exp

let rec interp env e = ...


| True -> True_v
| False -> False_v
| If(e1,e2,e3) -> (match interp env e1 with
True_v -> interp env e2
| False_v -> interp env e3
| _ -> failwith “...”)
Stephen Chong, Harvard University 12
Type Checking
let rec tc (env:var->tipe) (e:exp) =
match e with
...
| True -> Bool_t
| False -> Bool_t
| If(e1,e2,e3) ->
(let (t1,t2,t3) = (tc env e1,tc env e2,tc env e3)
in
match t1 with
| Bool_t ->
if (t2 != t3) then error() else t2
| _ -> failwith “...”)

Stephen Chong, Harvard University 13


Type Inference

•Type checking is great if we already have enough


type annotations
•For our simple functional language, sufficient to have type
annotations for function arguments
•But what about if we tried to infer types?
•Reduce programmer burden!
•Efficient algorithms to do this: Hindley-Milner
•Essentially build constraints based on how expressions are
used and try to solve constraints
•Error messages for non-well-typed programs can be
challenging!
Stephen Chong, Harvard University 14
Polymorphism and Type Inference
•Polymorphism is the ability of code to be used on values of different
types.
•E.g., polymorphic function can be invoked with arguments of different types
•Polymorph means “many forms”
•OCaml has polymorphic types
•e.g., val swap : 'a ref -> 'a -> ‘a = ...
•But type inference for full polymorphic types is undecidable...
•OCaml has restricted form of polymorphism that allows type
inference: let-polymorphism aka prenex polymorphism
•Allow let expressions to be typed polymorphically, i.e., used at many types
•Doesn’t require copying of let expressions
•Requires clear distinction between polymorphic types and non-
polymorphic types...
Stephen Chong, Harvard University 15
Type Safety
•“Well typed programs do not go wrong.”
– Robin Milner, 1978
•Note: this is a very strong property.
•Well-typed programs cannot “go wrong” by trying to execute
undefined code (such as 3 + (fun x -> 2))
•Simply-typed lambda calculus is guaranteed to terminate! (i.e. it
isn't Turing complete)
•Depending on language, will not rule out all possible
undefined behavior
•E.g., 3/0, *NULL, ...
•More sophisticated type systems can rule out more kinds of
possible runtime errors
Stephen Chong, Harvard University 16
Judgements and Inference Rules

•We saw type checking algorithm in code


•Can express type-checking rules compactly and
clearly using a type judgment and inference
rules

Stephen Chong, Harvard University 17


Type Judgments
•In the judgment: E ⊢ e : t
•E is a typing environment or a type context
•E maps variables to types. It is just a set of bindings of the form:
x1 : t1, x2 : t2, …, xn : tn
•If E ⊢ e : t then expression e has type t under typing environment E
•E ⊢ e : t can be thought of as a set or relation
•For example:
x : int, b : bool ⊢ if (b) 3 else x : int
•What do we need to know to decide whether “if (b) 3 else x” has
type int in the environment x : int, b : bool?
•b must be a bool i.e. x : int, b : bool ⊢ b : bool
•3 must be an int i.e. x : int, b : bool ⊢ 3 : int
•x must be an int i.e. x : int, b : bool ⊢ x : int
Stephen Chong, Harvard University 18
Recall Inference Rules
Premises
e1⇓ fun x -> e e2⇓ v e{v/x} ⇓ w
Axiom
i⇓i e1 e2 ⇓ w
Conclusion

•Inference rule
•If the premises are true, then the conclusion is true
•An axiom is a rule with no premises
•Inference rules can be instantiated by replacing
metavariables (e, e1, e2, x, i, ...) with expressions, program
variables, integers, as appropriate.
Stephen Chong, Harvard University 19
Why Inference Rules?
•Compact, precise way of specifying language properties.
•E.g. ~20 pages for full Java vs. 100’s of pages of prose Java Language Spec.
•Inference rules correspond closely to the recursive AST traversal that
implements them
•Type checking (and type inference) is nothing more than attempting to prove
a different judgment ( E ⊢ e : t ) by searching backwards through the rules.
•Compiling in a context is nothing more than a collection of inference rules
specifying yet a different judgment ( E ⊢ src ⇒ target )
•Moreover, the compilation rules are very similar in structure to the typechecking rules
•Strong mathematical foundations
•The “Curry-Howard correspondence”: Programming Language ~ Logic,
Program ~ Proof, Type ~ Proposition
•See CS152 if you’re interested in type systems!

Stephen Chong, Harvard University 20


Simply-typed Lambda Calculus
INT VAR ADD

x :T ∈ E E ⊢ e1 : int E ⊢ e2 : int

E ⊢ i : int E ⊢ x :T E ⊢ e1 + e2 : int

FUN APP

E, x : T ⊢ e : S E ⊢ e1 : T -> S E ⊢ e2 : T

E ⊢ fun (x:T) -> e : T -> S E ⊢ e1 e2 : S

•Note how these rules correspond to the code.


Stephen Chong, Harvard University 21
Type Checking Derivations

•A derivation or proof tree is a tree where nodes are


instantiations of inference rules and edges connect a
premise to a conclusion
•Leaves of the tree are axioms (i.e. rules with no
premises)
•Goal of the typechecker: verify that such a tree exists.
•Example: Find a tree for the following program using
the inference rules on the previous slide:
⊢ (fun (x:int) -> x + 3) 5 : int

Stephen Chong, Harvard University 22


Example Derivation Tree
x : int ∈ x : int
VAR INT
x : int ⊢ x : int x : int ⊢ 3 : int
ADD
x : int ⊢ x + 3 : int
FUN INT
⊢ (fun (x:int) -> x + 3) : int -> int ⊢ 5 : int
APP
⊢ (fun (x:int) -> x + 3) 5 : int

INT VAR ADD


x :T ∈ E E ⊢ e1 : int E ⊢ e2 : int
E ⊢ i : int E ⊢ x :T E ⊢ e1 + e2 : int
FUN APP

E, x : T ⊢ e : S E ⊢ e1 : T -> S E ⊢ e2 : T
E ⊢ fun (x:T) -> e : T -> S E ⊢ e1 e2 : S
Stephen Chong, Harvard University 23
Example Derivation Tree
x : int ∈ x : int
VAR INT
x : int ⊢ x : int x : int ⊢ 3 : int
ADD
x : int ⊢ x + 3 : int
FUN INT
⊢ (fun (x:int) -> x + 3) : int -> int ⊢ 5 : int
APP
⊢ (fun (x:int) -> x + 3) 5 : int

•Note: the OCaml function typecheck verifies the existence of this tree. The
structure of the recursive calls when running tc is same shape as this tree!
•Note that x : int ∈ E is implemented by the function lookup

Stephen Chong, Harvard University 24


Type Safety Revisited

Theorem: (simply typed lambda calculus with integers)

If ⊢ e : t then there exists a value v such that e ⇓ v .

Stephen Chong, Harvard University 25


Arrays
•Array constructs are not hard
•First: add a new type constructor: T[]
e1 is the size of the newly
NEW E ⊢ e1 : int E ⊢ e2 : T allocated array. e2
initializes the elements of
E ⊢ new T[e1](e2) : T[] the array.

INDEX
Note: These rules don’t
E ⊢ e1 : T[] E ⊢ e2 : int ensure that the array index is
E ⊢ e1[e2] : T in bounds – that should be
checked dynamically.

UPDATE
E ⊢ e1 : T[] E ⊢ e2 : int E ⊢ e3 : T
E ⊢ e1[e2] = e3 ok
Stephen Chong, Harvard University 26
Tuples

•ML-style tuples with statically known number of


products
•First: add a new type constructor: T1 * … * Tn
TUPLE
E ⊢ e1 : T1 … E ⊢ en : Tn
E ⊢ (e1, …, en) : T1 * … * Tn

PROJ
E ⊢ e : T1 * … * Tn 1 ≤ i ≤ n
E ⊢ #i e : Ti

Stephen Chong, Harvard University 27


References

•ML-style references (note that ML uses only expressions)


•First, add a new type constructor: T ref
REF
E ⊢ e :T
E ⊢ ref e : T ref

DEREF
E ⊢ e : T ref
E ⊢ !e : T
Note the similarity with the rules
ASSIGN for arrays…
E ⊢ e1 : T ref E ⊢ e2 : T
E ⊢ e1 := e2 : unit
Stephen Chong, Harvard University 28
Oat Type Checking

•For HW5 we will add typechecking to Oat


•And some other features
•Some of Oat’s features
•Imperative (update variables, like references)
•Distinction between statements and expressions
•More complicated control flow
• Return
• While, For, ...
•What does a type system look like for Oat?
Stephen Chong, Harvard University 29
Some Oat Judgments
•Split environment E into Globals and Locals
•Expression e has type t under context G;L
•G; L ⊢ e : t
•Statement s is well typed under context G;L. If it returns, it
returns a value of type rt. After s, the local context is L’.
•G; L; rt ⊢ s L’

•Where does G come from?


•Program is a list of global variable declarations and function
declarations
•Use judgment to gather up global variable declarations
•⊢g prog G
Stephen Chong, Harvard University 30
Example Derivation
var x1 = 0;
var x2 = x1 + x1;
x1 = x1 – x2;
return(x1);

Stephen Chong, Harvard University 31


Example Derivation
var x1 = 0;
var x2 = x1 + x1;
x1 = x1 – x2;
return(x1);

Stephen Chong, Harvard University 32


Example Derivation
var x1 = 0;
var x2 = x1 + x1;
x1 = x1 – x2;
return(x1);

Stephen Chong, Harvard University 33


Type Safety For General Languages
Theorem: (Type Safety)

If P is a well-typed program, then either:


(a) the program terminates in a well-defined way, or
(b) the program continues computing forever

•Well-defined termination could include:


•halting with a return value
•raising an exception
•Type safety rules out undefined behaviors:
•abusing “unsafe” casts: converting pointers to integers, etc.
•treating non-code values as code (and vice-versa)
•breaking the type abstractions of the language
•What is “defined” depends on the language semantics…
Stephen Chong, Harvard University 34
Compilation As Translating Judgments

•Consider the source typing judgment for source expressions:


C⊢e:t

•How do we interpret this information in the target language?


⟦C ⊢ e : t⟧ = ?
•⟦C⟧ translates contexts
•⟦t⟧ is a target type
•⟦e⟧ translates to a (potentially empty) stream of instructions, that,
when run, computes the result into some operand

•INVARIANT: if ⟦C ⊢ e : t ⟧ = ty, operand, stream


then the type (at the target level) of the operand is ty=⟦t⟧
Stephen Chong, Harvard University 35
Example

• C ⊢ 37 + 5 : int
•What is ⟦ C ⊢ 37 + 5 : int⟧ ?

⟦C ⊢ 37 : int⟧ = (i64, Const 37, []) ⟦C ⊢ 5 : int⟧ = (i64, Const 5, [])

⟦C ⊢ 37 + 5 : int⟧ = (i64, %tmp, [%tmp = add i64 (Const 37) (Const 5)])

Stephen Chong, Harvard University 36


What about the Context?
•What is ⟦C⟧?
•Source level C has bindings like: x:int, y:bool
•We think of it as a finite map from identifiers to types
•What is the interpretation of C at the target level?
•⟦C⟧ maps source identifiers, “x”, to target types and ⟦x⟧
•What is the interpretation of a variable ⟦x⟧ at the target level?
•How are the variables used in the type system?

x:t2L x : t 2 L G; L ` exp : t
typ var typ assn
<latexit sha1_base64="K5iBfmcfG0Ps8F071fnXZSt8v90=">AAACHnicbVBNS8NAFNzUrxq/qh69LBbBU0m0oOil6EEPHhSsCk0om+3GLt1swu5LaQn5I578KZ4EBfHgRf+N25qDtg4sDDNvePsmSATX4DhfVmlmdm5+obxoLy2vrK5V1jdudJwqypo0FrG6C4hmgkvWBA6C3SWKkSgQ7DbonY782z5TmsfyGoYJ8yNyL3nIKQEjtSt1L1SEZgN8hAF7XOKLPDs7vvD6HaK7eCznHrABaJqZvNfuE5W3K1Wn5oyBp4lbkCoqcNmufHidmKYRk0AF0brlOgn4GVHAqWC57aWaJYT2yD1rGSpJxLSfja/L8Y5ROjiMlXkS8Fj9nchIpPUwCsxkRKCrJ72R+J/XSiE89DMukxSYpD+LwlRgiPGoKtzhilEQQ0MIVdz8FdMuMXWBKdS2TQvu5M3T5Gav5u7X9q7q1cZJ0UcZbaFttItcdIAa6Bxdoiai6AE9oRf0aj1az9ab9f4zWrKKzCb6A+vzG3nxoeo=</latexit>
G; L ` x : t <latexit sha1_base64="njp3eviXzds4eDbNDhv3Lt/Bfc4=">AAACS3icbZDPaxNBFMdno9U2/or26OVhEDyF3SooBqHooT3kUItpC5kQ3s7OJkNnZ9eZtzVh2b/PkydP/hE9CRXEQ2fTPWjrg4Evn+9782a+caGVozD8EXRu3d64c3dzq3vv/oOHj3qPnxy5vLRCjkWuc3sSo5NaGTkmRVqeFFZiFmt5HJ9+aPzjM2mdys0nWhVymuHcqFQJJI9mPeSpRVEt4S0QcGVgBPxziQnsDUf8LEG3ALksGreuPBpaaukS3jXOEPihmi8Irc2/wKjmJJfkROV38Rk6Z+pZrx8OwnXBTRG1os/aOpj1vvMkF2UmDQntr5hEYUHTCi0poWXd5aWTBYpTnMuJlwYz6abVOooannuSQJpbfwzBmv49UWHm3CqLfWeGtHDXvQb+z5uUlL6ZVsoUJUkjrhalpQbKockVEmWlIL3yAoVV/q0gFuizJZ9+t+tTiK7/+aY42hlELwc7H1/1d9+3eWyyp+wZe8Ei9prtsn12wMZMsK/snF2wX8G34GfwO/hz1doJ2plt9k91Ni4BKJOyRA==</latexit>
G; L; rt ` x = exp; ) L
as expressions as addresses
(which denote values) (which can be assigned)

Stephen Chong, Harvard University 37


Interpretation of Contexts
•⟦C⟧ = a map from source identifiers to types and target identifiers
•INVARIANT:
x:t ∈ C means that

(1) lookup ⟦C⟧ x = (⟦t⟧*, %id_x)


(2) the (target) type of %id_x is ⟦t⟧* (a pointer to ⟦t⟧)

Stephen Chong, Harvard University 38


Interpretation of Variables
•Establish invariant for expressions:
= (%tmp, [%tmp = load i64* %id_x])
x:t2L
typ var
<latexit sha1_base64="K5iBfmcfG0Ps8F071fnXZSt8v90=">AAACHnicbVBNS8NAFNzUrxq/qh69LBbBU0m0oOil6EEPHhSsCk0om+3GLt1swu5LaQn5I578KZ4EBfHgRf+N25qDtg4sDDNvePsmSATX4DhfVmlmdm5+obxoLy2vrK5V1jdudJwqypo0FrG6C4hmgkvWBA6C3SWKkSgQ7DbonY782z5TmsfyGoYJ8yNyL3nIKQEjtSt1L1SEZgN8hAF7XOKLPDs7vvD6HaK7eCznHrABaJqZvNfuE5W3K1Wn5oyBp4lbkCoqcNmufHidmKYRk0AF0brlOgn4GVHAqWC57aWaJYT2yD1rGSpJxLSfja/L8Y5ROjiMlXkS8Fj9nchIpPUwCsxkRKCrJ72R+J/XSiE89DMukxSYpD+LwlRgiPGoKtzhilEQQ0MIVdz8FdMuMXWBKdS2TQvu5M3T5Gav5u7X9q7q1cZJ0UcZbaFttItcdIAa6Bxdoiai6AE9oRf0aj1az9ab9f4zWrKKzCb6A+vzG3nxoeo=</latexit>
G; L ` x : t where (i64, %id_x) = lookup ⟦L⟧ x
as expressions
(which denote values)

•What about statements?


= stream @ [store ⟦t⟧ opn, ⟦t⟧* %id_x]
x : t 2 L G; L ` exp : t
typ assn
<latexit sha1_base64="njp3eviXzds4eDbNDhv3Lt/Bfc4=">AAACS3icbZDPaxNBFMdno9U2/or26OVhEDyF3SooBqHooT3kUItpC5kQ3s7OJkNnZ9eZtzVh2b/PkydP/hE9CRXEQ2fTPWjrg4Evn+9782a+caGVozD8EXRu3d64c3dzq3vv/oOHj3qPnxy5vLRCjkWuc3sSo5NaGTkmRVqeFFZiFmt5HJ9+aPzjM2mdys0nWhVymuHcqFQJJI9mPeSpRVEt4S0QcGVgBPxziQnsDUf8LEG3ALksGreuPBpaaukS3jXOEPihmi8Irc2/wKjmJJfkROV38Rk6Z+pZrx8OwnXBTRG1os/aOpj1vvMkF2UmDQntr5hEYUHTCi0poWXd5aWTBYpTnMuJlwYz6abVOooannuSQJpbfwzBmv49UWHm3CqLfWeGtHDXvQb+z5uUlL6ZVsoUJUkjrhalpQbKockVEmWlIL3yAoVV/q0gFuizJZ9+t+tTiK7/+aY42hlELwc7H1/1d9+3eWyyp+wZe8Ei9prtsn12wMZMsK/snF2wX8G34GfwO/hz1doJ2plt9k91Ni4BKJOyRA==</latexit>
G; L; rt ` x = exp; ) L where (⟦t⟧, %id_x) = lookup ⟦L⟧ x
as addresses and ⟦G;L ⊢ exp : t⟧ = (⟦t⟧, opn, stream)
(which can be assigned)

Stephen Chong, Harvard University 39

You might also like