- Part 1 - Introduction
- Part 2 - Type Systems
- Part 3 - Introduction to the Code
-
Part 4 - Code Walk-Through
- Parser
- Resolver
-
Type Checker
- Type Check: If Expression
- Type Check: Do Expression
- Type Check: Lists
- Type Checking: Progress So Far
- Generating Fresh Type Variables
- Occurs Check - Preventing Infinite Types
- The Substitution Map
- Binding Type Variables - Updating the Substitution Map
- Finding Free Type Variables
- Instantiating Polymorphic Types
- Type Checking: Another Progress Check
- Generalise
- Unification
- đ§© Type Check: Define Expression
- đ§© Type Check: Function Call
- đ§© Type Check: Let Expression
- đ§© Type Check: Lambda Expression
- đ§© Type Check: Atom Expression
- Limitations & Future Work
- Type Checking: Summary
- Lowering
- Evaluating
- REPL
- Conclusion
Part 1 - Introduction
In this post, youâll learn how to build a lightweight HindleyâMilner type checker in Haskell. No advanced theory is required. Weâll apply it to a tiny, LISP-inspired language so you can focus on how inference works.
Hindley-Milner inference may seem intimidating, but I believe that it is much more approachable than it first appears. Each concept is quite understandable. It is just a matter of working through them and building up to the full picture.
Hopefully, youâll find this post useful if you want to implement a type system of your own, or if you wish to understand how Hindley-Milner works.
Structure of this Document
This document is structured as follows:
- Part 1: Introduction and overview.
- Part 2: Hindley-Milner type system introduction.
- Part 3: Introduction to the code.
- Part 4: Code walk-through.
- Conclusion.
The code is available on GitHub.
Conventions
- đ is used to introduce a new term or concept.
- đĄ introduces a tip or reminder.
- đ§© starts a block that goes into more detail about a specific topic.
Scope
The explanation and code focus on a practical introduction to the Hindley-Milner type system. This is not meant to be production grade, no consideration given to performance or optimisations for real-world languages.
This post is about implementing a Hindley-Milner type system in Haskell, not about LISP itself. Many of the features that make LISP interesting are not implemented or discussed in this context. It is just the S-Expression syntax being used.
I believe that itâs often easier to start with a lightweight practical approach, and then decide whether you want to dive into the theory. I hope that this post will give you enough of a practical understanding of Hindley-Milner type systems that you can then go on to learn more about the theory if you want to.
It should demonstrate that implementing the basics for a small DSL is entirely possible, and not as hard as it may seem at first glance.
Why LISP?
Iâm using a LISP-like language because it is a small language. Using a small subset of LISP means that we can focus on the type system, rather than language itself.
To keep the language minimal, Iâm not implementing many of the features of LISP. There are no macros, quoting etc.
Despite this simplicity it is still enough to demonstrate how the type system works. It could also be a starting point for a more complete language if you want to extend it.
If you are interested in seeing a fully-fledged statically typed LISP, take a look at Typed Racket.
For the remainder of this post, I will refer to the language being implemented as âLISPâ.
Quick Lisp Primer
LISP has a uniform, minimal syntax:
- Code and data are written as parenthesised lists.
- Prefix notation is used:
(func arg1 arg2 ...)
. - Functions are first-class values.
(+ 10 12)
results in22
.(prn "Hello, World!")
- same as
print("Hello, World!")
in Python - or
putStrLn "Hello, World!"
in Haskell
- same as
There are many good resources to learn LISP, so I wonât go into more detail here.
â ïž You do not need to learn LISP to understand this post. The basic concepts of LISP are simple and can be grasped quickly from the context of the code examples.
Part 2 - Type Systems
If you are reading this, you are probably already familiar with types and why you might want them, so I wonât cover that here. Instead, weâll dive straight into how HindleyâMilner inference works.
In short, for the LISP being implemented, we want to ensure that user code is well-typed.
(+ 10 12)
is well-typed.(+ 10 "Hello")
is not.
> (+ 10 "Hello")
mismatch
Type Error: Unification
expected: Int, but found: String.1: (+ 10 "Hello")
^
| +----------
You may find that switching between the theory (Part 2) and the code (Part 3) helps you understand the concepts better.
Hindley-Milner Type System
From Wikipedia:
A classical type system for the lambda calculus with parametric polymorphism. It deduces types automatically across entire modules, not just local expressions, making it the backbone of ML-style languages. - Wikipedia
The Type Checker
The type checker being discussed here implements the Hindley-Milner type inference algorithm.
What Hindley-Milner Gives You:
- Type inference â no annotations needed (but supported).
- Unification â the workhorse that solves equality between types.
- Let-generalisation â automatic polymorphism from monomorphic type bindings.
- Principal types â you get the most general type, so your functions remain reusable.
Unification
đ Unification is the algorithm that, given two type expressions,
- Determines whether they can be made equal
- And if so, constructs the most general substitution for type variables that makes the expressions identical.
For example, calling increment on an integer:
- Suppose
inc
is a function with typeInt -> Int
. - When you write
(inc 1)
:- The type checker sees that
inc
expects an argument of typeInt
. - It sees that
1
is of typeInt
. - It tries to unify the argument type of
inc
(Int
) with the type of1
(Int
). - Since both are
Int
, the unification succeeds, and the expression type checks.
- The type checker sees that
Calling increment on a string:
- If you try
(inc "hello")
- The checker sees that
inc
expectsInt
. - It sees
"hello"
is aString
. - Unification of
Int
withString
fails. - A type error is reported.
- The checker sees that
đ A type variable is a placeholder that stands for any type. E.g. a
or U0
.
đ A substitution map tracks which type variables should be replaced by other types or variables, enabling recursive resolution to the final, most specific type during type inference.
- The substitution map maps a type variable name (like
U0
,U1
) to another type variable or concrete type. - This can be type variable to type e.g.Â
U0 -> Int
, or one type variable to another, e.g.ÂU1 -> U0
.
During unification, the algorithm uses the substitution map to replace type variables with their mapped types.
(recursively resolving chains like U1 -> U0 -> Int
).
- Lookup
U1
=>U0
- Lookup
U0
=>Int
- Result:
Int
â ïž The code will prevent infinite types by checking that a type variable never appears within the structure itâs being unified with. See Infinite Types for more details.
đ§© Longer unification Example
list 1)) (concat3 () () (
- Assume
concat3
isconcat3 :: [a] -> [a] -> [a] -> [a]
. - That is, it takes three lists of the same type and returns a concatenated list of that type.
(list 1)
:list
creates a list, so this function creates a list containing the integer1
- In this LISP implementation, lists are homogeneous, meaning all elements must be of the same type.
(concat3; arg1
() ; arg2
() list 1) ; arg3
( )
Type checking:
- arg1:
() :: [U1]
-
The first
()
is inferred as an empty list with element typeU1
i.e. type[U1]
, whereU1
is a fresh type variable.
- arg1:
- arg2:
() :: [U2]
-
The second
()
is also inferred as an empty list of type[U2]
, whereU2
is a another fresh type variable.
- arg2:
- arg3:
(list 1) :: [Int]
-
arg3: The third argument
(list 1)
is type-checked and inferred as a list ofInt
, i.e.Â[Int]
.
- arg3:
đĄ A fresh type variable is a new unique type variable that has not been used before in the current type environment.
(concat3; :: [U1]
() ; :: [U2]
() list 1) ; :: [Int]
( )
Unification:
- Homogeneous list
-
All the arguments must be of the same type, so we need to unify
U1
,U2
, andInt
.
- Unify
U1 ~ U2
-
This gives us
U1 = U2
because the types match structurally (They are both fresh type variables and can unify with any type).
- Unify
U1 ~ Int
-
Next unify
U1
withInt
, yieldingU1 = Int
. -
Because
U2
was already equated toU1
, it too becomesInt
(through hierarchical lookup).
- Unify with the function type
-
concat3
has type[a] -> [a] -> [a] -> [a]
. -
Unify each argumentâs inferred type (
[U1]
,[U2]
,[Int]
) with[a]
, which after substitution becomes[Int]
for all three. -
In other words, solve
[a] ~ [U1], [a] ~ [U2], [a] ~ [Int]
-
After applying our substitutions (
U1 = Int
,U2 = Int
) -
the signature specializes to
[Int] -> [Int] -> [Int] -> [Int]
. - Everything matches
; :: [Int] -> [Int] -> [Int] -> [Int]
(concat3 ; :: [Int]
() ; :: [Int]
() list 1) ; :: [Int]
(; :: [Int] )
â
And so the unification and type checking succeeds, resulting in the final type of the expression being [Int]
.
Generalisation
đ Generalisation in Hindley-Milner, is the process of turning a monomorphic type into a polymorphic type (forall type) by quantifying its free type variables when binding a value with let.
đ The principal type is sometimes informally called the final inferred type, but technically it refers to the most general type from which all others can be derived by substitution.
Generalisation is the key to Hindley-Milnerâs polymorphism. The type system can determine when a type can be made polymorphic.
If you write a function like (λ (x) x)
- In the unification step, the type checker will infer this to have the type
U0 -> U0
, whereU0
is a fresh type variable. - Generalisation will then turn this into a polymorphic type
â U0. U0 -> U0
, meaning it can work with any type. - The
forall
quantifierâ U0
indicates thatU0
can be any type, making the function polymorphic.
Even though the example is trivial, this mechanism underlies powerful polymorphism in real-world languages like ML and Haskell.
In our checker, we generalise at let-bindings and top-level defines.
đĄ It may seem as though U0 -> U0
should âobviouslyâ be polymorphic. But the type system initially treats U0 -> U0
as a monotype.
Meaning U0
is a single unknown type, not a stand-in for any type.
- Type inference starts with monotypes by default, because the type checker doesnât yet know which type variables are local to the expression and which ones can be safely generalised.
- This is the safest default: assume nothing is polymorphic until proven otherwise.
- And it is why generalisation only happens at
let
(and define) bindings. It is the point where the type checker can safely say that a value is fully known and can be reused.
Instantiation
đ Instantiation: Replaces all quantified variables in a polymorphic type with fresh type variables.
đ Instantiation always produces a monotype.
- It operates on a polymorphic type (i.e., one with a
forall
quantifier) and replaces each quantified variable with a fresh type variable. - For example, the polytype
â a b. a -> b -> String
is instantiated to the monotypeU0 -> U1 -> String
, whereU0
andU1
are fresh type variables. - This ensures that each use of a polymorphic value is independent in type inference.
- Two type variables cannot be assumed equal just because they share a name in different contexts.
Instantiation guarantees that each use of a polymorphic value has its own fresh type variables, preventing unwanted constraints from leaking between uses.
đ We instantiate every polymorphic type at every variable use. For example, when you refer to a topâlevel identity function or a letâbound polymorphic value.
Hindley-Milnerâs power comes from its simple, elegant structure. Infer monotypes, generalise at let-bindings, instantiate at use.
đ§© Longer Generalisation Example
let ((identity (λ (z) z)))
(let ( (x
(identity 10))
(
(yidentity "Hello"))
(
)
(prn y)
) )
Without generalisation, a single monomorphic binding of identity would lock its type to the first use. This would make the second call fail.
Here is a look at how generalisation solves this problem.
- We have a nested
let
expression. - The outer
let
binds anidentity
function. - The inner
let
binds two variablesx
andy
, both using theidentity
function.
Using what we know from the previous section, we can infer the types step by step.
let ((identity (λ (z) z))) ; identity :: U0 -> U0
(let ( (x
(identity 10)) ; 10 :: Int, identity :: Int -> Int
(
(yidentity "Hello")) ; "Hello" :: String, ...
(
)
(prn y)
) )
- â While unifying
x
the type checker sees thatidentity
must beInt -> Int
because10
is anInt
. - â When checking
(identity "Hello")
, unification attemptsInt ~ String
and fails.
That is not going to work. This is exactly what generalisation is for.
Instead:
let ((identity (λ (z) z))) ; identity :: â U0. U0 -> U0
(let ( (x
(identity 10)) ; 10 :: Int, identity :: Int -> Int
(
(yidentity "Hello")) ; "Hello" :: String, identity :: String -> String
(
)
(prn y)
) )
- â
The type checker generalises
identity
toâ U0. U0 -> U0
. - â
Now
identity
can be used with any type, so it can be applied to both10
and"Hello"
without issue.
Infinite Types
đ Infinite types are types that are defined in terms of themselves, leading to recursive type definitions that cannot be resolved.
The HindleyâMilner type checker detects infinite types, using the occurs check and rejects them. Without an occurs check, you could write nonsensical types that make no sense (lead to infinite recursion).
For example:
let ((x (list x)))
( x)
- Infer: In
(list x)
, ifx :: U0
, thenlist x :: [U0]
. - Unify: Attempt to solve
U0 = [U0]
. - Detect: Because
U0
occurs within[U0]
, unification would recurse indefinitely.
Summary
So far you have seen what to expect from a Hindley-Milner type system, and youâve been introduced to
- Unification.
- The substitution map.
- Generalisation.
- Instantiation.
- Infinite types and the occurs check.
Part 3 - Introduction to the Code
With an introduction to the theory handled, letâs look at the code to implement the type system in Haskell.
Implementation Overview
As a reminder, here is what this LISP implementation handles:
- Data types:
Int
,Bool
,String
,nil
, and lists of these types. - A small set of special forms:
lambda
,let
,define
,if
,function application
. - A small set of primitive functions like
+
,-
,*
,and
,or
, etc. - A tiny standard library.
let
for defining local variables.define
for defining global variables and functions.
Compiler High Level Flow
The LISP implementation is quite shallow as discussed, but Iâve also tried to keep it relatively broad. It could have been only a parser, the type checker and a REPL. This would miss some interesting parts of actually implementing a type system, so Iâve implemented a few more real-world aspects of the pipeline.
Instead, it looks like this:
- Parsing: LispParser.hs
- Converts source code into a simple Abstract Syntax Tree (AST).
- No special form recognition. Just simple s-expressions
- Resolver: Resolver.hs
- Recognizes special forms
- Create a more structured AST.
- Type Checking: TypeChecker.hs
- Implements the Hindley-Milner type inference algorithm.
- Handles unification, generalisation, and instantiation.
- Type checks the resolved AST.
- Produces a typed AST with type annotations.
- Lowering: Eval/Lower.hs
- Converts the typed AST into a lower-level representation.
- In this implementation, its effectively just type erasure
- Evaluation: Eval/Evaluator.hs
- Evaluates the lowered AST.
- Implements a simple interpreter for the LISP language.
Finally, a REPL (Repl.hs) that ties everything together, allowing you to interactively enter LISP code and see the results.
In Part 4 Iâll walk through each of these components in more detail, explaining how they work and fit together.
Code Conventions
- Iâve tried to keep the Haskell relatively simple, but what counts as simple is very subjective.
- I have a preference for qualified imports, so youâll see many
import xxx as qualified yyy
statements. - I am using a custom prelude (
Verset
) which I prefer because it also encourages qualified imports (not surprisingly, as I created it). Switching toprotolude
or even the standard prelude should be easy enough. - Iâve used GHC 12.2 but it should work with 9.8 and later.
- Iâve tried to add a reasonable amount of code comments. Hopefully youâll be able to follow the code without too much trouble.
pass
=pure ()
- Record fields are made strict using
!
. - Iâm using overloaded-record-dot syntax, so you can use
x.field
instead offield x
for record field access. - Rather than using multiple
'
(prime) suffixes, Iâll use1
,2
, etc. Soname1 => name2 => name3
instead ofname => name' => name''
.
Monad-Transformers
â ïžIf you are not familiar with monad transformers, do not let this discourage or distract you.
The type-checker runs in StateT TcState (Except TypeError)
.
Think of it as mutable state and early exit in a pure setting. Everything inside is just modify'
for state and throwE
for errors.
You can safely ignore the plumbing and focus on the core logic.
You could remove the monad transformer entirely, e.g.
- Run entirely in IO, use a TVar for state and throwing actual exceptions for errors.
- Then the type signature would look something like IO (TypeEnv, TypedLispVal)
.
Part 4 - Code Walk-Through
Parser
The code for the parser is in LispParser.hs.
data ParsedLispVal
= PlNil !Pos
| PlAtom !Pos Text
| PlInt !Pos Int
| PlString !Pos Text
| PlBool !Pos Bool
| PlList !Pos [ParsedLispVal]
deriving (Show, Eq)
The parser is intentionally minimal. It only generates low-level s-expressions. No form recognition is done here.
For example, the following LISP code:
if #t 1 2) (
Would be parsed into the following AST (position elided):
PlList (Pos ...) [
PlAtom (Pos ...) "if",
PlBool (Pos ...) True,
PlInt (Pos ...) 1,
PlInt (Pos ...) 2
]
At this stage, there is no difference between special forms, lists, definitions, or function applications. That is all handled in the resolver.
There are tradeoffs here. A simple parser is easy to implement and understand. You might keep the parser simple like this even in a much more complete LISP implementation.
The simpler parser AST may be the homoiconic AST that youâd want the user to see and manipulate (e.g. for macros). But you might alternatively opt to recognise more forms in the parser itself.
Errors
Even a basic language can benefit from readable error messages.
By tagging each ParsedLispVal
with a Pos
we can track the exact line and column that the node was parsed from.
getSourcePos
from megaparsec makes this easy.
atPos :: LispParser Pos
= do
atPos <- M.getSourcePos
p pure $ Pos (M.unPos $ M.sourceLine p) (M.unPos $ M.sourceColumn p)
This allows us to report errors nicely since we know the line and column of the node:
Type Error: Unbound variable `z`1: (let
2: ((x 1)
3: (y z))
^
|
+----------4: #t)
The various error types (for the parser, resolver, type checker and evaluator) all implement the LispError
type class.
class LispError a where
showLispError :: a -> (Maybe Pos, Text, Text, Text) -- (position, type, name, error message)
This allows us to format errors in a consistent way, regardless of the exact error type.
The formatting of errors is done in Printer/PrintError.hs
using the prettyprinter
and prettyprinter-ansi-terminal libraries.
Resolver
The parser produces raw lists, but we want a structured AST to work with. The resolver walks each ParsedLispVal
and generates structured ResolvedLispVal
nodes.
The code for the resolver is in Resolver.hs.
data ResolvedLispVal
= RlAtom !Pos Text
| RlBool !Pos Bool
| RlDefine !Pos !Text !ResolvedLispVal
| RlDo !Pos ![ResolvedLispVal]
| RlFuncCall !Pos !ResolvedLispVal ![ResolvedLispVal]
| RlIf !Pos !ResolvedLispVal !ResolvedLispVal !ResolvedLispVal
| RlInt !Pos Int
| RlLambda !Pos ![(Pos, Text)] ![ResolvedLispVal]
| RlLet !Pos !L.LetStyle ![((Pos, Text), (Pos, ResolvedLispVal))] ![ResolvedLispVal]
| RlList !Pos [ResolvedLispVal]
| RlNil !Pos
| RlString !Pos Text
deriving (Show, Eq)
- The resolver is responsible for creating a more structured AST from the parsed s-expressions.
- It recognizes special forms, and converts the parsed s-expressions into AST nodes that are easier to work with in the type checker and evaluator.
- E.g. it is much simpler to work with a
RlIf
than just aPlList
with anif
atom and three arguments that you would then need to check each time.
The main resolver function is resolveImpl
, and it looks like this:
resolveImpl :: Bool -> P.ParsedLispVal -> Either ResolverError ResolvedLispVal
=
resolveImpl isTopLevel lv case lv of
-- No change
P.PlNil p -> Right $ RlNil p
P.PlInt p v -> Right $ RlInt p v
P.PlString p v -> Right $ RlString p v
P.PlBool p v -> Right $ RlBool p v
P.PlAtom p v -> Right $ RlAtom p v
P.PlList p allVs -> do
case allVs of
: vs) ->
(hcase h of
P.PlAtom _ "let" -> resolveLet p L.LetParallel vs
P.PlAtom _ "lambda" -> resolveLambda p vs
P.PlAtom _ "λ" -> resolveLambda p vs
P.PlAtom _ "if" -> resolveIf p vs
P.PlAtom _ "define" ->
if isTopLevel
then resolveDefine p vs
else Left . ReResolverError (Just p) $ "define can only be used at top level"
-- (list 1 2 3) => (1 2 3)
P.PlAtom _ "list" -> do
<- traverse (resolveImpl False) vs
vs' Right $ RlList p vs'
P.PlAtom _ "do" -> do
<- traverse (resolveImpl False) vs
vs' Right $ RlDo p vs'
-- Everything else is a function call
-> do
_ <- resolveImpl False h
f <- traverse (resolveImpl False) vs
args Right $ RlFuncCall p f args
-> do
[] -- Empty list is empty list not nil
Right $ RlList p []
A few things to note here:
- It recognizes the
let
,lambda
,if
,define
, anddo
, special forms as well as function application. - The resolver is recursive, it walks the parsed AST and resolves each node recursively.
- We need to track whether we are at the top-level or not, because
define
can only be used at the top-level. - Lambda can be written as
lambda
orλ
. - There is no type-checking here.
- Unlike many (most?) LISP implementations we donât substitute
nil
for empty lists or vice versa.
Letâs look at resolveDefine
to see how that works.
resolveDefine :: Pos -> [P.ParsedLispVal] -> Either ResolverError ResolvedLispVal
= do
resolveDefine pos vs' let err = "expected define in form (define name expr)\n"
case vs' of
-> do
[name1, val1] <- as' asAtom "atom" name1 $ Just (err <> "expected atom for name of define")
name2 <- resolveImpl False val1
val2 pure $ RlDefine pos name2 val2
->
_ Left . ReResolverError (Just pos) $ err <> "Invalid number of arguments for define"
asAtom :: P.ParsedLispVal -> Maybe Text
P.PlAtom _ a) = Just a
asAtom (= Nothing
asAtom _
as' :: (P.ParsedLispVal -> Maybe a) -> Text -> P.ParsedLispVal -> Maybe Text -> Either ResolverError a
=
as' f n v e' case f v of
Just v' -> Right v'
Nothing ->
let e = maybe "" ("\n" <>) e' in
Left $ ReResolverError (Just $ P.getPos v) $ "Expected " <> n <> ", but got " <> P.nameOf v <> e
The resolve functions are all fairly similar. They need to check the s-expression and see if it matches the expected form.
- The various
asXXX
functions are used to match the expected type of the node. - The
as'
function wraps anasXXX
call, and converts a mismatch (Nothing
) into an error (ResolverError
). - The
as'
function return anEither
type. Usingas'
means that an error will short-circuit the resolution and return immediately. P.nameOf
is thenameOf
function imported from theLispParser
module. It gets a human-readable name for the node.let
only supports parallel bindings. No sequential or recursive bindings here.
With every ParsedLispVal
now a ResolvedLispVal
, complete with structure and positions, weâre ready to feed these into our type checker.
Type Checker
The type checker traverses the ResolvedLispVal
tree, threading an environment and substitution state through unification, generalisation, and instantiation to produce a fully typed AST.
The code for the type checker is in TypeChecker.hs.
data TypedLispVal
= TvAtom !Pos !L.LispType !Text
| TvBool !Pos !Bool
| TvDefine !Pos !L.PolyType !Text !TypedLispVal
| TvDo !Pos !L.LispType ![TypedLispVal]
| TvFuncCall !Pos !L.LispType !TypedLispVal ![TypedLispVal] -- ^ TvFuncCall pos returnType functionVal argVals
| TvIf !Pos !L.LispType !TypedLispVal !TypedLispVal !TypedLispVal
| TvInt !Pos !Int
| TvLambda !Pos !L.LispType ![(Pos, Text)] ![TypedLispVal]
| TvLet !Pos !L.LispType !L.LetStyle ![(Pos, Text, TypedLispVal)] ![TypedLispVal]
| TvList !Pos !L.LispType ![TypedLispVal]
| TvNil !Pos
| TvString !Pos !Text
deriving (Show, Eq)
data TcState = TcState
tsTypeVarCounter :: !Int
{ tsSubst :: !(Map Text L.LispType)
, }
TypedLispVal
is the type of the AST after type checking. It is very similar toResolvedLispVal
, just with type information added.- Primitive types like
Int
,Bool
,String
, andnil
are not annotated with their type, since this is obvious from their constructor. -
TvInt {}
isTyInt
etc.
- Primitive types like
The main type checking function is typeCheckVal'
in TypeChecker.hs, which looks like this:
typeCheckVal' :: TypeEnv
-> R.ResolvedLispVal
-> StateT TcState (Except TypeError) (TypeEnv, TypedLispVal)
= do typeCheckVal' env1 rv
It takes an initial type environment and a ResolvedLispVal
, and returns a new type environment and a TypedLispVal
on success.
data TypeEnv = TypeEnv
teTypes :: !(Map.Map Text L.PolyType)
{ teParent :: !(Maybe TypeEnv)
, }
Here is the top of typeCheckVal'
typeCheckVal' :: TypeEnv
-> R.ResolvedLispVal
-> StateT TcState (Except TypeError) (TypeEnv, TypedLispVal)
= do
typeCheckVal' env1 rv -- Type check
<- go
(envFinal, topT1)
<- applyValSubstitutions topT1
topFinal pure (envFinal, topFinal)
where
=
go case rv of
-- Simple cases, no extra type checking needed
R.RlNil p -> pure (env1, TvNil p)
R.RlInt p v -> pure (env1, TvInt p v)
R.RlString p v -> pure (env1, TvString p v)
R.RlBool p v -> pure (env1, TvBool p v)
-- Type check the more complex cases
R.RlAtom p v -> typeCheckAtom p v
R.RlDefine p name val -> typeCheckDefine p name val
R.RlDo p vs -> typeCheckDo p vs
R.RlFuncCall p f as -> typeCheckFuncCall p f as
R.RlIf p cond then' else' -> typeCheckIf p cond then' else'
R.RlLambda p bindings body -> typeCheckLambda p bindings body
R.RlLet p style bindings body -> typeCheckLet p style bindings body
R.RlList p vs -> typeCheckList p vs
- The functions returns a
TypedLispVal
i.e. the result of type checking theResolvedLispVal
. - It also returns the
TypeEnv
, which is the type environment after type checking. E.g. it is updated by a(defined ...)
form - For the simple cases like
TvInt
it simply returns the correspondingTypedLispVal
Now, let us look at how those typeCheckXXX calls are implemented.
Type Check: If Expression
-- (if cond then else)
= do
typeCheckIf pos cond1 then1 else1 -- Type check the `condition`, `then`, and `else` expressions.
<- typeCheckVal' env1 cond1
(env2, cond2) <- typeCheckVal' env2 then1
(env3, then2) <- typeCheckVal' env3 else1
(env4, else2)
-- Check that the condition is a boolean.
L.TyBool)
unify (L.getPos cond2, getValType cond2) (pos,
-- Unify the types of the then and else branches.
-- I.e. the then and else branches must have the same result type.
unify (L.getPos then2, getValType then2) (L.getPos else2, getValType else2)
-- Get the final type of the body of the if expression.
let finalType = getValType then2
pure (env4, TvIf pos finalType cond2 then2 else2)
Looking at that superficially, it is not that complicated.
- Type check the
condition
,then
, andelse
expressions. - Check (unify) that the condition is a boolean.
- Check (unify) that the types of the
then
andelse
branches are the same. I.e. that the final expression in each has the same type. - Get that final type and make it the type of the
TvIf
expression.
The unify
function will be discussed in more detail below.
For now:
đĄ Unification is the algorithm that, given two type expressions,
- Determines whether they can be made equal
- And if so, constructs the most general substitution for type variables that makes the expressions identical.
unify :: (Pos, L.LispType)
-> (Pos, L.LispType)
-> StateT TcState (ExceptT TypeError Identity) ()
So for the if
expression that was roughly
- Type-check all the child expressions.
- Unify child expressions to ensure the meet expectations
- Get the final type of the expression.
- Construct the
TvIf
.
Type Check: Do Expression
-- (do ...)
= do
typeCheckDo pos vs -- Type check all expressions in the do block.
<- foldTypeCheckVals env1 vs
(env2, vs2)
-- The type of the block is the type of the last expression or nil if empty.
let lastType = case lastMay vs2 of
Nothing -> L.TyNil
Just x -> getValType x
pure (env2, TvDo pos lastType vs2)
TvDo pos lastType vs2
^ ^ ^ ^+------- typed children
| | | +------------- result type of the block
| | +-------------------- source position
| +------------------------ âdoâ node
This is very similar to the if
expression.
- Type check child expressions.
- Get the type of the body (last expression).
- Construct the
TvDo
.
Type Check: Lists
-- (list v...)
-- Type check a list of values.
-- In this implementation, lists are always homogeneous (all elements must have the same type).
=
typeCheckList pos vs case vs of
-- An empty list can have any element type. Create a fresh type variable for the element type.
-- Note that nil and the empty list `()` are not synonymous in this system (in most lisps they are).
-> do
[] <- nextTypeVar
u pure (env1, TvList pos (L.TyList $ L.TyVar u) [])
-- Lists at this point are just vectors of values, never function calls.
-- The only requirement is that all elements have the same type.
:t) -> do
(h-- Type check the head element to get its type.
<- typeCheckVal' env1 h
(env2, headVal) -- Type check the tail elements to get their types.
<- foldTypeCheckVals env2 t
(env3, tailVals)
let
-- Type and position of the first element.
= (L.getPos headVal, getValType headVal)
headType -- Type and position the tail elements.
= tailVals <&> \v -> (L.getPos v, getValType v)
tailTypes
-- Unify the type of each tail element with the head element's type,
-- to ensure all elements in the list have the same type.
$ \tt -> unify headType tt
forM_ tailTypes -- If no type mismatch is found, the list is homogeneous.
pure (env3, TvList pos (L.TyList $ snd headType) (headVal : tailVals))
A bit more happening there, but still nothing too complicated.
- Infer before unify.
- Unify takes a
Pos
and aL.LispType
, so it can report errors with the position of any type mismatch.
-- the list node
TvList pos -- type of the homogeneous list
(TyList $ snd headType) -- the typed list elements (headVal : tailVals)
đĄ Remember that lists are homogeneous in this implementation.
Non-empty Lists
For non-empty lists, the code looks similar to the if
expressionâs code.
- Infer the headâs type by checking the first element.
- Infer each tail elementâs type with
foldTypeCheckVals
. - Unify each tailâs type with the headâs to enforce homogeneity.
- Construct a
TvList pos (TyList headType) [headVal âŠ]
node.
Empty Lists
When the type checker first sees an empty list it canât know what the final concrete type will be.
So we say that the empty list can be of any type, i.e. in Haskell we say it has the type [a]
.
a
will be unified later when the list is used.
As noted in the code, the empty list is not the same as nil
in this implementation.
Type Checking: Progress So Far
In the examples above, you will have seen at a high level how some of the LISP forms are type-checked. You many want to go back and forth between the code and the theory in Part 2 to get a better feel for what its trying to achieve.
Next weâll look at some important helper functions before moving on to the core type-checking code.
Generating Fresh Type Variables
đĄ A fresh type variable is a new unique type variable that has not been used before in the current type environment.
From the unification section above, we saw that the type checker needs to be able to generate fresh type variables. Let look at the implementation of that in the type checker.
The type checker carries its state in TcState
data TcState = TcState
tsTypeVarCounter :: !Int
{ tsSubst :: !(Map Text L.LispType)
, }
tsTypeVarCounter
is used to generate fresh type variables.tsSubst
is the substitution map.
Generating a fresh type variable is done in the nextTypeVar
function, which looks like this:
typeVarPrefix :: Text
= "U"
typeVarPrefix
nextTypeVar :: (Monad m) => StateT TcState m Text
=
nextTypeVar $ \st ->
state let c' = tsTypeVarCounter st + 1
= typeVarPrefix <> show c'
name = st { tsTypeVarCounter = c' }
st' in (name, st')
It generates a fresh type variable by incrementing the counter and returning a new type variable name.
Since there is only one TcState
for the entire type-checking process, this ensures that all generated type variables are unique.
Iâm using U
as the prefix for type variables. This is entirely arbitrary as the name will always be unique thanks to substitution.
Occurs Check - Preventing Infinite Types
đĄ Infinite types are types that are defined in terms of themselves, leading to recursive type definitions that cannot be resolved.
-- | Check if a type variable occurs in a type
-- This is used to prevent infinite types
-- For example, if you this haskell type is infinite: a = [a]
occurs :: Text -> L.LispType -> Bool
=
occurs name lt case lt of
L.TyNil -> False
L.TyInt -> False
L.TyString -> False
L.TyBool -> False
L.TyVar v -> v == name
L.TyList vs -> occurs name vs
L.TyFunc fnArgsType1 fnRetType1 ->
any (occurs name) (fnArgsType1 <> [fnRetType1])
- Recursively checks if a type variable already occurs in a type.
- From above, we saw that bindVar calls occur before binding. Then throwing an infiniteâtype error if it returns True.
The Substitution Map
đĄ A substitution map tracks which type variables should be replaced by other types or variables, enabling recursive resolution to the final, most specific type during type inference.
After the unification section process binds e.g. U1 = Int
and U2 = U1
, we need to walk every type and replace U2 -> U1 -> Int
.
The substitution map is where we store the information to track which type variables have been unified with which types.
The substitution map is stored in the tsSubst
of TcState
.
The applySubstitutions
function and variants are used to apply the substitutions to types and values.
applySubstitutions :: L.LispType -> StateT TcState (Except TypeError) L.LispType
= do
applySubstitutions lt case lt of
L.TyNil -> pure lt
L.TyInt -> pure lt
L.TyString -> pure lt
L.TyBool -> pure lt
L.TyList lt1 -> do
<- applySubstitutions lt1
lt2 pure $ L.TyList lt2
L.TyFunc targs tret -> do
<- traverse applySubstitutions targs
targs2
<- applySubstitutions tret
tret2 pure $ L.TyFunc targs2 tret2
L.TyVar v -> do
<- get
st case Map.lookup v st.tsSubst of
Just lt2 -> applySubstitutions lt2
Nothing -> pure lt
Nil
,Int
,String
, andBool
have no substitutions, so they are returned as is.- For lists, functions and other complex types, it recursively applies substitutions to their components and child components.
TyVar
is the interesting case- Look up the type variable in the substitution map.
- If it exists:
- apply substitutions to the found type. I.e. recursively apply substitutions to the type.
- This is how we resolved
U2 ~ U1 ~ Int
above.
- If it does not exist in the substitution map, it is returned as is.
- Remember that in the recursive case, this means that all available substitutions have been applied to the type variable.
You can look at the code to see
applyPolyTypeSubstitutions
: callsapplySubstitutions
on child components ofPolyType
applyValSubstitutions
: callsapplySubstitutions
on child components of aTypedLispVal
Binding Type Variables - Updating the Substitution Map
đĄ Binding a type variable means associating a type variable with a specific type, allowing the type checker to resolve that variable to the bound type during type inference.
Unification is both a lookup and a binding operation.
Youâll see bindVar
invoked in the unify
functionâs TyVar
cases.
This is where two types unify a variable to a concrete type (or another variable).
-- | Bind a variable to a type
-- This is done during type unification as part of the type inference process.
bindVar :: Pos -> Text -> L.LispType -> StateT TcState (Except TypeError) ()
bindVar pos name lt-- If trying to bind a type variable to itself, do nothing.
-- E.g. `U1 = U1`
| getTVarName lt == Just name = pure ()
-- Prevent infinite types. See `occurs`
| occurs name lt = lift . throwE $ TcInfiniteType pos name lt
-- Record the substitution
-- i.e. record that `name` is now bound to `lt`
| otherwise = modify' $ \st -> st { tsSubst = Map.insert name lt st.tsSubst }
where
getTVarName :: L.LispType -> Maybe Text
L.TyVar v) = Just v
getTVarName (= Nothing getTVarName _
bindVar
is used to bind a type variable to a type.- Check if being a variable to itself.
- Check for infinite types using
occurs
(see above). - Otherwise, it updates the substitution map in
TcState
usingStateT
âsmodify'
to record that the type variable is now bound to the type.
Finding Free Type Variables
To generalise a type we quantify variables that are not already in scope. Getting free type variables in a type and in the environment is the first step.
In the unification process above, it was said:
Generalisation picks the type variables in the inferred type that are not already âin scopeâ in the environment
In the code this is done by freeTypeVars
and variants.
-- | A free type variable in a type is a type variable that is not bound by a forall in that type,
-- nor already assigned a meaning in the current environment.
freeTypeVars :: L.LispType -> Set Text
L.TyVar v) = Set.singleton v
freeTypeVars (L.TyList t) = freeTypeVars t
freeTypeVars (L.TyFunc args ret) = Set.unions $ freeTypeVars ret : (freeTypeVars <$> args)
freeTypeVars (L.TyNil = Set.empty
freeTypeVars L.TyInt = Set.empty
freeTypeVars L.TyString = Set.empty
freeTypeVars L.TyBool = Set.empty freeTypeVars
- This uses
Set
from thecontainers
package to track free type variables. - Unsurprisingly, sets make set logic easy.
- Here we use
union :: Set a -> Set a -> Set a
to combine two sets - And
unions :: [Set a] -> Set a
to combine multiple sets.
Letâs look at the TyFunc
case
freeTypeVars (L.TyFunc args ret) = Set.unions (freeTypeVars ret : (freeTypeVars <$> args))
freeTypeVars ret
- finds the free variables in the return type.freeTypeVars <$> args
- create a list of the free variables from each argument type.freeTypeVars ret
: ⊠- prepends the return typeâs free variables to the argument typesâ free variables, making a list of sets.Set.unions ...
- combines all those sets into one set, containing all free variables from the whole function type.
-- | For PtMono, the free variables are just those of the underlying monotype.
-- For PtForall vs t, the free variables are those in t excluding the ones quantified in vs.
freeTypeVarsInPoly :: L.PolyType -> Set Text
L.PtMono t) = freeTypeVars t
freeTypeVarsInPoly (L.PtForall vs t) = freeTypeVars t `Set.difference` Set.fromList vs freeTypeVarsInPoly (
The PtForall
case is interesting, as it excludes the type variables that are quantified in the vs
list.
- In other words, it finds the free type variables in a polymorphic type, excluding those that are bound by the
forall
quantifier.
Finally freeTypeVarsEnv
is used to find free type variables in the environment.
-- | traverses the environment hierarchy and accumulates free type variables
freeTypeVarsEnv :: TypeEnv -> Set Text
=
freeTypeVarsEnv env let
= fromMaybe mempty (freeTypeVarsEnv <$> env.teParent)
freeParents = freeTypeVarsInPoly <$> (Map.elems env.teTypes)
freeInThis in
: freeInThis) Set.unions (freeParents
To get the free type variables for the whole environment, you:
- Collect all free type variables in the parent environment (if there is one).
- This is done recursively.
- So it will traverse the entire environment hierarchy upwards.
- Note that the fmap here (
<$>
) is mapping over theMaybe
type, so it gets aMaybe (Set Text)
and thefromMaybe
handles theNothing
case.
- Collect all free type variables in the current environment.
- Map
freeTypeVarsInPoly
over each of the values in the environmentâs type map.
- Map
- Take the union of these sets to get all the free type variables in scope.
Note that although Set.unions
and recursive environment traversals are fine for small programs, larger codebases might require caching or a more incremental approach.
Instantiating Polymorphic Types
đĄ Instantiating a polymorphic type means replacing each quantified variable in a polymorphic type with a fresh type variable.
For example
â a b. a -> b -> a
becomesU0 -> U1 -> U0
- Where
U0
andU1
are fresh type variables. - Instantiation always produces a monotype.
This must be done recursively so that all nested polymorphic types are instantiated.
It must also be done recursively for monomorphic types. If we supported more complex types, you could have a monomorphic type that contains polymorphic. (E.g. in Haskell a monomorphic record with a polymorphic field type).
instantiate :: Pos -> L.PolyType -> StateT TcState (Except TypeError) L.LispType
= do
instantiate _pos pt1 case pt1 of
L.PtMono lt1 ->
instantiate' Map.empty lt1
L.PtForall vars1 lt1 -> do
<- for vars1 $ \v -> do
isubsts1 <- nextTypeVar
c pure (v, c)
let isubsts2 = Map.fromList isubsts1
instantiate' isubsts2 lt1
where
instantiate' :: Map Text Text -> L.LispType -> StateT TcState (Except TypeError) L.LispType
= do
instantiate' isubst lt1 case lt1 of
L.TyNil -> pure lt1
L.TyInt -> pure lt1
L.TyString -> pure lt1
L.TyBool -> pure lt1
L.TyList lt2 -> do
<- instantiate' isubst lt2
lt3 pure $ L.TyList lt3
L.TyVar v -> do
case Map.lookup v isubst of
Just v2 -> pure $ L.TyVar v2
Nothing ->
-- No substitution found, so return the original type variable
pure $ L.TyVar v
L.TyFunc targs tret -> do
<- traverse (instantiate' isubst) targs
targs2
<- instantiate' isubst tret
tret2 pure $ L.TyFunc targs2 tret2
Type Checking: Another Progress Check
You have now seen all the building blocks required for us to move on to generalisation
and unify
.
With all the pieces in place, they should be reasonably understandable.
Generalise
đĄ Generalisation in Hindley-Milner, is the process of turning a monomorphic into a polymorphic type.
generalise :: TypeEnv -> L.LispType -> L.PolyType
=
generalise env t let
-- 1. Collect all free type variables appearing in the environment.
= freeTypeVarsEnv env
envVars -- 2. Collect all free type variables in the type being generalised.
= freeTypeVars t
typeVars -- 3. Find variables that appear in the type, but not in the environment.
-- These are the variables that can be generalised (quantified over).
= Set.toList (typeVars `Set.difference` envVars)
toGen in
-- 4. If there are no variables to generalise, return a monomorphic type.
if null toGen
then L.PtMono t
else L.PtForall toGen t
- The comments in the code explain the steps pretty well.
Paraphrasing from above
- Generalisation picks the type variables in the inferred type that are not already âin scopeâ in the environment
- (free in type) - (free in type environment)
Unification
đĄ Unification is the algorithm that, given two type expressions,
- Determines whether they can be made equal
- And if so, constructs the most general substitution for type variables that makes the expressions identical.
Finally, we get to the unification function, which is the core of the type inference algorithm.
unify :: (Pos, L.LispType)
-> (Pos, L.LispType)
-> StateT TcState (ExceptT TypeError Identity) ()
= do
unify (lhsPos, lhs1) (rhsPos, rhs1) -- 1) Substitute
<- applySubstitutions lhs1
lhs2 <- applySubstitutions rhs1
rhs2
unify' lhs2 rhs2
where
unify' :: L.LispType -> L.LispType -> StateT TcState (Except TypeError) ()
-- 2) Equal?
| l == r = pure ()
unify' l r
-- 3) Type variable?
L.TyVar name) r = bindVar lhsPos name r
unify' (L.TyVar name) = bindVar rhsPos name l
unify' l (
-- 4) Lists?
L.TyList a1) (L.TyList b1) = unify' a1 b1
unify' (
-- 5) Functions?
L.TyFunc fnArgsType1 fnRetType1) (L.TyFunc fnArgsType2 fnRetType2) = do
unify' (length fnArgsType1 == length fnArgsType2) $ do
unless (. throwE $ TcArityError (Just $ lhsPos) (length fnArgsType1) (length fnArgsType2)
lift
-- unify the argument types
-> unify (lhsPos, a1) (rhsPos, a2)) fnArgsType1 fnArgsType2
zipWithM_ (\a1 a2 -- unify the return types
unify (lhsPos, fnRetType1) (rhsPos, fnRetType2)
-- 6) Mismatch
= lift . throwE $ TcTypeMismatch "Unification mismatch" lhsPos (L.PtMono l) (Just rhsPos) (L.PtMono r) unify' l r
Main steps
- Apply all substitutions to both types. (see
applySubstitutions
) - If they are equal, do nothing
- If one is a type variable, bind it to the other type (unless this creates an infinite type)
- If both are lists, unify their elements
- If both are functions, unify their argument types and return types
- Otherwise, throw a type mismatch error
With all the leg work done, the unification algorithm is concise and straightforward.
We can now look at a couple of the more complex cases from typeCheckVal'
to see how they use unification and generalisation.
đ§© Type Check: Define Expression
-- (define name val)
= do
typeCheckDefine pos name val1 -- Type check the value being defined.
<- typeCheckVal' env1 val1
(_env2, val2)
-- Apply substitutions before generalising. This gets the final (possibly concrete) type of the value.
-- This step resolves all type variables to their current bindings.
<- applySubstitutions (getValType val2)
tFinal
-- Generalise the type of the value to create a polymorphic type if possible.
-- See `generalise` for details.
let pt = generalise env1 tFinal
-- Update the *current* environment with the new binding.
-- Note: this does not create a new env layer, it is an in-place update.
let env3 = env1 { teTypes = Map.insert name pt env1.teTypes }
pure (env3, TvDefine pos pt name val2)
- The resolver already restricted
define
s to the top-level. No need to check again. - Type-check the value being defined.
- Substitute
- Generalise. We generalise for
define
and forlet
bindings. - In-place environment update with the new binding.
đ§© Type Check: Function Call
To type an application like (+ 1 2)
, we infer the functionâs type, infer each argumentâs type, and then unify against a fresh type-var return type.
-- (func args...)
-- Type check a function call (application).
-- Note that this is for when a function is called, not when it is defined.
= do
typeCheckFuncCall pos funcVal' args' -- Type check the function being called and all argument expressions.
<- typeCheckVal' env1 funcVal'
(_, funcVal) <- foldTypeCheckVals env1 args'
(_, argVals) -- Collect the types of all the arguments.
let argTypes = getValType <$> argVals
-- Create a fresh type variable for the return type.
<- L.TyVar <$> nextTypeVar
retType
-- Unify the type of the function value with a function type: (argTypes -> retType).
-- This means the value being called must be a function taking the argument types and returning the return type.
-- I.e. the value being called is a function whose type unifies with a function type constructed from the argument
-- types and a fresh return type variable.
L.TyFunc argTypes retType)
unify (L.getPos funcVal, getValType funcVal) (pos,
-- After unification, apply substitutions to get the final (possibly concrete) return type.
<- applySubstitutions retType
retTypeSubst
pure (env1, TvFuncCall pos retTypeSubst funcVal argVals)
- This is for function application, not function definition. E.g.
(+ 1 2)
. - There are no explicit types, so the return type always starts as a fresh type variable.
unify
is called to ensure that the thing being called is a function with the expected argument types and return type.- Donât forget to apply substitutions to the return type after unification.
đ§© Type Check: Let Expression
A let
introduces local bindings whose definitions may themselves be polymorphic.
To support polymorphism, we must typeâcheck each binding, generalise its monotype,
and put all new ones into a fresh environment before checking the body.
-- (let ( (n1 v1) (n2 v2)...) body...)
= do
typeCheckLet pos style bindings1 body1 <-
(bindings2', env2') case style of
-- Parallel let bindings:
-- * All bindings are evaluated in the same outer environment.
-- * No binding can refer to any other binding in the same let.
-- * Only the body sees all new bindings.
--
-- Sequential let bindings, where a binding may refer to a previous binding,
-- are not supported, but should be easy to add.
L.LetParallel -> do
-- Type check each binding in the let form.
<- traverse typeCheckLetBinding bindings1
ls let bindings2 = ls <&> \(_, p, name, val) -> (p, name, val)
-- Generalise the types of the bindings to allow polymorphism.
= bindings2 <&> \(_, name, val) -> (name, generalise env1 $ getValType val)
bindings3 -- Create a new environment layer for the let body, containing the new bindings as local variables.
= TypeEnv
env2 = Just env1
{ teParent = Map.fromList bindings3
, teTypes
}
pure (bindings2, env2)
- In LISP, there are several styles of
let
bindings.- parallel: all bindings are evaluated in the same outer environment.
- sequential: bindings are evaluated one after the other, and can refer to previous bindings.
- recursive: bindings are evaluated in the same environment but can refer to each other.
- This implementation only supports parallel
let
bindings, but it should be easy to add the others. - Generalise and create a new environment layer for the body of the
let
expression.
đ§© Type Check: Lambda Expression
To type-check (λ (x y) ...)
, we assign each parameter a fresh type variable, extend the environment with those bindings, typeâcheck the body, and then assemble a function type from the parameter and return types.
-- (λ (param1 param2 ...) body..)
= do
typeCheckLambda pos params1 body1 -- Create new (fresh) monomorphic type variables for each of the parameters.
<- for params1 $ \(p, name) -> do
params2 <- nextTypeVar
u pure (p, name, L.PtMono $ L.TyVar u)
-- Create a new environment layer for this function.
-- containing the parameter bindings as local variables.
let env2 = TypeEnv
= Just env1
{ teParent = Map.fromList $ params2 <&> \(_, name, pt) -> (name, pt)
, teTypes
}
-- Type check the lambda body in the new environment.
<- foldTypeCheckVals env2 body1
(_tenv2, body2)
-- Apply substitutions to each parameter type to resolve any type variables.
-- Use `getMonoType` to extract the monomorphic type from a PolyType.
-- This is required here because we only ever bind parameters to monotypes,
-- so it is safe to extract without ambiguity.
<- for params2 $ \(_p, _name, pt) ->
params3
applySubstitutions (getMonoType pt)
-- Get the return type of the lambda by looking at the last expression in the body.
-- Apply substitutions to get the fully resolved type.
<- case lastMay body2 of
retType Nothing -> pure L.TyNil
Just l -> applySubstitutions (getValType l)
-- Construct the final function type for the lambda.
let lambdaType = L.TyFunc params3 retType
pure (env1, TvLambda pos lambdaType params1 body2)
- There are no explicit types in the lambda expression, so we create fresh type variables for each parameter and return type.
- A lambda captures its environment, so we create a new environment layer for the lambda body.
- The body is type-checked in this new environment.
- Get the return type and apply substitutions to resolve any type variables.
The way that this implementation handles the type environment naturally supports lexical scoping and variable capture which are the foundations of closures in functional languages.
When type-checking a lambda, we construct a new TypeEnv
(env2
) that contains just the lambdaâs parameters. teParent = Just env1
then links it to the parent scope.
env1
is the environment at the point where the lambda is defined.
That is we return (env1, TvLambda...)
, not (env2, ...)
. The new bindings are scoped only within the lambda body, preserving the outer environment unmodified.
This mirrors how most functional languages implement closures.
đ Lambdas capture the environment where they are defined, not where they are called from.
đ§© Type Check: Atom Expression
Atoms are our only form of name lookup. We fetch their polymorphic type from the environment and instantiate them to fresh monotypes for use.
= do
typeCheckAtom pos v -- Try to look up the variable in the type environment.
case tlookup v env1 of
-- If found, instantiate the value (see `instantiate`).
Just pt -> do
<- instantiate pos pt
lt pure (env1, TvAtom pos lt v)
-- If not found, throw an unbound type variable error.
Nothing -> lift . throwE $ TcUnboundVariable pos v
tlookup :: Text -> TypeEnv -> Maybe L.PolyType
=
tlookup name env case Map.lookup name env.teTypes of
Just v -> Just v
Nothing -> case env.teParent of
Nothing -> Nothing
Just parent -> tlookup name parent
tlookup
is a utility function that looks recursively up a variable in the current type environment or itâs parents etc.- Instantiate if found otherwise throw an unbound variable error.
Limitations & Future Work
Quick Wins
- Type signatures for let and define
- Useful for documentation and guiding inference
- Would require parsing an extra annotation form and a simple unify call in the resolver.
- Sequential let bindings
- Evaluate bindings one by one, each seeing previous ones
- Easy to implement by threading the environment through each binding in order.
-
The
LetType
sum type and case statements are already in place.
- Floating point type support
-
Adds
TyDouble
- Could be handled either by overloading or numeric promotion.
Advanced Features
- Recursive let and mutual recursion
- Userâdefined algebraic data types
- Type classes and constraints
-
Another way to deal with
double
vsint
.
- Performance optimisations
- Current substitution and environment operations are naive
- Large programs would benefit from incremental or cached substitution and more efficient environment lookups.
- Improved error messages
Type Checking: Summary
In just a few hundred lines, we have built a complete Hindley-Milner pipeline in TypeChecker.hs. From parser through evaluator for a minimal Lisp.
We support unification, generalisation, instantiation, polymorphic let bindings, and fresh-variable generation. The code is modular and clear, making it easy to extend.
Lowering
đĄ Lowering is the process of transforming a complex AST into a simpler form
Real-world compilers often perform optimisations, desugaring, etc during lowering. In our pipeline, lowering simply erases all type and position annotations.
Including a lowering phase mirrors a full compiler architecture and sets the stage for REPL features such as runtime type queries.
Since the AST has already passed every type check, removing annotations cannot introduce runtime errors.
In GHC, types are erased before code generation, our lowering step follows the same approach.
With annotations removed, the resulting EvalVar
nodes become the input to our interpreter (see the EvalVar
type in the Evaluator next section.).
-- | Lower a typed value to 'EvalVar m'.
-- In this implementation, lowering does little other than type erasure.
-- In a more complex implementation, lowering could also perform optimisations or additional transformations.
-- Compling to bytecode or transpiling to another language would typically operate on lowered representation.
lowerToEvalVar' :: T.TypedLispVal -> Except E.EvalError (EvalVar m)
= do
lowerToEvalVar' tv case tv of
T.TvNil _ -> pure EvNil
T.TvInt _ v -> pure $ EvInt v
T.TvBool _ v -> pure $ EvBool v
T.TvString _ v -> pure $ EvString v
T.TvAtom _ _ a -> pure $ EvVar a
T.TvList _ _ vs -> do
case lowerToEvalVar vs of
Left err -> throwE err
Right vs' -> pure $ EvList vs'
T.TvFuncCall _p _t fv' argvs' -> do
<- lowerToEvalVar' fv'
fv <- traverse lowerToEvalVar' argvs'
argvs pure $ EvFuncCall fv argvs
T.TvDo _ _ vs -> do
case lowerToEvalVar vs of
Left err -> throwE err
Right vs' -> pure $ EvDo vs'
T.TvLet _ _ style bindings1 body1 -> do
<- for bindings1 $ \(_, n, v) -> do
bindings2 <- lowerToEvalVar' v
v2 pure (n, v2)
<-
body2 case lowerToEvalVar body1 of
Left err -> throwE err
Right body' -> pure body'
pure $ EvLet style bindings2 body2
T.TvLambda _pos _typ args body1 -> do
<- traverse lowerToEvalVar' body1
body2 pure $ EvLambdaForm (snd <$> args) body2
T.TvIf _pos _typ cond then' else' -> do
<- lowerToEvalVar' cond
cond2 <- lowerToEvalVar' then'
then2 <- lowerToEvalVar' else'
else2 pure $ EvIf cond2 then2 else2
T.TvDefine _pos _typ n v -> do
<- lowerToEvalVar' v
v2 pure $ EvDefine n v2
Evaluating
Having erased types, our evaluator walks the EvalVar tree, executing primitives and user-defined functions. We know that the code is well-typed, so we can safely evaluate it without worrying about type errors.
Here is the lowered AST
data EvalVar m
= EvBool !Bool
| EvDefine !Text !(EvalVar m)
| EvDo ![EvalVar m]
| EvFuncCall !(EvalVar m) ![(EvalVar m)]
| EvFunction !(EvFunc m)
| EvIf !(EvalVar m) !(EvalVar m) !(EvalVar m)
| EvInt !Int
| EvLambdaForm ![Text] ![EvalVar m]
| EvLet L.LetStyle ![(Text, EvalVar m)] ![EvalVar m]
| EvList ![EvalVar m]
| EvNil
| EvString !Text
| EvVar !Text
deriving (Eq, Show)
newtype EvFunc m = EvFunc ([EvalVar m] -> EvalEnv m -> ExceptT EvalError m (EvalVar m))
This is very similar to TypedLispVal
, but without the type information and positions.
Unlike previous types, it is parameterised by a monad m
. All the types in the evaluator and the evaluator itself follow this pattern.
The core evaluator remains pure, all IO is delegated to an EvalIO m
record. By abstracting over m, we can run in IO or swap in a pure test runner.
data EvalEnv m = EvalEnv
eeParent :: !(Maybe (EvalEnv m))
{ eeVars :: !(Map Text (EvalVar m))
,deriving (Eq, Show) }
EvFunc
is newtype wrapper for a function type. That is a function that can be applied.
EvalEnv
is the environment that the function is evaluated in, it contains the variable bindings and other state needed for evaluation. EvalEnv
is hierarchical, i.e. a stack of environments.
Interaction with the external world
The evaluator itself is pure, it has no direct way to perform and side effects.
Instead it uses EvalIO m
which provides a way to interact with the external world, assuming m
is an IO
-like monad.
data EvalIO m = EvalIO
eiPrnLn :: !(forall s. (Show s) => s -> m ())
{ eiPrnTextLn :: !(Text -> m ())
, eiPrnErrorInCode :: !(forall e. (L.LispError e) => Text -> e -> m ())
, eiLog :: !(Lg.Logger m)
, }
Of course you could also implement EvalIO
without using IO
e.g. if you wanted to use it from a pure context.
Primitive Functions
Primitive functions provide built-in operations implemented in Haskell but whose types are checked in our HM engine.
PrimFns.hs contains the primitive functionâs implementation. It is very small and incomplete but does demonstrate how the host language can provide functions to the LISP language.
E.g.
newtype PrimitiveFunctions m = PrimitiveFunctions (Map Text (EvFunc m, L.PolyType))
getPrimitiveFunctions :: forall m.
Monad m)
(=> E.EvalIO m
-> m (E.PrimitiveFunctions m)
= do
getPrimitiveFunctions eio let intIntInt = L.PtMono $ L.TyFunc [L.TyInt, L.TyInt] L.TyInt
pure . E.PrimitiveFunctions . Map.fromList $
"+", (E.EvFunc $ eMathsBin (+), intIntInt))
[ ("-", (E.EvFunc $ eMathsBin (-), intIntInt))
, (
"prn"
, ( E.EvFunc ePrn
, ( L.PtMono $ L.TyFunc [L.TyString] L.TyNil
,
)
)
]
where
ePrn :: [E.EvalVar m] -> E.EvalEnv m -> ExceptT E.EvalError m (E.EvalVar m)
= do
ePrn args _eenv case args of
-> do
[v] <- E.as' E.asString "string" $ v
s . E.eiPrnTextLn eio $ s
lift pure E.EvNil
-> do
_ . E.EeRuntimeError Nothing $ "prn: wrong number of arguments calling prn: expected 1 argument, got: " <> show (length args)
throwE
eMathsBin :: (Int -> Int -> Int) -> [E.EvalVar m] -> E.EvalEnv m -> ExceptT E.EvalError m (E.EvalVar m)
= do
eMathsBin op args _eenv case args of
-> do
[v1, v2] <- E.as' E.asInt "int" $ v1
i1 <- E.as' E.asInt "int" $ v2
i2 pure . E.EvInt $ op i1 i2
-> do
_ . E.EeRuntimeError Nothing $ "Wrong number of arguments calling integer binary function: expected 2 arguments, got: " <> show (length args) throwE
getPrimitiveFunctions
returns a map of primitive functions and their types.
The types are used during type checking to ensure that the arguments passed to the functions are of the correct type.
The evaluator will get only the EvFunc
from the (EvFunc m, L.PolyType)
pair for use in evaluation.
Typically youâd want to have as few primitive functions as possible. Then youâd provide a standard library of functions written in the target language.
Standard Library
The REPL loads a small LISP standard library at startup.
; identity function
identity (λ (x) x))
(define
; create a list with two elements
list x y)))
(define pair (λ (x y) (
; square a number
* x x))) (define square (λ (x) (
REPL
The REPL ties together parsing, type checking, lowering, and evaluation into an interactive loop. It is implemented in Repl.hs using the haskline library for line editing, history, and simple completion
The REPL is not the focus of this post, so I wonât go into too much detail here.
Startup
On launch, the REPL:
- Loads
stdLib.lisp
(the standard library). - Initializes the parser, type checker, evaluator, and environment with primitives.
- Enters a loop reading user input.
â ïž In this prototype, stdLib.lisp
must reside in the current directory.
REPL Features
Commands
- Expression Evaluation
- Typing any valid Lisp expression parses, type-checks, and evaluates it, printing the result.
:t <expr>
-
Show the inferred type of
<expr>
after evaluating it.
:ts
- List all known top-level names and their types (primitives, library definitions, and user defines).
+t <expr>
- Toggle printing of types after evaluation.
-
This is on by default, but can be toggled with
+t
.
:quit
orCtrl-D
- Exit the REPL.
+m
- Toggle multi-line mode. End a block with a line containing only ..
Multi-Line Mode
When multi-line mode is on, you can enter a block over several lines:
> +m
= on.
Multiline
let
>>> (10)
... ( (x 20)
... (y
... )* x y)
... (
... )
... .200
Known Types with :ts
:ts
shows all known types, which is useful for debugging and understanding the types in the program.
> :ts
* (: Int -> Int -> Int)
+ (: Int -> Int -> Int)
- (: Int -> Int -> Int)
/ (: Int -> Int -> Int)
< (: Int -> Int -> Bool)
> (: Int -> Int -> Bool)
and (: Bool -> Bool -> Bool)
eq_bool (: Bool -> Bool -> Bool)
eq_int (: Int -> Int -> Bool)
eq_string (: String -> String -> Bool)identity (â [U0] (: U0 -> U0))
not (: Bool -> Bool)
or (: Bool -> Bool -> Bool)
pair (â [U2] (: U2 -> U2 -> (List U2)))
prn (: String -> Nil) square (: Int -> Int)
REPL Example
> (define identity (λ (x) x))
> :t identity
identity : â [U0]. U0 -> U0
> (identity 123)
123
> (identity "hello")
"hello"
> (identity (list 1 2 3))
1 2 3)
(
> ((λ (x) (+ x x)) 10)
20
Types at Runtime?
Since Lowering erases types from the AST, you might wonder how the REPL is able to show types (for :t
and :ts
).
The answer is that evaluation returns both a value and a type environment.
The REPL retains this and is then able to display type information even after erasure.
Either EvalError (T.TypeEnv, T.TypedLispVal, EvalEnv m, EvalVar m)) m (
The REPL stores (and updates) this in its state
data REnv = REnv
ePrintType :: !Bool
{ eEvalEnv :: !(Maybe (E.EvalEnv IO))
, eEio :: !(E.EvalIO IO)
, ePrimFns :: !(E.PrimitiveFunctions IO)
, eTypeEnv :: !(T.TypeEnv) ; <<----------- HERE
, eMultiLine :: !Bool
, }
This approach separates runtime execution (EvalVar
) from type bookkeeping, while still allowing full type introspection in the interactive REPL.
Conclusion
- In this post, we walked through a HindleyâMilner type system implementation in Haskell for a minimal Lisp.
- You saw the core theory of unification, generalisation, instantiation, and principal types.
- You also explored a working code pipeline, from parsing and resolution through type checking, lowering, and evaluation.
- A key takeaway is that HM-style inference need not be daunting to implement.
- Although minimal, this example provides a solid foundation for further extensions.
I hope youâve found this walk-through both practical and inspiring.
Please feel free to clone the GitHub repo, experiment with the code, and share your improvements or questions.
Further Reading
- The Typed Racket Guide
- Typed Racket
- Robert Nystrom
- Crafting Interpreters
- Wikipedia:
- HindleyâMilner type system
- Pierce, Benjamin C. (2002). Types and Programming Languages. MIT Press.
- Types and Programming Languages
- Milner, R. (1978). A theory of type polymorphism in programming.
- Journal of Computer and System Sciences, 17(3), 348-375.
- Damas, L. & Milner, R. (1982). Principal type-schemes for functional programs.
- Principal type-schemes for functional programs