Type Checking with Eqlog: Polymorphism

August 22, 2023

This is the fifth post in a series on implementing a type checker with the Eqlog Datalog engine. In this post, we’ll extend our type system with generics, or polymorphism, and implement Hindley-Milner type inference.

You can find the other posts here:

  1. Parsing [code]
  2. Variable binding [code]
  3. Types [code]
  4. Typing [code]
  5. Polymorphism [code] (this post)

In the last post, we completed our type checker for a simply typed language. This type checker accepts a program only if it can assign a uniquely determined, concrete type to every expression and variable in the program. But this can be overly restrictive; our type checker rejects some sensible programs. Consider this example:

function id(x) {
    return x;
}

let a = id(5);
let b = id('xyz');

The problem here is a type conflict for the variable x: Our type checker infers that x must have type number due to the call id(5), but that it must also have type string due to the call id('xyz'). On the other hand, if we removed the two calls to id, so that id was never called, then our type checker would reject the program because the type of x would not be determined.

Our goal for this post is to implement polymorphism, or generics, which will make the program above type check with inferred polymorphic type id : (a) => a where a is a type variable. At the two call sites of the id function, our new version of the type checker implicitly instantiates the polymorphic type of id by plugging in number and string, respectively, for the type variable a.

Monotypes and polytypes

Polymorphism makes it necessary for us to deal with type variables. We represent these in our Eqlog program as a Type elements which cannot be obtained by applying one of the type constructors to other Type elements. Our previous checker for simple types would report an error if it found such Type elements after Eqlog evaluation, but we will relax this check later on.

In addition to the monomorphic types or monotypes that we’ve seen so far, we now also consider polymorphic types or polytypes. Both monotypes and polytypes are given by an underlying Type element.

Sort GeneralType;
Func MonoType : Type -> GeneralType;
Func PolyType : Type -> GeneralType;

The difference between monotypes and polytypes lies in how we interpret type variables occurring in the underlying Type element:

The flavor of polymorphism we’re implementing here is usually called prenex or rank 1 polymorphism. “Prenex” because types can be (implicitly) quantified over only on the top level, i.e. in front of the type. Thus forall a. (a) => a is valid, whereas (forall a. a => a) -> (forall b. b => ()) is invalid. And “rank 1” because our type system has rank 0 types (monotypes) and rank 1 types (polytypes) which quantify over rank 0 types, but no more: We could also imagine rank 2 types that are allowed to quantify over rank 1 types, rank 3 types which quantify over rank 2 types and so forth.

Since types of variables can now be polymorphic, we have to change our VarTypeInX : X -> Type function family to be valued in GeneralType:

Func VarTypeInStmts : StmtListNode -> GeneralType;
Func VarTypeInExpr : ExprNode -> GeneralType;
...

While the axioms propagating variable bindings through syntax nodes can stay the same, we have to decide for each AST node that introduces a variable binding whether the variable has monotype or polytype.

Type contexts

Which type variables in a polytype are bound depends on the context of the polytype: If the polytype appears in the global module scope, then all type variables in it are bound. But if a polytype appears in the body of function, then only those type variables that are not already introduced by arguments of the containing function are bound. Consider the following convoluted way of implementing the identity function:

function id(x) {
    function const_x(y) {
        return x;
    }
    return const_x(5);
}

Suppose that the type of x is MonoType(a) and that the type of y is MonoType(b) for type variables a and b. Then the type of id is PolyType((a) => a), and since id appears in the global module scope, a is bound. The type of const_x is PolyType((b) => a), but since it appears in the body of the id function, so that the ambient type context contains a, only b is bound.

Because of local functions such as the one above, we need to keep track of which type variables were already introduced in the body of a function, and we need to propagate this information into relevant subnodes. This is the purpose of type contexts, which we define in Eqlog as follows:

Sort TypeContext;

Pred TypeInContext : Type * TypeContext;
Pred TypesInContext : TypeList * TypeContext;

Think of TypeContext elements as (non-extensional) sets of Type elements, and the TypeInContext predicate as the membership relation. We add axioms (here omitted) so that TypesInContext(tys, ctx) holds if and only if every type in the tys type list is in ctx.

Next we introduce total functions to associate type contexts to various syntax nodes:

Func ModuleTypeContext : ModuleNode -> TypeContext;
Func FunctionTypeContext : FunctionNode -> TypeContext;
Func StmtTypeContext : StmtNode -> TypeContext;
Func ExprTypeContext : ExprNode -> TypeContext;
...

Axiom mn : ModuleNode => ModuleTypeContext(mn)!;
Axiom fn : FunctionNode => FunctionTypeContext(fn)!;
...

The type context we associate to an AST node usually agrees with the type context of its parent node. For example, the axiom

Axiom
    EqualsExprNode(expr, lhs, rhs)
    & expr_ctx = ExprTypeContext(expr)
    & lhs_ctx = ExprTypeContext(lhs)
    & rhs_ctx = ExprTypeContext(rhs)
    =>
    expr_ctx = lhs_ctx
    & lhs_ctx = rhs_ctx
    ;

propagates type contexts through equality comparisons.

Crucially, though, the type context associated to function definition literals is not the same as the type context of its parent node but an extension of it: A type context that contains all the types of the base type context it extends, but that may contain additional types. We axiomatize type context extensions as follows:

Pred ContextExtension : TypeContext * TypeContext;
Axiom
    ContextExtension(base, ext)
    & TypeInContext(sigma, base)
    =>
    TypeInContext(sigma, ext)
    ;

We can then enforce our constraints on the type contexts of functions as follows:

Axiom
    FunctionStmtNode(stmt, func)
    & ambient_ctx = StmtTypeContext(stmt)
    & func_ctx = FunctionTypeContext(func)
    =>
    ContextExtension(ambient_ctx, func_ctx)
    ;

Within the scope of a function literal, we want to treat the types of function arguments as valid monotypes, so we add them to the type context of the function:

Axiom
    ConsArgListNode(arg_list, _, head_ty_node, _)
    & ctx = ArgListContext(arg_list)
    & head_ty = SemanticOptType(head_ty_node)
    =>
    TypeInContext(head_ty, ctx)
    ;

Eqlog does not support negations, but even if it did, Datalog engines can for fundamental reasons support negations only for rules that don’t apply recursively. Because of this, we cannot check that a Type element occurring inside a polytype is a bound type variable during Eqlog evaluation: That would require us to hypothesize that a type element is not the result of applying a type constructor, and that it is not in the ambient type context. Fortunately, we will only need to check whether a type element is unbound, i.e., that the type element is either a type variable in the ambient type context, or that it is the result of applying a type constructor to unbound types. To prepare for this, we close type contexts under type constructors:

Axiom gamma: TypeContext & sigma = VoidType() => TypeInContext(sigma, gamma);
...
Axiom
    kappa = FunctionType(dom, cod)
    & TypesInContext(dom, gamma)
    & TypeInContext(cod, gamma)
    =>
    TypeInContext(kappa, gamma)
    ;

We also want type contexts to be closed under inverses of type constructors. This is relevant for functions with arguments whose types involve a type variable that they are not equal to. For example, consider the apply function:

function apply (f, x) {
    return f(x);
}

The apply function should have type ((a) => b, a) => b. Our rules for arguments of functions imply that the types of f: (a) => b and x: a are in the type context of the body of apply, but only the following rule, which closes type contexts under inverses of the function type constructor, lets us conclude that then also b must be in the type context:

Axiom
    TypeInContext(FunctionType(dom, cod), gamma)
    =>
    TypesInContext(dom, gamma)
    & TypeInContext(cod, gamma)
    ;

With all rules about type contexts in place, we can now assume that a type element in a polytype scheme is unbound if and only if it is in the ambient type context.

Instantiation

Our function VarTypeInExpr : Var -> GeneralType can result into a polytype, whereas we still expect the function ExprType : ExprNode -> Type to result into a type in the current context, i.e., into a monotype. This means that we need to revisit our typing rules for variable usages.

In case the variable has monotype, we can continue to equate the type of the variable expression with the type of the variable:

Axiom
    VariableExprNode(expr, var)
    & VarTypeInExpr(var, expr) = MonoType(ty)
    =>
    ExprType(expr) = ty
    ;

However, if the variable has PolyType(ty), then we need to instantiate the type scheme ty first, i.e., replace the type variables in ty by suitable types in the current context. We introduce the following machinery for that purpose:

Sort Instantiation;

Func Instantiate : Instantiation * Type -> Type;
Func InstantiateList : Instantiation * TypeList -> TypeList;

Each Instantiation element represents a (partial, non-extensional) map Type -> Type, and the Instantiate function represents the application of such maps to Type elements. As usual, we add rules that enforce that the InstantiateList function is given by Instantiate on each element of a TypeList.

We can now associate an Instantiation element to every usage of a variable with polytype and equate the expression type with an instantiation of the polytype scheme.

Func ExprInstantiation : ExprNode -> Instantiation;

Axiom
    VariableExprNode(expr, var)
    & VarTypeInExpr(var, expr) = PolyType(_)
    =>
    ExprInstantiation(expr)!
    ;
Axiom
    VariableExprNode(expr, var)
    & VarTypeInExpr(var, expr) = PolyType(sigma)
    & inst = ExprInstantiation(expr)
    & expr_ty = ExprType(expr)
    =>
    Instantiate(inst, sigma) = expr_ty
    ;

The typing rules that we’ve added in the previous post constrain the expression types of usages of polymorphic variables based on where the variables appear. For example, if a variable is used in position of the function in an application expression, then the instantation of the variable’s type must be a function type. However, we haven’t encoded any constraints based on the type scheme of the variable’s polytype so far, which is our next task.

Instantiation and type constructors

The instantiation of a function type should be a function type, the instantiation of the string type should be the string type, and so forth. In other words, Instantiate should commute with type constructors in its Type argument:

// Instantiate commutes with the number type constructor.
Axiom
    instance_number = Instantiate(_, NumberType())
    =>
    NumberType() = instance_number
    ;

// If a funnction type is instantiated, then also the domain and
// codomain are instantiated, and instantiate commutes with the
// function type constructor.
Axiom
    Instantiate(inst, FunctionType(dom, cod))!
    =>
    InstantiateList(inst, dom)!
    & Instantiate(inst, cod)!
    ;
Axiom
    dom_instances = InstantiateList(inst, dom)
    & cod_instance = Instantiate(inst, cod)
    & func_instance = Instantiate(inst, FunctionType(dom, cod))
    =>
    FunctionType(dom_instances, cod_instance) = func_instance
    ;
...

Note that commutativity of the Instantiate(inst, -) operation with type constructors for fixed inst implies that the operation is uniquely determined by its value on type variables.

Instantiation and unbound type elements

Unbound type variables cannot be freely instantiated because they are fixed by the ambient type context of the polytype already. In case of the convoluted identity function above, this means that every instantiation of const_x : (b) => a must map a to itself.

Since we’re not allowing higher-ranked types, we don’t allow returning values with polytypes from functions; even if we return a variable with polymorphic type, our type checker implicitly instantiates the variable’s type scheme into a monotype first. As a consequence, polytypes cannot escape the ambient type context in which they were defined: The type context of a usage site of a variable with polytype is always an iterated extension of the ambient type context of the where the polytype was defined. It follows that at every usage site of a polytype, a type element occuring in the polytype scheme is unbound if and only if it is in the ambient type context of a usage site.

Func InstantiationTarget : Instantiation -> TypeContext;
Axiom
    VariableExprNode(expr, var)
    & VarTypeInExpr(var, expr) = PolyType(_)
    & instance = ExprInstantiation(expr)
    & ctx = ExprTypeContext(expr)
    =>
    InstantiationTarget(instance) = ctx
    ;
Axiom
    sigma_instance = Instantiate(instance, sigma)
    & TypeInContext(sigma, InstantiationTarget(instance))
    =>
    sigma = sigma_instance
    ;

Type errors

Our introduction of polymorphism makes it necessary to revisit two error categories: Undetermined type errors and conflicting type constraints.

Undetermined types

Recall that in the last post we introduced a predicate

Pred DeterminedType : Type;

and added rules so that DeterminedType(ty) holds if and only if ty does not contain any type variables. If our type checker found a Type element that did not satisfy DeterminedType after Eqlog evaluation, it would report an error due to the undetermined type.

Now that we’ve added support for polymorphism, this check is overly restrictive, since type variables can now appear in polytypes and the definitions of polymorphic functions. However, we cannot remove the check entirely. Consider the following program:

function absurd() {
    return absurd();
}

let x = absurd();

The type of absurd is () => b, so its return type is entirely unrestricted. This means that also the type of x is undetermined (i.e., a type variable), but we require that all let bindings must have monotype.

To adapt the DeterminedType predicate to polymorphism, we add the following rules mirroring our rules for type contexts: The argument and return types of functions are always determined, and determined types are closed also under inverses of type constructors (for cases such as the apply function above).

Axiom
    ConsArgListNode(_, _, arg_type, _)
    & ty = SemanticOptType(arg_type)
    =>
    DeterminedType(ty)
    ;
Axiom
    Function(_, _, _, ret_type, _)
    & ty = SemanticOptType(cod_type)
    =>
    DeterminedType(ty)
    ;
Axiom
    DeterminedType(FunctionType(dom, cod))
    =>
    DeterminedTypes(dom)
    & DeterminedType(cod)
    ;

Type conflicts

Consider the following function definiton:

function foo (x) {
    x(x);
}

Our type checker infers that x must be have function type (a) => b. Since x is applied to itself, we infer a type equality a = (a) => b. Every instantiation of foo in which the argument type is a concrete type (i.e., it does not contain type variables) would result into a type conflict. But if foo is never used, then our type checker will not report an error, because our rules for type conflicts fire only for concrete types, not for type variables.

To detect errors such as the foo function, we introduce a predicate SmallerType : Type * Type such that SmallerType(sigma, tau) holds if and only if sigma is structurally strictly smaller than tau.

// SmallerType is transitive.
Axiom
    SmallerType(sigma, tau)
    & SmallerType(tau, kappa)
    =>
    SmallerType(sigma, kappa)
    ;

// A function type is structurally greater than all of its domain types and
// its codomain type.
Axiom
    kappa = FunctionType(sigmas, tau)
    =>
    SmallerTypes(sigmas, kappa)
    & SmallerType(tau, kappa)
    ;

Here SmallerTypes : TypeList * Type is a predicate that holds if and only if each type in the first argument is a SmallerType than the second argument.

Now consider again the example of the function foo above. Our rules imply that SmallerType(a, (a) => b) holds, hence due to the type equality a = (a) => b, we have SmallerType(a, a). Thus, the rule

Axiom SmallerType(sigma, sigma) => ConflictingTypes();

lets us detect this and similar errors.