August 05, 2023

This is the third post in a series on implementing a type checker with the Eqlog Datalog engine. This post deals with our semantic model of the type system, and we’ll finally see some of Eqlog’s distinctive features in action. You can find the other posts here:

- Parsing [code]
- Variable binding [code]
**Types [code] (this post)**- Typing [code]
- Polymorphism [code]

As usual, the code we discuss in this post is available as a branch in the associated repository.

Our goal is to define a *semantic* model of our type system
that complements the *syntactical* `TypeNode`

AST
nodes we’ve already discussed in the first post on parsing. Our parser
generates fresh `TypeNode`

elements for each piece of source
code representing a type. As a consequence, there can be many different
`TypeNode`

elements representing the same type. For example,
the AST of the program

```
function foo (num): boolean {
let e: boolean = num == 5;
return e;
}
```

contains two different `BooleanTypeNode`

elements, one for
each occurrence of the string `boolean`

. For type checking
and inference we will eventually have to verify that certain types
match, so `TypeNode`

elements are unsuitable for this.

Instead, we introduce a new sort `Type`

to represent
*semantic* types. In order to be useful for type checking and
inference, semantic types should have the following properties:

- There should be at most one
`Type`

element that represents`boolean`

; similarly for`string`

,`number`

and so forth. - Various AST nodes should have associated
`Type`

element, e.g.`TypeNode`

,`ExprNode`

(the type of an expression) and`FunctionNode`

(the type of a function literal). - Every variable binding should have an associated
`Type`

element.

All of these requirements are *functional* in nature: For
example, requirement 2 says that there should be a total function
`TypeNode -> Type`

.

Eqlog has a `Func`

keyword that lets us declare (partial)
functions, for example in the following Eqlog program:

```
Sort M;
Func Mul : M * M -> M;
```

Eqlog represents functions as graph
relations, i.e., as sets of value-result tuples. Thus, the binary
`Mul`

function is represented in the same way as a ternary
predicate `MulGraph : M * M * M`

, and
`MulGraph(a, b, c)`

holds if and only if
`Mul(a, b) = c`

. However, not every relation corresponds to a
function: This is only so if the last component is uniquely determined
by the other components. For the ternary `MulGraph`

predicate, this means that the following axiom must hold:

`Axiom MulGraph(x, y, z_0) & MulGraph(x, y, z_1) => z_0 = z_1;`

Eqlog implicitly adds such *functionality axioms* for every
function declaration.

The conclusion of a functionality axiom is an equality, and
equalities are not part of standard Datalog either. Eqlog implements
equalities as follows. Recall that Eqlog repeatedly matches the premise
of each axiom on current data in the model and then adds the conclusion
of the axiom for each match. An equality `x = y`

in the
premise restricts matches to those where `x`

and
`y`

are interpreted as the same element.

Equalities in the conclusion are more interesting: Eqlog models
contain union-find data structure to track equality among elements. When
Eqlog has matched the premise of an axiom and there is an equality
`x = y`

in the conclusion, then it merges the equivalence
classes of the elements that are matched for `x`

and
`y`

.

Crucially, Eqlog matches premises with respect to the equivalence
relations stored in these union-find data structures. For example, there
are two occurrences of the variable `x`

in the premise
`MulGraph(x, y, z_0) & MulGraph(x, y, z_1)`

of our
functionality axiom above. A match of this premise is allowed to
interpret these two occurrences by distinct (in the sense that they are
represented by different IDs) elements as long as the two elements are
in the same equivalence class. This means that more matches become
possible after an equality has been found.

For example, suppose we have a model of our theory above given by
four distinct elements `a, b, c, d`

and the following
`MulGraph`

entries:

```
MulGraph(a, b, c)
MulGraph(a, b, d)
MulGraph(c, a, b)
MulGraph(d, a, a)
```

Let’s walk through Eqlog evaluation. At first, the only non-trivial
(i.e. `z_0 != z_1`

) match for the premise of the
functionality axiom is `(a, b, c)`

, `(a, b, d)`

.
The conclusion of the functionality axiom then asserts
`c = d`

. This means that `(c, a, b)`

,
`(d, a, a)`

is a valid match now. From this it follows that
`b = a`

must hold. At this point there are only trivial
matches, where the conclusion already holds, so evaluation stops.

We return to defining our semantic model of types. Our type system has the following type constants, which we declare as nullary functions:

```
Func VoidType : Type;
Func BooleanType : Type;
Func NumberType : Type;
Func StringType : Type;
```

The functionality axiom for nullary functions asserts that there is
at most one such constant. For example, if
`sigma = VoidType()`

and `tau = VoidType()`

, then
`sigma = tau`

.

The only type constructor with non-trivial arguments in our toy
language is the constructor of function types: It takes as parameters a
list of types for the domain (the types of arguments) and a type for the
codomain (the result type). To model a *list* of semantic types,
we use a similar trick as for syntax nodes:

```
Sort TypeList;
Func NilTypeList : TypeList;
Func ConsTypeList : Type * TypeList -> TypeList;
```

However, here we’re using Eqlog functions instead of predicates to
represent empty lists and the cons operation. This has the effect that
Eqlog considers all `NilTypeList()`

elements as equal, and
that two type lists agree if they have equal heads and equal tails. The
analogous properties would be undesirable for AST nodes because in the
AST we want to distinguish `NilTypeListNode`

elements that
that appear at different locations in source code.

We can now declare the function type constructor:

`Func FunctionType: TypeList * Type -> Type;`

One of our requirements on the semantic `Type`

sort was
that there should be functions associating `Type`

elements to
various AST nodes. To that end, we declare the following functions:

```
Func SemanticType : TypeNode -> Type;
Func SemanticOptType : OptTypeNode -> Type;
Func SemanticArgTypes : ArgListNode -> TypeList;
// The types of expressions and function literals:
Func ExprType : ExprNode -> Type;
Func ExprTypes : ExprNodeList -> TypeList;
Func FunctionNodeType : FunctionNode -> Type;
```

However, Eqlog’s `Func`

keyword introduces
*partial* functions, whereas we required *total*
functions.

To enforce that these functions are total, we can use Eqlog’s
exclamation mark operator `!`

like so:

```
Axiom tn: TypeNode => SemanticType(tn)!;
Axiom otn: OptTypeNode => SemanticOptType(otn)!;
Axiom ag: ArgListNode => SemanticArgTypes(ag)!;
Axiom expr: ExprNode => ExprType(expr)!;
Axiom exprs: ExprListNode => ExprTypes(exprs)!;
Axiom func: FunctionNode => FunctionNodeType(func)!;
```

These axioms make use of another Eqlog construct that we haven’t seen
before: *Sort quantifiers*, for example
`tn: TypeNode`

, match every element of a given sort. Sort
quantifiers are needed in situations where a variable is not constrained
by a predicate or function in the premise.

You can read the exclamation mark operator as “is defined”. During
evaluation, when Datalog has matched the premise of an axiom and
encounters an atom `t!`

in the conclusion, it first checks
whether the expression `t`

and all its subexpressions are
already defined. If they are, then nothing needs to be done. If some
subexpression or `t`

itself is not defined yet, then Eqlog
adjoins fresh elements to the model to represent them. For example, the
axiom for `SemanticType`

causes Eqlog to adjoin a fresh
`Type`

element `t`

to the model for each
`TypeNode`

element `tn`

and insert the tuple
`(tn, t)`

into the graph of `SemanticType`

.

We have to be very careful about axioms involving the exclamation mark operator though, because such axioms can lead to non-termination. Consider for example this Eqlog program that encodes the natural numbers:

```
Sort N;
Func Z : N;
Func S : N -> N;
Axiom => Z()!;
Axiom n : N => S(n)!;
```

Evaluation on even the empty model structure does not terminate for
this Eqlog program, since Eqlog will adjoin more and more elements
`Z(), S(Z()), S(S(Z())), ...`

without halting. In our case,
termination is guaranteed because the number of `Type`

elements adjoined during evaluation is bounded by the number of AST node
elements, and AST node elements are only added to the model before Eqlog
evaluation, by the parser.

Our `SemanticType`

, `SemanticOptType`

and
`SemanticArgTypes`

functions associate semantic type (list)
elements to certain AST nodes. However, we haven’t set up any rules
governing these functions. For example, we need an axiom that enforces
that the semantic type associated to a `BooleanTypeNode`

is
the semantic `BooleanType`

. The following axiom accomplishes
this:

```
Axiom
BooleanTypeNode(tn)
& sigma = SemanticType(tn)
=>
BooleanType() = sigma
;
```

There are analogous axioms that encode the relationship between
`NumberTypeNode`

and `NumberType`

,
`VoidTypeNode`

and `VoidType`

and so forth. The
semantic type of a function type AST node depends recursively on the
semantic types associated to domain and codomain AST nodes:

```
Axiom
FunctionTypeNode(tn, args, codomain)
& dom_types = SemanticArgTypes(args)
& cod_type = SemanticType(codomain)
& function_type = SemanticType(tn)
=>
FunctionType(dom_types, cod_type) = function_type
;
```

The `OptTypeNode`

sort represents optional type node
elements. Our parser uses these nodes to represent type optional
annotations. The `SemanticOptType`

function is total and
gives us a `Type`

element from such nodes, regardless of
whether the `OptTypeNode`

corresponds to an actual
`TypeNode`

or not. One way to think about this is that the
resulting `Type`

element represents a type that is
constrained by the type annotation. If the type annotation is missing,
then the `Type`

element is entirely unconstrained, i.e., not
related to any type constructor. But on `SomeOptTypeNode`

elements, which are given by an actual type node,
`SemanticOptType`

should agree with
`SemanticType`

:

```
Axiom
SomeOptTypeNode(otn, tn)
& sigma = SemanticType(tn)
=>
SemanticOptType(otn) = sigma
;
```

The `SemanticArgTypes : ArgListNode -> TypeList`

should
be given by mapping the `SemanticTypeOpt`

function on
optional type annotations, which we enforce as follows:

```
Axiom
NilArgListNode(al)
& semantic_types = SemanticArgTypes(al)
=>
NilTypeList() = semantic_types
;
Axiom
ConsArgListNode(al, _, head_type, tail)
& semantic_head = SemanticOptType(head_type)
& semantic_tail = SemanticArgTypes(tail)
& semantic_types = SemanticArgTypes(al)
=>
semantic_types = ConsTypeList(semantic_head, semantic_tail)
;
```

In the previous post on variable binding, we set up predicates
`VarInX : Var * X`

for various AST node sorts `X`

,
for example `ExprNode`

, and we added axioms that enforce
`VarInX(var, node)`

holds precisely when the variable
`var`

is accessible within `node`

. For type
checking, this information is not enough: We need to know which type the
variable has.

Thus, we change the `VarInX`

predicates to functions
`VarTypeInX : Var * X -> Type`

: These functions should be
defined on a pair of `(var, node)`

if `var`

is in
scope for `node`

, and if so, then
`VarTypeInX(var, node)`

is the type of the variable. We also
update our rules that introduce variable bindings so they take type
annotations into consideration. For example, the axiom for function
argument variables is now as follows:

```
Axiom
ConsStmtListNode(stmts, _, tail)
& sigma = VarTypeInStmts(var, stmts)
=>
VarTypeInStmts(var, tail) = sigma
;
```

The axioms responsible for propagating existing variable bindings
also need updating so that they propagate the type of variables. For
example, the axiom that propagates variable bindings through
`StmtNodeList`

elements now looks like this:

```
Axiom
ConsStmtListNode(stmts, _, tail)
& sigma = VarTypeInStmts(var, stmts)
=>
VarTypeInStmts(var, tail) = sigma
;
```

In most type systems, type constructors have *disjoint
ranges*, and each type constructor is separately *injective*.
The disjoint range property means that types obtained using different
type constructors do not agree. In our case, this means for example that
the `boolean`

type is different from the `string`

type, and that the `void`

type is different from all function
types. Injectivity is only relevant for type constructors with arguments
(so no type constants such as `number`

). For our toy
language, injectivity thus only applies to function types, where it says
that if two functions types are equal, then they must have the same
domain types and the same codomain types.

Evaluation of our Eqlog program can currently not violate these
properties, but this will change in the next posts due to typing
constraints on expression or variables. For example, we will enforce
that the type of a variable that is used as condition for an
`if`

statement must be of type `boolean`

, and that
it must also be of type `string`

if it is initialized with a
string literal. We enforce these constraints by imposing equalities on
the `ExprType`

of expressions, which by the functionality
axioms can lead to a violation of the disjoint range or injectivity
properties. These violations only happen for ill-typed programs, so our
type checker should report a type error in these situations. Similarly
to the `VariableShadowing`

predicate that we used to report
variable shadowing, we introduce a predicate

`Pred ConflictingTypes: ();`

that our Eqlog program should populate if it detects a violation of injectivity or the disjoint ranges property.

Detecting a violation of the disjoint ranges property is fairly straightforward:

```
Axiom VoidType() = BooleanType() => ConflictingTypes();
Axiom VoidType() = NumberType() => ConflictingTypes();
Axiom VoidType() = StringType() => ConflictingTypes();
Axiom VoidType() = FunctionType(_, _) => ConflictingTypes();
Axiom BooleanType() = NumberType() => ConflictingTypes();
// ...
```

We could encode injectivity of the function type constructor like this:

```
Axiom
FunctionType(sigmas_0, tau_0) = FunctionType(sigma_1, tau_1)
=>
sigmas_0 = sigmas_1
& tau_0 = tau_1
;
```

Instead, we’ll go with an approach that will help us also later:
Axiomatizing inverse functions to `FunctionType`

in each
argument.

When we collect constraints on the types of expressions, we’ll
eventually have the need to assert that some type `kappa`

is
a function type, i.e., that there should *exist*
`domain`

and `codomain`

such that
`kappa = FunctionType(domain, kappa)`

. However, Eqlog does
intentionally not support existential quantification, since this would make Eqlog
evaluation semantically ill-behaved. To work around this, we
introduce functions `DomainTypes : Type -> TypeList`

and
`CodomainType : Type -> Type`

as inverse functions for the
two arguments of `FunctionType`

:

```
Axiom DomainTypes(tau)! => CodomainType(tau)!;
Axiom CodomainType(tau)! => DomainTypes(tau)!;
Axiom
kappa = FunctionType(sigmas, tau)
=>
DomainTypes(kappa) = sigmas
& CodomainType(kappa) = tau
;
Axiom
sigmas = DomainTypes(kappa)
& tau = CodomainType(kappa)
=>
FunctionType(sigmas, tau) = kappa
;
```

To assert that a given type is a function type, it now suffices to
assert that `DomainTypes`

or `CodomainType`

is
defined on it. Since functions are injective if and only if they have an
inverse function on their range, the presence of
`DomainTypes`

and `CodomainType`

makes the
explicit injectivity axiom above redundant.

Due to injectivity of
`FunctionType : TypeList * Type -> Type`

, it can happen
that we equate `TypeList`

elements during evaluation. If type
lists are equal, then they should contain the same types in the same
order. Our Eqlog program equates type lists with different lengths only
if the input source code contains a variable number mismatch, e.g. a
call of a function with the wrong number of arguments, so we want to
report a type error in this case. We can enforce these condition on type
lists like so:

```
Axiom NilTypeList() = ConsTypeList(_, _) => ConflictingTypes();
Axiom
ConsTypeList(head_0, tail_0) = ConsTypeList(head_1, tail_1)
=>
head_0 = head_1
& tail_0 = tail_1
;
```