Build a Compiler — Type Checker

Marc Auberer
3 min readApr 2, 2024

--

Next, let’s look how we can guarantee a semantically correct program by checking type compatibility. This job is done by a compiler component called the type checker. It resolves the types of each sub-expression e.g. it deduces the type of the expression “1 + 2” to type int, because “1” and “2” are int literals and the “+” operator combines those two integer inputs to an integer output. This process is called type inference. Some programming languages also have a type-inferred type specifier. C++ for example offers the keyword auto for that:

int main() {
auto a = 5.67; // Variable a will be of type double
}

Furthermore, the type checker does also checks types for compatibility. Here is an example:

int main() {
int a = "String";
}

This does not compile, because we have a string literal on the right hand side and try to assign it to an integer variable on the left. The Clang C compiler gives the following error message for the above code:

error: incompatible pointer to integer conversion initializing ‘int’ with an expression of type ‘char[5]’

Depending on whether the source language supports function overloads, the type checker can also do overload resolution to pick one of potentially several candidates, known under a common function name. If the type checker has the concrete argument types at hand, the matching candidate can be chosen:

int max(int, int);
int max(int, int, int);
double max(double, double);
double max(double, double, double);

int main() {
double a = 12.4;
double m = max(4.0, a); // third candidate will be chosen
}

All the gathered type and overload information needs to be stored somewhere. The data types for variables can be attached to their respective symbols in the symbol table and we can also enrich the AST with this information if we want to consume it in later compile stages.

Further checks (Static Analysis)

There are several other checks that can be performed withing the semantic analysis, but not necessarily by the type checker. Often times they exist in a simple form in the compiler frontend but mainly in the optimizing middle-end. Here are a few:

Data Flow Analysis & Control Flow Analysis
Compilers are usually checking control and data flow to detect potential issues like e.g. uninitialized variable, unreachable or dead code. This can be realized by constructing a Control Flow Graph (CFG) to grasp the correlation between different program parts. This can reveal unreachable code paths, that the compiler can remove. For loops, the compiler can sometimes statically see, if is an infinite loop or not.
Another approach is tracking the lifetime of all variables throughout the entire program. If a variable is not initialized before it is referenced, this is indicating a potential bug. If the variable is declared but never read or written to, it is a candidate for safe removal.
Also, a Call Graph can reveal valuable insight about the program. Functions that the compiler frontend already can proof unused, can directly be dropped to reduce load on later compile stages.

Memory Safety Analysis
For memory safe languages, the compiler is responsible to make sure a certain set of rules is never violated. Rust for example does this with a so-called borrow-checker, that enforces those rules. There are also simpler analysis approaches like bounds checking for arrays or escape analysis for pointers.

Exercise

Please copy the files ASTVisitor.h and ASTVisitor.cpp over from the solution, over to the src/ast/ subdirectory.

Please also create the files TypeChecker.h and TypeChecker.cpp in the subdirectory src/typechecker/. In the header file, add the struct ExprResult, which contains a SymbolType and an optional SymbolTableEntry for a sub-expression. Also create the TypeChecker class, that inherits from CompilerPass and ASTVisitor and implement all visit methods in the cpp file. Throw an std::runtime_error whenever the types do not match. Also, use the setEvaluatedSymbolType(SymbolType) method to attach the symbol type to the corresponding AST node for each sub-expression. If something is not clear, take a look at the TypeChecker solution here.

Continue reading the next article

--

--

Marc Auberer

C++ Compiler Developer at SAP. Passionate software developer and OpenSource enthusiast.