Build a Compiler — Introduction

Marc Auberer
3 min readApr 2, 2024

--

As programmers, we often forget about the technology that our software is built upon: Compilers are the binding bridge between the programmer and the hardware of a computer.

Considering the fact, that I am going to be teaching compiler design lecture at the corporative state university in Karlsruhe as of April 2024, I thought of breaking my knowledge about compilers down to a few concise articles to prepare for that challenge. The articles are conceived and structured to be first and foremost understandable for beginners in the field of compiler design. They serve as extension to my lecture, but are hopefully also comprehensible stand-alone.

Throughout the course of the article series, we will develop an own little ahead of time compiler for mathematical expressions. For that we are going to use the following tools:

  • C++ as programming language
  • The LLVM compiler infrastructure as compiler middle- and backend
  • CMake as our build system

Although we could create lexer and parser (more about that later in the series) using a parser generator like Bison or ANTLR, we will write them manually for the sake of example.

In order for you to follow along the explanations, the following articles are illustrated with examples in C or C++ to establish a connection between theory and practice. Occasionally, there is additional information linked for some examples, marked with “[CE]” for Compiler Explorer or similar.

Focus

There are four major different kinds of compilers: ahead of time compilers (AOT) like in C++ or Rust, just in time compilers (JIT) like in Java or Java Script, transpilers like in Typescript and lastly interpreters like in Python.

Modern compilers are usually structured in compile phases that are executed sequentially one after another for each compile unit. The Clang compiler for C++, for example, uses compile phases to process each individual compile unit and produce an object file for it. On the other hand, there are compilers like the Rust compiler, that work query-based. This means, that the compilation of an enclosed program part, e.g. a function, is only triggered as soon as it is needed by another part of the program. We will focus on the former, “traditional” design with sequential compile phases, along with the ahead of time compilation model.

Typically, compilers have at least the following compile phases:

We will cover the first five phases in the following parts of the series. The latter three are provided by LLVM for us and are covered in the last two articles.

Subtopics

Apart from this introduction article, there are seven more articles to read. If you are only interested in a particular compiler component, feel free to jump to the respective article.

  1. Lexer
  2. Parser
  3. Symbol Table
  4. Type Checker
  5. IR Generator
  6. Optimization
  7. Backend

Let’s do it!

So without further ado, let’s start with implementing our first compiler to evaluate simple mathematical expressions.

Continue reading the first article

--

--

Marc Auberer

C++ Compiler Developer at SAP. Passionate software developer and OpenSource enthusiast.