Build a Compiler — Optimization

4 min readApr 2, 2024

Now that we transformed our program to intermediate representation, we can pass it off to an optimization pipeline, that can perform one or multiple optimizations on it. Optimizations can be done for a variety of different metrics. Most of the time, we want to achieve best possible execution speed aka performance. But we could for example also optimize for small binary size or low memory usage. Often times, the preferable approach is to find the right balance between those metrics. Therefore, many compilers offer parameterization to give the user the option to decide. GCC and Clang for example, support the following optimization options via command line arguments: -O0 ,-O1, -O2, -O3, -Os and -Oz. -O0to -O3are optimization modes for performance and -Osand -Ozare optimization modes for binary size. Typically, performance optimization is mostly about removing and simplifying code, which automatically benefits the binary size. But there are some optimization techniques (used to aggressively optimize for performance with e.g. -O3) like loop unrolling, that usually impact code size negatively. If the compiler supports multiple optimization pipelines for multiple metrics, the programmer can decide what metric is important to him, based on the given use case.

Many compilers use so-called optimization passes to do the actual work. This has the advantage, that the optimization happens in a sequential and modular manner and can be flexibly reorganized and reordered. Each optimization pass (at least in the compiler middle-end) takes IR as input, does some analysis or transformation on it and in turn produces IR as output. This IR can be passed to the next pass and so on.

Compiler toolchains like LLVM offer many optimization passes out of the box, that can be directly applied to unoptimized IR and produce an equivalent but better one, in order to optimize the used metric. This makes it relatively easy to craft industry-standard optimizing compilers by only having to care about the compiler frontend and LLVM does the middle- and backend magic. Using LLVM to incorporate advanced optimization techniques has also its cost. LLVM has not the best reputation to perform optimization fast, but compile time is just another metric we can optimize for if we choose to.

Here are a few commonly used optimization passes, offered by LLVM:

Inst Combine:
Reformulates expressions in a shorter and/or more efficient way for the machine. E.g. InstCombine would transform the expression x * 2 to x << 1 [Alive2][CE].
Function Inlining:
Functions that only have a few callers and/or are relatively small can be inlined to the caller to save the overhead to call the function [CE].
Loop Invariant Code Motion (LICM):
Invariant code within a loop can be pulled out of the loop to not calculate the same result over and over again. This saves execution time, but still yields the same result and is therefore equivalent [CE].

In general, optimization passes in LLVM work stand-alone, but may share certain information that analysis passes come up with. In the middle- as well as the backend the general approach is to first canonicalize, then analyze and finally transform the IR. Some passes can appear multiple times in the same optimization pipeline. Imagine a function inliner: We first want to simplify a function, then check if inlining is worthwhile and if the function was inlined, simplify the callsite code even further. For such reasons, default optimization pipelines of compilers for general purpose languages are often built onto experience and cost benefit estimations. Domain specific languages usually decide on hand-crafted optimization pipelines for their individual use cases.

Exercise

Create the two source files IROptimizer.h and IROptimizer.cpp in the subdirectory src/iroptimizer/. This time, the IROptimizer class needs only to inherit from CompilerPass. For feeding the unoptimized IR to LLVM, we need a few things, which you can add as members to the class:

llvm::LoopAnalysisManager loopAnalysisMgr
llvm::FunctionAnalysisManager functionAnalysisMgr
llvm::CGSCCAnalysisManager cgsccAnalysisMgr
llvm::ModuleAnalysisManager moduleAnalysisMgr
std::unique_ptr<llvm::PassBuilder> passBuilder
llvm::Module* llvmModule
llvm::TargetMachine* targetMachine

This code is required to initialize the analysis managers:

passBuilder = std::make_unique<llvm::PassBuilder>(targetMachine);

functionAnalysisMgr.registerPass([&] { return passBuilder->buildDefaultAAPipeline(); });

passBuilder->registerModuleAnalyses(moduleAnalysisMgr);
passBuilder->registerCGSCCAnalyses(cgsccAnalysisMgr);
passBuilder->registerFunctionAnalyses(functionAnalysisMgr);
passBuilder->registerLoopAnalyses(loopAnalysisMgr);
passBuilder->crossRegisterProxies(loopAnalysisMgr, functionAnalysisMgr, cgsccAnalysisMgr, moduleAnalysisMgr);

To optimize the IR module, this code can be used:

llvm::OptimizationLevel llvmOptLevel = llvm::OptimizationLevel::O2; // Adjust here, if you want to change to optmization level
llvm::ModulePassManager modulePassMgr = passBuilder->buildPerModuleDefaultPipeline(llvmOptLevel);
modulePassMgr.run(*llvmModule, moduleAnalysisMgr);

After running the chosen optimization pipeline we can pass our optimized IR over to the compiler backend, which is covered in the next article.

Continue reading the next article

Build a Compiler — Optimization

Exercise

Written by Marc Auberer