# Why are my builds so slow?

## Intro

I write C++ code almost every day, be it for work or for fun (yes, we exist). While I have great tools for benchmarking runtime performance in my toolbox, I never really tried *"benchmarking compile times"*.

With build times continuously creeping up regardless of the build caches I throw at the problem, I wanted to understand what really takes so long to compile.

I don't intend to showcase specific problems, but rather show how one can analyze their own codebase to find the pain points.

## How do we do this?

The idea is to use the `-ftime-trace` compiler flag available in Clang to generate a JSON trace of the compilation process and then visualize it. Thanks to Aras Pranckevičius and Anton Afanasyev, this is part of [Clang since version 9](https://github.com/llvm/llvm-project/commit/d880de2d19d46f7cfea7aa593602458440bc1e8d).

You can use this with any build system that supports passing custom flags to the compiler. If you are using Bazel and building within the sandbox, have a look at this [Bazel issue](https://github.com/bazelbuild/bazel/issues/9047) as you might need to jump through some hoops to get the trace file out of the sandbox. Enabling the [`--sandbox_debug` option](https://github.com/bazelbuild/bazel/issues/9047#issuecomment-3691815154) is what I did.

### Example program

I will use a simple example program that uses `std::variant` and `std::visit` with multiple types to demonstrate what the generated trace looks like.

Don’t look too much into what the code is doing, it’s just an example of a variant with a couple types, not doing anything fancy or important.

```cpp
#include <cstddef>
#include <string>
#include <variant>
#include <vector>

struct Sizer
{
    std::size_t operator()(int) const { return sizeof(int); }
    std::size_t operator()(double) const { return sizeof(double); }
    std::size_t operator()(const std::string &v) const { return v.size(); }
    std::size_t operator()(bool) const { return sizeof(bool); }
    std::size_t operator()(const std::vector<int> &v) const { return v.size() * sizeof(int); }
};

int main()
{
    using Value = std::variant<int, double, std::string, bool, std::vector<int>>;
    std::vector<Value> values{42, 3.14, std::string("hello"), true, std::vector{1, 2, 3}};

    std::size_t total{0};
    for (const auto &v : values)
    {
        total += std::visit(Sizer{}, v);
    }
    return static_cast<int>(total);
}
```

> [Compiler Explorer](https://godbolt.org/z/Kq3bvjEYs)

### Generating the trace

First, make sure you are using Clang as your compiler. Then, add the `-ftime-trace` flag to your compiler flags. By default, this will generate a json file named `<name>.json` just next to the object file.

### Visualizing the trace

Once you have the trace file, you can use either the `chrome://tracing` tool in a Chromium-based browser or the [Perfetto UI](https://ui.perfetto.dev/) to visualize it.

![A flame graph that illustrates various compiler stages. Color-coded segments display functions and tasks such as ParseFunctionDefinition and CodeGenPasses, indicating different execution phases in a vertical layout.](https://cdn.hashnode.com/res/hashnode/image/upload/v1766708589140/2f4394fc-667f-4741-a67c-188b56324738.png align="center")

Flame graphs are great. Clicking one of the sections gives us information about what operation it is related to.

Unsurprisingly, given how simple our program is, a big portion of the time is spent parsing the included files.

![Flame graph showing one of the largest sections which shows that it is related to included header file](https://cdn.hashnode.com/res/hashnode/image/upload/v1766709406719/8034cc08-50ed-4339-a734-6aa004a58645.png align="center")

In a larger program, if you are seeing such expensive headers, it might be a good idea to check and see if the header in question can be split up or thinned down a bit into smaller headers that can be included as needed.

Tools like [Include What You Use](https://include-what-you-use.org/) are also great to identify unnecessary includes across your codebase. Of course, in our example case, there is not much to improve on this side.

Going further in the timeline, we can also clearly see the time spent instantiating our variant and the vector of variants.

![Flame graph showing one of the variant instantiation section](https://cdn.hashnode.com/res/hashnode/image/upload/v1766710625829/56d167b4-d958-4866-a14c-dca7e77aac64.png align="center")

![Flame graph showing one of the vector of variants instantiation section](https://cdn.hashnode.com/res/hashnode/image/upload/v1766710672482/78e46c03-ab27-46f3-a112-523f46d48700.png align="center")

## What now?

Go ahead and try this out on your own codebase! If you are looking for more details, check out the Clang developer documentation on [Performance Investigation](https://clang.llvm.org/docs/analyzer/developer-docs/PerformanceInvestigation.html) and the [Perfetto documentation](https://perfetto.dev/docs).
