Generating source-level Control Flow Graph using Clang 4.0

Introduction

I needed to find all possible paths a program could traverse during its execution. I also needed to modify a few of the basic blocks in the given source code. Therefore, IR or something similar wasn’t suitable.
I was looking through different tools for a few days to find the proper one but I found nothing useful. Finally, I decided to read up on Clang.

Clang is a C/C++/Objective-C compiler front-end. It is a sub-project of LLVM and uses LLVM for code optimization and back-end support. Figure 1 shows how source code, clang, LLVM IR and LLVM are connected. You can read more about this project on LLVM Website and LLVM Wikipedia.

Figure 1: Relation between Clang and LLVM

Clang is not only a compiler frontend with many options, but also a very powerful C++ library for having control over the abstract syntax tree (AST), automated editing source code, etc. I initially read about a method in Clang’s library, buildCFG, and I came to the conclusion that Clang is my guardian angel. : ) I found out that it is possible to write a new tool using Clang’s library.

In the rest of this post, I will explain step-by-step what I did to achieve what I need.

Using Clang as a library

Clang provides different interfaces that we can choose from to write a clang-tool. We will use LibTooling which is an unstable and backward-incompatible C++ API and gives us full control over the AST.

It may seem counter intuitive to write an unstable and backward-incompatible API, however:

One could argue that instability is a design goal.

—Eli Bendersky

So we can compromise backward-compatibility to keep moving forward, to refactor our design as needed, and to be up to date in general.

We are going to write our clang-tool using Clang-4.0 and this tool might not be compatible with more recent versions (e.g. Clang-8).

How to write this tool?

The first step in generating a Control Flow Graph (CFG) for each function is to answer this question: Where are function declarations in the code? Since Clang is a compiler front-end, it generates and uses AST. To answer the question, we can search in the AST to find some nodes representing function declarations. Clang holds AST nodes in ASTContext which is defined in ASTContext.h

But what is our interface to read this AST? Clang’s library has prepared an abstract interface, ASTConsumer, which is defined in ASTConsumer.h. We have to define a new object that inherits ASTConsumer and overrides a fews methods of it (and probably adds a few variables) to read AST in our desired way.

ASTConsumer has many methods for reading the AST. We are using HandleTranslationUnit(clang::ASTContext&) which is invoked when AST of each translation unit is parsed. This method is empty and we must override it. We could probably also use HandelTopLevelDecl(clang::DeclGroupRef) which is invoked at every top-level declaration but I did not test this method.

So far, we know that we have to define a new object that inherits ASTConsumer and overrides HandleTranslationUnit(clang::ASTContext&). In our case, this method should look for function declarations.
Clang has two other classes: clang::ast_matchers::MatchFinder and clang::ast_matchers::MatchFinder::MatchCallback. The first one, checks if there is a “matching node“ in the context or not, and the second is a “callback“ object which is called when the “matching node” that is registered for it is successfully found in the AST.

So, an instance of clang::ast_matchers::MatchFinder (call it “Finder“) searches for the specified nodes and upon finding a “matching node”, it calls an instance of the corresponding “callback” object. In fact, all “callback” objects have a method named run(const clang::ast_matchers::MatchFinder::MatchResult&). “Finder” calls this method and passes the information of the “matching node” as an argument with type const clang::ast_matchers::MatchFinder::MatchResult& to it.

How to specify some nodes? This document is a reference for all ASTMatchers. So we can specify function declaration nodes as below:

1
2
3
4
5
const auto matching_node =  
clang::ast_matchers::functionDecl(isExpansionInMainFile()).bind("fn");
/*`bind()` binds a function declaration node to an ID.
Callback object can distinguish results(matched nodes) by this ID.
Here, all "functionDecl" nodes are bound to ID="fn".*/

Up here, we have this code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class MyCallback : public clang::ast_matchers::MatchFinder::MatchCallback {
MyCallback(){}
void run(const clang::ast_matchers::MatchFinder::MatchResult &Result){
// do something with this result
}
};

class MyConsumer : public clang::ASTConsumer {
public:
explicit MyConsumer() : handler() {
const auto matching_node = clang::ast_matchers::functionDecl(isExpansionInMainFile()).bind("fn");
match_finder.addMatcher(matching_node, &handler);
}
void HandleTranslationUnit(clang::ASTContext& ctx) {
match_finder.matchAST(ctx);
}
private:
MyCallback handler;
clang::ast_matchers::MatchFinder match_finder;
};

and we can implement run as below:

1
2
3
4
5
6
7
8
9
10
11
void run(const clang::ast_matchers::MatchFinder::MatchResult &Result){
const auto* Function = Result.Nodes.getNodeAs<clang::FunctionDecl>("fn");
const auto CFG = clang::CFG::buildCFG(Function,
Function->getBody(),
Result.Context,
clang::CFG::BuildOptions());
for (const auto* blk : *CFG){
blk->dump(); // Prints Basic Blocks.
// do something more.
}
}

Putting it all together, we have:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Clang includes
#include <clang/AST/ASTConsumer.h>
#include <clang/AST/ASTContext.h>
#include <clang/ASTMatchers/ASTMatchFinder.h>
#include <clang/ASTMatchers/ASTMatchers.h>
#include <clang/Analysis/CFG.h>
#include <clang/Basic/Diagnostic.h>
#include <clang/Basic/LangOptions.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Frontend/FrontendAction.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>

// LLVM includes
#include <llvm/ADT/StringRef.h>
#include <llvm/Support/CommandLine.h>
#include <llvm/Support/raw_ostream.h>

class MyCallback : public clang::ast_matchers::MatchFinder::MatchCallback {
public:
MyCallback(){}
void run(const clang::ast_matchers::MatchFinder::MatchResult &Result){
const auto* Function = Result.Nodes.getNodeAs<clang::FunctionDecl>("fn");
const auto CFG = clang::CFG::buildCFG(Function,
Function->getBody(),
Result.Context,
clang::CFG::BuildOptions());
for (const auto* blk : *CFG){
blk->dump(); // Prints Basic Blocks.
// do something more.
}
}
};

class MyConsumer : public clang::ASTConsumer {
public:
explicit MyConsumer() : handler() {
const auto matching_node = clang::ast_matchers::functionDecl(clang::ast_matchers::isExpansionInMainFile()).bind("fn");
match_finder.addMatcher(matching_node, &handler);
}
void HandleTranslationUnit(clang::ASTContext& ctx) {
match_finder.matchAST(ctx);
}
private:
MyCallback handler;
clang::ast_matchers::MatchFinder match_finder;
};

class MyFrontendAction : public clang::ASTFrontendAction {
public:
std::unique_ptr<clang::ASTConsumer>
CreateASTConsumer(clang::CompilerInstance&, llvm::StringRef) override {
return std::make_unique<MyConsumer>();
}
};

struct ToolFactory : public clang::tooling::FrontendActionFactory {
clang::FrontendAction* create() override {
return new MyFrontendAction();
}
};

int main(int argc, const char **argv) {
auto CFGCategory = llvm::cl::OptionCategory("CFG");
clang::tooling::CommonOptionsParser OptionsParser(argc, argv, CFGCategory);
clang::tooling::ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList());
// run the Clang Tool, creating a new FrontendAction.
return Tool.run(new ToolFactory);
}

To build this tool (with name cfg.cpp), use this Makefile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
TARGET := cfg
HEADERS := -isystem `llvm-config-4.0 --includedir`
WARNINGS := -Wall -Wextra -pedantic -Wno-unused-parameter
CXXFLAGS := $(WARNINGS) -std=c++14 -fno-exceptions -fno-rtti -O3 -Os
LDFLAGS := `llvm-config-4.0 --ldflags`

CLANG_LIBS := \
-lclangFrontendTool \
-lclangRewriteFrontend \
-lclangDynamicASTMatchers \
-lclangToolingCore \
-lclangTooling \
-lclangFrontend \
-lclangASTMatchers \
-lclangParse \
-lclangDriver \
-lclangSerialization \
-lclangRewrite \
-lclangSema \
-lclangEdit \
-lclangAnalysis \
-lclangAST \
-lclangLex \
-lclangBasic

LIBS := $(CLANG_LIBS) `llvm-config-4.0 --libs --system-libs`

all: cfg

.phony: clean
.phony: run

clean:
rm $(TARGET) || echo -n ""

cfg: $(TARGET).cpp
$(CXX) $(HEADERS) $(LDFLAGS) $(CXXFLAGS) $(TARGET).cpp $(LIBS) -o $(TARGET)

If you have not installed LLVM-4 and Clang-4 on your system, use this:

1
sudo apt install llvm-4.0 llvm-4.0-dev clang-4.0 libclang-4.0-dev python-clang-4.0

then simply:

1
2
cd path/to/file
make

finally, use this tool:

1
./cfg path/to/target/code.cpp --

Example

As an example, you can take a look at the code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
void print_hello(void){
printf("Hello\n");
}

void print_good(void){
printf("Goodbye\n");
}

void print_bad(void){
printf("Oh no!\n");
}
void darkerthandark(void){
printf("DARK\n");
}

int main(){
int i;
int a = 10;

if (a < 10)
print_hello();
if (0)
print_good();
if (1)
print_bad();

for ( i = 0 , a = 3 ; i < 99 ; i++){
darkerthandark();
}
return 0;

}

The output for the main function is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
[B0 (EXIT)]
Preds (1): B1


[B1]
1: return 0;
Preds (1): B4
Succs (1): B0


[B2]
1: i++
Preds (1): B3
Succs (1): B4


[B3]
1: darkerthandark()
Preds (1): B4
Succs (1): B2


[B4]
1: i < 99
T: for (...; [B4.1]; ...)
Preds (2): B2 B5
Succs (2): B3 B1


[B5]
1: i = 0
2: a = 3
3: ... , [B5.2]
Preds (2): B6 B7(Unreachable)
Succs (1): B4


[B6]
1: print_bad()
Preds (1): B7
Succs (1): B5


[B7]
1: 1
T: if [B7.1]
Preds (2): B8 B9
Succs (2): B6 B5(Unreachable)


[B8]
1: print_good()
Preds (1): B9(Unreachable)
Succs (1): B7


[B9]
1: 0
T: if [B9.1]
Preds (2): B10 B11
Succs (2): B8(Unreachable) B7


[B10]
1: print_hello()
Preds (1): B11
Succs (1): B9


[B11]
1: int i;
2: int a = 10;
3: a < 10
T: if [B11.3]
Preds (1): B12
Succs (2): B10 B9


[B12 (ENTRY)]
Succs (1): B11

Conclusion

I was initially just searching for a proper tool to generate the CFG of any function. Finally, by using Clang’s LibTooling, not only did I succeed to generate the CFG, but also it was possible for me to manipulate basic blocks easily.

As a result, after working with Clang, I was enthusiastic to try developing different clang-tools using the AST. The reason was that Clang has a very well-designed architecture. Also, the comments written in Clang’s source code are very useful and instructive. Above all, Clang is a very powerful tool for having control over the AST. After learning some basics, you can write many different tools and this will cause a very enjoyable feeling! :)

Thanks to Peter Goldsborough who helped me to write my own clang-tool by his useful talk at C++Now 2017.

[1], LLVM for Grad Students

References

[1], Clang’s source code!

[2], Clang’s documentation

[3], Peter Goldsborough’s talk at C++Now 2017.

[4], Kevin Boos’s blog post.

Post by: Hossein Moghaddas