Automated Binary Analysis (a short intro)

Automated program analysis is the practice of analyzing computer programs using [other] programs, as opposed to code audits by a developer. The objective can be to find security vulnerabilities, to optimize, or to reverse engineer an obfuscated program and gain knowledge about its flow. The are benefits to this approach. It allows scalability to cater to the huge number of programs at large which would not be possible to audit “manually”. Code auditing is a skill that is not gained easily, so a developer good for the job doesn’t come cheap. Also, online and live monitoring of code might be a necessity.
Automated program analysis can be implemented at two levels: static analysis is concerned with analysis of code at compile time, without actually executing the program. And dynamic analysis is done with executing the program and considering the runtime as well. Also, the input program can be source code of the program, or the compiled binary. Each approach has its benefits and disadvantages. Inferring the callgraph of a program, tracking api-calls, and recovering control flow graph of the program can be among the tasks of analysis platforms.
A tool used for runtime analysis of programs is symbolic execution, which runs through the program assuming symbolic (rather than concrete) values for the variables to gain further knowledge about the flow of a program.

A concrete example


Trail of Bits sketches a way to detect heartbleed. It is based on a characterization of the vulnerability via calls to ntoh and hton which can taint a variable, then calling memcopy and passing a tainted value to it, without bound-checking.
To this end, they have used Clang Analyzer. As part of LLVM, there is a tool to do static analysis of programs: scan-bulid. To perform a custom analysis, everything goes into a C++ class and registers as a plugin. Here’s how the code looks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void NetworkTaintChecker::checkPostCall(const CallEvent &Call,
CheckerContext &C) const {
const IdentifierInfo *ID = Call.getCalleeIdentifier();
if(ID == NULL) {
return;
}

if(ID->getName() == "ntohl" || ID->getName() == "ntohs") {
ProgramStateRef State = C.getState();
SymbolRef Sym = Call.getReturnValue().getAsSymbol();

if(Sym) {
ProgramStateRef newState = State->addTaint(Sym);
C.addTransition(newState);
}
}

Similarly, to check for function calls, we could have used BAP). In its original and preferred OCaml it looks like:

1
2
module CG = Graphs.Callgraph
module CFG = Graphs.Tid


This gives access to callgraph and cfg. Now imagine we wanted a function that will take the call graph cg, the target function, and the subroutine term sub, and return a sequence of calls that has a destination function, that reaches target in the call graph. BAP provides for us nifty methods.

1
2
3
4
5
6
7
8
9
10
11

let callsites cg target sub =
Term.enum blk_t sub |>
Seq.concat_map ~f:(fun blk ->
Term.enum jmp_t blk |> Seq.filter_map ~f:(fun j ->
match Jmp.kind j with
| Goto _ | Ret _ | Int (_,_) -> None
| Call dst -> match Call.target dst with
| Direct tid when reaches cg tid target ->
Some (Term.tid blk)
| _ -> None))

In a similar manner, every custom analysis is developed in an OCaml (or other bindings) file and then registered as a plugin to BAP. (instructions here)

So many tools!


There are also other tools built with different motivations in mind, among them angr , ROSE, radare2.
Radare2 is more suited for ctf, ROSE used to be source analysis and thus supports source analysis as well as binary and angr was used to make mechanical phish which won 3rd place at Darpa’s Cyber Grand Challenge. ROSE is developed in C++, and is fast and great for research, but when you want to work on small programs (like a ctf challenge) you’d rather take other options, like bap which is based on OCaml and you can use baptop which is a REPL environment that you can interactively run commands and play around with the binary.

The most recent work published using angr, is (USENIX Security ‘18) HeapHopper: Bringing Bounded Model Checking to Heap Implementation Security which leverages the tool to implement another tool called HeapHopper which inspects heap implementations for vulnerablities. The method has been proven to be successful on a ptmalloc vulnerability, and uses angr as it’s symbolic execution engine checking compiled programs, after attempted exploit, for security violations.

Post by: Iman Hosseini