Installing the Capstone disassembler (with commentary)

The premise

I came across this tool Capstone as I wanted to make a simple tool for binary analysis and have some fun with homegrown tools. One module I implemented first is a parser which reads through a PE (.exe) file and parses the header to find where the sections, tables, and other metadata are in the file; And then knowing where the sections are, the data is, and the code is. I was thinking: let’s find the basic blocks, plot the cfg? What would that need: to understand the binary code. It looks simple: just make a big hashmap mapping bytes to the mnemonics and operators right? Until I realized this actually is not that simple, the ISA we’re talking is not small and by the way I found Capstone which does what I needed here BUT also it is doing it faster and supports all architectures, with some helper functions out-of-the-box as well: i.e. point it to the beginning of a code section, and it disassembles all the way.

So Capstone is a framework for disassembly: it helps you dissasemble bytes into human readable instruction for [almost] any ISA on [almost] any platform. You can view the full features on their website, but the bottom line is this tool is ubiquitous in reversing tools, you can view 402 projects in which Capstone has been used on their website among them is radare2. which you might recall from another post

Another feature is that there are bindings for many languages: Go, Js, .NET (C#), Java, Python, Haskell, Rust, PHP and more. This means you can easily integrate it into whatever else you are doing in any language, a big bonus. In this tutorial lets first get it to work on Windows with C (this is the toughest option. If you want to use it in js, there’s no hitch: just get the minified js and you can use it; couldn’t be easier!)

Spending few hours stuck trying to get Capstone playing nice with Visual Studio, led me to learn more about how C Runtime works in the Windows world, and this is something useful for other Visual Studio endeavours as well, say you want to do a project whith OpenGL and want glm, freeglut you will see a lot of this. (Or any other C related project really) Also another benefit of looking into Capstone’s repository is that Capstone is a well-crafted tool, writers say the best way to learn writing is to read good prose, similarly Capstone conforms to good coding practices and looking into the code is instructive, and also it is a relatively big C project working on linux, windows and mac and it is also beneficial to check it out if you are interested in making a multi-platform C project, a class in using CMake. One of the features of Capstone that you can build your customized version of Capstone, trimming the features you do not need, this again is a pattern which is very useful in various projects.

Windows - Staring MSVC in the eye

For windows, the problem is building C using MSVC has its quirks. Trying out the sample test from the website, we set up a new console project in Visual Studio and then, of course, we need to tell Visual Studio how to actually use Capstone we need to tell it somehow, where to look for headers and stuff.

The windows distributions when extracted, includes a folder called include where the headers are, and there are .lib files and a .dll and cstool.exe which is a commandline tool. You can do (this is right from the readme file at cstool directory):

The _-d_ option shows details. Pretty cool for a quick query. Moving on, .lib files contain the IAT (Import Address Table) which tells the linker where to look for different functions in the .dll which includes the actual code, and is loaded at runtime. So if no .lib will spew out errors from the linker that I need a function but I cannot find it, and no .dll means it tries to find the .dll (and fails) or if there is a .dll but it is not compatible from that .lib tries to find a function at the address from the IAT but would hit some weird errors this time.

First we tell Visual Studio (I am on 2017, but instructions would be similar) where that include folder is located [my project is named caps]: Go to Project > caps Properties…

As it can be seen you just head to C/C++ > General there is a field for Additional Include Directories and you add the directory of the include file (containing the relevant headers). (don’t forget the separation delimiter _;_) Now we put capstone.dll and the .lib in the project directory. By default it looks for dll files there, but you can also specify it at the VC++ directories tab, and not put the dll in project directory: you might want to be fancy and put the .lib and .dll files in seperate folders and not the root directory of the project. You can also just put them anywhere, but it’s better if they aren’t outside your project folder i.e. imagine you want to release the code.

And finally we specify what .lib files we need using: Linker > Input > Additional Dependencies

It should now build without error right? That’s what I thought but I got a ton of errors, errors like LNK 2001 unresolved external symbol vsnprintf. After searching the names showing up here, I realized the linker is having problems even with memcpy, in linux these are all part of the in libc, the [shared] library for standard C. So we probably have a .lib problem here. TL;DR to have C runtime, turns out we need to add these .lib files as well (the same way as before): _ucrt.lib; libcvruntime.lib; libcmt.lib; _

Adding these should fix the problem. You can see some details here, but basically ucrt.lib is the Universal CRT: back in the old days it used to be different but in Visual Studio 2015 Microsoft refactored CRT and standard C library, POSIX extensions and some microsoft-specific macros and variables were moved into UCRT and some compiler-specific (as opposed to the Universal in UCRT) were moved to libvcruntime.lib and also libcmt.lib (LIBC[D].LIB, LIBCMT[D].LIB, and MSVCRT[D].LIB are all similar the MT is for MultiThreaded, the D is for DLL versions, which are not statically linked and thus also require the corresponding DLL). You can view which symbols are in which by omitting that lib and trying to build:

The better way though is to use Microsoft’s DUMPBIN tool to see info on each library. As an example, some runtime check functions (you can spot them as they have RTC in their names) are in libcmt.lib , and the famous free, malloc, strlen, … we know and love, are in ucrt.lib. And if you see weird long symbols, particularly with atsigns in the name that is due to name mangling.

Linux - Goodbye .dll ; hello .so

In linux things are easier really, fastest way is to just build from source: get the github repo and (as the instructions say):

1
2
$ sudo ./make.sh 
$ sudo ./make.sh install

And you are good to go. If you do prefer not to build from source for some reason, then use the packeged versions:

1
$ sudo apt-get install libcapstone-dev

This is for development and gets you libcapstone3 as a dependency but you can just apt-get libcapstone3 directly if you wish. At the time of writing, the capstone website is outdated and instructs you to apt-get libcapstone2 but that package is now extinct (won’t find it) and you should try it with version 3. (If you build from soruce you are getting the latest, which is version 4)

There are examples, if you have got the repo from github, at capstone/tests/ and you can make and then try out the generated binaries. If you see the makefile there you will get an idea of how it works, but it is a bit complicated so if you are in a separata directory and just trying to compile your code, do this:

1
$ gcc test.c -lcapstone

The _-l[LIBRARY_NAME]_ switch tells the compiler (actually the linker, ld) that you are linking against [LIBRARY_NAME] (here capstone) library which you have installed. If you do not specify this option, you will not get into a problem with the header, gcc knows where to find the header but then it will spew errors. Comparing to windows, this is what happens in place of those .dll , .lib work we did. Using:

1
$ dpkg -L libcapstone3

You can view which directories are affected (-L switch is for list) and you can see that installation would put Capstone files in directories like /usr/share , /usr/lib , .. (one for headers, shared libraries, docs etc.) and so when you specify -lcapstone to gcc it looks for capstone.so or capstone.a in these directories. Slick!

Post by: Iman Hosseini