Enter the craft

I don’t know where I got this piece of advice, but one should consider that a super hero might be someone’s first super hero. In other words, before talking about something in depth one should make sure the audience has enough context to follow. Speaking of context, this is my second post on a series about reverse engineering of xadec.dll, a library used in an open source game I happen to like but missing one crucial thing: its source code.

So after a lengthy origin story that could be summed up to “I used to play fake drums on a home console and then on a Windows desktop but these days I only use a laptop running Fedora” it is now time to briefly introduce reverse engineering with the assumption that maybe the reader knows nothing or little about the topic.

Reverse engineers may not be super heroes, but after spending overall 30 hours on a small and simple library I can confirm this is no easy task. Much like Batman and all his bat tools, it would take forever to figure anything out without standing on the shoulder of giants.

Speaking of Batman, I love the irony behind the name. While it’s obviously the combination of “bat” and “man” it also results in an actual word. This turns Alfred into Batman’s batman, considering Bruce’s war on crime and Alfred’s Britishness and military background. But I digress…

Compilation is a lossy process

It’s easier to go from source code to target code. The other way around it requires additional efforts to guess anything that was lost during compilation since your processor cares little about type systems or variables and has to fit whatever high level instructions one may write into a limited set of instructions and registers. Turn optimizations on and you end up with even more obfuscation of the original code.

Disassembly is very easy, it’s only a matter of finding where the code is in a container. On Linux that would usually be somewhere in an ELF (Executable and Linkable Format) file, while on Windows it’s more likely to be found in a PE (Portable Executable) file. It may not only apply to native code though, a JVM will look for byte code in class files, and in essence any target code can easily be disassembled.

Decompilation on the other hand is very hard. The more we lose during the compilation process, the harder it becomes to perform the inverse operation. Debug information can probably greatly help, but xadec.dll doesn’t include any so going from native code to C code requires a fair amount of guessing. As an x86 library, it will map 1-1 with x86 assembly. Much like Java byte code almost maps 1-1 with Java source code, it’s usually easy to disassemble. Now try decompiling Scala code for example, and you will probably get some Java pidgin you’ll need to mentally map to proper Scala (unless tools made progress in this area).

Hammer time

I’ve been involved in the Varnish Cache project for a fair amount of years now and it has become my solution to most problems. Granted, most problems my day job bring to the table tend to be directly related but at some point any problem looks like it could be solved with a cache. Amusingly, one of my customers reached the support with interesting questions involving a disassembly of code generated by Varnish while I was knee-deep in this topic so that sure was fortunate for me to have spent some time reading assembly listings and be able to not only follow but contribute to the support case. But I digress…

When all you have is a hammer, all problems look like nails eh? One reason for the reverse engineering was also the prospect of doing something new and safely leave the comfort zone. Failing would bear no consequences other than losing hobby time and succeeding would be quite the payoff (although in retrospect it wasn’t that much).

It took me around two weeks to evaluate different tools, but evaluate may be an overstatement. The first one I tried is an online tool called ODA that scared and misled me into thinking I needed to know things like the endianness and word size of the library in addition to its calling convention. Maybe because I replied incorrectly to those questions the interface would show me many functions and symbols but nothing very useful. At this point I was already thinking that I was heading straight to failure.

I then searched for disassemblers in my package manager and found a couple of them available on Fedora. I tried examiner, a tool that adds value on top of objdump, but it would only give me this:

Binary file is not a known executable type.
xadec.dll: PE32 executable (DLL) (GUI) Intel 80386, for MS Windows

So I gave up on both of them and went back online to look for more tools.

Next I moved to Github for a search because most of the results on Fedora were either libraries or frameworks for disassembly. So maybe I would find actual programs making use of them that nobody maintains in Fedora. And I found half a dozen projects, some looking more alive than others. The only one I managed to build and run is called Plasma. Hobby time is scarce, and after spending so much time looking for tools and trying to learn this one I turned to the dark side again and looked for more online tools.

Giving up on tools

The first thing I found was that a recent-enough version of objdump could indeed disassemble PE32 executables but looking at the blog post’s date, published years ago, I decided to give it a try:

$ objdump --disassemble-all xadec.dll | wc -l
9435

Good… I guess? That sure looks like a lot of work. At least I don’t need to install a custom build of GNU binutils.

Then I shifted to decompilers and found one available online: the Retargetable Decompiler aka retdec. Not only was it able to produce C code, it also came with SVG graphs to follow the code flow and it was smart enough to give their proper names to 3 out of 4 of the public functions. And even better, I could download the entire archive with all the goodies available from my web browser. At the moment I’m writing this, retdec is no longer available online, I was very lucky!

However the largest graphs (incidentally most of them) were not built on purpose, probably for resource savings (and I won’t blame a free service for that) so as soon as I noticed the Github link I went ahead and cloned the repository to build a copy locally.

During the lengthy build process I read some of the documentation and watched a video presenting the project. It is very interesting, I recommend at least looking at the architecture. Once the build was done, I was able to run it locally and produce the C code. Bonus point: the master branch was able to map the function names of the 4 public functions. I was also able to figure that much with objdump but it’s nice to get a confirmation from retdec. On the other hand I didn’t get any fancy graphs. I gave up and kept the C code around just in case. The code is understandably horrible because of the loss of information, but it would sometimes help to compare my observations with retdec’s deductions. I barely used the decompiled C, and in the end never made use of those nifty SVG graphs…

And that’s how after spending a good 10 hours over the course of two weeks I decided to proceed because I could otherwise spend many more hours trying to find a tool I’d ultimately need to spend more time learning. Leaving the comfort zone sucks sometimes, but in the next post I will show that I had a couple tricks up my sleeves that kept the whole endeavor not that uncomfortable.