Learning new languages

Last October I wrote about my progress and the first release of bjxa. I published a reference implementation of a BandJAM XA decoder in C able to reproduce bit-for-bit the same audio output as xa.exe. After that I reviewed DTXMania’s code, which to this date I still can’t build completely on Linux with Mono.

I remember seeing years ago a piece of advice for developers, encouraging them to learn one new language each year, to challenge their comfort zone. While I never followed this advice, I think that learning a new language can be counter-productive if their isn’t an actionable project to back the learning. Porting bjxa to C# was a perfect excuse, since the scope is small enough and the language is somewhat familiar to Java which I already know.

My conclusions after conducting this work is that C# is indeed a much more pleasant language than Java. I’m still not satisfied with the code I came up with but it was enough to even challenge my C implementation and drive some improvements.

I then sent an email to the DTXMania maintainer in November to raise his awareness of BJXA and was greeted with deafening silence. This was of course my own fault for not creating an account and create a ticket on the project tracker, but that’s a consequence of a distinct lack of confidence.

That made me wonder about the languages I learned over the years and whether the total amount would average around one per year. Here is a list I could produce off the top of my head, in alphabetic order:

  • AWK
  • Ada
  • C#
  • C99
  • Go
  • Groovy
  • Java
  • JavaScript
  • POSIX shell
  • Perl
  • PHP
  • Python
  • Ruby
  • Rust
  • SED
  • SQL
  • VCL
  • m4
  • vimscript
  • x86 assembler (Intel syntax)

That’s a fair amount of languages, but I’m a beginner in most of them. For some of them I only learned them to write exactly one program or library for a transient need and never touched them again. Some I use out of necessity every once in a while but never became proficient, but I digress…

Reverse script kidding

Tools keep making astonishing progress in all areas, and even a beginner can get results pretty fast with the help of IDEs or similar tools. And yet, as I settled in programming my personal trend was to move away from such tools and instead learn underlying tools they tend to abstract away.

Using IDEs is fine, learning the fundamental tools is critical. Keeping IDEs despite being proficient with the underlying tools is also fine, although I personally prefer to avoid IDEs.

But as a beginner it is sometimes much easier to use graphical tools that try to be more user friendly by hiding the dirty details. Despite my success in uncovering the BJXA codec and documenting it thoroughly there are still unknown parts. The reverse engineering of xadec.dll was easy even for an x86 beginner because it came with a header file that gave a significant head start. The fact that it was a shared library also gave me by definition access in the assembly to all the public functions.

Now to find the answers to the remaining mysteries of the codec I would need to decompile xa.exe but there is a catch: it’s a statically linked program and the only starting point I have is the entry point of the program, which is responsible for calling the main() function of a C program. In addition it’s a Win32 executable, which I’m not familiar with.

Just when I decided to live with those mysteries, the NSA knocked on my proverbial door and convinced me to give a try to their IDE.

Here be dragons

After decompiling xadec.dll I was convinced that it was originally written in C and I suspected xa.exe would be too. Finding the main() function would in theory allow me to unravel the encoding code and see how it treats the undefined behavior of the codec.

Trying to use retdec didn’t work out because it failed to identify the calling convention and I gave up (without reaching out for help) when I couldn’t figure out how to let it know. The result was even worse with xa.exe.

Trying to decompile the assembly by hand resulted in a total failure to even locate the main() function. Even the knowledge of xa.exe’s command line arguments I failed to use a reference to the usage code and find my way back to the argv parsing:

$ wine xa.exe
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
Usage  : xa.exe <option> [filename<.wav/.xa>]
Option : -e[n]    Encode[WAV->XA](Default) / n:BitCount(4/6/8)
         -d       Decode[XA->WAV]
         -p       Play File
         -o[dir]  Output Directory
         -u       Update

Then the NSA announced it would open source one of their reverse engineering tools and eventually made it available, although the source code isn’t yet. Ghidra is a graphical tool that is for reverse engineering the equivalent of an IDE for development.

In less than 2 minutes (I timed it!) I was able to find the main() function and confirm that it used a C-like signature taking the two famous arguments int argc and a char **argv.

I think that Ghidra’s main advantage over retdec was its ability to find the right calling convention from the get go. I managed to find the equivalent of xadec.dll’s xaDecodeOpen by luck and found the decompiled code to be quite elegant.

Overall, I think the NSA deserves to be congrat… to be cong… I think the NSA did an OK job.

Unfortunate obfuscation

One reason why I had a hard time finding the main() function and its command line parsing is again an incarnation of the robustness principle. With no hint from the usage description I couldn’t guess that options were case-insensitive and that they also supported the Windows style for options:

$ wine xa.exe -d square-mono-8.xa
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
square-mono-8.xa -> square-mono-8.wav : 44100Hz / monaural / 8bits.
1 Files Decoded.

$ wine xa.exe /d square-mono-8.xa
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
square-mono-8.xa -> square-mono-8.wav : 44100Hz / monaural / 8bits.
1 Files Decoded.

$ wine xa.exe /D square-mono-8.xa
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
square-mono-8.xa -> square-mono-8.wav : 44100Hz / monaural / 8bits.
1 Files Decoded.

I also couldn’t have guessed that the command line parsing would intertwine options parsing and a lot of processing. Thanks to Ghidra I have a semi-clean switch statement and was able to figure that while some processing is done immediately when an option is identify, some checks are deferred until needed. For example I can ask to encode something and ultimately say that I want to do decoding instead and it won’t complain:

$ wine xa.exe -e5 square-mono-8.wav
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
bitcount error.

$ wine xa.exe -e5 -d square-mono-8.xa
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
bitcount error.

$ wine xa.exe -e8 square-mono-8.wav
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
square-mono-8.wav -> square-mono-8.xa : 44100Hz / 16bit / monaural
1 Files Encoded.

$ wine xa.exe -e8 -d square-mono-8.xa
WAV to XA v1.22 Copyright 2000-2001 bandjam.net
square-mono-8.xa -> square-mono-8.wav : 44100Hz / monaural / 8bits.
1 Files Decoded.

I also found an interesting undocumented -l option and its effects are puzzling to say the least… Looking at the shape of the code, it looks as though xa.exe was written in C++ and if I’m right that’s probably Visual C++.

I hope to solve the remaining mysteries in the BandJAM XA codec with the help of Ghidra and so far it looks much more within reach. Unfortunately that’s a project I will have to shelve for later, but I digress.

19 years of xadec.dll

Eventually, I received a response from the DTXMania maintainer regarding my struggles when it comes to running it on Linux using Wine, and especially the problem of xadec.dll. The response was overwhelmingly positive and I learned that they were struggling too because of it and the fact that so many songs playable for the game use that ancient codec.

They wanted to maintain the capability to read BandJAM XA files, but that prevented 64bit builds of the game. Out of nowhere came bjxa and it got them closer to solving the problem for good (they still have an ancient OGG and MP3 decoder similar to xadec.dll: no source code).

I also learned in the process that the original DTXMania maintainer was working on a DTXMania2 but unfortunately that one is even harder to run with Wine. Without much help from my side they managed to integrate libbjxa in their projects and the fact that bjxa.exe was available to see how to use the library probably helped.

When I got the first response, the maintainer was apologetic because it had taken a couple months to reply to my inquiry. Working on free and open source software I have experienced the demanding tone of some users. I have found myself in demanding positions too, and have probably given that impression more than once even when it wasn’t the case. I find it a bit sad though when maintainers apologize when they shouldn’t and keep doing so even after being told that all is fine, but I digress…

No matter how I thanked them they would thank me even more. I’d like to thank them here once more and I hope to hit them hard with more demanding patches as I hope to iron out the remaining problems inside Wine that I identified.

19 years of buffer overflows

I found two interesting classes of bugs in xadec.dll, one of which could lead to a security vulnerability. The first one is that most tainted pointer dereferences are done without a proper null check. The second one is a buffer overflow that can be triggered easily with a specially crafted XA file. Can it result in arbitrary code execution? I don’t know and even if I managed to own a DTXMania process running in Wine that wouldn’t prove that the same exploit could be used on Windows and vice versa. And considering that I won’t install Windows to run the game and introduced myself to reverse engineering just for the sake of running the game on Fedora, I won’t shave that yak.

If you would like to see what real reverse engineering looks like, I strongly recommend this conference from people that could have probably done in minutes what initially took me hours. My plan is to level up my game (all kinds of puns intended) by cheating. I will rely on Ghidra, my new power tool, but before that I have other things to tend to and hopefully I will write on other topics in the future.