Cracking the codec open
I may have complained in a past blog post that retdec as a decompiler didn’t
produce readable or helpful code, I may have even said that the code was
horrible, but I also showed that it was rather good at figuring library calls.
So in order to understand that the four Windows function calls could be
reduced to a plain free(3)
call I had to both look at the decompiled code
for xaDecodeOpen
and documentation of the functions and their flags on the
MSDN.
These days I’m way more comfortable with the man
utility, but in a sense I
also think that online docs are very important. In this regard, I always found
Microsoft’s MSDN a very good place to browse documentation. It has always been
a terrible place to start browsing though. If it weren’t for search engines, I
would never find the starting point.
In order to decompile xaDecodeClose
I needed to look at xaDecodeOpen
to enumerate all memory management
functions:
GlobalAlloc
GlobalFree
GlobalHandle
GlobalLock
GlobalUnlock
In portable C code that can be safely reduced to calloc(3)
and free(3)
. No
big surprise here, it is the memory management of the opaque structure, the
“handle”. In other words, stuff hidden under the compiler rug that I will need
to uncover, gathering clues like a detective, trying to fit all the pieces
together. And that first clue was waiting there in the open to be picked up:
xaDecodeOpen
’s second argument.
The big bad WAV file
One misconception about WAV files is that they are big because uncompressed, another is that they are WAV files. WAV files are really one kind of RIFF file, a generic container that specialized in audio and video. A RIFF file is made of chunks and a WAV file is a RIFF file with a WAVE chunk describing how its audio data is encoded, and then a data chunk containing the aforementioned audio data. You could for example use an MP3 codec in a WAV file, and you would get a file approximately the same size as an MP3 file. It’s probably even possible to add things like track information or subtitles to a RIFF file, similar to MP3s ID3 tags but at this point, I digress…
The reason most WAV files are big is because of their number one (literally) and probably most widely used codec: PCM or Pulse-Code Modulation. To keep it short (or digression-free) the signal is represented with a fixed duration for samples and each samples has a fixed value. In stereo mode, samples are interleaved, with one left sample followed by one right sample for each point in time. Interleaving is streaming-friendly as you may read the audio without the need to jump back and forth in the audio stream and can avoid pseudo random accesses. Arguably, picking the smallest granularity (individual samples) may not be that efficient, but I can’t help it, I digress.
From XA to PCM
Looking at sample.c
, xadec.h
and some MSDN documentation we roughly see
this:
typedef struct _XAHEADER {
ULONG id;
ULONG nDataLen;
ULONG nSamples;
USHORT nSamplesPerSec;
UCHAR nBits;
UCHAR nChannels;
ULONG nLoopPtr;
SHORT befL[2];
SHORT befR[2];
UCHAR pad[4];
} XAHEADER;
typedef struct {
WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;
int
main()
{
FILE *fp;
XAHEADER xah;
WAVEFORMATEX wfx;
XASTREAMHEADER xash;
fp = fopen("sample.xa", "rb");
fread(&xah, 1, sizeof(XAHEADER), fp);
hxas = xaDecodeOpen(&xah, &wfx);
/* ... */
}
At this point we could assume that xadec.dll
will turn an XA file into a WAV
file but no. While I claimed that this XA file format doesn’t look like
something the Bandjam author came up with, the use of WAVEFORMATEX
on the
other hand looks like a Windows-oriented choice in API design. The wfx
structure is passed as an “output” parameter, only to neatly pack information
about the resulting audio in a single data structure. We technically already
have all we need in the XAHEADER
structure, provided that we know one more
secret about xaDecodeOpen
(hint: it’s hidden in the assembly.)
The XA header
The fread(3)
call gives us one vital piece of information: the XA header is
located at the beginning of the file and is serialized using little endian
integers. It means that at this point I’m able to dump the header of an XA and
keep track of how the header fields are used using their offset, and that
should help me dig further.
I’m always baffled that ISO C99 doesn’t offer anything to deal with the byte order of a given architecture. The byte orders big and little endian define in which order a “word” is laid out in memory. Confusingly enough, endian refers to one end or the other of the “word” once broken down into octets, so in this case values from the XA header start from the little end. It’s as confusing as finding the shutdown function in the start menu on Windows.
Apparently the name was borrowed from Johnathan Swift’s Gulliver’s Travels in which rebels would break their eggs from the big end in a political opposition to the Lilliputian king who’s tyrannic rule imposed breaking them from the little end. So in computing we find architectures relying on either ordering (and in some cases even both I’m told). It can get even more messy with “word” order.
Network protocols tend to favor big endian representation over the wire so
it’s often referred to network byte order too. And for no apparent reason C
was left with nothing to deal with architectural differences besides the usual
undefined behavior. Instead we have cryptic function names from another time
like ntohs
and ntohl
to convert numbers back and forth between network and
host (CPU architecture) byte order. Nothing part of the C standard to go back
and forth between both byte orders. Either those functions are no-op on a big
endian host, or they swap the ordering. But I digress.
The header looks mostly straightforward: a magic number, audio information,
some padding to align to a power of 2 byte size (32) and the number of bytes
(nDataLen
) after the header. Then there are less obvious fields like
nSamples
that can probably be computed from (and sanity-checked against) the
other values and the befL
and befR
arrays. They will turn out to be very
important clues that I only understood in retrospect. I’m not sure whether I
could have figured out their purpose beforehand though, but they definitely
confirmed my interpretation of the codec.
However, this is a topic for later. In the next post I will describe the
decompilation of xaDecodeOpen
and
how encouraging it was.