Thinking like a compiler
In the previous post I came to the
conclusion that xadec.dll
turns an XA file into a PCM stream and describes
that PCM stream with a WAVEFORMATEX
data structure. After looking at the
valuable insights present in xadec.h
it’s now time to look at the assembly
and for the first time encounter machine code that _really_ doesn’t look
like what a sane human being would come up with.
The xaDecodeOpen
function does two main things: set up the decoder state and
fill the WAVEFORMATEX
structure. And that’s where the riddle begins: given
an opaque data structure, how to make sense of the values it holds?
10001000: push ebx ; esp offset to eip is 4
10001001: push ebp ; esp offset to eip is 8
10001002: mov ebp,DWORD PTR [esp+0xc] ; ebp = pxah; /* arg0 */
10001006: push esi ; esp offset to eip is 12
10001007: cmp DWORD PTR [ebp+0x0],0x3144574b ; if (pxah->id != _XAID)
1000100e: jne 0x1000110a ; return (0); /* error */
10001014: push 0x34
10001016: push 0x40
10001018: call DWORD PTR ds:0x10006010 ; eax = GlobalAlloc(64, 52);
1000101e: push eax
1000101f: call DWORD PTR ds:0x1000600c ; eax = GlobalLock(eax); /* no-op */
10001025: mov ebx,eax ; ebx = eax; /* hxas = calloc(1, 52); */
10001027: test ebx,ebx ; if (ebx == NULL)
10001029: je 0x1000110a ; return (0); /* error */
1000102f: [...]
Being completely new to this, and lacking any clear methodology I made things
up as I was making progress. I would for example call arg0
the first
argument to a function, or replace it with its name (if it’s known or I
understand its purpose). In this case arg0
is called pxah
in xadec.h
and
its id
field known, otherwise I would have called written down something
like arg0->dw0
in the if
statement where dw0
means a 32-bit integer at
the offset 0
.
I have yet to make sense of what the opaque data structure fields mean, but at least I know its size: 52 octets. Yikes, that’s a lot of bytes to keep track of.
I note that a failing GlobalLock
after a succeeding GlobalAlloc
will leak
52 bytes of memory (plus the allocator’s overhead for alignment and tracking)
but I also note something interesting. I once read that the first rule of
compiler optimizations is that the compiler is smarter than you, and the
second rule is that the compiler is smarter than you. Apparently xadec.dll
was either compiled without (much) optimizations or comes from a time where
compilers were less smart. In this case I’m not sure how moving eax
’s value
to ebx
helps. Testing a null eax
would have the same effect of unwinding
the stack and returning an error:
10001104: [...]
1000110a: pop esi ; esp offset to eip is 8
1000110b: pop ebp ; esp offset to eip is 4
1000110c: xor eax,eax ; eax = 0;
1000110e: pop ebx ; esp offset to eip is 0
1000110f: ret ; return (eax); /* error */
You could even save some processing by moving the xor
operation before the
other pop
operations and jump straight after the xor
when GlobalLock
fails and eax
is already null. Unless pop esi
or pop ebp
has a side
effect I’m not aware of on eax
, which sounds highly unlikely to me. I’ve had
this idea for a while that most software problems can be reduced to graph
theory problems. The use (and reuse) of a limited set of registers is akin to
graph coloring and that’s a tough nut to crack, but I digress…
Thinking more like a compiler
Let’s move on and stay focused on what’s happening in the code. It’s already complicated enough, and too early too think about optimizing anything. It is more likely that I may have to mentally deoptimize code unless I’m lucky…
10001029: [...]
1000102f: xor eax,eax ; eax = 0;
10001031: mov al,BYTE PTR [ebp+0xe] ; eax = pxah->nBits;
10001034: sub eax,0x4 ; eax -= 4;
10001037: je 0x10001063 ; if (eax == 0) goto 0x10001063;
10001039: sub eax,0x2 ; eax -= 2;
1000103c: je 0x10001053 ; if (eax == 0) goto 0x10001053;
1000103e: sub eax,0x2 ; eax -= 2;
10001041: jne 0x10001071 ; if (eax != 0) goto 0x10001071;
10001043: mov DWORD PTR [ebx+0x28],0x21 ; hxas->dw40 = 31;
1000104a: mov DWORD PTR [ebx+0x2c],0x10001450 ; hxas->dw44 = 0x10001450;
10001051: jmp 0x10001071 ; goto 0x10001071;
10001053: mov DWORD PTR [ebx+0x28],0x19 ; hxas->dw40 = 25;
1000105a: mov DWORD PTR [ebx+0x2c],0x100013e0 ; hxas->dw44 = 0x100013e0;
10001061: jmp 0x10001071 ; goto 0x10001071;
10001063: mov DWORD PTR [ebx+0x28],0x11 ; hxas->dw40 = 17;
1000106a: mov DWORD PTR [ebx+0x2c],0x100013a0 ; hxas->dw44 = 0x100013a0;
10001071: [...]
So now the compiler is outsmarting us for a greater good, unless xadec.dll
was coded in this style which I highly doubt. It is commonly known that
computers are good at comparing things to zero and sometimes simply turning a
for
loop upside down so that it converges towards zero can provide a
noticeable performance boost. An explanation could be that some architectures
like x86 may set the zero flag (ZF) for instructions not primarily designed to
make comparisons. Arithmetic and bitwise operations set the flag if the result
is zero and unset it otherwise, and sub
can be followed by a conditional
jump.
So conceptually we get this:
int nBits = pxah->nBits;
if ((nBits -= 4) == 0) {
hxas->dw40 = 31;
hxas->dw44 = 0x10001450;
}
else if ((nBits -= 2) == 0) {
hxas->dw40 = 25;
hxas->dw44 = 0x100013e0;
}
else if ((nBits -= 2) == 0) {
hxas->dw40 = 17;
hxas->dw44 = 0x100013a0;
}
else {
/* hxas->dw40 and hxas->dw44 are both zero */
}
After a mental deoptimization I may write this:
struct xastream *
xaDecodeOpen(struct xaheader *pxah, struct waveformatex *wfx)
{
struct xastream *hxas = NULL;
if (pxah->id == _XAID)
hxas = calloc(1, 52);
if (hxas == NULL)
return (NULL);
if (pxah->nBits == 8) {
hxas->dw40 = 31;
hxas->dw44 = 0x10001450;
}
else if (pxah->nBits == 6) {
hxas->dw40 = 25;
hxas->dw44 = 0x100013e0;
}
else if (pxah->nBits == 4) {
hxas->dw40 = 17;
hxas->dw44 = 0x100013a0;
}
/* ... */
}
On the other hand retdec ended up with something confusing, looking roughly like this once stripped (really really) down:
// Address range: 0x10001000 - 0x1000110f
char * xaDecodeOpen(char * a1, int16_t * a2, int32_t a3) {
int32_t v1 = (int32_t)a1;
if (*(int32_t *)a1 != 0x3144574b) {
return NULL;
}
char * v2 = GlobalLock(GlobalAlloc(64, 52));
int32_t pMem2 = (int32_t)v2;
if (v2 != NULL) {
unsigned char v3 = *(char *)(v1 + 14);
int32_t v4 = v3;
if (v3 != 4) {
if (v4 == 6) {
*(int32_t *)(pMem2 + 40) = 25;
*(int32_t *)(pMem2 + 44) = 0x100013e0;
} else {
if (v4 == 8) {
*(int32_t *)(pMem2 + 40) = 33;
*(int32_t *)(pMem2 + 44) = 0x10001450;
}
}
/* ... */
return (char *)pMem2;
}
*(int32_t *)(pMem2 + 40) = 17;
*(int32_t *)(pMem2 + 44) = 0x100013a0;
/* ... */
return (char *)pMem2;
}
return NULL;
You may note the presence of a third argument, and this is likely a problem
caused by the lack of known calling convention and my inability to find how to
tell retdec which one it is. It also has the tendency to follow the assembly
too closely and for example when pxah->nBits
is moved as a byte and then
computed as a dword, we end up with two distinct variables v3
and v4
. And
that’s when we’re lucky: not accounting for the /* ... */
code duplication
there were 12 other local variables and in total the library decompilation
grew 119 global variables.
Breaking the spell
With a partial function decompilation we already end up with a handful of
magic numbers, so it’s best to start now. I’m leaving 0x3144574b
out because
it was ostensibly waiting to be picked up in xadec.h
and seemed like the
first thing you’d check in a function called xaDecodeOpen
. Well, don’t mind
me, that would be the second thing to check, right? Did you notice the lack of
null check of arg0
aka pxah
?
Next, the mysterious numbers 17, 25 and 33. In order to understand their meaning, I need to solve a couple equations and for the sake of brevity I will mention when I get to that code what confirmed the supposed meaning.
f(x) = ???
f(8) = 33
f(6) = 25
f(4) = 17
It doesn’t take much time to notice the pattern, at least it didn’t take me long to realize that those figures were “almost” recognizable:
blockSize(nBits) = 4 * nBits + 1
blockSize(8) = 33
blockSize(6) = 25
blockSize(4) = 17
Praise my mighty arithmetic powers!
Next I need to make sense of 0x10001450
, 0x100013e0
and 0x100013a0
. They
look awfully like function addresses, and tracking the use of this 44
offset
confirms it:
100011c4: push edx
100011c5: push ebx
100011c6: call DWORD PTR [edi+0x2c]
One can assume that nBits
refers to the number of bits per samples, so the
addresses probably refer to callbacks that uncompress the samples, and it
looks like the callbacks take two arguments.
Moving forward in the decompilation I find more magic numbers, but fortunately less magic and more straightforward from now on:
1000106a: [...]
10001071: xor eax,eax ; eax = 0;
10001073: mov al,BYTE PTR [ebp+0xf] ; eax = pxah->nChannels;
10001076: dec eax ; eax--;
10001077: je 0x10001085 ; if (eax == 0) goto 0x10001085;
10001079: dec eax ; eax--;
1000107a: jne 0x1000108c ; if (eax != 0) goto 0x1000108c;
1000107c: mov DWORD PTR [ebx+0x30],0x10001210 ; hxas->dw48 = 0x10001210;
10001083: jmp 0x1000108c ; goto 0x1000108c;
10001085: mov DWORD PTR [ebx+0x30],0x10001190 ; hxas->dw48 = 0x10001190;
1000108c: [...]
This is quite similar to the nBits
callbacks, here we have what looks like
one callback for mono and one callback for stereo conversion. Again, using the
same trick with dec
instead of sub
to compare pxah->nChannels
to 1
and
2
. xadec.dll
is a 32bit library, so it’s no surprise to see function
pointers fit in a dword.
Boardom is bliss
I’m a big fan of The IT Crowd, a silly British sitcom that manages to be hilarious at times. There’s an episode on board games, and they make it really sound like bored games, and sometimes boredom is the best thing that could happen to you, but I digress. Thankfully at this point it’s all very boring.
There is nothing complicated in this function and the decompilation is very straightforward. Except maybe for a couple magic numbers that were easy to understand, there was really only one surprising thing. The next boring step is to check that we have both callbacks, so in other words a valid combination of sample size and mono or stereo sound.
10001085: [...]
1000108c: mov eax,DWORD PTR [ebx+0x2c] ; eax = hxas->dw44;
1000108f: test eax,eax ; if (hxas->dw44 == NULL)
10001091: je 0x100010f0 ; goto 0x100010f0;
10001093: mov eax,DWORD PTR [ebx+0x30] ; eax = hxas->dw48;
10001096: test eax,eax ; if (hxas->dw48 == NULL)
10001098: je 0x100010f0 ; goto 0x100010f0;
1000109a: [...]
I’m omitting the error handing at 0x100010f0
but it is positioned just
before the bit I already decompiled and frees hxas
before returning zero
(aka an error in this case).
Next is a bit of a surprise, although not puzzling. Thanks to prior exposure to this construct I was able to follow the logic:
10001098: [...]
1000109a: push edi ; esp offset to eip is 16
1000109b: mov ecx,0x8 ; ecx = 8;
100010a0: mov esi,ebp ; esi = pxah
100010a2: mov edi,ebx ; edi = hxas
100010a4: rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
; memcpy(hxas, pxah, sizeof *pxah);
100010a6: [...]
Copy 8 dwords (32 bytes) from pxah
to hxas
, in other words embed a copy of
the XA header in the opaque data structure, removing at once 32 bytes from the
52 bytes puzzle. What. A. Relief.
Surf the WAVE
Surf is one of those board games for which I wonder how it all started. At some point someone thought it would be fun to ride waves, and turned out to be correct. But how does such an idea come to someone’s mind? In a sense, this reverse engineering journey is that kind of improbable idea, and after a poor start I now feel like everything is going by itself according to plan. World domination will ensue.
The last bit of good news came from the last bit of decompiled code. At this point where everything succeeded, all we need (barring one more missing null check) is to fill in the wave file format:
100010a4: [...]
100010a6: mov ecx,DWORD PTR [esp+0x18] ; ecx = wfx; /* arg1 */
100010aa: xor edx,edx ; edx = 0;
100010ac: pop edi ; esp offset to eip is 12
100010ad: pop esi ; esp offset to eip is 8
100010ae: mov WORD PTR [ecx],0x1 ; wfx->wFormatTag = WAVE_FORMAT_PCM;
100010b3: movzx ax,BYTE PTR [ebp+0xf] ; ax = pxah->nChannels;
100010b8: mov WORD PTR [ecx+0x2],ax ; wfx->nChannels = pxah->nChannels;
100010bc: mov dx,WORD PTR [ebp+0xc] ; dx = pxah->nSamplesPerSec;
100010c0: mov DWORD PTR [ecx+0x4],edx ; wfx->nSamplesPerSec = pxah->nSamplesPerSec;
100010c3: xor edx,edx ; edx = 0;
100010c5: movzx ax,BYTE PTR [ebp+0xf] ; ax = pxah->nChannels;
100010ca: shl eax,1 ; eax /= 2;
100010cc: mov WORD PTR [ecx+0xc],ax ; wfx->nBlockAlign = pxah->nChannels / 2;
100010d0: mov dx,WORD PTR [ebp+0xc] ; dx = pxah->nSamplesPerSec;
100010d4: and eax,0xffff ; eax = ax; /* no prior xor */
100010d9: pop ebp ; esp offset to eip is 4
100010da: imul eax,edx ; eax = pxah->nSamplesPerSec * pxah->nChannels / 2;
100010dd: mov DWORD PTR [ecx+0x8],eax ; wfx->nAvgBytesPerSec = eax;
100010e0: mov eax,ebx ; eax = hxas;
100010e2: mov WORD PTR [ecx+0xe],0x10 ; wfx->wBitsPerSample = 16;
100010e8: mov WORD PTR [ecx+0x10],0x0 ; wfx->cbSize = 0;
100010ee: pop ebx ; esp offset to eip is 0
100010ef: ret ; return (hxas);
100010f0: [...]
The decoder always produces 16bit PCM at the same rate as the XA file, and
this code confirms that the WAVEFORMATEX
structure pointer is an “output”
argument, not a regular argument asking to produce “that kind of WAVE data”.
It means that the rest of the code should remain simple as well and that’s
quite motivating to follow this through the end.
Collecting my precious
With all that, my first pieces of the puzzle:
typedef TODO xa_inflate_f(TODO, TODO);
typedef TODO xa_convert_f(TODO, TODO);
struct xastream {
struct xaheader hdr;
uint32_t dw32;
uint32_t dw36;
uint32_t blk_sz; /* 40 */
xa_inflate_f *inflate_cb; /* 44 */
xa_convert_f *convert_cb; /* 48 */
}; /* 52 */
Unfortunately that uncovered more pieces, but most of the opaque structure is known and that is a very significant progress. In the next post I will take a step back and look at more trivial matters.