Thinking like a compiler

In the previous post I came to the conclusion that xadec.dll turns an XA file into a PCM stream and describes that PCM stream with a WAVEFORMATEX data structure. After looking at the valuable insights present in xadec.h it’s now time to look at the assembly and for the first time encounter machine code that _really_ doesn’t look like what a sane human being would come up with.

The xaDecodeOpen function does two main things: set up the decoder state and fill the WAVEFORMATEX structure. And that’s where the riddle begins: given an opaque data structure, how to make sense of the values it holds?

10001000: push   ebx                             ; esp offset to eip is 4
10001001: push   ebp                             ; esp offset to eip is 8
10001002: mov    ebp,DWORD PTR [esp+0xc]         ; ebp = pxah; /* arg0 */
10001006: push   esi                             ; esp offset to eip is 12
10001007: cmp    DWORD PTR [ebp+0x0],0x3144574b  ; if (pxah->id != _XAID)
1000100e: jne    0x1000110a                      ;         return (0); /* error */
10001014: push   0x34
10001016: push   0x40
10001018: call   DWORD PTR ds:0x10006010         ; eax = GlobalAlloc(64, 52);
1000101e: push   eax
1000101f: call   DWORD PTR ds:0x1000600c         ; eax = GlobalLock(eax); /* no-op */
10001025: mov    ebx,eax                         ; ebx = eax; /* hxas = calloc(1, 52); */
10001027: test   ebx,ebx                         ; if (ebx == NULL)
10001029: je     0x1000110a                      ;         return (0); /* error */
1000102f: [...]

Being completely new to this, and lacking any clear methodology I made things up as I was making progress. I would for example call arg0 the first argument to a function, or replace it with its name (if it’s known or I understand its purpose). In this case arg0 is called pxah in xadec.h and its id field known, otherwise I would have called written down something like arg0->dw0 in the if statement where dw0 means a 32-bit integer at the offset 0.

I have yet to make sense of what the opaque data structure fields mean, but at least I know its size: 52 octets. Yikes, that’s a lot of bytes to keep track of.

I note that a failing GlobalLock after a succeeding GlobalAlloc will leak 52 bytes of memory (plus the allocator’s overhead for alignment and tracking) but I also note something interesting. I once read that the first rule of compiler optimizations is that the compiler is smarter than you, and the second rule is that the compiler is smarter than you. Apparently xadec.dll was either compiled without (much) optimizations or comes from a time where compilers were less smart. In this case I’m not sure how moving eax’s value to ebx helps. Testing a null eax would have the same effect of unwinding the stack and returning an error:

10001104: [...]
1000110a: pop    esi                              ; esp offset to eip is 8
1000110b: pop    ebp                              ; esp offset to eip is 4
1000110c: xor    eax,eax                          ; eax = 0;
1000110e: pop    ebx                              ; esp offset to eip is 0
1000110f: ret                                     ; return (eax); /* error */

You could even save some processing by moving the xor operation before the other pop operations and jump straight after the xor when GlobalLock fails and eax is already null. Unless pop esi or pop ebp has a side effect I’m not aware of on eax, which sounds highly unlikely to me. I’ve had this idea for a while that most software problems can be reduced to graph theory problems. The use (and reuse) of a limited set of registers is akin to graph coloring and that’s a tough nut to crack, but I digress…

Thinking more like a compiler

Let’s move on and stay focused on what’s happening in the code. It’s already complicated enough, and too early too think about optimizing anything. It is more likely that I may have to mentally deoptimize code unless I’m lucky…

10001029: [...]
1000102f: xor    eax,eax                         ; eax = 0;
10001031: mov    al,BYTE PTR [ebp+0xe]           ; eax = pxah->nBits;
10001034: sub    eax,0x4                         ; eax -= 4;
10001037: je     0x10001063                      ; if (eax == 0) goto 0x10001063;
10001039: sub    eax,0x2                         ; eax -= 2;
1000103c: je     0x10001053                      ; if (eax == 0) goto 0x10001053;
1000103e: sub    eax,0x2                         ; eax -= 2;
10001041: jne    0x10001071                      ; if (eax != 0) goto 0x10001071;
10001043: mov    DWORD PTR [ebx+0x28],0x21       ; hxas->dw40 = 31;
1000104a: mov    DWORD PTR [ebx+0x2c],0x10001450 ; hxas->dw44 = 0x10001450;
10001051: jmp    0x10001071                      ; goto 0x10001071;
10001053: mov    DWORD PTR [ebx+0x28],0x19       ; hxas->dw40 = 25;
1000105a: mov    DWORD PTR [ebx+0x2c],0x100013e0 ; hxas->dw44 = 0x100013e0;
10001061: jmp    0x10001071                      ; goto 0x10001071;
10001063: mov    DWORD PTR [ebx+0x28],0x11       ; hxas->dw40 = 17;
1000106a: mov    DWORD PTR [ebx+0x2c],0x100013a0 ; hxas->dw44 = 0x100013a0;
10001071: [...]

So now the compiler is outsmarting us for a greater good, unless xadec.dll was coded in this style which I highly doubt. It is commonly known that computers are good at comparing things to zero and sometimes simply turning a for loop upside down so that it converges towards zero can provide a noticeable performance boost. An explanation could be that some architectures like x86 may set the zero flag (ZF) for instructions not primarily designed to make comparisons. Arithmetic and bitwise operations set the flag if the result is zero and unset it otherwise, and sub can be followed by a conditional jump.

So conceptually we get this:

int nBits = pxah->nBits;

if ((nBits -= 4) == 0) {
	hxas->dw40 = 31;
	hxas->dw44 = 0x10001450;
}
else if ((nBits -= 2) == 0) {
	hxas->dw40 = 25;
	hxas->dw44 = 0x100013e0;
}
else if ((nBits -= 2) == 0) {
	hxas->dw40 = 17;
	hxas->dw44 = 0x100013a0;
}
else {
	/* hxas->dw40 and hxas->dw44 are both zero */
}

After a mental deoptimization I may write this:

struct xastream *
xaDecodeOpen(struct xaheader *pxah, struct waveformatex *wfx)
{
	struct xastream *hxas = NULL;

	if (pxah->id == _XAID)
		hxas = calloc(1, 52);

	if (hxas == NULL)
		return (NULL);

	if (pxah->nBits == 8) {
		hxas->dw40 = 31;
		hxas->dw44 = 0x10001450;
	}
	else if (pxah->nBits == 6) {
		hxas->dw40 = 25;
		hxas->dw44 = 0x100013e0;
	}
	else if (pxah->nBits == 4) {
		hxas->dw40 = 17;
		hxas->dw44 = 0x100013a0;
	}

	/* ... */
}

On the other hand retdec ended up with something confusing, looking roughly like this once stripped (really really) down:

// Address range: 0x10001000 - 0x1000110f
char * xaDecodeOpen(char * a1, int16_t * a2, int32_t a3) {
    int32_t v1 = (int32_t)a1;
    if (*(int32_t *)a1 != 0x3144574b) {
        return NULL;
    }
    char * v2 = GlobalLock(GlobalAlloc(64, 52));
    int32_t pMem2 = (int32_t)v2;
    if (v2 != NULL) {
        unsigned char v3 = *(char *)(v1 + 14);
        int32_t v4 = v3;
        if (v3 != 4) {
            if (v4 == 6) {
                *(int32_t *)(pMem2 + 40) = 25;
                *(int32_t *)(pMem2 + 44) = 0x100013e0;
            } else {
                if (v4 == 8) {
                    *(int32_t *)(pMem2 + 40) = 33;
                    *(int32_t *)(pMem2 + 44) = 0x10001450;
                }
            }
            /* ... */
            return (char *)pMem2;
        }
        *(int32_t *)(pMem2 + 40) = 17;
        *(int32_t *)(pMem2 + 44) = 0x100013a0;
        /* ... */
        return (char *)pMem2;
    }
    return NULL;

You may note the presence of a third argument, and this is likely a problem caused by the lack of known calling convention and my inability to find how to tell retdec which one it is. It also has the tendency to follow the assembly too closely and for example when pxah->nBits is moved as a byte and then computed as a dword, we end up with two distinct variables v3 and v4. And that’s when we’re lucky: not accounting for the /* ... */ code duplication there were 12 other local variables and in total the library decompilation grew 119 global variables.

Breaking the spell

With a partial function decompilation we already end up with a handful of magic numbers, so it’s best to start now. I’m leaving 0x3144574b out because it was ostensibly waiting to be picked up in xadec.h and seemed like the first thing you’d check in a function called xaDecodeOpen. Well, don’t mind me, that would be the second thing to check, right? Did you notice the lack of null check of arg0 aka pxah?

Next, the mysterious numbers 17, 25 and 33. In order to understand their meaning, I need to solve a couple equations and for the sake of brevity I will mention when I get to that code what confirmed the supposed meaning.

f(x) = ???
f(8) = 33
f(6) = 25
f(4) = 17

It doesn’t take much time to notice the pattern, at least it didn’t take me long to realize that those figures were “almost” recognizable:

blockSize(nBits) = 4 * nBits + 1
blockSize(8) = 33
blockSize(6) = 25
blockSize(4) = 17

Praise my mighty arithmetic powers!

Next I need to make sense of 0x10001450, 0x100013e0 and 0x100013a0. They look awfully like function addresses, and tracking the use of this 44 offset confirms it:

100011c4: push   edx
100011c5: push   ebx
100011c6: call   DWORD PTR [edi+0x2c]

One can assume that nBits refers to the number of bits per samples, so the addresses probably refer to callbacks that uncompress the samples, and it looks like the callbacks take two arguments.

Moving forward in the decompilation I find more magic numbers, but fortunately less magic and more straightforward from now on:

1000106a: [...]
10001071: xor    eax,eax                         ; eax = 0;
10001073: mov    al,BYTE PTR [ebp+0xf]           ; eax = pxah->nChannels;
10001076: dec    eax                             ; eax--;
10001077: je     0x10001085                      ; if (eax == 0) goto 0x10001085;
10001079: dec    eax                             ; eax--;
1000107a: jne    0x1000108c                      ; if (eax != 0) goto 0x1000108c;
1000107c: mov    DWORD PTR [ebx+0x30],0x10001210 ; hxas->dw48 = 0x10001210;
10001083: jmp    0x1000108c                      ; goto 0x1000108c;
10001085: mov    DWORD PTR [ebx+0x30],0x10001190 ; hxas->dw48 = 0x10001190;
1000108c: [...]

This is quite similar to the nBits callbacks, here we have what looks like one callback for mono and one callback for stereo conversion. Again, using the same trick with dec instead of sub to compare pxah->nChannels to 1 and 2. xadec.dll is a 32bit library, so it’s no surprise to see function pointers fit in a dword.

Boardom is bliss

I’m a big fan of The IT Crowd, a silly British sitcom that manages to be hilarious at times. There’s an episode on board games, and they make it really sound like bored games, and sometimes boredom is the best thing that could happen to you, but I digress. Thankfully at this point it’s all very boring.

There is nothing complicated in this function and the decompilation is very straightforward. Except maybe for a couple magic numbers that were easy to understand, there was really only one surprising thing. The next boring step is to check that we have both callbacks, so in other words a valid combination of sample size and mono or stereo sound.

10001085: [...]
1000108c: mov    eax,DWORD PTR [ebx+0x2c]        ; eax = hxas->dw44;
1000108f: test   eax,eax                         ; if (hxas->dw44 == NULL)
10001091: je     0x100010f0                      ;         goto 0x100010f0;
10001093: mov    eax,DWORD PTR [ebx+0x30]        ; eax = hxas->dw48;
10001096: test   eax,eax                         ; if (hxas->dw48 == NULL)
10001098: je     0x100010f0                      ;         goto 0x100010f0;
1000109a: [...]

I’m omitting the error handing at 0x100010f0 but it is positioned just before the bit I already decompiled and frees hxas before returning zero (aka an error in this case).

Next is a bit of a surprise, although not puzzling. Thanks to prior exposure to this construct I was able to follow the logic:

10001098: [...]
1000109a: push   edi                             ; esp offset to eip is 16
1000109b: mov    ecx,0x8                         ; ecx = 8;
100010a0: mov    esi,ebp                         ; esi = pxah
100010a2: mov    edi,ebx                         ; edi = hxas
100010a4: rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
                                                 ; memcpy(hxas, pxah, sizeof *pxah);
100010a6: [...]

Copy 8 dwords (32 bytes) from pxah to hxas, in other words embed a copy of the XA header in the opaque data structure, removing at once 32 bytes from the 52 bytes puzzle. What. A. Relief.

Surf the WAVE

Surf is one of those board games for which I wonder how it all started. At some point someone thought it would be fun to ride waves, and turned out to be correct. But how does such an idea come to someone’s mind? In a sense, this reverse engineering journey is that kind of improbable idea, and after a poor start I now feel like everything is going by itself according to plan. World domination will ensue.

The last bit of good news came from the last bit of decompiled code. At this point where everything succeeded, all we need (barring one more missing null check) is to fill in the wave file format:

100010a4: [...]
100010a6: mov    ecx,DWORD PTR [esp+0x18]        ; ecx = wfx; /* arg1 */
100010aa: xor    edx,edx                         ; edx = 0;
100010ac: pop    edi                             ; esp offset to eip is 12
100010ad: pop    esi                             ; esp offset to eip is 8
100010ae: mov    WORD PTR [ecx],0x1              ; wfx->wFormatTag = WAVE_FORMAT_PCM;
100010b3: movzx  ax,BYTE PTR [ebp+0xf]           ; ax = pxah->nChannels;
100010b8: mov    WORD PTR [ecx+0x2],ax           ; wfx->nChannels = pxah->nChannels;
100010bc: mov    dx,WORD PTR [ebp+0xc]           ; dx = pxah->nSamplesPerSec;
100010c0: mov    DWORD PTR [ecx+0x4],edx         ; wfx->nSamplesPerSec = pxah->nSamplesPerSec;
100010c3: xor    edx,edx                         ; edx = 0;
100010c5: movzx  ax,BYTE PTR [ebp+0xf]           ; ax = pxah->nChannels;
100010ca: shl    eax,1                           ; eax /= 2;
100010cc: mov    WORD PTR [ecx+0xc],ax           ; wfx->nBlockAlign = pxah->nChannels / 2;
100010d0: mov    dx,WORD PTR [ebp+0xc]           ; dx = pxah->nSamplesPerSec;
100010d4: and    eax,0xffff                      ; eax = ax; /* no prior xor */
100010d9: pop    ebp                             ; esp offset to eip is 4
100010da: imul   eax,edx                         ; eax = pxah->nSamplesPerSec * pxah->nChannels / 2;
100010dd: mov    DWORD PTR [ecx+0x8],eax         ; wfx->nAvgBytesPerSec = eax;
100010e0: mov    eax,ebx                         ; eax = hxas;
100010e2: mov    WORD PTR [ecx+0xe],0x10         ; wfx->wBitsPerSample = 16;
100010e8: mov    WORD PTR [ecx+0x10],0x0         ; wfx->cbSize = 0;
100010ee: pop    ebx                             ; esp offset to eip is 0
100010ef: ret                                    ; return (hxas);
100010f0: [...]

The decoder always produces 16bit PCM at the same rate as the XA file, and this code confirms that the WAVEFORMATEX structure pointer is an “output” argument, not a regular argument asking to produce “that kind of WAVE data”. It means that the rest of the code should remain simple as well and that’s quite motivating to follow this through the end.

Collecting my precious

With all that, my first pieces of the puzzle:

typedef TODO xa_inflate_f(TODO, TODO);
typedef TODO xa_convert_f(TODO, TODO);

struct xastream {
        struct xaheader hdr;
        uint32_t        dw32;
        uint32_t        dw36;
        uint32_t        blk_sz; /* 40 */
        xa_inflate_f    *inflate_cb; /* 44 */
        xa_convert_f    *convert_cb; /* 48 */
}; /* 52 */

Unfortunately that uncovered more pieces, but most of the opaque structure is known and that is a very significant progress. In the next post I will take a step back and look at more trivial matters.