Beware: a bug in memcpy() in MacOS 10.7 Kernel

I was just creating a truncated port of STLPort for MacOS kernel environment for one of our Mac drivers and stumbled upon a nasty bug. Any attempt to initialize an std::string immediately caused a kernel panic.

Investigating the problem revealed that the memcpy() function that is manually coded in assembly does not actually care about the return value. Makes sense, how often did you use the return value of memcpy()? I never did. Just until finding out that STLPort heavily does and crashes in case it’s wrong.

I’ve created a simple test case to reproduce the bug:

The __ucopy_trivial() function that is supposed to return the pointer to the end of the destination array actually returns 0x02. Looking more into the memcpy() function shows that the authors have simply forgotten to do anything with the $rax register holding the return value:

That’s consistent wit the contents of the xnu-1699.22.81/osfmk/x86_64/bcopy.s file:

/* void *memcpy((void *) to, (const void *) from, (size_t) bcount) */
/*            rdi,              rsi,          rdx   */
 * Note: memcpy does not support overlapping copies
    movq    %rdx,%rcx
    shrq    $3,%rcx                /* copy by 64-bit words */
    cld                    /* copy forwards */
    movq    %rdx,%rcx
    andq    $7,%rcx                /* any bytes left? */

Oops, what %rax? 🙂

I believe one of the reasons reason why this bug has not been immediately discovered is because in most of the cases when you call memcpy(), gcc will use the $rax register to hold the first argument before placing it to $rdi. Thus if you try to reproduce the bug with a simple call to memcpy() alone, you won’t see any problem:

The solution

The solution is simple: just make a wrapper around memcpy() and put it somewhere in your global header files so that the memcpy()-related code will actually use it:

static inline void *memcpy_workaround(void *dst, 
                                      const void *src, size_t len)
    memcpy(dst, src, len);
    return dst;

#define memcpy memcpy_workaround



Making breakpoints work with VMWare gdb stub

If you are debugging your Linux or MacOS kernel drivers frequently, running the guest OS under VMWare and using the VMWare gdb stub can save you a lot of time: fast reliable debugging experience handled by VMWare on top of the operating system itself.

There’s just one “feature” that can cost you a lot of time if you’re and just took me a whole evening to figure out.

When you add debug stub support to your VM, you add something like this to your VMX file:

debugStub.listen.guest32.remote = "TRUE" 
debugStub.listen.guest64.remote = "TRUE"

Then you start debugging your kernel being so happy about the speed and reliability and then suddenly notice that you can’t set more than 4 breakpoints.

The problem is that VMWare 8 uses the “hide breakpoints” mode by default. I.e. instead of making your breakpoints with the “int 3” instructions like any normal debugger would do, it uses the scarce hardware debugging registers, limiting the amount of the breakpoints you can have to 4. It happens regardless of the GDB command you use to set the breakpoint. I.e. the normal “break” will still be interpreted as “hbreak”.

The solution is very simple. Just add this line to your VMX file:

debugStub.hideBreakpoints = "FALSE"

This disables the breakpoint hiding mode and lets VMWare use the good old int 3 instruction to set the breakpoints so that you can debug normally again.