All posts by Ivan Shcherbakov

Beware: a bug in memcpy() in MacOS 10.7 Kernel

I was just creating a truncated port of STLPort for MacOS kernel environment for one of our Mac drivers and stumbled upon a nasty bug. Any attempt to initialize an std::string immediately caused a kernel panic.

Investigating the problem revealed that the memcpy() function that is manually coded in assembly does not actually care about the return value. Makes sense, how often did you use the return value of memcpy()? I never did. Just until finding out that STLPort heavily does and crashes in case it’s wrong.

I’ve created a simple test case to reproduce the bug:

The __ucopy_trivial() function that is supposed to return the pointer to the end of the destination array actually returns 0x02. Looking more into the memcpy() function shows that the authors have simply forgotten to do anything with the $rax register holding the return value:

That’s consistent wit the contents of the xnu-1699.22.81/osfmk/x86_64/bcopy.s file:

/* void *memcpy((void *) to, (const void *) from, (size_t) bcount) */
/*            rdi,              rsi,          rdx   */
/*
 * Note: memcpy does not support overlapping copies
 */
ENTRY(memcpy)
    movq    %rdx,%rcx
    shrq    $3,%rcx                /* copy by 64-bit words */
    cld                    /* copy forwards */
    rep
    movsq
    movq    %rdx,%rcx
    andq    $7,%rcx                /* any bytes left? */
    rep
    movsb
    ret

Oops, what %rax? 🙂

I believe one of the reasons reason why this bug has not been immediately discovered is because in most of the cases when you call memcpy(), gcc will use the $rax register to hold the first argument before placing it to $rdi. Thus if you try to reproduce the bug with a simple call to memcpy() alone, you won’t see any problem:

The solution

The solution is simple: just make a wrapper around memcpy() and put it somewhere in your global header files so that the memcpy()-related code will actually use it:

static inline void *memcpy_workaround(void *dst, 
                                      const void *src, size_t len)
{
    memcpy(dst, src, len);
    return dst;
}

#define memcpy memcpy_workaround

 

 

Making breakpoints work with VMWare gdb stub

If you are debugging your Linux or MacOS kernel drivers frequently, running the guest OS under VMWare and using the VMWare gdb stub can save you a lot of time: fast reliable debugging experience handled by VMWare on top of the operating system itself.

There’s just one “feature” that can cost you a lot of time if you’re and just took me a whole evening to figure out.

When you add debug stub support to your VM, you add something like this to your VMX file:

debugStub.listen.guest32=1
debugStub.listen.guest64=1
debugStub.listen.guest32.remote = "TRUE" 
debugStub.listen.guest64.remote = "TRUE"

Then you start debugging your kernel being so happy about the speed and reliability and then suddenly notice that you can’t set more than 4 breakpoints.

The problem is that VMWare 8 uses the “hide breakpoints” mode by default. I.e. instead of making your breakpoints with the “int 3” instructions like any normal debugger would do, it uses the scarce hardware debugging registers, limiting the amount of the breakpoints you can have to 4. It happens regardless of the GDB command you use to set the breakpoint. I.e. the normal “break” will still be interpreted as “hbreak”.

The solution is very simple. Just add this line to your VMX file:

debugStub.hideBreakpoints = "FALSE"

This disables the breakpoint hiding mode and lets VMWare use the good old int 3 instruction to set the breakpoints so that you can debug normally again.

A GDB update for Android-NDK fixes many bugs

If you were previously debugging some native Android code with ndk-gdb, you have probably noticed some instabilities and annoying bugs. But would you ever assume that a whole bunch of those problems is caused by a horribly outdated version of gdb and gdbserver, its counterpart, provided with Android NDK r8?

I would not. Just until one of our VisualGDB customers casually asked why can’t he see the contents of NEON registers in Visual Studio when he hovers the mouse over such a register. A quick investigation showed that the Android port of gdb and gdbserver has been maintained over the last years with features added and bugs fixed, but due to some unknown reason the NDK is shipped with an ancient gdb 6.6 released back in 2006.

So what I just did is built gdb 7.4.1 and gdbserver 7.4.1 from sources with Android patches, fixed some minor compilation and script errors and replaced the original NDK binaries with them. The result is quite inspiring:

  • The newer GDB shows the NEON registers (such as $d0) normally.
  • The THUMB code is disassembled normally even without debug symbols.
  • I’ve just got a feeling that it is somewhat faster and crashes quite a bit less.
  • It supports pending breakpoints (althrough requires the sharedlibrary command to rebind them).

I have no clue why the Android NDK maintainers put the ancient build of gdb into the NDK releases and totally ignore the new versions. Maybe, they did not have time to test them, maybe there are compatibility issues with some devices… Unfortunately, we could only guess. There’s one thing for sure though, the 7.4.1 build looks a lot better than the prehistoric 6.6 and has saved us a lot of time and nerve already. Maybe it can do the same for you.

Just to conclude, we have uploaded the pre-built gdb/gdbserver 7.4.1 to our website along with the installation instructions. If you are debugging NDK code on a daily basis, give it a try… BTW, we’ve also included it into the VisualGDB installer and enabled it by default.

Android ndk-build.cmd bug

Just discovered a minor bug in the ndk-build.cmd script provided with Android NDK r8.

If you’re calling ndk-build.cmd from Visual Studio, or any other IDE or script, beware that if your build fails, the script will still return an exit code of 0, as if no problem happened. This happens due to en error in the last line of the script:

%NDK_ROOT%\prebuilt\windows\bin\make.exe -f %NDK_ROOT%build/core/build-local.mk SHELL=cmd %* || exit /b %ERRORLEVEL%

The %ERRORLEVEL% from running make.exe is not yet available when exit /b is executed, so the script returns 0 despite the error returned by make.

The fix is very simple: just remove the || exit /b %ERRORLEVEL% from the last line. As the return value of the .cmd script is the return value of the last command, the error returned by make will be propagated by the script and your IDE will successfully detect it.

You can get a fixed version of the script here: http://visualgdb.com/KB/ndk-build.zip.