Building a Win32 GCC cross-compiler for Debian systems

We’ve just finished building a Win32 toolchain for building Raspberry PI applications (BTW, if you have not checked Raspberry PI yet, do it, as it’s a pretty amazing gadget for it price). The build process was far away from being straight-forward and slightly different from the way barebone cross-compilers are built, so this post summarizes all problems and gives workarounds for them.

The build

The main difference between building a cross-compiler for a Linux system and a barebone system (such as arm-eabi) is the immense amount of libraries already available on the target system, each of them having include and library files under /usr/include and /usr/lib. Building separate versions of them on Windows would not only be tricky and time-consuming, but would also cause troubles in case our Windows binaries end up slightly different from the ones on the target Linux machine. So we simply import the headers and the libraries from the Linux machine before building the toolchain.

Importing the files

Identifying what to import is easy: just run this command line on your Linux box:

 echo | gcc -v -E -

You will get something like that:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-8+rpi1' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-8+rpi1) 
COLLECT_GCC_OPTIONS='-v' '-E' '-march=armv6' '-mfloat-abi=hard' '-mfpu=vfp'
 /usr/lib/gcc/arm-linux-gnueabihf/4.6/cc1 -E -quiet -v -imultilib . -imultiarch arm-linux-gnueabihf - -march=armv6 -mfloat-abi=hard -mfpu=vfp
ignoring nonexistent directory "/usr/local/include/arm-linux-gnueabihf"
ignoring nonexistent directory "/usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../../arm-linux-gnueabihf/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/arm-linux-gnueabihf/4.6/include
 /usr/local/include
 /usr/lib/gcc/arm-linux-gnueabihf/4.6/include-fixed
 /usr/include/arm-linux-gnueabihf
 /usr/include
End of search list.
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "<stdin>"
COMPILER_PATH=/usr/lib/gcc/arm-linux-gnueabihf/4.6/:/usr/lib/gcc/arm-linux-gnueabihf/4.6/:/usr/lib/gcc/arm-linux-gnueabihf/:/usr/lib/gcc/arm-linux-gnueabihf/4.6/:/usr/lib/gcc/arm-linux-gnueabihf/
LIBRARY_PATH=/usr/lib/gcc/arm-linux-gnueabihf/4.6/:/usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/:/usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../:/lib/arm-linux-gnueabihf/:/lib/:/usr/lib/arm-linux-gnueabihf/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-E' '-march=armv6' '-mfloat-abi=hard' '-mfpu=vfp'

The keywords here are  #include and LIBRARY_PATH. Most of the directories are empty or do not exist, so on Raspberry PI you end up with 3 key directories to import:

  • /usr/include
  • /usr/lib
  • /lib

Transfer them to your Windows machine and save somewhere preserving the relative paths.

Another important point is to look at the “Configured with” message that the target gcc shows, as we will need to replicate those arguments when building our cross-compiler.

Sysroot vs. Prefix

The usual way of building a barebone cross-compiler is to provide the target and prefix parameters to the configure script and let it decide where to put the includes and libraries:

../gcc-X.Y/configure --prefix=/c/my-gcc-folder/ --target=arm-eabi

This won’t be work with cross-compilers for Linux due to different locations of headers and libraries, so the configuration process is slightly different:

  1. Go to the usual target directory for your toolchain (e.g. c:\folder\arm-linux-gnueabihf)
  2. Create a subdirectory there (e.g. called sysroot, the name is arbitrary)
  3. Copy the imported Linux files to that directory, preserving the paths (e.g. /usr/include becomes c:\folder\arm-linux-gnueabihf\sysroot\usr\include).
  4. When configuring binutils, GCC and GDB provide an additional argument:
--with-sysroot=/c/folder/arm-linux-gnueabihf/sysroot

That’s not all though. If you are impatient enough to try building your GCC now, you will encounter multiple missing header/library errors when it comes up to building libgcc and other target libraries. In fact, the compiler you’ll get won’t ever find most of the headers. This can be diagnosed by running xgcc.exe the same way we did for Linux gcc:

echo | xgcc -v -E -

You might find some really strange paths there: e.g. c:/folder/sysrootc:/mingw/msys/1.0/include, or some of the Linux library paths (e.g. /usr/lib/arm-linux-gnueabihf) will be missing.

There are two totally different reasons behind this problem:

  1. Trying to maximize compatibility with Windows, MinGW/MSys silently replaces /usr/include with the MSys include directory. While this is useful when building Windows software, it has one inadvertent side effect: it replaces the NATIVE_SYSTEM_HEADER_DIR definition passed to GCC via command line with the MSys include path! Thus gcc ends up trying to append c:/mingw/msys/1.0/include to your sysroot, as it believes it’s the standard include file location.
  2. The Debian Linux distro (that Raspberry PI is based on) uses a slightly different include/lib path structure (e.g. /usr/include/<target>) and building GCC without the Debian patches would result in a crippled GCC.

The second one can be easily fixed by using the sources from Debian repositories (run apt-get source gcc-<version> on your Linux machine or apply Debian patches manually). The first one can be resolved by adding the following code to cppdefault.c:

#if defined(__MINGW32__) &amp;&amp; defined(TARGET_SYSTEM_ROOT)
#define NATIVE_SYSTEM_HEADER_DIR "/usr/include"
#endif

Building target libs

Another surprise will be waiting for you when your GCC build process tries to link libgcc. The immense amount of object files stuffed into just one link command line will exceed the Windows maximum command line length and will cause a strange CreateProcess() error to appear. The following command causes the overflow:

    $(subst @multilib_flags@,$(CFLAGS) -B./,$(subst \
        @multilib_dir@,$(MULTIDIR),$(subst \
        @shlib_objs@,$(objects),$(subst \
        @shlib_base_name@,libgcc_s,$(subst \
        @shlib_map_file@,$(mapfile),$(subst \
        @shlib_slibdir_qual@,$(MULTIOSSUBDIR),$(subst \
        @shlib_slibdir@,$(shlib_slibdir),$(SHLIB_LINK))))))))

The following simple hack changes it to use a response file instead:

    echo $(objects) > __objects.txt
    $(subst @multilib_flags@,$(CFLAGS) -B./,$(subst \
        @multilib_dir@,$(MULTIDIR),$(subst \
        @shlib_objs@,@__objects.txt,$(subst \
        @shlib_base_name@,libgcc_s,$(subst \
        @shlib_map_file@,$(mapfile),$(subst \
        @shlib_slibdir_qual@,$(MULTIOSSUBDIR),$(subst \
        @shlib_slibdir@,$(shlib_slibdir),$(SHLIB_LINK))))))))

Another problem would be related to wrong definition of caddr_t caused by conflicting Windows and Linux definitions and is resolved by adding -DUSED_FOR_TARGET to CFLAGS used for building libgcc.

Debugging with GDB

The sysroot trick is vital if you are planning to debug your apps with gdb running on Windows via gdbserver. Unless your sysroot structure exactly replicates the Linux directory structure and unless gdb is also configured with the –with-sysroot argument, you’ll get the following error message when trying to debug your programs:

warning: Unable to find dynamic linker breakpoint function.

This would mean a failure trying to load symbols for ld-linux-armhf.so.3 rendering most of module-related functionality inoperable.  Needless to say, your debugging won’t be very usable afterwards…

The summary

Summarizing all previously said, here’s a checklist for building your Linux cross-compiler on Windows:

  • If you are targeting Debian, apply Debian patches to GCC.
  • Fix NATIVE_SYSTEM_HEADER_DIR inside cppdefault.c
  • Use –with-sysroot when configuring binutils, gcc and gdb
  • Import Linux includes and libs before building gcc
  • Fix $(objects) in libgcc/Makefile
  • Add -DUSED_FOR_TARGET to libgcc CFLAGS
  • Test your toolchain with some C++ code (including STL)
  • Test debugging with gdbserver to ensure all symbols are usable

If you like it simple…

If you don’t feel like reliving the adventurous enterprise of building your own Windows toolchain for Raspberry PI, you can download our pre-built version.

 

 

 

Beware: a bug in memcpy() in MacOS 10.7 Kernel

I was just creating a truncated port of STLPort for MacOS kernel environment for one of our Mac drivers and stumbled upon a nasty bug. Any attempt to initialize an std::string immediately caused a kernel panic.

Investigating the problem revealed that the memcpy() function that is manually coded in assembly does not actually care about the return value. Makes sense, how often did you use the return value of memcpy()? I never did. Just until finding out that STLPort heavily does and crashes in case it’s wrong.

I’ve created a simple test case to reproduce the bug:

The __ucopy_trivial() function that is supposed to return the pointer to the end of the destination array actually returns 0x02. Looking more into the memcpy() function shows that the authors have simply forgotten to do anything with the $rax register holding the return value:

That’s consistent wit the contents of the xnu-1699.22.81/osfmk/x86_64/bcopy.s file:

/* void *memcpy((void *) to, (const void *) from, (size_t) bcount) */
/*            rdi,              rsi,          rdx   */
/*
 * Note: memcpy does not support overlapping copies
 */
ENTRY(memcpy)
    movq    %rdx,%rcx
    shrq    $3,%rcx                /* copy by 64-bit words */
    cld                    /* copy forwards */
    rep
    movsq
    movq    %rdx,%rcx
    andq    $7,%rcx                /* any bytes left? */
    rep
    movsb
    ret

Oops, what %rax? 🙂

I believe one of the reasons reason why this bug has not been immediately discovered is because in most of the cases when you call memcpy(), gcc will use the $rax register to hold the first argument before placing it to $rdi. Thus if you try to reproduce the bug with a simple call to memcpy() alone, you won’t see any problem:

The solution

The solution is simple: just make a wrapper around memcpy() and put it somewhere in your global header files so that the memcpy()-related code will actually use it:

static inline void *memcpy_workaround(void *dst, 
                                      const void *src, size_t len)
{
    memcpy(dst, src, len);
    return dst;
}

#define memcpy memcpy_workaround

 

 

Making breakpoints work with VMWare gdb stub

If you are debugging your Linux or MacOS kernel drivers frequently, running the guest OS under VMWare and using the VMWare gdb stub can save you a lot of time: fast reliable debugging experience handled by VMWare on top of the operating system itself.

There’s just one “feature” that can cost you a lot of time if you’re and just took me a whole evening to figure out.

When you add debug stub support to your VM, you add something like this to your VMX file:

debugStub.listen.guest32=1
debugStub.listen.guest64=1
debugStub.listen.guest32.remote = "TRUE" 
debugStub.listen.guest64.remote = "TRUE"

Then you start debugging your kernel being so happy about the speed and reliability and then suddenly notice that you can’t set more than 4 breakpoints.

The problem is that VMWare 8 uses the “hide breakpoints” mode by default. I.e. instead of making your breakpoints with the “int 3” instructions like any normal debugger would do, it uses the scarce hardware debugging registers, limiting the amount of the breakpoints you can have to 4. It happens regardless of the GDB command you use to set the breakpoint. I.e. the normal “break” will still be interpreted as “hbreak”.

The solution is very simple. Just add this line to your VMX file:

debugStub.hideBreakpoints = "FALSE"

This disables the breakpoint hiding mode and lets VMWare use the good old int 3 instruction to set the breakpoints so that you can debug normally again.

A GDB update for Android-NDK fixes many bugs

If you were previously debugging some native Android code with ndk-gdb, you have probably noticed some instabilities and annoying bugs. But would you ever assume that a whole bunch of those problems is caused by a horribly outdated version of gdb and gdbserver, its counterpart, provided with Android NDK r8?

I would not. Just until one of our VisualGDB customers casually asked why can’t he see the contents of NEON registers in Visual Studio when he hovers the mouse over such a register. A quick investigation showed that the Android port of gdb and gdbserver has been maintained over the last years with features added and bugs fixed, but due to some unknown reason the NDK is shipped with an ancient gdb 6.6 released back in 2006.

So what I just did is built gdb 7.4.1 and gdbserver 7.4.1 from sources with Android patches, fixed some minor compilation and script errors and replaced the original NDK binaries with them. The result is quite inspiring:

  • The newer GDB shows the NEON registers (such as $d0) normally.
  • The THUMB code is disassembled normally even without debug symbols.
  • I’ve just got a feeling that it is somewhat faster and crashes quite a bit less.
  • It supports pending breakpoints (althrough requires the sharedlibrary command to rebind them).

I have no clue why the Android NDK maintainers put the ancient build of gdb into the NDK releases and totally ignore the new versions. Maybe, they did not have time to test them, maybe there are compatibility issues with some devices… Unfortunately, we could only guess. There’s one thing for sure though, the 7.4.1 build looks a lot better than the prehistoric 6.6 and has saved us a lot of time and nerve already. Maybe it can do the same for you.

Just to conclude, we have uploaded the pre-built gdb/gdbserver 7.4.1 to our website along with the installation instructions. If you are debugging NDK code on a daily basis, give it a try… BTW, we’ve also included it into the VisualGDB installer and enabled it by default.

Android ndk-build.cmd bug

Just discovered a minor bug in the ndk-build.cmd script provided with Android NDK r8.

If you’re calling ndk-build.cmd from Visual Studio, or any other IDE or script, beware that if your build fails, the script will still return an exit code of 0, as if no problem happened. This happens due to en error in the last line of the script:

%NDK_ROOT%\prebuilt\windows\bin\make.exe -f %NDK_ROOT%build/core/build-local.mk SHELL=cmd %* || exit /b %ERRORLEVEL%

The %ERRORLEVEL% from running make.exe is not yet available when exit /b is executed, so the script returns 0 despite the error returned by make.

The fix is very simple: just remove the || exit /b %ERRORLEVEL% from the last line. As the return value of the .cmd script is the return value of the last command, the error returned by make will be propagated by the script and your IDE will successfully detect it.

You can get a fixed version of the script here: http://visualgdb.com/KB/ndk-build.zip.

View earlier posts...