Resolving library symbol load errors when debugging with cross-toolchains

Using cross-compilation toolchains to build code for your embedded Linux boards (such as Raspberry PI) can be cool. It’s faster than building your code on a slow embedded box with few RAM and disk space, it’s easier to edit files directly, it’s more comfortable to have all necessary files at hand.

There’s one common problem, however. If you use your cross-compilation toolchain to debug a remote Linux box using GDBServer, you may end up in a situation when GDB silently fails to load symbols for your libraries unless you manually execute the sharedlibrary command. This article explains why this happens and how to resolve it.

Shared library load events on Linux

Unlike Windows where library load/unload events are generated by the operating system and caught via WaitForDebugEvent(), linux does not offer an OS-wide mechanism for handling those events. Instead, the dynamic linker (typically called ld-linux.so) provides a special function called _dl_debug_state() that is called each time a library load/unload event occurs. Here’s the definition of the function from Glibc 2.17:

Each time the dynamic linker is asked to load or unload a library, it calls the _dl_debug_state() function. E.g. here’s code from _dl_map_object_from_fd() that notifies debugger that a library was loaded:

If a debugger (such as GDB) sets a breakpoint on the _dl_debug_state() function, it should be able to detect all library load/unload events and act accordingly (e.g. load the necessary symbols). Let’s see how GDB accomplishes this.

The GDB approach

To see how exactly GDB handles the shared library events, we’ll use a Windows cross-build of GDB 7.5.1 to debug a Linux program using gdbserver.

In order to distinguish the breakpoints internally created to watch for shared library events, GDB creates them with a special tag using the create_solib_event_breakpoint() function. If we set a breakpoint on it, let a properly configured GDB attach to a remote Linux target, we’ll see that the function is called by the enable_break() function in solib-svr4.c that is in turn called by the svr4_solib_create_inferior_hook() function.

The enable_break() function checks the debug_base field of the svr4_info structure. The value comes from parsing the .dynamic section of the executable (elf_locate_base() function does the job) and is typically NULL on Linux. In this case GDB will use a different algorithm to find the address:

First of all, GDB will find the name of the dynamic linker library (e.g. /lib/ld-linux.so.2) by calling find_program_interpreter() that reads it from the main executable file’s headers.

Then GDB will use its symbol engine to open the ld-linux.so file (via a call to solib_bfd_open()). Once the file is opened locally, GDB will use the symbol information from the file to find the address of the function used by the dynamic linker to report library events. GDB will do this by trying to locate one of several predefined functions in the symbol table:

Here’s the list of the predefined symbol names that GDB expects to find:

The _dl_debug_state() function mentioned above is the third entry in the list.

The problem

The big problem with this approach is that GDB uses the local copy of the ld-linux.so to determine the address of _dl_debug_state() and then sets a breakpoint at this address inside the remote copy of ld-linux.so. Everything works like a charm if those copies are identical, however if your local copy gets out of sync with your Linux box (e.g. after updating your Linux system), things get nasty: GDB will silently put a breakpoint at a wrong place without issuing any warning and your library load events will be silently ignored. The easiest way to tell whether your files are out of sync is to run the following command against both local and remote versions of ld-linux.so:

The readelf -s command will dump the symbols listed in the ld-linux.so file. The grep tool will filter out everything except the line related to _dl_debug_state:

If the address of _dl_debug_state (0000f9f0 in the example above) turns out to be different on your remote Linux box and the machine with the cross-toolchain, you have found the cause of your problem.

Known solutions

One way of solving this would be to keep your cross-toolchain sysroot folder synchronized with the remote Linux box to ensure that the versions of ld-linux.so on both sides are binary identical. If you don’t want to do this, you can tell GDB to automatically download the binaries from your remote machine instead of using the local copies in the cross-toolchain sysroot directory. This is achieved using the set sysroot command:

This will let GDB use the root directory (/) on the remote machine (running gdbserver) as the source of the symbol files. Note that you will need to have a fresh build of gdbserver and a fast connection between the two machines to use this option efficiently.