Trouble debugging sample kernel module (RPi 1 + J-Link via JTAG)

Sysprogs forums Forums VisualKernel Trouble debugging sample kernel module (RPi 1 + J-Link via JTAG)

Tagged: 

This topic contains 8 replies, has 2 voices, and was last updated by  support 2 months, 1 week ago.

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #20252

    maiorfi
    Participant

    Hi.

    Despite sometimes I succeed in running a debugging session (I still am struggling to understand what that success depends on), for the most part of debugging sessions I get my target halted and following errors on GDB log (gdb client is started on a Ubuntu 17.10 VirtualBox virtual machine; its path is /opt/KernelCache/4.9.80+_0.kernel/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin/arm-linux-gnueabihf-gdb –interpreter mi):

    Kernel debug data block is corrupt (wrong signature)
    Failed to build/install the kernel debug helper module. Enumerating modules and loading symbols will be very slow.
    Cannot query module base address for module at 0xbf1ef980

    What can be wrong?

    (Although I guess that bold lines are the most relevant here, it would also be useful to understand how to get rid of other less important warning)

    Thanks!

     

     

    #20255

    support
    Keymaster

    Hi,

    This looks like VisualKernel fails to use the fast mechanism to query the module list. It could happen if the JTAG connection was unreliable and the memory reads were not succeeding. Could you please try attaching the full GDB log (in the “All GDB Interaction” mode) and the full output from OpenOCD? It may contain important clues.

    Another thing to try would be to lower the JTAG frequency (try as low as 500 KHz first). If this solves the problem, the JTAG wiring might be at blame and double-checking/re-soldering it should solve the problem.

    #20257

    maiorfi
    Participant

    I tried with both 500KHz and 100KHz, but with no luck.

    You can find gdb and ocd session logs as attachments (for 500KHz session).

    Thanks.

    • This reply was modified 2 months, 3 weeks ago by  maiorfi.
    • This reply was modified 2 months, 3 weeks ago by  maiorfi.
    Attachments:
    You must be logged in to view attached files.
    #20278

    support
    Keymaster

    Hi,

    Thanks for the logs. We encountered the issue during our internal testing, although it happened once every 10-20 debug sessions and did not interfere with debugging (VisualKernel used the slower symbol-based module listing mechanism and it worked without any noticeable side effects).

    This could be caused by some race conditions during OpenOCD initialization. We can help you pinpoint and work around this, although we may need a few iterations with this, as we cannot reproduce the behavior you are observing on our side. Please try adding the “mon sleep 3000” command under VisualKernel Project Properties -> Debug Settings -> Kernel Connection -> Advanced Settings (before and/or after the target remote command). If this resolves the problem, please try experiment with reducing the delay.

    If it doesn’t help, please try stopping the debug session via Debug->Break All and try reading the block that VisualKernel tried reading during initialization (e.g. x/16xb 0xbf05f940) manually by repeating the command shown in the GDB Session window. Does this produce a valid (non-zero) result after you stop the session manually?

    #20313

    maiorfi
    Participant

    Hi. Sorry for my delay.

    Sadly “mon sleep 3000” returns on GDB session log “&”\”monitor\” command not supported by this target.\n””

    #20314

    maiorfi
    Participant

    Putting “mon sleep 3000” only after target  remote command “command not supported by this target” issue is solved, but I still get the original issue.

    Please find gdb and OCD logs as attachments.

    Thanks.

    Attachments:
    You must be logged in to view attached files.
    #20325

    support
    Keymaster

    Hi,

    Thanks for checking this. Looks like it could be some configuration issue rather than a race condition then. In order to pinpoint this, please follow the steps below:

    1. Take a note of the module list block from the gdb log (e.g. x/16xb 0xbf1e8940)
    2. Start OpenOCD manually using a command line from the OpenOCD window in VisualKernel (stop the VisualKernel session first).
    3. Start gdb on the Linux machine manually and connect to gdb (target remote …)
    4. Try reading the module list block by running the x/16xb <address> command manually. Does this result in non-zero values?

    If the manual setup yields non-zero block contents, please try comparing the initialization commands between the manual mode and the VisualKernel GDB session log.

    If the manual setup also reproduces the problem, please try checking if other commands (e.g. bt, disassembly) work and if reading memory using the OpenOCD commands (e.g. mon mdw <address>) works. This should help us understand which part of the debugging pipeline is affected by the bug and devise a workaround.

    #20365

    maiorfi
    Participant

    Hi. Sadly, manual setup still shows reading zero blocks when executing “x/16xb” gdb command. I also tried some “mon mdw <address> <count>” commands, but looking at module list block I only find:

    0xbf1e4940: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4960: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4980: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e49a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e49c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e49e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4a00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4a20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4a40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4a60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4a80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4aa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    0xbf1e4ac0: 00000000 00000000 00000000 00000000

    Just to be sure JTAG is working as expected I also tried to run a mon mdw at a different address I spotted in gdb trace, and there I can find some “non-zero” content:

    0xc001d4d4: e1a00000 e1a00000 e1a00000 1afffff7 e12fff1e ee070f3a e2800020 e2511020
    0xc001d4f4: 8afffffb e12fff1e e3a02000 e5911158 e3800008 ee072fd5 ee072f9a ee020f10
    0xc001d514: ee0d1f30 e12fff1e 00000000 00000040 00000008 0000000c 00000004 00000000
    0xc001d534: 00000000 0000004c 00000000 00000040 00000000 0000000c

    Is there any other board I could try? I’m using a RPi 1, since it seems that RPi3 is not supported by my J-Link (V8), but I have plenty of other arm-based linux embedded devices I could try with.

    Thanks!

    #20368

    support
    Keymaster

    Hi,

    This could be caused by OpenOCD not handling some memory protection-related issues. As a quick workaround please try disabling the “use helper module for listing symbols” option to revert to a slower logic of scanning the regular kernel symbols.

    BTW, as VisualKernel uses OpenOCD for JTAG debugging, it should work with J-Link (we did a quick test for Raspberry Pi 3 with J-Link 9.1 and could not find any problems). Another alternative would be to get a fairly inexpensive Olimex ARM-USB-OCD-H probe – it essentially gives OpenOCD low-level control over the JTAG protocol, and hence should work with any target that is supported by OpenOCD.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.