Why is the LinuxKernelDebugHelper module not getting built correctly?

Sysprogs forums Forums VisualKernel Why is the LinuxKernelDebugHelper module not getting built correctly?

This topic contains 10 replies, has 2 voices, and was last updated by  support 1 month ago.

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #26794

    Goerge255
    Participant

    I am at a stage where I can successfully:

    1. Build the “Hello World” sample
    2. Run this sample on the debugee/target with a breakpoint enabled
    3.  The “output” tab displays the printk output
    4. The debugger correctly stops at the breakpoint in the  LinuxKernelModule1_init() function
    5. I can single step with StepOver (F10) and Step Into (F11)

    However, I get the following error message in the “GDB Session” tab after the status “Building LinuxKernelDebugHelper module” ends:

    I click on “Yes” and proceed…
    …after which, the following message appears in the “GDB Session” tab/subwindow:

    Cannot determine the address of the kernel debug block address
    Failed to build/install the kernel debug helper module. Enumerating modules and loading symbols will be very slow.
    Failed to query the address of next module. Your module list can be incomplete.
    Cannot query module base address for module at 0xa000019
    Warning: LinuxKernelModule1.ko has no .text section.

    Note, that the status “Building LinuxKernelDebugHelper module”, appears every time I run the debugger.

     

    QUESTION:  Why is the “Kernel Debug Helper Module”, not getting built ?

    I’m running Debian v10 (armhf) as the debugee/target and my Debug settings look like this:

    P.S.

    What is this option “Run a GDB stub:” ?
    Should I put anything in it ?

    • This topic was modified 1 month, 2 weeks ago by  Goerge255.
    • This topic was modified 1 month, 2 weeks ago by  Goerge255.
    • This topic was modified 1 month, 2 weeks ago by  Goerge255.
    • This topic was modified 1 month, 2 weeks ago by  Goerge255.
    Attachments:
    You must be logged in to view attached files.
    #26801

    Goerge255
    Participant

    If you look at this message box below, you can see that the Detected kernel version: is followed by GARBAGE CHARACTERS, which is in an indicator that something has gone awry.

    I found some goodies displayed in the console.  See the log below.

    This log indicates, that the Debug Helper's Loadable Kernel Module (LKM) has been built, …but not correctly.

    Specifically, the log indicates that the  Debug Helper's LKM has crashed inside its LinuxKernelDebugHelper_init() function, because it called the kernel function tracepoint_probe_register()  with a bad pointer struct tracepoint *tp because it could not find the exported kernel symbol: __tracepoint_module_load in /proc/kallsyms.
    See this article about this undefined kernel symbol.
    Anyway, all this has caused the kernel function tracepoint_probe_register_prio() to dereference this bad pointer ( 48 (0x30) bytes into this function ) and cause the crash.

    The log:
    [ 329.695667] LinuxKernelDebugHelper: loading out-of-tree module taints kernel.
    [ 329.815043] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
    [ 329.819438] pgd = 7135cf40
    [ 329.820714] [0000000c] *pgd=483f6003, *pmd=00000000
    [ 329.823712] Internal error: Oops: 206 [#1] SMP ARM
    [ 329.826062] Modules linked in: LinuxKernelDebugHelper(O+) evdev ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb virtio_net net_failover virtio_blk failover virtio_mmio virtio_ring virtio
    [ 329.835266] CPU: 0 PID: 1831 Comm: insmod Tainted: G O 4.19.0-6-armmp-lpae #1 Debian 4.19.67-2+deb10u2
    [ 329.839768] Hardware name: Generic DT based system
    [ 329.842623] PC is at tracepoint_probe_register_prio+0x30/0x2e0
    [ 329.845218] LR is at __schedule+0x310/0xa38
    [ 329.847016] pc : [] lr : [] psr: 60010113
    [ 329.849655] sp : c814dd20 ip : c814dc68 fp : c814dd5c
    [ 329.851847] r10: 0000000a r9 : 00000000 r8 : bf03b080
    [ 329.854055] r7 : bf03945c r6 : bf03b000 r5 : 00000000 r4 : bf03b2c0
    [ 329.856788] r3 : c8403480 r2 : 00000000 r1 : 00000000 r0 : 00000001
    [ 329.859676] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
    [ 329.862725] Control: 30c5387d Table: 4830d7c0 DAC: fffffffd
    [ 329.865232] Process insmod (pid: 1831, stack limit = 0x943ad6d6)
    [ 329.867919] Stack: (0xc814dd20 to 0xc814e000)
    [ 329.869906] dd20: c0c4a660 c0c47e3c c1440940 c049b5f8 c814dd5c bf03b2c0 bf03b2c0 bf03b000
    [ 329.873420] dd40: 00000000 bf03b080 c814df38 c1405dd8 c814dd6c c814dd60 c0554d40 c0554a50
    [ 329.877228] dd60: c814dd8c c814dd70 bf03e09c c0554d30 bf03b080 bf03e000 c1405dd8 00000000
    [ 329.880731] dd80: c814de04 c814dd90 c04035dc bf03e00c eff5c928 c0c47e64 00000000 00000002
    [ 329.884200] dda0: c814ddc4 c814ddb0 c0c47e64 c04f7370 006000c0 c064c060 c814de04 c814ddc8
    [ 329.887725] ddc0: c064c060 c0664818 00000001 c062c8f0 0000000c c0521b44 c8012c80 15d5c67e
    [ 329.891278] dde0: 00000002 bf03b080 00000002 c83cb180 c83cb740 bf03b080 c814de2c c814de08
    [ 329.894764] de00: c0521b80 c0403598 c814de2c c814de18 00000002 00000002 c83cb700 c83cb740
    [ 329.898296] de20: c814df14 c814de30 c0524334 c0521b18 bf03b08c 00007fff bf03b080 c052105c
    [ 329.901811] de40: c06711b4 c1405dd8 c0f8cae4 c0f8caf8 c0f8cad8 c1045f18 c0e09ebc bf03b1b0
    [ 329.905465] de60: bf03b27c 00000000 bf03b194 c05202b0 bf03b0c8 c83cb708 c1405dd8 c1061874
    [ 329.908982] de80: c814df30 000312fc c814dee4 c814de98 00000000 00000000 00000000 00000000
    [ 329.912548] dea0: 00000000 00000000 6e72656b 00006c65 00000000 00000000 00000000 00000000
    [ 329.916090] dec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [ 329.919553] dee0: 00000000 15d5c67e 7fffffff c1405dd8 00000000 00000003 005027e0 c0401324
    [ 329.923014] df00: c814c000 0000017b c814dfa4 c814df18 c052489c c0521e28 7fffffff 00000000
    [ 329.926489] df20: 00000003 00000002 c1405dd8 f0bd6000 000312fc 00000000 f0bd69de f0bd6b40
    [ 329.929892] df40: f0bd6000 000312fc f0c06a64 f0c06850 f0bfc3bc 00003000 00003170 00000000
    [ 329.933389] df60: 00000000 00000000 00001cac 00000034 00000035 0000001b 0000001f 00000014
    [ 329.936853] df80: 00000000 15d5c67e b3e83700 00000000 00000000 0000017b 00000000 c814dfa8
    [ 329.940365] dfa0: c0401120 c05247e8 b3e83700 00000000 00000003 005027e0 00000000 be976b58
    [ 329.943885] dfc0: b3e83700 00000000 00000000 0000017b 01ad8118 00000000 be976cd8 00000000
    [ 329.947379] dfe0: be976b08 be976af8 004fae41 b6d8ed92 40030030 00000003 00000000 00000000
    [ 329.951807] [] (tracepoint_probe_register_prio) from [] (tracepoint_probe_register+0x1c/0x20)
    [ 329.956839] [] (tracepoint_probe_register) from [] (LinuxKernelDebugHelper_init+0x9c/0x1000 [LinuxKernelDebugHelper])
    [ 329.962792] [] (LinuxKernelDebugHelper_init [LinuxKernelDebugHelper]) from [] (do_one_initcall+0x50/0x21c)
    [ 329.967725] [] (do_one_initcall) from [] (do_init_module+0x74/0x244)
    [ 329.971243] [] (do_init_module) from [] (load_module.constprop.14+0x2518/0x27f0)
    [ 329.975152] [] (load_module.constprop.14) from [] (sys_finit_module+0xc0/0x110)
    [ 329.978966] [] (sys_finit_module) from [] (ret_fast_syscall+0x0/0x4c)
    [ 329.982512] Exception stack(0xc814dfa8 to 0xc814dff0)
    [ 329.984682] dfa0: b3e83700 00000000 00000003 005027e0 00000000 be976b58
    [ 329.988121] dfc0: b3e83700 00000000 00000000 0000017b 01ad8118 00000000 be976cd8 00000000
    [ 329.991641] dfe0: be976b08 be976af8 004fae41 b6d8ed92
    [ 329.994098] Code: e1a0a003 e1a07001 e1a09002 eb1bd295 (e595300c)
    [ 329.997746] ---[ end trace 7aceba8a4492f3df ]---
    [ 338.455862] LinuxKernelModule1: Hello, world!

     

    Below is the printout of my kernel symbols containing the keyword “tracepoint”

    root@debian:~# cat /proc/kallsyms | grep "tracepoint"

    c0554a44 T tracepoint_probe_register_prio
    c0554d24 T tracepoint_probe_register
    c0554d44 T tracepoint_probe_unregister
    c0554f44 T register_tracepoint_module_notifier
    c0554fc0 T unregister_tracepoint_module_notifier
    c055503c t tracepoint_module_notify
    c055521c T for_each_kernel_tracepoint
    c0569ef8 T tracepoint_printk_sysctl
    c0585004 T bpf_find_raw_tracepoint
    c05962f8 t bpf_raw_tracepoint_release
    c05964c8 t bpf_raw_tracepoint_open
    c1225728 t init_tracepoints
    c1225fe8 t set_tracepoint_printk

    • This reply was modified 1 month, 2 weeks ago by  Goerge255.
    • This reply was modified 1 month, 2 weeks ago by  Goerge255.
    • This reply was modified 1 month, 2 weeks ago by  Goerge255.
    • This reply was modified 1 month, 1 week ago by  Goerge255.
    • This reply was modified 1 month, 1 week ago by  Goerge255.
    • This reply was modified 1 month, 1 week ago by  Goerge255.
    • This reply was modified 1 month, 1 week ago by  Goerge255.
    • This reply was modified 1 month, 1 week ago by  Goerge255.
    #26859

    support
    Keymaster

    The message about mismatching kernel symbols is shown when VisualKernel tries to read the kernel version from the target memory, using the address stored in the vmlinux file. If the result doesn’t match the version stored in the vmlinux file itself, the warning is shown. If the debugging works despite ignoring the warning, most likely the kernel contains some patches that remove the version information from RAM. In that case, the warning can be safely ignored (e.g. it does trigger incorrectly for STM32MP1 devices).

    The LinuxKernelDebugHelper module uses tracepoints to trace the loading/unloading of kernel modules. If the module loading tracepoints are disabled per Linux kernel configuration, LinuxKernelDebugHelper will indeed not work. This can be worked around in one of 2 ways:

    1. Simply disabling the LinuxKernelDebugHelper  module via VisualKernel Project Properties. This will revert to gathering module information by querying the module-related structures in debugger. This is slower than LinuxKernelDebugHelper, but will not require any configuration changes.
    2. Enabling the module loading tracepoints via the kernel configuration and rebuilding the kernel. This will enable the missing __tracepoint_module_load() symbol, allowing LinuxKernelDebugHelper to process module load events directly on the target, reducing the debug overhead.
    #26861

    Goerge255
    Participant

    2.Enabling the module loading tracepoints via the kernel configuration and rebuilding the kernel. This will enable the missing __tracepoint_module_load() symbol, allowing LinuxKernelDebugHelper to process module load events directly on the target, reducing the debug overhead.

    I disagree.
    In this scenario, the missing __tracepoint_module_load symbol is caused by the missing kernel configuration option CONFIG_KALLSYMS_ALL.
    This symbol is exported to /proc/kallsyms  by the statement extern struct tracepoint __tracepoint_##name; in the macro __DECLARE_TRACE defined in  tracepoint.h and as such, it is allocated in the  initialized data section.

    According to this documentation, when the kernel is compiled without the option CONFIG_KALLSYMS_ALL, only symbols allocated in the text and inittext sections appear in the /proc/kallsyms.  Thus, the symbol __tracepoint_module_load will not appear in /proc/kallsyms even if the Kernel is compiled with CONFIG_TRACEPOINTS = y  because this missing symbol is not allocated in the text or inittext section  ( it is allocated in the   initialized data section ).

    This is easy to verify by issuing the following command:
    root@debian:~# cat /proc/kallsyms | grep "__tracepoint_module_load"
    c16f7b20 D __tracepoint_module_load

    Note the capital letter “D”, which means that the symbol is located in the  initialized data section – not in the text section.

     

    To add the insult to injury, the standard distribution of Debian Linux comes with CONFIG_KALLSYMS_ALL disabled !

    Does VisualKernel’s documentation contain any guidelines about the configuration options that the Linux Kernel should be compiled with, in order to work with Visual Kernel  optimally ?

     

    #26862

    Goerge255
    Participant

    The message about mismatching kernel symbols is shown when VisualKernel tries to read the kernel version from the target memory, using the address stored in the vmlinux file. If the result doesn’t match the version stored in the vmlinux file itself, the warning is shown.

    I disagree in this scenario.
    The garbage characters are caused by the LinuxKernelDebugHelper.ko module causing oops (crashing the system) because it is buggy.

    The bug is in the source file LinuxKernelDebugHelper_main.c in the function LinuxKernelDebugHelper_init in the line:
    tracepoint_probe_register((struct tracepoint *)kallsyms_lookup_name("__tracepoint_module_load"), hook_module_load, NULL);
    …which over-optimistically assumes that kallsyms_lookup_name("__tracepoint_module_load") always returns a valid address.  Since this is not always true, often the function called by tracepoint_probe_register dereferences a NULL pointer in the kernel code, which crashes the system and generates the garbage characters.

    As I noticed in the previous post, the __tracepoint_module_load symbol does not appear in /proc/kallsyms even in the current standard Debianv10 distro, because the Kernel in this distro is compiled without the option CONFIG_KALLSYMS_ALL set, by default !!!

    To prevent the  LinuxKernelDebugHelper.ko module from crashing on Debian (and probably many other distros), the offending line should be changed to:

    if (kallsyms_lookup_name("__tracepoint_module_load")) 
       tracepoint_probe_register((struct tracepoint *)kallsyms_lookup_name("__tracepoint_module_load"), hook_module_load, NULL);
    else
      return -EFAULT;  /* Print some error message here, indicating that the Kernel has not been compiled with configuration options, which are required to maintain compatibility between the Kernel and the VisualKernel's Debug Helper module */

     

     

    #26918

    Goerge255
    Participant

    I asked this question before but it remained unanswered for over a week.

    “Does VisualKernel’s documentation contain any guidelines about the configuration options that the Linux Kernel should be compiled with, in order to work with all features of Visual Kernel  optimally ?”

     

    I already figured out that:

    CONFIG_KALLSYMS
    CONFIG_KALLSYMS_ALL
    CONFIG_TRACEPOINT

    …need to be configured to “yes” or VK’s Debug Helper’s module will not work …or will crash the kernel.

    WHAT ELSE needs to be configured in the kernel ?  ….anything on that list ?

    • This reply was modified 1 month ago by  Goerge255.
    #26921

    support
    Keymaster

    We have quickly rechecked the latest Debian 10.2 and the default kernel shipped with it is built with the CONFIG_KALLSYMS_ALL option.

    Most likely, you are using an unsupported Linux distro, that explains the configuration errors. You can find a list of supported distros here: https://visualkernel.com/history/.
    If you are not sure, please share the URL to the ISO file you are using and can check if it is supported.

    Also VisualKernel would normally detect tracepoint issues a compile-time and would not attempt to load LinuxKernelDebugHelper, however if you are using an unsupported distro that was never tested with VisualKernel, this check indeed may not work as expected. Either way, it is highly unlikely that dereferencing a null pointer would replace the kernel version string with random characters, while keeping the system otherwise usable.

    #26922

    Goerge255
    Participant

    If you are not sure, please share the URL to the ISO file you are using and can check if it is supported.

    I have installed the Debian using their standard debian-installer without a graphical interface (text-only mode):
    http://http.us.debian.org/debian/dists/buster/main/installer-armhf/current/images/netboot/

    Either way, it is highly unlikely that dereferencing a null pointer would replace the kernel version string with random characters, while keeping the system otherwise usable.

    Dereferencing a bad pointer in kernel code is a catastrophy that can cause any side effect.  Random characters, after a kernel crash, are not surprising…

    Also, as far as “keeping the system otherwise usable” – I have never reported that.
    Instead, I’ve reported that the gdb still remains functional in the kernel after the oops and allows for single stepping.

    Regardless how unlikely, it happens nonetheless.
    Just look at the oopsscreen I have posted before.  This happened because CONFIG_KALLSYMS_ALL was not defined, and consequently the symbol __tracepoint_module_load was not defined, either.

    The obvious bug in LinuxKernelDebugHelper_main.c, which I have documented and the oops trace is not something that I have hallucinated.  We are all programmers here – please do not treat me like a luser.

     

    PLEASE provide guidelines about the configuration options that the Linux Kernel should be compiled with, in order to work with all features of Visual Kernel  optimally ?  …e.g.: anything else on that list  ?

    • This reply was modified 1 month ago by  Goerge255.
    #26924

    support
    Keymaster

    Sorry, VisualKernel does not support the generic armhf build of Debian as it is not as popular as other distros.

    VisualKernel does support the following distros for desktop (x86 and amd64 targets):

    • Ubuntu
    • Debian
    • Mint
    • CentOS
    • Fedora

    See the history page for a list of specific versions supported by a particular VisualKernel release.

    VisualKernel also supports the following ARM targets:

    Generally, you should be able to target an unsupported distro as well, however you would need to research the compatible flags and settings on your side. As directly supporting each distro requires considerable non-trivial troubleshooting, we are only able to offer out-of-the-box support for a handful of most popular distributions.

    Regarding LinuxKernelDebugHelper, we will try to address this in one of the next releases of VisualKernel, however as the issue is only triggered by a specific kernel configuration on an unsupported distro, this has a relatively low priority.

    #26925

    Goerge255
    Participant

    …you would need to research the compatible flags and settings on your side.

    Are you serious ?!

    Do you expect me to read through the source code of VisualKernel’s components, document your source code and on that basis determine what Linux Kernel configuration options it requires ? ..or do you expect me to brute-force Kernel configurations options until VK’s components stop crashing my system and start working properly?
    I can be expected to read and analyze my source code or even Linux Kernel’s source code ….BUT NOT to read and analyze your source code if I am going to be paying for it !

    It is a grave omission that VisualKernel does not offer public documentation plainly listing all required Linux Kernel configuration options.  These options are COMMON to all distros because they all use the same kernel written by Linus Torvalds.

    If such list was available, then any client of yours, could compile the Linux Kernel in such manner that it is compatible with VisualKernel and the onus to do so would be on them,  REGARDLESS of the specific distro’s default settings.  …also,  people often have Custom Kernel configurations, which can be incompatible with VisualKernel when they don’t know which option is required by VK and which is not allowed.  This incompatibility can happen even with custom Kernels in the distros which you officially support.

    Sincerely!

    • This reply was modified 1 month ago by  Goerge255.
    • This reply was modified 1 month ago by  Goerge255.
    #26928

    support
    Keymaster

    The mechanisms used by VisualKernel to manage the modules vary between different Linux kernel versions. To make things more complicated, some of the distros have previously backported some trace-related changes into their kernel fork (so that the same kernel version used by different distros would require a different configuration).

    Furthermore, those options are also different between hardware platforms, hence a configuration that works for one distro/kernel versions will very likely not work with others (we have previously encountered that).

    This is exactly why we chose to support a handful of most popular distros and configurations (i.e. test everything on our side and make sure VisualKernel sets all necessary parameters automatically) and provide sources for all low-level components so that they could be tweaked to target an unsupported configuration.

    We understand you are using an unsupported distro and expect the level of support we would normally provide for a supported distro. Unfortunately, this is not something we can offer at a price of an off-the-shelf product, as it is only possible to directly support popular distros/targets that are relevant to many users as the same time.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.