Sysprogs forums › Forums › VisualKernel › Problem debugging custom kernel
- This topic has 13 replies, 2 voices, and was last updated 4 years, 10 months ago by support.
-
AuthorPosts
-
January 15, 2020 at 10:52 #27065MischoParticipant
Hello,
I’m trying to debug over Ethernet custom kernel for i.MX7 from fsl-community-bsp BSP with sources and configuration generated with Yocto.
I had copied kernel sources to other folder and setup VisualKernel advanced kernel project. For build I’m using toolchain from Yocto generated SDK for target image. Build finished correctly without errors.
Next I deployed image to running device originally flashed with Yocto result image by replacing zImage in boot partition. Device boots correctly anduname -v
displays correct build time and version.
When I attach debugger and try to unload module with breakpoint set inprint_modules()
as displayed in tutorial , I getReceived a SIGTRAP: Trace/breakpoint trap
error from Visual Studio 2015 Enterprise.Output from debug console:
LinuxKernelDebugHelper: loading out-of-tree module taints kernel. netpoll: kgdboe: local IP 10.0.7.134 kgdboe: single-core mode enabled. Shutting down all cores except #0. This is slower, but safer. kgdboe: you can try using multi-core mode by specifying the following argument: insmod kgdboe.ko force_single_core = 0 CPU1: shutdown kgdboe: Successfully initialized. Use the following gdb command to attach: target remote udp:10.0.7.134:31337
Same thing happened, when I tried to debug image build directly from Yocto loading its symbols using VisualKernel
Quick Debug Linux Kernel
option, probably when it should hit automatic breakpoint after attach.What could be a problem?
Thanks in advance.- This topic was modified 4 years, 10 months ago by Mischo.
January 15, 2020 at 22:41 #27071supportKeymasterHi,
This might indicate that the KGDBoE module (debugging over Ethernet) is not fully compatible with your target, or it could indicate a symbol problem.
If you could attach a gdb log from the debug session, we should be able to tell what is going on. Please also consider debugging the target via JTAG, as it is generally less fragile.
January 16, 2020 at 11:07 #27077MischoParticipantThanks for quick response. Log and .config file is in attachment. KGDBoE shows no error on build and looks like it is installed successfully. In meantime I’m trying to get debug running using LPC-Link 2 probe JTAG with CMSIS-DAP firmware.
Attachments:
You must be logged in to view attached files.January 16, 2020 at 12:10 #27079MischoParticipantJanuary 16, 2020 at 12:38 #27082MischoParticipantKernel module project running on Kernel build from VisualKernel and set “Kernel type” to it (so symbols should match) throws same error. Output from debug UART on device:
kgdboe: loading out-of-tree module taints kernel. netpoll: kgdboe: local IP 10.0.7.134 kgdboe: single-core mode enabled. Shutting down all cores except #0. This is slower, but safer. kgdboe: you can try using multi-core mode by specifying the following argument: insmod kgdboe.ko force_single_core = 0 CPU1: shutdown KGDB: Registered I/O driver kgdboe kgdboe: Successfully initialized. Use the following gdb command to attach: target remote udp:10.0.7.134:31337 LinuxKernelModule1: module license 'Proprietary' taints kernel. Disabling lock debugging due to kernel taint LinuxKernelModule1: Hello, world! LinuxKernelModule1: Goodbye, world!
Log is in attachment.
Attachments:
You must be logged in to view attached files.January 16, 2020 at 14:57 #27084MischoParticipantI have tried to debug kernel module project (I assume it is easier for debugger first setup than Kernel debugging) with J-LINK using OpenOCD method and it fails with “Failed to connect to debug stub” error. Test connection passes correctly for J-LINK. GDB stub log and VisualKernel settings image are attached.
Attachments:
You must be logged in to view attached files.January 16, 2020 at 20:09 #27095supportKeymasterHi,
Thanks for providing the detailed description.
It looks like the KGDBoE-based debug session works. The stop happens inside the entry-common.S file and is likely by design:
[ 140315 ms] ~"284\t\tb\tret_slow_syscall\n" [ 140315 ms] *stopped,reason="signal-received",signal-name="SIGTRAP",signal-meaning="Trace/breakpoint trap",frame={addr="0x80108504",func="sys_call_table",args=[],file="/home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd/kernel-source/arch/arm/kernel/entry-common.S",fullname="/home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd/kernel-source/arch/arm/kernel/entry-common.S",line="284"},thread-id="121",stopped-threads="all"
Normally, VisualKernel would open the entry-common.S file in Visual Studio once that stop happened, however depending on how you imported the kernel into it, it may not know where to locate it. If this is the case, you can setup a manual mapping between the paths reported by gdb (e.g. /home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd) and the paths on the Windows machine via VisualKernel Project Properties -> Path Mapping.
Regarding JTAG, most likely your firewall is blocking the connection (gdb running on the build machine needs to connect to OpenOCD running on the Windows machine), or the build machine is not able to resolve the Windows machine‘s host name due to missing DNS entries. Please double-check the firewall settings and the gdb log (search the gdb log for “remote” to find out the host/port used by VisualKernel). You can override the “remote” command via VisualKernel Project Properties -> Debug Settings -> Advanced.
January 16, 2020 at 21:58 #27105MischoParticipantThanks for response.
I forgot to mention, that debugger shows entry-common.S file in editor on correct line so symbols are probably working correctly. Problem is that I cannot continue because it stops immediately on the same line (284) when I try to continue (I have no breakpoint there). Code looks like:
__sys_trace_return_nosave: enable_irq_notrace mov r0, sp bl syscall_trace_exit b ret_slow_syscall
It breaks on
b ret_slow_syscall
I’m attaching the whole file (renamed because does not allow to upload it with original name).
For JTAG I will try to play with network settings on Windows machine.
- This reply was modified 4 years, 10 months ago by Mischo.
Attachments:
You must be logged in to view attached files.January 16, 2020 at 22:38 #27109supportKeymasterThanks for the clarification. It looks like some patches or configuration in the kernel you are using make it incompatible with KGDBoE.
Generally KGDBoE is less reliable than other debug methods as it relies on several assumptions about the network driver implementation that don’t always hold.
If your board has JTAG pins available, using it instead of KGDBoE should result in much more consistent and reliable experience. Let us know if you need help understanding the connectivity issues between gdb and OpenOCD.
January 17, 2020 at 13:59 #27114MischoParticipantThanks.
I managed to get through GDB error for JTAG connection (was GDB side problem on build machine). Now VisualKernel attaches to kernel module without any error notifications, but it does not hit breakpoint in
LinuxKernelModule1_init()
function and “LinuxKernelModule1: Hello, world!” is printed correctly to debug UART. Breakpoint changes to transparent as when no symbol files are loaded when normally debugging in Visual Studio, but it would be really strange to VisualKernel would not be able to load symbols for module it just built and deployed to device. I’m attaching log, GDB printed line 487 as last until I killed debug session.Attachments:
You must be logged in to view attached files.January 17, 2020 at 17:52 #27118MischoParticipantI have tried to change “Obtain module information via:” to “Optimized helper module”. Looks like it at least detect module in “Modules” tab and symbols tab points to correct object, but now I get error in UART debug console and module does not run. Without Optimized helper module I cannot see module in “Modules” tab (lsmod displays it correctly on device) so it is probably cause of why debugging symbols are not loaded in my previous post.
LinuxKernelModule1: module license 'Proprietary' taints kernel. Disabling lock debugging due to kernel taint Unhandled prefetch abort: breakpoint debug exception (0x002) at 0x7f02a16c Internal error: : 2 [#1] PREEMPT SMP ARM Modules linked in: LinuxKernelModule1(PO+) LinuxKernelDebugHelper(O) ov5640_camera_mipi mxc_mipi_csi evbug mx6s_capture CPU: 1 PID: 503 Comm: insmod Tainted: P O 4.9.67-fslc+g953c6e30c970 #23 Hardware name: Freescale i.MX7 Dual (Device Tree) task: a8460580 task.stack: a8606000 PC is at ModuleEventCallback+0x0/0x18 [LinuxKernelDebugHelper] LR is at ModuleNotificationCallback+0x144/0x164 [LinuxKernelDebugHelper] pc : [<7f02a16c>] lr : [<7f02a40c>] psr: 600b0013 sp : a8607e30 ip : 00000000 fp : a8607e4c r10: a8880da4 r9 : 00000000 r8 : 00000001 r7 : 00000000 r6 : 7f02e0c0 r5 : 7f02aa40 r4 : 7f02a7f4 r3 : 7f02a16c r2 : 00000054 r1 : 5456454d r0 : 00000001 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c53c7d Table: a887806a DAC: 00000051 Process insmod (pid: 503, stack limit = 0xa8606210) Stack: (0xa8607e30 to 0xa8608000) 7e20: a93673c8 00000001 a8880d80 7f02e0c0 7e40: a8607e5c a8607e50 7f02a448 7f02a2d4 a8607f1c a8607e60 801b21c0 7f02a438 7e60: 7f02e0cc 00007fff 7f02e0c0 801ae804 0000f1e8 00000000 80d567d4 80d56910 7e80: 80d567fc 7f02e0cc a8607f44 7f02e0cc 80b039e0 7f02e2a4 7f02e108 024000c0 7ea0: a8607eec a8607eb0 80259170 80258fcc c0ab1000 00000000 00006c65 00000000 7ec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7ee0: 00000000 00000000 00000000 00000000 7fffffff 00000000 00000003 0045c378 7f00: 0000017b 80108504 a8606000 00000000 a8607fa4 a8607f20 801b260c 801b00f0 7f20: 7fffffff 00000000 00000003 a8607f38 8021e104 c0ab1000 0000f1e8 00000000 7f40: 00000000 c0ab1000 0000f1e8 c0abfbf8 c0abfa70 c0abcba4 000002c0 00000390 7f60: 00000000 00000000 00000000 00000490 00000023 00000024 0000000e 00000012 7f80: 00000008 00000000 00000000 00000000 00c7d190 00000000 00000000 a8607fa8 7fa0: 80108340 801b2564 00000000 00c7d190 00000003 0045c378 00000000 7edfdbfc 7fc0: 00000000 00c7d190 00000000 0000017b 00000002 00000002 0046ee28 00000000 7fe0: 7edfdc00 7edfdbf0 004551e7 76e7b5f2 60030030 00000003 0a746961 0000000a [<7f02a16c>] (ModuleEventCallback [LinuxKernelDebugHelper]) from [<7f02a448>] (hook_module_load+0x1c/0x20 [LinuxKernelDebugHelper]) [<7f02a448>] (hook_module_load [LinuxKernelDebugHelper]) from [<801b21c0>] (load_module+0x20dc/0x2300) [<801b21c0>] (load_module) from [<801b260c>] (SyS_finit_module+0xb4/0xcc) [<801b260c>] (SyS_finit_module) from [<80108340>] (ret_fast_syscall+0x0/0x3c) Code: e1a00004 eb487df8 e89da830 e7f001f2 (e1200171) ---[ end trace 98d462b56ca2f876 ]--- note: insmod[503] exited with preempt_count 1
January 17, 2020 at 19:37 #27120supportKeymasterHi,
It looks like the instead of passing the breakpoint event to the JTAG debugger, the kernel tries handling it directly. This might also be the cause for kgdboe error you encountered before.
Please try adding the following startup command to VisualKernel Project Properties -> Startup Commands -> Before Connecting to Target:
monitor gdb_breakpoint_override hard
This will force OpenOCD to use hardware breakpoints instead of software breakpoints. Depending on the way exception handling is implemented in this kernel port, it may resolve the issue.
January 20, 2020 at 10:01 #27149MischoParticipantHi,
I have tried to change to hardware breakpoints and it now breaks with SIGTRAP on first line in
/* * This is where the real work happens. * * Keep it uninlined to provide a reliable breakpoint target, e.g. for the gdb * helper command 'lx-symbols'. */ static noinline int do_init_module(struct module *mod) { int ret = 0; struct mod_initfree *freeinit; freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL); if (!freeinit) { ret = -ENOMEM; goto fail; } freeinit->module_init = mod->init_layout.base; /* * We want to find out whether @mod uses async during init. Clear * PF_USED_ASYNC. async_schedule*() will set it. */ current->flags &= ~PF_USED_ASYNC; do_mod_ctors(mod); /* Start the module */ if (mod->init != NULL) ret = do_one_initcall(mod->init); if (ret < 0) { goto fail_free_freeinit; } if (ret > 0) { pr_warn("%s: '%s'->init suspiciously returned %d, it should " "follow 0/-E convention\n" "%s: loading module anyway...\n", __func__, mod->name, ret, __func__); dump_stack(); } /* Now it's a first class citizen! */ mod->state = MODULE_STATE_LIVE; blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_LIVE, mod); /* * We need to finish all async code before the module init sequence * is done. This has potential to deadlock. For example, a newly * detected block device can trigger request_module() of the * default iosched from async probing task. Once userland helper * reaches here, async_synchronize_full() will wait on the async * task waiting on request_module() and deadlock. * * This deadlock is avoided by perfomring async_synchronize_full() * iff module init queued any async jobs. This isn't a full * solution as it will deadlock the same if module loading from * async jobs nests more than once; however, due to the various * constraints, this hack seems to be the best option for now. * Please refer to the following thread for details. * * http://thread.gmane.org/gmane.linux.kernel/1420814 */ if (!mod->async_probe_requested && (current->flags & PF_USED_ASYNC)) async_synchronize_full(); mutex_lock(&module_mutex); /* Drop initial reference. */ module_put(mod); trim_init_extable(mod); #ifdef CONFIG_KALLSYMS /* Switch to core kallsyms now init is done: kallsyms may be walking! */ rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms); #endif module_enable_ro(mod, true); mod_tree_remove_init(mod); disable_ro_nx(&mod->init_layout); module_arch_freeing_init(mod); mod->init_layout.base = NULL; mod->init_layout.size = 0; mod->init_layout.ro_size = 0; mod->init_layout.ro_after_init_size = 0; mod->init_layout.text_size = 0; /* * We want to free module_init, but be aware that kallsyms may be * walking this with preempt disabled. In all the failure paths, we * call synchronize_sched(), but we don't want to slow down the success * path, so use actual RCU here. */ call_rcu_sched(&freeinit->rcu, do_free_init); mutex_unlock(&module_mutex); wake_up_all(&module_wq); return 0; fail_free_freeinit: kfree(freeinit); fail: /* Try to protect us from buggy refcounters. */ mod->state = MODULE_STATE_GOING; synchronize_sched(); module_put(mod); blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod); klp_module_going(mod); ftrace_release_mod(mod); free_module(mod); wake_up_all(&module_wq); return ret; }
However I was able to start debugging with workaround correctly using J-LINK with official driver and
J-Link GDB server
. Next I setupCustom kernel
connection onHost/Port
provided by GDB server and selectBefore debugging, target is: Crashed/frozen
in Kernel session tweaking category, because GDB server puts MCU to halted state automatically. Kernel module is not loaded automatically this way, so It must be loaded manually over SSH or from GDB session tab.Thanks for all support.
January 21, 2020 at 04:15 #27173supportKeymasterNo problem. Most likely, the chip you are using handles the exceptions slightly differently from the way OpenOCD would expect them and hence prevents the breakpoints from being handled correctly. Normally, this should be fixed in one of the upcoming OpenOCD updates.
Either way, if J-Link software works better, you can use the following workaround to avoid the “Attach to crashed/frozen target”. Try creating a gdb script with the following contents:
target remote :2331 monitor go disconnect quit
It instructs gdb to connect to the J-Link gdb stub, resume the target and disconnect. You can run it via command line as shown below:
<VisualKernel directory>\KernelTools\arm\arm-linux-gnu-gdb.exe -s <script file>
Or alternatively, add it to VisualKernel Project Properties -> Custom Debug Steps -> Before Debugging. This will allow VisualKernel to connect to the target via SSH and handle deployment/module enumeration as if it does with regular debug sessions.
-
AuthorPosts
- You must be logged in to reply to this topic.