Problem debugging custom kernel

Sysprogs forums Forums VisualKernel Problem debugging custom kernel

Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #27065
    Mischo
    Participant

    Hello,

    I’m trying to debug over Ethernet custom kernel for i.MX7 from fsl-community-bsp BSP with sources and configuration generated with Yocto.
    I had copied kernel sources to other folder and setup VisualKernel advanced kernel project. For build I’m using toolchain from Yocto generated SDK for target image. Build finished correctly without errors.
    Next I deployed image to running device originally flashed with Yocto result image by replacing zImage in boot partition. Device boots correctly and uname -v displays correct build time and version.
    When I attach debugger and try to unload module with breakpoint set in print_modules() as displayed in tutorial , I get Received a SIGTRAP: Trace/breakpoint trap error from Visual Studio 2015 Enterprise.

    Output from debug console:

    LinuxKernelDebugHelper: loading out-of-tree module taints kernel.
    netpoll: kgdboe: local IP 10.0.7.134
    kgdboe: single-core mode enabled. Shutting down all cores except #0. This is slower, but safer.
    kgdboe: you can try using multi-core mode by specifying the following argument:
    insmod kgdboe.ko force_single_core = 0
    CPU1: shutdown
    kgdboe: Successfully initialized. Use the following gdb command to attach:
    target remote udp:10.0.7.134:31337

    Same thing happened, when I tried to debug image build directly from Yocto loading its symbols using VisualKernel Quick Debug Linux Kernel option, probably when it should hit automatic breakpoint after attach.

    What could be a problem?
    Thanks in advance.

     

    • This topic was modified 4 years, 3 months ago by Mischo.
    #27071
    support
    Keymaster

    Hi,

    This might indicate that the KGDBoE module (debugging over Ethernet) is not fully compatible with your target, or it could indicate a symbol problem.

    If you could attach a gdb log from the debug session, we should be able to tell what is going on. Please also consider debugging the target via JTAG, as it is generally less fragile.

    #27077
    Mischo
    Participant

    Thanks for quick response. Log and .config file is in attachment. KGDBoE shows no error on build and looks like it is installed successfully. In meantime I’m trying to get debug running using LPC-Link 2 probe JTAG with CMSIS-DAP firmware.

    Attachments:
    You must be logged in to view attached files.
    #27079
    Mischo
    Participant

    .config file

    • This reply was modified 4 years, 3 months ago by Mischo.
    Attachments:
    You must be logged in to view attached files.
    #27082
    Mischo
    Participant

    Kernel module project running on Kernel build from VisualKernel and set “Kernel type” to it (so symbols should match) throws same error. Output from debug UART on device:

    kgdboe: loading out-of-tree module taints kernel.
    netpoll: kgdboe: local IP 10.0.7.134
    kgdboe: single-core mode enabled. Shutting down all cores except #0. This is slower, but safer.
    kgdboe: you can try using multi-core mode by specifying the following argument:
    	insmod kgdboe.ko force_single_core = 0
    CPU1: shutdown
    KGDB: Registered I/O driver kgdboe
    kgdboe: Successfully initialized. Use the following gdb command to attach:
    	target remote udp:10.0.7.134:31337
    LinuxKernelModule1: module license 'Proprietary' taints kernel.
    Disabling lock debugging due to kernel taint
    LinuxKernelModule1: Hello, world!
    LinuxKernelModule1: Goodbye, world!

    Log is in attachment.

    Attachments:
    You must be logged in to view attached files.
    #27084
    Mischo
    Participant

    I have tried to debug kernel module project (I assume it is easier for debugger first setup than Kernel debugging) with J-LINK using OpenOCD method and it fails with “Failed to connect to debug stub” error. Test connection passes correctly for J-LINK. GDB stub log and VisualKernel settings image are attached.

    Attachments:
    You must be logged in to view attached files.
    #27095
    support
    Keymaster

    Hi,

    Thanks for providing the detailed description.

    It looks like the KGDBoE-based debug session works. The stop happens inside the entry-common.S file and is likely by design:

    [  140315 ms] ~"284\t\tb\tret_slow_syscall\n"
    [  140315 ms] *stopped,reason="signal-received",signal-name="SIGTRAP",signal-meaning="Trace/breakpoint trap",frame={addr="0x80108504",func="sys_call_table",args=[],file="/home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd/kernel-source/arch/arm/kernel/entry-common.S",fullname="/home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd/kernel-source/arch/arm/kernel/entry-common.S",line="284"},thread-id="121",stopped-threads="all"

    Normally, VisualKernel would open the entry-common.S file in Visual Studio once that stop happened, however depending on how you imported the kernel into it, it may not know where to locate it. If this is the case, you can setup a manual mapping between the paths reported by gdb (e.g. /home/build/fsl-community-bsp/buildOutput/tmp/work-shared/imx7dsabresd) and the paths on the Windows machine via VisualKernel Project Properties -> Path Mapping.

    Regarding JTAG, most likely your firewall is blocking the connection (gdb running on the build machine needs to connect to OpenOCD running on the Windows machine), or the build machine is not able to resolve the Windows machine‘s host name due to missing DNS entries. Please double-check the firewall settings and the gdb log (search the gdb log for “remote” to find out the host/port used by VisualKernel). You can override the “remote” command via VisualKernel Project Properties -> Debug Settings -> Advanced.

    #27105
    Mischo
    Participant

    Thanks for response.

    I forgot to mention, that debugger shows entry-common.S file in editor on correct line so symbols are probably working correctly. Problem is that I cannot continue because it stops immediately on the same line (284) when I try to continue (I have no breakpoint there). Code looks like:

    __sys_trace_return_nosave:
    	enable_irq_notrace
    	mov	r0, sp
    	bl	syscall_trace_exit
    	b	ret_slow_syscall

    It breaks on
    b ret_slow_syscall

    I’m attaching the whole file (renamed because does not allow to upload it with original name).

    For JTAG I will try to play with network settings on Windows machine.

    • This reply was modified 4 years, 3 months ago by Mischo.
    Attachments:
    You must be logged in to view attached files.
    #27109
    support
    Keymaster

    Thanks for the clarification. It looks like some patches or configuration in the kernel you are using make it incompatible with KGDBoE.

    Generally KGDBoE is less reliable than other debug methods as it relies on several assumptions about the network driver implementation that don’t always hold.

    If your board has JTAG pins available, using it instead of KGDBoE should result in much more consistent and reliable experience. Let us know if you need help understanding the connectivity issues between gdb and OpenOCD.

    #27114
    Mischo
    Participant

    Thanks.

    I managed to get through GDB error for JTAG connection (was GDB side problem on build machine). Now VisualKernel attaches to kernel module without any error notifications, but it does not hit breakpoint in LinuxKernelModule1_init() function and “LinuxKernelModule1: Hello, world!” is printed correctly to debug UART. Breakpoint changes to transparent as when no symbol files are loaded when normally debugging in Visual Studio, but it would be really strange to VisualKernel would not be able to load symbols for module it just built and deployed to device. I’m attaching log, GDB printed line 487 as last until I killed debug session.

    Attachments:
    You must be logged in to view attached files.
    #27118
    Mischo
    Participant

    I have tried to change “Obtain module information via:” to “Optimized helper module”. Looks like it at least detect module in “Modules” tab and symbols tab points to correct object, but now I get error in UART debug console and module does not run. Without Optimized helper module I cannot see module in “Modules” tab (lsmod displays it correctly on device) so it is probably cause of why debugging symbols are not loaded in my previous post.

    LinuxKernelModule1: module license 'Proprietary' taints kernel.
    Disabling lock debugging due to kernel taint
    Unhandled prefetch abort: breakpoint debug exception (0x002) at 0x7f02a16c
    Internal error: : 2 [#1] PREEMPT SMP ARM
    Modules linked in: LinuxKernelModule1(PO+) LinuxKernelDebugHelper(O) ov5640_camera_mipi mxc_mipi_csi evbug mx6s_capture
    CPU: 1 PID: 503 Comm: insmod Tainted: P           O    4.9.67-fslc+g953c6e30c970 #23
    Hardware name: Freescale i.MX7 Dual (Device Tree)
    task: a8460580 task.stack: a8606000
    PC is at ModuleEventCallback+0x0/0x18 [LinuxKernelDebugHelper]
    LR is at ModuleNotificationCallback+0x144/0x164 [LinuxKernelDebugHelper]
    pc : [<7f02a16c>]    lr : [<7f02a40c>]    psr: 600b0013
    sp : a8607e30  ip : 00000000  fp : a8607e4c
    r10: a8880da4  r9 : 00000000  r8 : 00000001
    r7 : 00000000  r6 : 7f02e0c0  r5 : 7f02aa40  r4 : 7f02a7f4
    r3 : 7f02a16c  r2 : 00000054  r1 : 5456454d  r0 : 00000001
    Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c53c7d  Table: a887806a  DAC: 00000051
    Process insmod (pid: 503, stack limit = 0xa8606210)
    Stack: (0xa8607e30 to 0xa8608000)
    7e20:                                     a93673c8 00000001 a8880d80 7f02e0c0
    7e40: a8607e5c a8607e50 7f02a448 7f02a2d4 a8607f1c a8607e60 801b21c0 7f02a438
    7e60: 7f02e0cc 00007fff 7f02e0c0 801ae804 0000f1e8 00000000 80d567d4 80d56910
    7e80: 80d567fc 7f02e0cc a8607f44 7f02e0cc 80b039e0 7f02e2a4 7f02e108 024000c0
    7ea0: a8607eec a8607eb0 80259170 80258fcc c0ab1000 00000000 00006c65 00000000
    7ec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    7ee0: 00000000 00000000 00000000 00000000 7fffffff 00000000 00000003 0045c378
    7f00: 0000017b 80108504 a8606000 00000000 a8607fa4 a8607f20 801b260c 801b00f0
    7f20: 7fffffff 00000000 00000003 a8607f38 8021e104 c0ab1000 0000f1e8 00000000
    7f40: 00000000 c0ab1000 0000f1e8 c0abfbf8 c0abfa70 c0abcba4 000002c0 00000390
    7f60: 00000000 00000000 00000000 00000490 00000023 00000024 0000000e 00000012
    7f80: 00000008 00000000 00000000 00000000 00c7d190 00000000 00000000 a8607fa8
    7fa0: 80108340 801b2564 00000000 00c7d190 00000003 0045c378 00000000 7edfdbfc
    7fc0: 00000000 00c7d190 00000000 0000017b 00000002 00000002 0046ee28 00000000
    7fe0: 7edfdc00 7edfdbf0 004551e7 76e7b5f2 60030030 00000003 0a746961 0000000a
    [<7f02a16c>] (ModuleEventCallback [LinuxKernelDebugHelper]) from [<7f02a448>] (hook_module_load+0x1c/0x20 [LinuxKernelDebugHelper])
    [<7f02a448>] (hook_module_load [LinuxKernelDebugHelper]) from [<801b21c0>] (load_module+0x20dc/0x2300)
    [<801b21c0>] (load_module) from [<801b260c>] (SyS_finit_module+0xb4/0xcc)
    [<801b260c>] (SyS_finit_module) from [<80108340>] (ret_fast_syscall+0x0/0x3c)
    Code: e1a00004 eb487df8 e89da830 e7f001f2 (e1200171) 
    ---[ end trace 98d462b56ca2f876 ]---
    note: insmod[503] exited with preempt_count 1
    #27120
    support
    Keymaster

    Hi,

    It looks like the instead of passing the breakpoint event to the JTAG debugger, the kernel tries handling it directly. This might also be the cause for kgdboe error you encountered before.

    Please try adding the following startup command to VisualKernel Project Properties -> Startup Commands -> Before Connecting to Target:

    monitor gdb_breakpoint_override hard

    This will force OpenOCD to use hardware breakpoints instead of software breakpoints. Depending on the way exception handling is implemented in this kernel port, it may resolve the issue.

    #27149
    Mischo
    Participant

    Hi,

    I have tried to change to hardware breakpoints and it now breaks with SIGTRAP on first line in

    /*
     * This is where the real work happens.
     *
     * Keep it uninlined to provide a reliable breakpoint target, e.g. for the gdb
     * helper command 'lx-symbols'.
     */
    static noinline int do_init_module(struct module *mod)
    {
    	int ret = 0;
    	struct mod_initfree *freeinit;
    
    	freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL);
    	if (!freeinit) {
    		ret = -ENOMEM;
    		goto fail;
    	}
    	freeinit->module_init = mod->init_layout.base;
    
    	/*
    	 * We want to find out whether @mod uses async during init.  Clear
    	 * PF_USED_ASYNC.  async_schedule*() will set it.
    	 */
    	current->flags &= ~PF_USED_ASYNC;
    
    	do_mod_ctors(mod);
    	/* Start the module */
    	if (mod->init != NULL)
    		ret = do_one_initcall(mod->init);
    	if (ret < 0) {
    		goto fail_free_freeinit;
    	}
    	if (ret > 0) {
    		pr_warn("%s: '%s'->init suspiciously returned %d, it should "
    			"follow 0/-E convention\n"
    			"%s: loading module anyway...\n",
    			__func__, mod->name, ret, __func__);
    		dump_stack();
    	}
    
    	/* Now it's a first class citizen! */
    	mod->state = MODULE_STATE_LIVE;
    	blocking_notifier_call_chain(&module_notify_list,
    				     MODULE_STATE_LIVE, mod);
    
    	/*
    	 * We need to finish all async code before the module init sequence
    	 * is done.  This has potential to deadlock.  For example, a newly
    	 * detected block device can trigger request_module() of the
    	 * default iosched from async probing task.  Once userland helper
    	 * reaches here, async_synchronize_full() will wait on the async
    	 * task waiting on request_module() and deadlock.
    	 *
    	 * This deadlock is avoided by perfomring async_synchronize_full()
    	 * iff module init queued any async jobs.  This isn't a full
    	 * solution as it will deadlock the same if module loading from
    	 * async jobs nests more than once; however, due to the various
    	 * constraints, this hack seems to be the best option for now.
    	 * Please refer to the following thread for details.
    	 *
    	 * http://thread.gmane.org/gmane.linux.kernel/1420814
    	 */
    	if (!mod->async_probe_requested && (current->flags & PF_USED_ASYNC))
    		async_synchronize_full();
    
    	mutex_lock(&module_mutex);
    	/* Drop initial reference. */
    	module_put(mod);
    	trim_init_extable(mod);
    #ifdef CONFIG_KALLSYMS
    	/* Switch to core kallsyms now init is done: kallsyms may be walking! */
    	rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms);
    #endif
    	module_enable_ro(mod, true);
    	mod_tree_remove_init(mod);
    	disable_ro_nx(&mod->init_layout);
    	module_arch_freeing_init(mod);
    	mod->init_layout.base = NULL;
    	mod->init_layout.size = 0;
    	mod->init_layout.ro_size = 0;
    	mod->init_layout.ro_after_init_size = 0;
    	mod->init_layout.text_size = 0;
    	/*
    	 * We want to free module_init, but be aware that kallsyms may be
    	 * walking this with preempt disabled.  In all the failure paths, we
    	 * call synchronize_sched(), but we don't want to slow down the success
    	 * path, so use actual RCU here.
    	 */
    	call_rcu_sched(&freeinit->rcu, do_free_init);
    	mutex_unlock(&module_mutex);
    	wake_up_all(&module_wq);
    
    	return 0;
    
    fail_free_freeinit:
    	kfree(freeinit);
    fail:
    	/* Try to protect us from buggy refcounters. */
    	mod->state = MODULE_STATE_GOING;
    	synchronize_sched();
    	module_put(mod);
    	blocking_notifier_call_chain(&module_notify_list,
    				     MODULE_STATE_GOING, mod);
    	klp_module_going(mod);
    	ftrace_release_mod(mod);
    	free_module(mod);
    	wake_up_all(&module_wq);
    	return ret;
    }

    However I was able to start debugging with workaround correctly using J-LINK with official driver and J-Link GDB server. Next I setup Custom kernel connection on Host/Port provided by GDB server and select Before debugging, target is: Crashed/frozen in Kernel session tweaking category, because GDB server puts MCU to halted state automatically. Kernel module is not loaded automatically this way, so It must be loaded manually over SSH or from GDB session tab.

    Thanks for all support.

    #27173
    support
    Keymaster

    No problem. Most likely, the chip you are using handles the exceptions slightly differently from the way OpenOCD would expect them and hence prevents the breakpoints from being handled correctly. Normally, this should be fixed in one of the upcoming OpenOCD updates.

    Either way, if J-Link software works better, you can use the following workaround to avoid the “Attach to crashed/frozen target”. Try creating a gdb script with the following contents:

    target remote :2331
    monitor go
    disconnect
    quit

    It instructs gdb to connect to the J-Link gdb stub, resume the target and disconnect. You can run it via command line as shown below:

    <VisualKernel directory>\KernelTools\arm\arm-linux-gnu-gdb.exe -s <script file>

    Or alternatively, add it to VisualKernel Project Properties -> Custom Debug Steps -> Before Debugging. This will allow VisualKernel to connect to the target via SSH and handle deployment/module enumeration as if it does with regular debug sessions.

Viewing 14 posts - 1 through 14 (of 14 total)
  • You must be logged in to reply to this topic.