SYSRQ Issue

Sysprogs forums Forums VisualKernel SYSRQ Issue

Viewing 15 posts - 1 through 15 (of 23 total)
  • Author
    Posts
  • #34126
    MST
    Participant

    Hi,

    I am working with a custom kernel based on Linux 5.10.174 and have some issues using visual kernel and the sysrq-g call from visual kernel. visual kernel fails to configure the KGDBOC interface(option: “before debugging the system is: Accesible via SSH”) (gdb log is attached) allthough the settings should be fine (at least on the debugee). For me it looks like visual kernel disables the KGDBOC interfaces after loading the helper module and then fails to reconfigure it and hence the sysrq-g call cannot be issued anymore (See VKDebug.png , I configured the KGDBOC interface myself to assure it is online before  starting the debug session).

    I am rather sure about the settings because I am able to debug the kernel “manually” by setting up the kgdboc interface and call  sysrg-g via terminal directly at the debugee (see screenshot “ManualDebug.png”). At this point,  I am able to do some debugging via a remote gdb running on my compile/debug VM which I am also using for visual kernel as compine/debug machine.
    When I use visual kernel at this point (system in debug state –> option: “before debugging the system is: Crashed/Frozen”) I can connect to the debugeee but I have the issue that I cannot access the debugee via ssh. This prevents to  load any kernel module to the debugee (testing your Hello World package) including the HelperModule. This simply cripples the speed and somehow prevents the debugging of kernel modules via breakpoints (breakpoints cannot be found, pretty sure this is some issue with wrong memory addresses). So this is not a workaround for me.

    I hope you can understand my issues and help me.

    Greets,
    MST

    Attachments:
    You must be logged in to view attached files.
    #34132
    support
    Keymaster

    Hi,

    It looks like due to the low speed of the COM port, gdb times out before it manages to connect to the target. We have updated our KGDBoC tutorial, showing how to increase the timeout.

    Also, feel free to try this build [VisualKernel-4.0.101.2354.msi], it will automatically apply increased timeout for KGDBoC connections.

    If it doesn’t help, please try adding the “set debug remote 1” command as shown in the tutorial and share the GDB log so that we could recheck what is going on.

    #34133
    MST
    Participant

    Hi,

    thanks for your support but the problem still remains after increasing the time out time (even to 100).

    Why and at what point does VK try to reset the kgdboc interface? Because this is the issue as I can observe it in the gdb log (“Cannot configure KGDB. Please try configuring it manually by writing to /sys/module/kgdboc/parameters/kgdboc.”) and in the dmesg

    [21107.877453] KGDB: Unregistered I/O driver kgdboc, debugger disabled
             +[21108.350362] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-
    memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)

    and at this moment the sysrq-g command is disabled and hence the machine cannot enter debug mode. I also tried to build some workaround by making these calls manually by adding calls prior to the start of the debugger (via the visual kernel options menu) – but unfortunately without success. The kgdboc interface is always killed by visual kernel as soon as it enters gdb. You can see my commands in the visual kernel launcher output (LauncherOutput.log). The “SetKGDBOC.sh” scripts simply runs “echo ttyS4,115200 > /sys/module/kgdboc/parameters/kgdboc” and enables the kgdoc interface. “SendSYSRQ-g.sh” is also a self describing one line script and a leftover of my tests (just ignore the line, its not executed 😉 ).
    With this setup dmesg looks like this:

       [24847.242012] KGDB: Registered I/O driver kgdboc     
             [24853.202471] KGDB: Unregistered I/O driver kgdboc, debugger disabled
            +[24853.671324] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-
             memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w) dump-ftrace-buffer(z)

    The first line of this dmesg log (activation of kgdboc) is done by “SetKGDBOC.sh”. The unregistration after that is caused be visual kernel (as explained at the top). The sysrq error is caused by the missing kgdboc.

    I hope this helps you to have a further look at this issue. Could it be an option to keep the kgdboc interface untouched by visual kernel (or make it optional?).

    greetings,
    MST

    Attachments:
    You must be logged in to view attached files.
    #34136
    support
    Keymaster

    Hi,

    Thanks for your clarification. VisualKernel handles KGDBoC as follows:

    1. Runs “cat /sys/module/kgdboc/parameters/kgdboc” on the target and compares the output with the contents of the <KgdbocArguments> element in the .vkrnlproj file.
    2. If they are different, runs “sh -c “echo <args>” > /sys/module/kgdboc/parameters/kgdboc“. If this command fails, it displays the “cannot configure KGDB” error.

    Based on what you are describing, it looks like the KGDBoC parameters configured via VisualKernel GUI are rejected by the kernel and it ends up with KGDBoC disabled. You can verify it very easily by comparing the contents of the <KgdbocArguments> element to”ttyS4,115200“. If it’s different, you can just patch it manually and reopen the solution – VisualKernel will then try to use the KGDBoC configuration you entered manually.

     

     

    #34137
    MST
    Participant

    Hi,

    thank you very much for these hints. Because of these it could fix the problem.
    In my case the issue was that I named the kgdboc port in the options gui (this is the same as the <KgdbocArguments> element in the .vkrnlproj file) “/dev/ttyS4” and this is somehow not permitted by the “echo <args>” > /sys/module/kgdboc/parameters/kgdboc” command. Skipping the path (“/dev/ttyS4” -> “ttyS4”) fixed the kgdboc registration errors! Maybe you should add this as a hint to the ui/documentation or check for the path while running the bash scripts and remove it?

    I tested this also manually by running
    echo /dev/ttyS4,115200 > /sys/module/kgdboc/parameters/kgdboc
    which returns
    -bash: echo: write error: No such device

    This is the error message when adding the path to the tty. At the same time the kgdboc interface is unregistered.
    Removing the path from the command (echo ttyS4,115200 > /sys/module/kgdboc/parameters/kgdboc) gives no message at all but one can se the unregistration and reregistration of the kgdboc interface in dmesg.

    thank you very much again, 🙂

    MST

    • This reply was modified 1 year, 7 months ago by MST.
    #34147
    support
    Keymaster

    Hi,

    Thanks, this explains it. VisualKernel automatically shows suggestions for COM ports up to COM4 (ttyS0..ttyS3), however as you needed to use ttyS4, you likely entered it manually, with the “/dev/” prefix, confusing the KGDBoC module.

    We have updated VisualKernel to explicitly mention the parameters echoed to the parameters file in the error message, to make it easier to diagnose this type of issues. We have also added COM ports up to COM8 (ttyS7) to the selection list.

    Feel free to try this build: VisualKernel-4.0.101.2357.msi

    #34152
    MST
    Participant

    Hi,

    I tried your new build and I like the updates in the interfaces.

    But I get a compilation error when starting the debugging with the helper module option switched on (stdarg.h not found, in TraceEngine.c). I am pretty sure it’s related the the changes described e.g. in this link https://github.com/bondagit/aes67-linux-daemon/issues/64 . My systems are kernel 5.10.173 (debugge, custom config) and kernel 5.19.0-38 (debugger, ubuntu 22.4 lts).

    greets,
    MST

    #34154
    MST
    Participant

    Hi,

    a second question with the new build (can’t edit my previous post). There is an option to “Explicitly switch from KDB to KGDB” in the debug options. This is a great idea imo because I’m often getting stuck in KDB when debugging –  especially after restarting my debuggee. Usually I have to start debugging, stop it and restart it with the “crashed/frozen” setting, stop the debugging again. At this point my machine can usually be accessed via ssh and I can start a normal debug session.
    Should this option improve this interaction? Unfortunately I don’t see any effect of checking or unchecking it yet.

    greets,
    MST

    #34163
    support
    Keymaster

    Hi,

    Sorry about the build error. We have indeed recently replaced #include <stdarg.h> with #include <linux/stdarg.h>, resolving build issues on some kernels, and did not encounter any problems with the new version.

    Can you confirm that changing it back to #include <stdarg.h> in the VisualKernel directory solves the issue? If yes, we can easily add an auto-detection logic for this.

    Regarding KDB->KGDB, we added this feature while investigating the KGDBoC timeouts, however it turned out that it should not be normally needed – if you connect GDB to KGDBoC that is stuck in KDB mode, it should automatically detect it and switch, as long as the GDB timeouts are sufficient. We kept the new setting in place in case it helps work around other issues, however it currently only works when the COM port is connected to the Windows machine directly. If you are running GDB on Linux and connecting to /dev/pts/3, you can force the KDB->KGDB switch by running the following command line before connecting gdb:

    echo kgdb > /dev/pts/3

    This needs to be run on the gdb machine, so the target does not need to be accessible via SSH at this point. If this works better, let us know and we will update the switching logic to run this automatically.

    #34164
    MST
    Participant

    Hi,

    the forum “ate” my first version of my reply. I’ll hope not to forgot anything in this second try …

    1) Changing the include works but there are compilation errors after that. I attached the error log. There seems to be some issues with the live tracing in the TraceEngine. I’m using kernel 5.10.174 without any modification to the source code.

    2) Concerning KDB -> KGDB: I am testing two setup: including a VM to build an debug (“three machine mode”) and direct debugging on windows and building the modules on the debugee (“two machine mode”).
    2.1) In the “three machine mode” I am able to multiplex the serial port via KDMX and hence I am able to access the serial console even while gdb is running. This allows me to manually enter “kgdb” if required (it does not happen all the time) – your echo commands works too 🙂
    2.2) I have more issues with the “two machine mode” because here I can only monitor the serial port (via “serial monitor”) and not access it once gdb is running. Disconnecting gdb kills my debuggee quite often (or at least kills the networking…) which requires to restart the debuggee. I tried adding the echo as a “custom debug step” “before launching the debugger” but this seems to be executed prior to the sysrq-g and hence before the debuggee enters KDB. In some cases the debugger is able to switch to kgdb but far from all the time. After that the debuggee is most of the time rather unstable or at least frozen. In this state there is no ssh connection possible. Usually (in the “three machine setup”) I can recover from this state by changing the debug setting “before debugging, the target is” to “Crashed/frozen” but in the “two machine setup” there is another issue at this point: After changing the setting visual kernel tries to synchronize its setting to the debuggee via ssh – which is not available! This means that I cannot do it that way and that means that I have to restart the debugee.
    All in all the “three machine setup” seems to be more reliable and stable for me. Will it be quicker when using the HelperModule? I am fighting to improve the performance/reliability right now.

    greets,
    MST

    Attachments:
    You must be logged in to view attached files.
    #34176
    support
    Keymaster

    Hi,

    Sorry about the forum glitch. It appears to happen very rarely due to some cookie issues, however we were never able to reproduce it reliably.

    Please try this build: VisualKernel-4.0.101.2360.msi. We have updated VisualKernel to automatically detect #include <stdarg.h> vs #include <linux/stdarg.h>. The error with ftrace_ops looks like your could you place check whether your ftrace.h file defines ftrace_ops? You can include it from any kernel module (or by temporarily editing any of the kernel sources) and IntelliSense will conveniently highlight the parts that are being parsed.

    We have also updated VisualKernel to explicitly log where it issues the KDB->KGDB command. You can try enabling View->Other Windows->VisualKernel Diagnostic Console, start a debug session and double-check that it contains the “Sending ‘kgdb’ to COMx @<baud>” line. If if helps, we can also add a setting to display a message box after doing a sysrq trigger, so that you could try sending it manually in a terminal program and see if it works.

    Regarding the frozen mode, enabling this mode updates the project file, so Visual Studio considers it outdated and tries rebuilding it, triggering a connection error. It should normally fail to connect, then ask you whether you want to try debugging the previous build result, and then it should work just fine. You can also try setting Tools->Options->Projects and Solutions->Build and Run->On Run -> Prompt to Build. This way Visual Studio won’t try to automatically build the outdated projects and you can choose to run it without having to wait for a connection timeout.

    P.S. We have recently added a new feature that allows tracing arbitrary kernel functions and source lines (i.e. instead of stopping in a debugger, it just records the variables you selected and lets you review them later). It currently requires a regular debugging connection, however we are working on an update that will allow doing it without just a network connection (no KGDBoC/KGDBoE/JTAG required). Because of the delayed nature of tracing (the actual data transfer happens after everything has been recorded), it doesn’t cause any instability and works much faster than most debug methods. Let us know if you would like to try it out.

    #34178
    MST
    Participant

    Hi,

    thanks for the help again. It would be create if you could make the KDB->KGDB trigger send manually because I’m still stuck quite often in it. A had a short glimpse at the diagnostic log (love the feature) but it is spammed by synchronization commands. Will further look at it next week.
    I couldn’t test your other remarks yet.

    This new feature sounds nice. How can I test it and see if it’s feasible with our machines?

    greets,
    MST

    #34186
    support
    Keymaster

    No problem, please try this build: VisualKernel-4.0.101.2366.msi

    In order to experiment with KDB->KGDB switching, please try disabling the explicit KDB->KGDB switch via settings, then open the .vkrnlproj file manually, and set the following elements:

    • ExplicitSwitchFromKDB to false
    • ManualSwitchFromKDB to true

    VisualKernel will then display a message box asking you to do the switch manually after issuing a sysrq, but before connecting to the COM port. If this works, we can add a proper setting to run an arbitrary command line.

    The new build also supports tracing without debugging. You can enable it as shown below or via Debug->Start VisualKernel Tracing Session. Tracing sessions run directly over UDP, do not require any other connection, however they do not stop the target, and instead record the selected variables in the background. You can use the techniques shown in our tracing tutorial to see what is going on at the target.

    P.S. The tracing engine is a brand new feature, we are open to feedback on it and should be able to iterate on it very fast.

    Attachments:
    You must be logged in to view attached files.
    #34188
    MST
    Participant

    Hi again,

    I did a testing session with your new version today. Unfortunately, unsuccessful. Some things seems to have gotten worse with this new version.

    1) I like the prompt that you included when compiling the HelperModule when there is an issue during compilation of the TraceEngine. My kernel (5.10.174) seems to be uncompatible with your code (still the same error “/tmp/LinuxKernelDebugHelper-444e4d9f-85c5-4f74-a164-a74130863526/TraceEngine.c:1172:61: error: dereferencing pointer to incomplete type ‘struct ftrace_ops’
    1172 | struct TracepointObject *tp = (struct TracepointObject *)op->private;” as before) but then I can skip this part (“continue without tracing”) and still use the HelperModule. Since the trace engine is required for the live tracing – just recognized the “error message” “System.Exception: The LinuxKernelDebugHelper module was built without tracing support. Tracing will be disabled until the target is restarted.” – I am not able to test this feature right now. Nevertheless I tried scanning the list of traceable functions and got stuck at 99 % at 4372 of 4374 files repeatingly (the files varies – strange). I once even waited for more than an hour but no more progress happens (takes ~ 12 minutes to get to this point).

    2) Considering the manual switch to kgdb: I can enable this feature in the project file and the window appears and waits for my input to continue. Unfortunately kdb does not switch to kgdb when I enter “kgdb” in the console but returns to its bash command prompt (= leaves the debug mode) and consequently the debug session crashes. This does not happend when I call tha sysrq-g manually, but I enters kgdb state then. Maybe you are doing some more commands that my system does not like. I will try to fix all this KDB/KGDB issue on a “brute force way”: I asked my colleague for a new kernel version without kdb. I hope to get this tomorrow and see if this fixing my issues in this sector.

    greets,
    MST

    #34189
    support
    Keymaster

    Hi,

    Sorry about that. The ftrace_ops error looks like our previous fix was incomplete. It’s not about the kernel version, but rather about the kernel being built without ftrace. We have fixed it in this build: VisualKernel-4.0.101.2367.msi

    The issue with the list of functions is different – most likely your custom kernel uses some parameters that confuse some of the VisualKernel’s parsing logic. We have tested it against several different platforms and kernel versions, but never encountered this error. Would you be able to share your vmlinux file with us so that we could retest it on our side? If not we can add some extra logging to pinpoint the root cause, but it would take longer. For what it’s worth, you can also try hitting “cancel” after most modules within the kernel have been indexed – VisualKernel with proceed with what it indexed so far, allowing you to set tracepoints there.

    Also with performance, the initial indexing could indeed take long (takes about a minute on i9-12900KS), however once it is done, VisualKernel saves the indexed symbols in its own optimized format, and loads them next time almost instantaneously.

    With KDB, VisualKernel just runs this command line as root:

    sh -c -"echo g > /proc/sysrq-trigger"

    If you are triggering sysrq differently, please let us know. You can also try checking VisualKernel Diagnostics Console (you may need to grep the log for SSH). It shows all SSH commands issued by VisualKernel, so you can always compare them against the manual list. But if the version without KDB works, it could be indeed an easier solution.

    Edit: the traceable symbol indexing should not normally show the number of processed files unless you are looking into the debug view. Are you sure you did not accidentally start IntelliSense indexing instead (e.g. via CodeJumps)?

    • This reply was modified 1 year, 6 months ago by support.
Viewing 15 posts - 1 through 15 (of 23 total)
  • You must be logged in to reply to this topic.