-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Description
I wanted to debug the RIOT thread creation code on the SiFive HiFive1, for this purpose I activated ENABLE_DEBUG in core/thread.c:
Lines 34 to 35 in 1dde0f4
| #define ENABLE_DEBUG 0 | |
| #include "debug.h" |
Interestingly, some functions from thread.c are not executed on any thread stack but instead use the ISR/Exception stack as defined by _sp in the riscv_common ldscript:
RIOT/cpu/riscv_common/ldscripts/riscv_base.ld
Lines 210 to 215 in b3b04fa
| .stack ORIGIN(ram) + LENGTH(ram) - __stack_size : | |
| { | |
| PROVIDE( _eheap = . ); | |
| . = __stack_size; | |
| PROVIDE( _sp = . ); | |
| } >ram AT>ram :ram |
For example, consider that kernel_init, which calls thread_create from start.S, is executed with this stack. Unfortunately, the default stack size is quite small with 256 bytes:
| __stack_size = DEFINED(__stack_size) ? __stack_size : 256; |
This seems to be too small for using the debugging macros from debug.h which on RISC-V would normally require an additional 256 bytes of stack space:
RIOT/cpu/riscv_common/include/cpu_conf_common.h
Lines 29 to 31 in b3b04fa
| #ifndef THREAD_EXTRA_STACKSIZE_PRINTF | |
| #define THREAD_EXTRA_STACKSIZE_PRINTF (256) | |
| #endif |
Unfortunately, the DEBUG_EXTRA_STACKSIZE macro, which is often used to increase stack space if ENABLE_DEBUG is activated can't really be used here as the size of the region in the ldscript would need to be increased in this case. Since the ISR stack is also not a normal RIOT thread, the following sanity check from debug.h does also not work correctly:
Lines 49 to 56 in 2df29a6
| if ((thread_get_active() == NULL) || \ | |
| (thread_get_active()->stack_size >= \ | |
| THREAD_EXTRA_STACKSIZE_PRINTF)) { \ | |
| printf(__VA_ARGS__); \ | |
| } \ | |
| else { \ | |
| puts("Cannot debug, stack too small. Consider using DEBUG_PUTS()."); \ | |
| } \ |
As such, I encountered several stack overflows when using ENABLE_DEBUG in core/thread.c on the HiFive1. This may lead to weird behavior during debugging and it would be nice if this could somehow be detected to avoid this pitfall during debugging. Also note that the heuristic employed by the scheduler (SCHED_TEST_STACK) to detect stack overflows does not work for the same reason. Maybe the sanity check in debug.h can be improved by adding an edge case for code running on the ISR stack? Furthermore, it would also be neat to improve the SCHED_TEST_STACK to also cover the ISR stack somehow.
Steps to reproduce the issue
This issue should be reproducible using the following GDB script:
set height 0
break thread_create
continue
break _vfprintf_r
continue
while ($sp > 0x80003f00)
p/x $sp
stepi
end
printf "stack overflow at pc: %x\n", $pc
This script can, for instance, be used with the hello-world application. Assuming ENABLE_DEBUG has been enabled in core/thread.c, the application can be compiled as follows:
$ BOARD=hifive1 make -C examples/hello-world/
Afterwards, extract the lower bound of the ISR stack (the ISR stack grows downwards towards _eheap) and add it to the while condition of the gdb script. The lower bound of the ISR stack should match the address of the _eheap symbol which can be extracted as follows:
$ riscv32-unknown-elf-nm examples/hello-world/bin/hifive1/hello-world.elf | grep _eheap
80003f00 B _eheap
Furthermore, the script requires debug symbols in newlib to find the internal _vfprintf_r symbol. Executing the GDB script results in the following output on my system:
Reading symbols from examples/hello-world/bin/hifive1/hello-world.elf...
_start () at /home/nmeum/src/RIOT/cpu/riscv_common/start.S:17
17 la gp, __global_pointer$
Breakpoint 1 at 0x204002c6: file /home/nmeum/src/RIOT/core/thread.c, line 196.
Breakpoint 1, thread_create (stack=stack@entry=0x80000090 <idle_stack> "", stacksize=stacksize@entry=256,
priority=priority@entry=15 '\017', flags=flags@entry=12, function=function@entry=0x204000de <idle_thread>,
arg=arg@entry=0x0, name=name@entry=0x20401c98 "idle") at /home/nmeum/src/RIOT/core/thread.c:196
196 if (priority >= SCHED_PRIO_LEVELS) {
Breakpoint 2 at 0x204014fc: file /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c, line 487.
Breakpoint 2, _vfprintf_r (data=0x80000000 <impure_data>, fp=0x800008b4,
fmt0=fmt0@entry=0x20401df5 "Created thread %s. PID: %hi. Priority: %u.\n", ap=ap@entry=0x80003fa4)
at /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c:487
487 /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c: No such file or directory.
$1 = 0x80003f70
0x204014fe 487 in /opt/riscv-gnu-toolchain/riscv-newlib/newlib/libc/stdio/nano-vfprintf.c
stack overflow at pc: 204014fe
Though this is a bit hacky, so the issue might not be as easily reproducible using the script. However, I think the outlined problem is hopefully also fairly obvious from the description above.