-
Notifications
You must be signed in to change notification settings - Fork 1.4k
zebra: zebra core with v6 RA #19000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: zebra core with v6 RA #19000
Conversation
Following core/BT was seen in internal code
Program terminated with signal SIGSEGV, Segmentation fault.
[Current thread is 1 (Thread 0x7fcd750c9540 (LWP 30999))]
(gdb) bt
0 0x00007fcd7596feec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007fcd75920fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007fcd75d008dc in core_handler (signo=11, siginfo=0x7ffd92dcb4f0, context=<optimized out>) at ../lib/sigevent.c:261
3 <signal handler called>
4 process_rtadv (arg=0x560287b66120) at ../zebra/rtadv.c:511
5 0x00007fcd75d1fa37 in wheel_timer_thread (t=<optimized out>) at ../lib/wheel.c:42
6 0x00007fcd75d13681 in event_call (thread=thread@entry=0x7ffd92dcbb60) at ../lib/event.c:2034
7 0x00007fcd75cbcb00 in frr_run (master=0x56028789ce00) at ../lib/libfrr.c:1242
8 0x0000560272e3945d in main (argc=14, argv=0x7ffd92dcbe88) at ../zebra/main.c:584
(gdb)
Paths to crash(Different occurrence):
Interface uplink_2 got added to wheel timer 1st time, at end of rtadv_start_interface_events()
1)2025-06-07T05:01:23.802459+00:00 mlx-5600-33 zebra[229165]: [SEY8W-2M6VH] debug rtadv_start_interface_events, loc 2>>>>>::ifp::0x55a3281b1990::uplink_2
About each 1 sec, wheel timer process the interface uplink_2
Log from process_rtadv()
2025-06-07T05:01:29.870749+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:30.870767+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:31.870783+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:32.870794+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:33.870809+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:34.870836+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
Now 2nd addition to wheel timer for same interface uplink_2 in rtadv_start_interface_events
>>if (adv_if != NULL) {
rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE);
wheel_add_item(zrouter.ra_wheel, zif->ifp);<<<duplicate gets added
return; /* Already added */
}
2)2025-06-07T05:03:44.642871+00:00 mlx-5600-33 zebra[229165]: [G63V5-AKC5D] debug in rtadv_start_interface_events, loc 1 >>>>>::ifp::0x55a3281b1990::uplink_2
Now, about each 1 sec, wheel timer process the interface uplink_2, twice back to back, which proves that indeed there are duplicate entries for uplink_2 in wheel timer
Log from process_rtadv()
2025-06-07T05:03:44.878999+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:44.879076+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:45.879096+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:45.879169+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:46.879187+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:46.879240+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
3)Now suppose the interface iuplink_2 s shutdown/removed, it will remove one instance for the interface from the wheel timer, another will still stay there
4)Interface uplink_2 memory is freed up
5)Now wheel timer tries to process uplink_2, it will crash
Signed-off-by: Soumya Roy <[email protected]>
|
ci:rerun |
mjstapp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure this is right now? we did have the question about double-adds in an earlier round of this work
|
|
||
| zif = ifp->info; | ||
| zvrf = rtadv_interface_get_zvrf(ifp); | ||
| adv_if = adv_if_del(zvrf, ifp->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this will prevent this interface from appearing in the show output that uses this hash - is that going to be ok? isn't that a behavior change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change now when interface is shutdown, it wont show part of "show ipv6 nd ra-interfaces ..." command and it will appear again when it is unshut. That should be fine , I guess?, as when shutdown, then RA is also deactivated for the interface, then it should not be part of the list for "show ipv6 nd ra-interfaces ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that I would expect a down interface to show up when looking for interfaces with feature x configured ... does show ip interface show shut down interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find command similar to show ip interface show, shut down interfaces, but I can see "show interface" output, which shows shut down interface too. Although, there it continue to show the RA related output like this :ND router advertisements sent: 22 rcvd: 21
ND router advertisements are sent every 10 seconds
True, there was a concern about double add, but no practical way to prove this before. Even this current trigger/behavior is slightly different in upstream frr( especially with network manager restart, we don't get calls to if_up/if_down() in upstream frr, but we get in internal code), and I could not exactly reproduce the same signature in upstream frr, add/delete is getting balanced out with other triggers, but if there is a path to call rtadv_start_interface_events with adv_if != null, it should cause the issue too in frr. Current fix should remove this kind of any known/unknown trigger, that can cause this crash in future. Also, I was modifying, wheel library before, to provide option, to check if item already exits already, before adding. We decided not to add that code, considering performance issue, for linear list walk in wheel timer. |
riw777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
mjstapp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this now
|
@Mergifyio backport dev/10.4 |
✅ Backports have been createdDetails
|
zebra: zebra core with v6 RA (backport #19000)
Following core/BT was seen in internal code
Program terminated with signal SIGSEGV, Segmentation fault. [Current thread is 1 (Thread 0x7fcd750c9540 (LWP 30999))] (gdb) bt
0 0x00007fcd7596feec in ?? () from /lib/x86_64-linux-gnu/libc.so.6 1 0x00007fcd75920fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 2 0x00007fcd75d008dc in core_handler (signo=11, siginfo=0x7ffd92dcb4f0, context=) at ../lib/sigevent.c:261 3
4 process_rtadv (arg=0x560287b66120) at ../zebra/rtadv.c:511 5 0x00007fcd75d1fa37 in wheel_timer_thread (t=) at ../lib/wheel.c:42 6 0x00007fcd75d13681 in event_call (thread=thread@entry=0x7ffd92dcbb60) at ../lib/event.c:2034 7 0x00007fcd75cbcb00 in frr_run (master=0x56028789ce00) at ../lib/libfrr.c:1242 8 0x0000560272e3945d in main (argc=14, argv=0x7ffd92dcbe88) at ../zebra/main.c:584 (gdb)
Paths to crash(Different occurrence):
Interface uplink_2 got added to wheel timer 1st time, at end of rtadv_start_interface_events() 1)2025-06-07T05:01:23.802459+00:00 mlx-5600-33 zebra[229165]: [SEY8W-2M6VH] debug rtadv_start_interface_events, loc 2>>>>>::ifp::0x55a3281b1990::uplink_2
About each 1 sec, wheel timer process the interface uplink_2 Log from process_rtadv()
2025-06-07T05:01:29.870749+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:30.870767+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:31.870783+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:32.870794+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:33.870809+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:34.870836+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
Now 2nd addition to wheel timer for same interface uplink_2 in rtadv_start_interface_events
2)2025-06-07T05:03:44.642871+00:00 mlx-5600-33 zebra[229165]: [G63V5-AKC5D] debug in rtadv_start_interface_events, loc 1 >>>>>::ifp::0x55a3281b1990::uplink_2
Now, about each 1 sec, wheel timer process the interface uplink_2, twice back to back, which proves that indeed there are duplicate entries for uplink_2 in wheel timer
Log from process_rtadv()
2025-06-07T05:03:44.878999+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:44.879076+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879096+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879169+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879187+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879240+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
3)Now suppose the interface iuplink_2 s shutdown/removed, it will remove one instance for the interface from the wheel timer, another will still stay there 4)Interface uplink_2 memory is freed up
5)Now wheel timer tries to process uplink_2, it will crash