Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@soumyar-roy
Copy link
Contributor

Following core/BT was seen in internal code

Program terminated with signal SIGSEGV, Segmentation fault. [Current thread is 1 (Thread 0x7fcd750c9540 (LWP 30999))] (gdb) bt
0 0x00007fcd7596feec in ?? () from /lib/x86_64-linux-gnu/libc.so.6 1 0x00007fcd75920fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 2 0x00007fcd75d008dc in core_handler (signo=11, siginfo=0x7ffd92dcb4f0, context=) at ../lib/sigevent.c:261 3
4 process_rtadv (arg=0x560287b66120) at ../zebra/rtadv.c:511 5 0x00007fcd75d1fa37 in wheel_timer_thread (t=) at ../lib/wheel.c:42 6 0x00007fcd75d13681 in event_call (thread=thread@entry=0x7ffd92dcbb60) at ../lib/event.c:2034 7 0x00007fcd75cbcb00 in frr_run (master=0x56028789ce00) at ../lib/libfrr.c:1242 8 0x0000560272e3945d in main (argc=14, argv=0x7ffd92dcbe88) at ../zebra/main.c:584 (gdb)

Paths to crash(Different occurrence):
Interface uplink_2 got added to wheel timer 1st time, at end of rtadv_start_interface_events() 1)2025-06-07T05:01:23.802459+00:00 mlx-5600-33 zebra[229165]: [SEY8W-2M6VH] debug rtadv_start_interface_events, loc 2>>>>>::ifp::0x55a3281b1990::uplink_2

About each 1 sec, wheel timer process the interface uplink_2 Log from process_rtadv()
2025-06-07T05:01:29.870749+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:30.870767+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:31.870783+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:32.870794+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:33.870809+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:01:34.870836+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2

Now 2nd addition to wheel timer for same interface uplink_2 in rtadv_start_interface_events

if (adv_if != NULL) {
rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE);
wheel_add_item(zrouter.ra_wheel, zif->ifp);<<<duplicate gets added
return; /* Already added */
}

2)2025-06-07T05:03:44.642871+00:00 mlx-5600-33 zebra[229165]: [G63V5-AKC5D] debug in rtadv_start_interface_events, loc 1 >>>>>::ifp::0x55a3281b1990::uplink_2

Now, about each 1 sec, wheel timer process the interface uplink_2, twice back to back, which proves that indeed there are duplicate entries for uplink_2 in wheel timer
Log from process_rtadv()
2025-06-07T05:03:44.878999+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:44.879076+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879096+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:45.879169+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879187+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2 2025-06-07T05:03:46.879240+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA] debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2

3)Now suppose the interface iuplink_2 s shutdown/removed, it will remove one instance for the interface from the wheel timer, another will still stay there 4)Interface uplink_2 memory is freed up
5)Now wheel timer tries to process uplink_2, it will crash

Following core/BT was seen in internal code

Program terminated with signal SIGSEGV, Segmentation fault.
[Current thread is 1 (Thread 0x7fcd750c9540 (LWP 30999))]
(gdb) bt
0  0x00007fcd7596feec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
1  0x00007fcd75920fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
2  0x00007fcd75d008dc in core_handler (signo=11, siginfo=0x7ffd92dcb4f0, context=<optimized out>) at ../lib/sigevent.c:261
3  <signal handler called>
4  process_rtadv (arg=0x560287b66120) at ../zebra/rtadv.c:511
5  0x00007fcd75d1fa37 in wheel_timer_thread (t=<optimized out>) at ../lib/wheel.c:42
6  0x00007fcd75d13681 in event_call (thread=thread@entry=0x7ffd92dcbb60) at ../lib/event.c:2034
7  0x00007fcd75cbcb00 in frr_run (master=0x56028789ce00) at ../lib/libfrr.c:1242
8  0x0000560272e3945d in main (argc=14, argv=0x7ffd92dcbe88) at ../zebra/main.c:584
(gdb)

Paths to crash(Different occurrence):
Interface uplink_2 got added to wheel timer 1st time, at end of rtadv_start_interface_events()
1)2025-06-07T05:01:23.802459+00:00 mlx-5600-33 zebra[229165]: [SEY8W-2M6VH]  debug rtadv_start_interface_events, loc 2>>>>>::ifp::0x55a3281b1990::uplink_2

About each 1 sec, wheel timer process the interface uplink_2
Log from process_rtadv()
2025-06-07T05:01:29.870749+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:30.870767+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:31.870783+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:32.870794+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:33.870809+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:01:34.870836+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2

Now 2nd addition to wheel timer for same interface uplink_2  in rtadv_start_interface_events
>>if (adv_if != NULL) {
        rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE);
        wheel_add_item(zrouter.ra_wheel, zif->ifp);<<<duplicate gets added
        return; /* Already added */
    }

2)2025-06-07T05:03:44.642871+00:00 mlx-5600-33 zebra[229165]: [G63V5-AKC5D]  debug in rtadv_start_interface_events, loc 1 >>>>>::ifp::0x55a3281b1990::uplink_2

Now, about each 1 sec, wheel timer process the interface uplink_2, twice back to back, which proves that indeed there are duplicate entries for uplink_2 in wheel timer
 Log from process_rtadv()
2025-06-07T05:03:44.878999+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:44.879076+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:45.879096+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:45.879169+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:46.879187+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2
2025-06-07T05:03:46.879240+00:00 mlx-5600-33 zebra[229165]: [H1EZX-2D8SA]  debug <<>>>>>>::ifp::0x55a3281b1990::uplink_2

3)Now suppose the interface iuplink_2 s shutdown/removed, it will remove one instance for the interface from the wheel timer, another will still stay there
4)Interface uplink_2 memory is freed up
5)Now wheel timer tries to process uplink_2, it will crash

Signed-off-by: Soumya Roy <[email protected]>
@soumyar-roy
Copy link
Contributor Author

ci:rerun

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure this is right now? we did have the question about double-adds in an earlier round of this work


zif = ifp->info;
zvrf = rtadv_interface_get_zvrf(ifp);
adv_if = adv_if_del(zvrf, ifp->name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will prevent this interface from appearing in the show output that uses this hash - is that going to be ok? isn't that a behavior change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change now when interface is shutdown, it wont show part of "show ipv6 nd ra-interfaces ..." command and it will appear again when it is unshut. That should be fine , I guess?, as when shutdown, then RA is also deactivated for the interface, then it should not be part of the list for "show ipv6 nd ra-interfaces ..."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I would expect a down interface to show up when looking for interfaces with feature x configured ... does show ip interface show shut down interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find command similar to show ip interface show, shut down interfaces, but I can see "show interface" output, which shows shut down interface too. Although, there it continue to show the RA related output like this :ND router advertisements sent: 22 rcvd: 21
ND router advertisements are sent every 10 seconds

@soumyar-roy
Copy link
Contributor Author

soumyar-roy commented Jun 16, 2025

are we sure this is right now? we did have the question about double-adds in an earlier round of this work

True, there was a concern about double add, but no practical way to prove this before. Even this current trigger/behavior is slightly different in upstream frr( especially with network manager restart, we don't get calls to if_up/if_down() in upstream frr, but we get in internal code), and I could not exactly reproduce the same signature in upstream frr, add/delete is getting balanced out with other triggers, but if there is a path to call rtadv_start_interface_events with adv_if != null, it should cause the issue too in frr. Current fix should remove this kind of any known/unknown trigger, that can cause this crash in future.

Also, I was modifying, wheel library before, to provide option, to check if item already exits already, before adding. We decided not to add that code, considering performance issue, for linear list walk in wheel timer.

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this now

@mjstapp mjstapp merged commit 5dfc95b into FRRouting:master Jul 8, 2025
17 checks passed
@mjstapp
Copy link
Contributor

mjstapp commented Jul 8, 2025

@Mergifyio backport dev/10.4

@mergify
Copy link

mergify bot commented Jul 8, 2025

backport dev/10.4

✅ Backports have been created

Details

mjstapp added a commit that referenced this pull request Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants