Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@nishant111
Copy link
Contributor

Fix frr scheduling loop to not loose any Scheduling precision in ppoll

fd_poll does not honor timer_wait less than 1000usec which cause ppoll to spin until timer_wait ticks to 0 causing high CPU

@nishant111 nishant111 force-pushed the nishant/frrEventSchedulingFix branch 3 times, most recently from 5b82a10 to 5e68313 Compare October 9, 2025 15:48
@nishant111 nishant111 changed the title lib : Fix frr scheduling loop to not loose any Scheduling precision in ppoll lib : Set the correct timeout attribute in ppoll Oct 9, 2025
@eqvinox
Copy link
Contributor

eqvinox commented Oct 14, 2025

I'm in favor of this (consider it a bugfix really), and think we should defer on trying to be clever (e.g. introduce some minimum sleep time like #19598 does.) The entire thing here is probably a holdover from using poll(), which only has millisecond precision, and someone might've gone "oh but I need this 0.5ms timer to work, I'll just busy loop".

For reference, context switches on modern OSes take single digit µs times, cf. https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/ (might be even less 7 years later). And note that when calling ppoll(), we're done doing things and likely won't have anything valuable in CPU caches. With that in mind, it feels entirely reasonable to pass small timeouts into ppoll(). The kernel people aren't stupid either, there's probably some minimum timeout below which it will just spin in the kernel (or rather, "short-sleep" the CPU core, rather than spin) and not task switch.

While there certainly could be gains had from doing some accumulation/batching of timers, there can also be unforeseen effects, even to the network level in terms of microbursts. (Unlikely, but possible.) Without data showing us any benefit of that, let's just stick to the simple.

@mjstapp
Copy link
Contributor

mjstapp commented Oct 14, 2025

I guess I don't think the FRR issue is with context-switching, or cpu caches, or sub-microsecond packet-processing in a dataplane somewhere. I think we have two use-cases: control-plane timers, and BFD.

the control-plane protocols (and some of our own internal components/IPCS/etc) use timers. those timers are usually at second scale - there's not value at all in nanosecond precision for those timers.

BFD is sort of a special case, because we seem to be under some pressure to support quite tight timers for BFD in user-space, and at some scale in terms of number of peers/sessions. Those timers are at tens-of-milliseconds scale or thereabouts. Even there, we have only seen that those kinds of timers hit a ceiling when the system is busy.

But what we've been doing is not "simple": the library code does extra work to sort the timer list at sub-millisecond precision, and we try very hard to distinguish between multiple timers that are scheduled microseconds apart. As Donald has shown, that only wastes cpu time without benefitting network stability at all. so my preference would be to be "simple": we make it clear that our apis support millisecond resolution, for example, and we plumb that through the various layers so we aren't doing extra work to support a precision that we can't hope to reliably achieve.

For reference, context switches on modern OSes take single digit µs times, cf. https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/ (might be even less 7 years later). [...] And note that when calling ppoll(), we're done doing things and likely won't have anything valuable in CPU caches.

While there certainly could be gains had from doing some accumulation/batching of timers, there can also be unforeseen effects, even to the network level in terms of microbursts. (Unlikely, but possible.) Without data showing us any benefit of that, let's just stick to the simple.

@nishant111 nishant111 force-pushed the nishant/frrEventSchedulingFix branch from 5e68313 to a4a34a3 Compare October 17, 2025 11:49
@github-actions github-actions bot added size/S rebase PR needs rebase and removed size/M labels Oct 17, 2025
@nishant111
Copy link
Contributor Author

As discussed in the community meeting , I have not removed the selectpoll_timeout code. I have just recalculated the correct tsp for ppoll inside "#if defined HAVE_PPOLL". poll() continues to use the old timeout calculation.

@nishant111 nishant111 force-pushed the nishant/frrEventSchedulingFix branch from a4a34a3 to 820e1f8 Compare October 23, 2025 07:07
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks clearer to me

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nishant111 nishant111 force-pushed the nishant/frrEventSchedulingFix branch from 820e1f8 to 39fb939 Compare October 28, 2025 05:57
@github-actions github-actions bot added size/M and removed size/S labels Oct 28, 2025
@nishant111 nishant111 requested a review from choppsv1 October 28, 2025 09:04
fd_poll does not honor timer_wait less than 1000usec which causes ppoll
to spin until timer_wait ticks to 0 causing high CPU.
Also, set 1 msec floor to poll() to not let poll() spin as well for
tv_usec < 1000 and > 0.
Also moving timeout related if-else ladder into #else poll compilation
to not hit SA cland dead code warning.

Signed-off-by: Nishant <[email protected]>
@nishant111 nishant111 force-pushed the nishant/frrEventSchedulingFix branch from 39fb939 to 85e459f Compare October 28, 2025 09:46
Copy link
Contributor

@choppsv1 choppsv1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@choppsv1 choppsv1 merged commit cff2068 into FRRouting:master Oct 28, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants