-
Couldn't load subscription status.
- Fork 364
ames: consolidate dead flows to a single behn timer #6738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice, simple approach, and should dramatically reduce the number of retry timers we have to maintain (and therefore the events logged for retry). At some point, I think we should develop "offline" heuristics for a peer and backoff even further.
A couple things:
| =^ moz u.cached-state | ||
| ?. ?=(%15 -.u.cached-state) [~ u.cached-state] | ||
| ~> %slog.0^leaf/"ames: init dead flow consolidation timer" | ||
| :- [[/ames]~ %pass /dead-flow %b %wait `@da`(add now ~m2)]~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to duplicate this timer initialization somewhere else to catch fresh boot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no better way of doing this than state and +on-born. See a75a083 for how I decided to implement it. Note that the recork timer a few lines above this suffers from the same problem: it never gets initialized on new ships.
| :: set new timer if non-null and not at max-backoff | ||
| :: | ||
| =? peer-core ?=(^ new-wake) | ||
| ?: =(~m2 rto.metrics.state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting here (and above) that the ~m2 literal is important, we don't want to actually use max-backoff and accidentally consolidate the :ping app flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in a3e7595
fb47490 to
b7354eb
Compare
pkg/arvo/sys/vane/ames.hoon
Outdated
| |= [[=ship =ship-state] core=_event-core] | ||
| ^+ event-core | ||
| =/ peer-state=(unit peer-state) (get-peer-state:core ship) | ||
| ?~ peer-state core | ||
| %- ~(rep by snd.u.peer-state) | ||
| |= [[=bone =message-pump-state] cor=_core] | ||
| ?. =(~m2 rto.metrics.packet-pump-state.message-pump-state) | ||
| cor | ||
| abet:(on-wake:(abed-peer:pe:cor ship u.peer-state) bone error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: there are a couple extraneous layers of indentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
2178c75 to
27fe522
Compare
a3e7595 to
82d4e2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will do
As discussed with @yosoyubik and @joemfb out of band.
I tested this with 50 000 dead flows. Without consolidation these flows resulted in a constant 30 % cpu usage. Consolidating the timers led to a CPU usage of almost 0 with a 100 % spike for a few seconds every two minutes.
The retry interval is the normal ~m2, we can make it configurable later.