-
Notifications
You must be signed in to change notification settings - Fork 881
chore: implement oom/ood processing component #16436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any blocking comments, but I feel we could probably do some slight re-shuffling and simplify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obligatory reminder to check migration number before merging! (Sorry, you're probably sick of hearing this 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet: can we add a linter to check this on merge to obviate the reminder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 I'll defer to @dannykopping on final approval though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking from my side; I see a bit of scope for future enhancements though.
I really appreciate how clearly the business logic & tests are written, well done.
usageDatapoints = append(usageDatapoints, usage) | ||
} | ||
|
||
usageStates := resourcesmonitor.CalculateVolumeUsageStates(monitor, usageDatapoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, given that we're having to do this for more memory & volumes, I think we should seriously consider updating the agent to send back that bool to indicate enabled but failed to collect; that's a 1:1 with your "unknown" logic here.
This can be in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet: can we add a linter to check this on merge to obviate the reminder?
@@ -527,3 +527,31 @@ func (k CryptoKey) CanVerify(now time.Time) bool { | |||
func (r GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisionerRow) RBACObject() rbac.Object { | |||
return r.ProvisionerJob.RBACObject() | |||
} | |||
|
|||
func (m WorkspaceAgentMemoryResourceMonitor) Debounce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather abstract this logic to prevent drift.
}{ | ||
{ | ||
name: "WhenOK/NeverExceedsThreshold", | ||
memoryUsage: []int64{2, 3, 2, 4, 2, 3, 2, 1, 2, 3, 4, 4, 1, 2, 3, 1, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why 17 datapoints, specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason, I'm happy to make this 20 to match our expected size though
return states | ||
} | ||
|
||
func NextState(c Config, oldState database.WorkspaceAgentMonitorState, states []State) database.WorkspaceAgentMonitorState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider moving the monitor state into UNKNOWN
if all datapoints are unknown. Currently if a monitor gets into OK/NOK, and only receives empty datapoints from then on out, the state will never change and this will be difficult for operators/users to reason about unless they look at the agent logs specifically - which are hard to get to.
This can be in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there should be more visibility if all datapoints are unknown. I don't believe we currently surface the monitor state anywhere so, at least for now, an UNKNOWN
state wouldn't be visible to operators/users. We'd probably need to have a discussion on how to surface this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document this in a follow-up issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 all good for me ! Really cool to see everything connected.
Just a question but no blocker.
Closes coder/internal#248
This PR implements the processing logic as set out in the RFC.