Thanks to visit codestin.com
Credit goes to github.com

Skip to content

chore: implement oom/ood processing component #16436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Feb 17, 2025
Merged

Conversation

DanielleMaywood
Copy link
Contributor

@DanielleMaywood DanielleMaywood commented Feb 4, 2025

Closes coder/internal#248

This PR implements the processing logic as set out in the RFC.

@DanielleMaywood DanielleMaywood marked this pull request as ready for review February 14, 2025 12:55
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any blocking comments, but I feel we could probably do some slight re-shuffling and simplify this a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obligatory reminder to check migration number before merging! (Sorry, you're probably sick of hearing this 😅)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet: can we add a linter to check this on merge to obviate the reminder?

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 I'll defer to @dannykopping on final approval though!

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking from my side; I see a bit of scope for future enhancements though.

I really appreciate how clearly the business logic & tests are written, well done.

usageDatapoints = append(usageDatapoints, usage)
}

usageStates := resourcesmonitor.CalculateVolumeUsageStates(monitor, usageDatapoints)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, given that we're having to do this for more memory & volumes, I think we should seriously consider updating the agent to send back that bool to indicate enabled but failed to collect; that's a 1:1 with your "unknown" logic here.

This can be in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet: can we add a linter to check this on merge to obviate the reminder?

@@ -527,3 +527,31 @@ func (k CryptoKey) CanVerify(now time.Time) bool {
func (r GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisionerRow) RBACObject() rbac.Object {
return r.ProvisionerJob.RBACObject()
}

func (m WorkspaceAgentMemoryResourceMonitor) Debounce(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather abstract this logic to prevent drift.

}{
{
name: "WhenOK/NeverExceedsThreshold",
memoryUsage: []int64{2, 3, 2, 4, 2, 3, 2, 1, 2, 3, 4, 4, 1, 2, 3, 1, 2},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: why 17 datapoints, specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason, I'm happy to make this 20 to match our expected size though

return states
}

func NextState(c Config, oldState database.WorkspaceAgentMonitorState, states []State) database.WorkspaceAgentMonitorState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider moving the monitor state into UNKNOWN if all datapoints are unknown. Currently if a monitor gets into OK/NOK, and only receives empty datapoints from then on out, the state will never change and this will be difficult for operators/users to reason about unless they look at the agent logs specifically - which are hard to get to.

This can be in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there should be more visibility if all datapoints are unknown. I don't believe we currently surface the monitor state anywhere so, at least for now, an UNKNOWN state wouldn't be visible to operators/users. We'd probably need to have a discussion on how to surface this information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document this in a follow-up issue.

Copy link
Contributor

@defelmnq defelmnq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 all good for me ! Really cool to see everything connected.

Just a question but no blocker.

@DanielleMaywood DanielleMaywood merged commit d6b9806 into main Feb 17, 2025
33 checks passed
@DanielleMaywood DanielleMaywood deleted the dm-internal-248 branch February 17, 2025 16:56
@github-actions github-actions bot locked and limited conversation to collaborators Feb 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Processing component
4 participants