-
Notifications
You must be signed in to change notification settings - Fork 431
TAS: Filter out Node updates with LastHeartbeatTime changes #6570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
53831d4
to
250b44d
Compare
/assign |
// checkNodeSchedulingPropertiesChanged checks if the node update affects TAS scheduling. | ||
func checkNodeSchedulingPropertiesChanged(oldNode, newNode *corev1.Node) bool { | ||
oldCopy := oldNode.DeepCopy() | ||
newCopy := newNode.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need to deepcopy the entire node? Wondering if there's a subset of the node that scheduling cares about.
Added a nit here: #6570 (comment) /approve |
LGTM label has been added. Git tree hash: 2f8a14316703c502dc7c05e70f3ab3e332300fbc
|
/assign @tenzen-y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this.
newNode: func() *corev1.Node { | ||
n := baseNode.DeepCopy() | ||
n.Status.Conditions[0].LastHeartbeatTime = later | ||
n.Status.Conditions[1].LastHeartbeatTime = later | ||
return n | ||
}(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of anonymous functions, we can clone the base node and chain new parameters.
/release-note-edit
|
250b44d
to
362d96e
Compare
}, | ||
// Handle resource.Quantity comparison to avoid panic on unexported fields | ||
func(a, b resource.Quantity) bool { | ||
return a.Cmp(b) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return a.Cmp(b) == 0 | |
return a.Equal(b) |
We have Equal
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utam0k This is still not fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
|
||
for name, tc := range testCases { | ||
t.Run(name, func(t *testing.T) { | ||
got := checkNodeSchedulingPropertiesChanged(tc.oldNode, tc.newNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I got it. Ideally, we want to call Update
function here.
But, we need to implement the fake workqueue RateLimiter for that.
So, I'm ok with testing for checkNodeSchedulingPropertiesChanged
since I think that implementing fake workqueue is extra work this PR PoV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I create the issue before creating the extra PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can create an issue, I would appreciate that.
"Node Ready status changed": { | ||
oldNode: baseNode.Clone().Obj(), | ||
newNode: baseNode.Clone(). | ||
ConditionStatus(corev1.NodeReady, corev1.ConditionFalse, later). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConditionStatus(corev1.NodeReady, corev1.ConditionFalse, later). | |
StatusConditions( | |
corev1.NodeCondition{ | |
Type: corev1.NodeReady, | |
Status: corev1.ConditionFalse, | |
LastHeartbeatTime: later, | |
LastTransitionTime: later, | |
}, | |
). |
We want to remove StatusConditions
call from baseNode, then call that in every test case.
We follow the DUMP principle to make tests as descriptive as possible.
Note: The ConditionHeartbeat
is still better since it overwrites only one field and is less programmable.
corev1.NodeCondition{ | ||
Type: corev1.NodeReady, | ||
Status: corev1.ConditionTrue, | ||
LastHeartbeatTime: now, | ||
LastTransitionTime: now, | ||
}, | ||
corev1.NodeCondition{ | ||
Type: corev1.NodeMemoryPressure, | ||
Status: corev1.ConditionFalse, | ||
LastHeartbeatTime: now, | ||
LastTransitionTime: now, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need multiple conditions?
TimeAdded: &later, | ||
}).Obj(), | ||
wantChanged: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I want to add a case where oldNode has a null TimeAdd and newNode has a proper TimeAdd.
362d96e
to
985ea92
Compare
// ConditionStatus updates the Status and LastTransitionTime of an existing condition. | ||
func (n *NodeWrapper) ConditionStatus(conditionType corev1.NodeConditionType, status corev1.ConditionStatus, transitionTime metav1.Time) *NodeWrapper { | ||
for i := range n.Status.Conditions { | ||
if n.Status.Conditions[i].Type == conditionType { | ||
n.Status.Conditions[i].Status = status | ||
n.Status.Conditions[i].LastTransitionTime = transitionTime | ||
break | ||
} | ||
} | ||
return n | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ConditionStatus updates the Status and LastTransitionTime of an existing condition. | |
func (n *NodeWrapper) ConditionStatus(conditionType corev1.NodeConditionType, status corev1.ConditionStatus, transitionTime metav1.Time) *NodeWrapper { | |
for i := range n.Status.Conditions { | |
if n.Status.Conditions[i].Type == conditionType { | |
n.Status.Conditions[i].Status = status | |
n.Status.Conditions[i].LastTransitionTime = transitionTime | |
break | |
} | |
} | |
return n | |
} |
This is no longer used anywhere.
@utam0k Remaining unresolved tasks are #6570 (comment) and #6570 (comment). |
Signed-off-by: utam0k <[email protected]>
985ea92
to
04fa579
Compare
Thanks for your quick review 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/lgtm
/approve
LGTM label has been added. Git tree hash: 855c723ab1cd9737637a48024c0f32c0ed1a71b3
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amy, tenzen-y, utam0k The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.12 |
@tenzen-y: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@tenzen-y: new pull request created: #6636 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@tenzen-y: new pull request created: #6637 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR adds filtering for Node update events in the TAS (Topology-Aware Scheduling) controller to skip reconciliation when only
LastHeartbeatTime
is updated.Previously, the TAS controller would trigger reconciliation for every Node update, including the periodic heartbeat updates.
Which issue(s) this PR fixes:
Ref: #6551
Special notes for your reviewer:
Does this PR introduce a user-facing change?