-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add GuestAgentInfo load metrics #14879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @machadovilaca - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
pkg/monitoring/metrics/virt-handler/domainstats/guest_metrics.go
Outdated
Show resolved
Hide resolved
02231c6 to
9b1e901
Compare
|
Hi, Where did this request came from? |
docs/observability/metrics.md
Outdated
| Guest hostname reported by the guest agent. The value is always 1. Type: Gauge. | ||
|
|
||
| ### kubevirt_vmi_guest_load_15m | ||
| Guest system load average over 15 minutes as reported by the guest agent. Type: Gauge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'load' mean in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
average length of the cpu queue over the period of time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this info in the metrics description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guest system load is the more accurate description as used the kernel
hidden in the comment section of the PR description again, moved out |
docs/observability/metrics.md
Outdated
| ### kubevirt_vmi_filesystem_used_bytes | ||
| Used VM filesystem capacity in bytes. Type: Gauge. | ||
|
|
||
| ### kubevirt_vmi_guest_hostname_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be a separate metric and not add it to the kubevirt_vm/i_info metrics?
9b1e901 to
f348447
Compare
| } | ||
|
|
||
| guestInfo := vmiReport.vmiStats.GuestAgentInfo | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we also need to check whether guestInfo.Load isn't nil since its a pointer:
// Load contains the system load averages (1M, 5M, 15M) from the guest agent
Load *VirtualMachineInstanceGuestOSLoad `json:"load,omitempty"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L87 checks if vmiReport.vmiStats.GuestAgentInfo is not nil, but it doesn't check if vmiReport.vmiStats.GuestAgentInfo.Load is not nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, missed it, updated
pkg/monitoring/metrics/virt-handler/domainstats/guest_agent_info_scraper.go
Show resolved
Hide resolved
|
/approve Lets add the tests and fix the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enp0s3 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| }) | ||
|
|
||
| Context("Deep copy behavior", func() { | ||
| It("should store a deep copy of guest info in cache", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the value of this test. DeepCopy is generated could be tested without going through the cache but I don't think we need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| scraper.mutex.Lock() | ||
| scraper.cache[socketFile] = &guestAgentInfoCache{ | ||
| timestamp: time.Now(), | ||
| info: guestInfo.DeepCopy(), | ||
| } | ||
| scraper.mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could extract blocks like this using the mutexes into helpers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
05987f1 to
9b02250
Compare
0xFelix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the failures in lane sig-compute related?
| } | ||
|
|
||
| guestInfo := vmiReport.vmiStats.GuestAgentInfo | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L87 checks if vmiReport.vmiStats.GuestAgentInfo is not nil, but it doesn't check if vmiReport.vmiStats.GuestAgentInfo.Load is not nil.
|
|
||
| By("checking if entry is within timeout") | ||
| cached, exists := getCacheEntry(socketFile) | ||
| withinTimeout := exists && time.Since(cached.timestamp) < cacheTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not verify controller logic? I'd like to see a test that verifies that the controller doesn't update if the cached value is still within the timeout range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand what you mean, it was making the requests to the scrapper cache
but the validation had to be local because of the way the functions were built
i refactored the scrapper to allow us to test each function individually
|
|
||
| By("checking if entry is outside timeout") | ||
| cached, exists := getCacheEntry(socketFile) | ||
| withinTimeout := exists && time.Since(cached.timestamp) < cacheTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, this doesn't test controller logic, but rather a condition in the test code. Was this generated by AI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It("should not clean up non-expired cache entries", func() { | ||
| By("adding a fresh cache entry") | ||
| addCacheEntry(socketFile, time.Now(), guestInfo) | ||
|
|
||
| By("calling cleanup immediately") | ||
| scraper.cleanupExpiredCache() | ||
|
|
||
| By("verifying cache entry still exists") | ||
| exists := cacheEntryExists(socketFile) | ||
| Expect(exists).To(BeTrue()) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is a duplicate of the case above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| // Load average over 1 minute | ||
| Load1m float64 `json:"load1m,omitempty"` | ||
| // Load5mSet indicates whether the 5 minute load average is set | ||
| Load5mSet bool `json:"load5mSet,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does set mean? Why can't the actual field be a ptr instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use the same format as libvirt.DomainGuestInfoLoad
so they set it like that
we could still make the conversion to pointer
but in every other domainstat where this is the case, we've been using the set
so I think should be better to keep it consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point to an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's internal API though. Which brings me back to I don't think we should litter the external API too much. Didn't we want to add this to the metrics endpoint provided by virt-launcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this struct isn't used in the VMI's status.
566d98f to
9d407d7
Compare
|
/retest |
@0xFelix it was related, fixed |
| func (d *GuestAgentInfoScraper) getCacheEntry(socketFile string) (*VirtualMachineInstanceStats, time.Time, bool) { | ||
| now := time.Now() | ||
|
|
||
| d.mutex.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You call RLock but then you write to d.cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, it does not make sense to have a RW mutex I think
| By("checking if entry is outside timeout") | ||
| _, timestamp, exists := scraper.getCacheEntry(socketFile) | ||
| Expect(exists).To(BeFalse()) | ||
| Expect(time.Since(timestamp)).To(BeNumerically(">", cacheTimeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp should be an empty time.Time{}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, updated
| if vmStats.GuestAgentInfo != nil && vmStats.GuestAgentInfo.Hostname != "" { | ||
| d.addCacheEntry(socketFile, time.Now(), vmStats.GuestAgentInfo) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test for this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| // Load average over 1 minute | ||
| Load1m float64 `json:"load1m,omitempty"` | ||
| // Load5mSet indicates whether the 5 minute load average is set | ||
| Load5mSet bool `json:"load5mSet,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point to an example?
9d407d7 to
a2524fd
Compare
Signed-off-by: João Vilaça <[email protected]>
a2524fd to
1f0dfc1
Compare
|
@machadovilaca: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
0xFelix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
Required labels detected, running phase 2 presubmits: |
|
/unhold /cc @enp0s3 |
|
/retest-required |
1 similar comment
|
/retest-required |
What this PR does
Before this PR:
No GuestAgentInfo metrics reported
After this PR:
GuestAgentInfo cpu load metrics reported
kubevirt/enhancements#67
jira-ticket: https://issues.redhat.com/browse/CNV-50883
References
Why we need it and why it was done in this way
The following tradeoffs were made:
The following alternatives were considered:
Links to places where the discussion took place:
Special notes for your reviewer
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note