Add VM in unhealthy state alerts runbooks#300
Conversation
|
@vladikr @enp0s3 @machadovilaca @avlitman please review the following runbooks and see if they are accurate. |
df79bf0 to
88456dd
Compare
|
Hi @0xFelix , Thank you for the review. I updated the runbooks based on your comments. Can you please have another look? |
0xFelix
left a comment
There was a problem hiding this comment.
Thank you for your efforts. I appreciate that you used LLMs to create these documents, but in my opinion, they are quite verbose, unspecific, and even incorrect in several places.
| ### 3. Check Image Availability (for containerDisk) | ||
| ```bash | ||
| # If using containerDisk, test image accessibility | ||
| kubectl run test-pull --image=<vm-disk-image> --dry-run=client |
There was a problem hiding this comment.
That looks wrong to me, what is it supposed to do exactly?
| 1. **Clear image cache** on the problematic node: | ||
| ```bash | ||
| # On the node (requires node access): | ||
| crictl rmi <problematic-image> |
There was a problem hiding this comment.
Same, not sure that always applies
There was a problem hiding this comment.
Please review the updated version
88456dd to
6cb0ab4
Compare
27b1776 to
e2cf4cc
Compare
| # If using containerDisk, test image accessibility | ||
| docker pull <vm-disk-image> |
There was a problem hiding this comment.
yeah, there could be local node issues.
|
|
||
| ### 5. Review VM Specification | ||
| ```bash | ||
| # Validate VM spec for common issues |
| kubectl get vm <vm-name> -n <namespace> \ | ||
| -o jsonpath='{.spec.template.spec}' | ||
|
|
||
| # Check for resource requests/limits |
|
|
||
| ### 4. Verify KubeVirt Configuration | ||
| ```bash | ||
| # Check KubeVirt installation status |
There was a problem hiding this comment.
Maybe check if CR status is Available/Ready instead?
|
Overall, this is a very nice, well-organized PR! I just had a few comments. |
e2cf4cc to
0c24031
Compare
Signed-off-by: Shirly Radco <[email protected]>
0c24031 to
811476a
Compare
Thanks @enp0s3 I added an additional line in the Escalation section, "- You don't have enough permissions to run the diagnosis and/or mitigation steps." based on your question. |
|
Thank you all, your review was very important and appreciated to make it accurate and meaningful! |
…vm_stuck_in_status_alerts Add VM in unhealthy state alerts runbooks
What this PR does / why we need it:
This PR adds alerts runbooks for the VirtualMachineStuckInUnhealthyState and VirtualMachineStuckOnNode alerts,
added in kubevirt/kubevirt#15227.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
https://issues.redhat.com/browse/CNV-49530
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: