Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sradco
Copy link
Contributor

@sradco sradco commented Jul 18, 2025

What this PR does

Before this PR:

No indication in alerts when a VM is stuck in an unhealthy status.

After this PR:

PR adds 2 alerts:

  1. VM that is in unhealthy status or in transitional status for more than 10 minutes, but doesnt have a VMI for more than 5 minutes
  2. VM that is in unhealthy status or in transitional status and has a VMI for more than 5 minutes.

Signed-off-by: Shirly Radco [email protected]

References

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

New VM alerts - VirtualMachineStuckInUnhealthyState, VirtualMachineStuckOnNode

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 18, 2025
@kubevirt-bot kubevirt-bot requested review from avlitman and orenc1 July 18, 2025 10:40
@kubevirt-bot kubevirt-bot added size/L sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/observability Denotes an issue or PR that relates to observability. labels Jul 18, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sradco - I've reviewed your changes - here's some feedback:

  • The first alert’s PromQL only enforces 10m of VM unhealthy state without checking for 5m of missing VMI as described – consider adding a time-based absence condition on kubevirt_vmi_info or adjusting the For period accordingly.
  • Both alert expressions are quite complex; factoring repeated sub-expressions into recording rules could improve readability and maintainability.
  • The test YAML introduces a lot of repeated series definitions—consider abstracting or reusing common blocks to reduce duplication and keep the tests concise.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The first alert’s PromQL only enforces 10m of VM unhealthy state without checking for 5m of missing VMI as described – consider adding a time-based absence condition on kubevirt_vmi_info or adjusting the `For` period accordingly.
- Both alert expressions are quite complex; factoring repeated sub-expressions into recording rules could improve readability and maintainability.
- The test YAML introduces a lot of repeated series definitions—consider abstracting or reusing common blocks to reduce duplication and keep the tests concise.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 18, 2025
@sradco sradco force-pushed the add_vm_in_error_alert branch from 8bf9a0d to 59731f7 Compare July 20, 2025 07:53
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 6, 2025
@sradco sradco force-pushed the add_vm_in_error_alert branch from 59731f7 to abb0edb Compare August 20, 2025 15:07
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2025
@sradco sradco force-pushed the add_vm_in_error_alert branch 3 times, most recently from 53f66ab to 85c96d5 Compare August 20, 2025 15:19
PR adds 2 alerts:
1. VM that is in error status or in transitional
status for more than 10 minutes, but doesnt have a VMI
for more than 5 minutes
2. VM that is in error status or in transitional
status and has a VMI for more than 5 minutes.

Signed-off-by: Shirly Radco <[email protected]>
@sradco sradco force-pushed the add_vm_in_error_alert branch from 85c96d5 to 3e4d379 Compare August 21, 2025 13:13
@sradco
Copy link
Contributor Author

sradco commented Aug 24, 2025

@machadovilaca @avlitman please review this PR

@sradco
Copy link
Contributor Author

sradco commented Aug 24, 2025

/retest-required

@sradco
Copy link
Contributor Author

sradco commented Aug 26, 2025

@avlitman @machadovilaca please review again

@avlitman
Copy link
Contributor

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: avlitman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2025
@machadovilaca
Copy link
Member

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2025
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-k8s-1.31-windows2016
/test pull-kubevirt-e2e-kind-1.33-vgpu
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.33-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.31-sig-network
/test pull-kubevirt-e2e-k8s-1.31-sig-storage
/test pull-kubevirt-e2e-k8s-1.31-sig-compute
/test pull-kubevirt-e2e-k8s-1.31-sig-operator
/test pull-kubevirt-e2e-k8s-1.32-sig-network
/test pull-kubevirt-e2e-k8s-1.32-sig-storage
/test pull-kubevirt-e2e-k8s-1.32-sig-compute
/test pull-kubevirt-e2e-k8s-1.32-sig-operator

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-commenter-bot
Copy link

✋🧢

/hold

Dear @sradco

⚠️ this pull request exceeds the number of retests that are allowed per individual commit.

🔎 Please check that the changes you committed are fine and that there are no infrastructure issues present!

Details Checklist:

💬 How we calculate the number of retests: The number of retest comments are the number of /test or /retest comments after the latest commit only.

👌 After all issues have been resolved, you can remove the hold on this pull request by commenting /unhold on it.

🙇 Thank you, your friendly referee automation, on behalf of the @sig-buildsystem and the KubeVirt community!

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2025
@sradco
Copy link
Contributor Author

sradco commented Sep 1, 2025

@enp0s3 Hi, This pr has lgtm and approve, how can we progress it?

@sradco
Copy link
Contributor Author

sradco commented Sep 1, 2025

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2025
@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-commenter-bot
Copy link

✋🧢

/hold

Dear @sradco

⚠️ this pull request exceeds the number of retests that are allowed per individual commit.

🔎 Please check that the changes you committed are fine and that there are no infrastructure issues present!

Details Checklist:

💬 How we calculate the number of retests: The number of retest comments are the number of /test or /retest comments after the latest commit only.

👌 After all issues have been resolved, you can remove the hold on this pull request by commenting /unhold on it.

🙇 Thank you, your friendly referee automation, on behalf of the @sig-buildsystem and the KubeVirt community!

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2025
@sradco
Copy link
Contributor Author

sradco commented Sep 2, 2025

@brianmcarey @dhiller Can you please check why this is blocked?

@avlitman
Copy link
Contributor

avlitman commented Sep 2, 2025

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2025
@kubevirt-bot kubevirt-bot merged commit a83abf8 into kubevirt:main Sep 2, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/monitoring dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/observability Denotes an issue or PR that relates to observability. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants