Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@noamasu
Copy link
Contributor

@noamasu noamasu commented Jun 25, 2025

VM snapshot with qemu-guest-agent fails when the domain is paused, because virt-controller sends a freeze/unfreeze request during snapshotting. This freeze/unfreeze is handled by virt-launcher, but it currently fails on paused VMs. Since paused VMs have a consistent filesystem state (similar to frozen), this change makes virt-launcher treat freeze as successful when the domain is paused. This allows snapshotting to continue safely.

What this PR does

Before this PR:

  • VM snapshot with qemu-guest-agent fails when the domain is paused. This happens since the snapshot flow trigger a freeze but since the VM is paused, freeze fails causing snapshot flow to also fail. since a Paused VM is considered a safe state to perform a snapshot, we expect it to pass when the VM is paused, but it doesnt.
  • it is possible to unpause a VM during snapshot, which is not desired.
  • a false warning message is being printed saying it is not safe to snapshot a running VM (without qemu-guest-agent) even though the VM is paused.
  • virt-launcher's agent poller is spamming qemu-gest-agent command failures when VM is paused.

After this PR:

  • Possible to complete a VM (with qemu-guest-agent) snapshot successfully even if the VM is paused.
  • Make sure unpause will be rejected if vm.Status.SnapshotInProgress is not nil.
  • prevent a warning message from being printed for when snapshot is being taken while VM (without qemu-guest-agent) is still running even though the VM is paused.
  • virt-launcher's agent poller skips qemu-guest-agent commands if the VM is paused.

References

Fixes #10759

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

@ShellyKa13
@codingben (check out the change for agent_poller.go)

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

bugfix: Enable vmsnapshot for paused VMs

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 25, 2025
@kubevirt-bot
Copy link
Contributor

Hi @noamasu. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @noamasu - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I've added some comments.

} else {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
return err
}
}
return err

nit: additional else not required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check for not found and do a new error? why not return the error which is not found anyways?

Comment on lines 1914 to 1915
} else {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

} else {
vmi = libvmifact.NewAlpine(libnet.WithMasqueradeNetworking())
}
vmi.Namespace = testsuite.GetTestNamespace(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also libvmi.WithNamespace. Can you put both libvmi Options into an opts slice?

Comment on lines 591 to 606
vm.Spec.Template.Spec.Domain.Devices.Disks = append(vm.Spec.Template.Spec.Domain.Devices.Disks, v1.Disk{
Name: "blank",
DiskDevice: v1.DiskDevice{
Disk: &v1.DiskTarget{
Bus: v1.DiskBusVirtio,
},
},
})
vm.Spec.Template.Spec.Volumes = append(vm.Spec.Template.Spec.Volumes, v1.Volume{
Name: "blank",
VolumeSource: v1.VolumeSource{
DataVolume: &v1.DataVolumeSource{
Name: "dv-" + vm.Name,
},
},
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that DV to the vmi first (with libvmi) and then create a VM from it?

},
})

vm = libvmi.NewVirtualMachine(vmi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call this func twice, also in L577

Comment on lines 635 to 636
Entry("[test_id:7000] with guest-agent", true),
Entry("[test_id:7001] no guest-agent", false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide the libvmifact function here instead of a boolean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xFelix , i tried to do:

	Entry("[test_id:7000] with guest-agent", libvmifact.NewFedora),
	Entry("[test_id:7001] without guest-agent", libvmifact.NewAlpine),

insead of boolean,
but becuase i need this condition to wait for the guest agent:

		if imageFunc == libvmifact.NewFedora {
			Eventually(matcher.ThisVMI(vmi), 12*time.Minute, 2*time.Second).
				Should(matcher.HaveConditionTrue(v1.VirtualMachineInstanceAgentConnected))
		}

I cannot compare a function like imageFunc == libvmifact.NewFedora ... so seems like i have to pass a boolean anyways?
wondering what how to handle this if statement if i dont have the boolean

// GET_AGENT - According to libvirt engineers this command shouldn't be used
// by KubeVirt, because it provides irrelevant information (version and supported commands).
func executeAgentCommands(commands []AgentCommand, agentPoller *AgentPoller) {
dom, err := agentPoller.Connection.LookupDomainByName(agentPoller.domainName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that these changes are for virt-launcher's agent poller is spamming qemu-gest-agent command failures when VM is paused.? If so, can you put them into a separate commit please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i will make a separate commit for this, and add relevant tests to it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition to the PR, thanks

Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! a couple of comments, also I will appreciate if you divide the PR to several commit
1.snapshot source change
2. manager freeze unfreeze changes + ut
3. lifecycle change + ut
4. extra change for running agent commands + ut
5. functional test


}

func (app *SubresourceAPIApp) UnpauseVMIRequestHandler(request *restful.Request, response *restful.Response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add this case to the subresource unit test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we also need to prevent from doing pause during snapshot? it might fail but should we even pass it along if it will just fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more I think about it I'm a bit conflicted on doing anything special in pause/unpause subresource since the VMI is not just paused/unpaused by users may happen when internal error occurs like disk i/o error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks I think we should at least handle cases of unpause done by users..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess unpause is fine

writeError(statusErr, response)
return
}
if vm.Status.SnapshotInProgress != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to add such a check in every state change subresource, I assume it is possible that the subresource will be used directly without modifing the vm yaml (where we deny with the admitter), @mhenriks wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm as far as I know, the subresources usually update something in the status and then a controller applies the change to the spec. When that happen the webhook will be invoked, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified with Cursor :) and as I thought in this cases the subresource call itself doesnt change vm/vmi which doesnt call the webhook and then the subresource passes the request directly to virt-handler which passes it right away to trough the grpc connection to the virt-handler to libvirt..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

far be it from me to question the AI but I'm pretty sure not "every state change subresource" is processed like that. That may be the case for pause because that is fundamentally an operation on the VMI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks technically any vm state change can come from virtctl(and I would assume UI also) bypassing the vm/vmi yaml change directly using the subresource. I think in such case the cases mentioned in the commit should be avoided is possible. We already do that in the webhook level so why not also subresource level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about what subresource calls? I know that VM start/stop subresource updates the VM spec/status which should invoke the validating webhook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right I see you are right, they call the subresource directly and the subresource patches the vm with start and stop. So I guess only pause/unpause does not go through the yaml

// GET_AGENT - According to libvirt engineers this command shouldn't be used
// by KubeVirt, because it provides irrelevant information (version and supported commands).
func executeAgentCommands(commands []AgentCommand, agentPoller *AgentPoller) {
dom, err := agentPoller.Connection.LookupDomainByName(agentPoller.domainName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition to the PR, thanks

if paused, err := util.DomainIsPaused(dom); err != nil {
log.Log.Errorf("cannot determine domain state: %v", err)
return
} else if paused {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the else
just do if paused

} else {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check for not found and do a new error? why not return the error which is not found anyways?

}
defer domain.Free()

if paused, err := util.DomainIsPaused(domain); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you added this function you can also use it in other places which check if domain is pause like:
UnpauseVMI

Comment on lines 635 to 636
Entry("[test_id:7000] with guest-agent", true),
Entry("[test_id:7001] no guest-agent", false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


online := exists

running := exists && s.vm.Status.PrintableStatus == kubevirtv1.VirtualMachineStatusRunning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks what do you think about using PrintableStatus? its not being relied upon to a big thing just whether to print on not the error message, but usually we prefer not to rely on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably okay for Running but I wouldn't rely on it for anything else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be best to just check the VMI conditions instead

}
defer dom.Free()

if paused, err := util.DomainIsPaused(dom); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you added these changes only to executeAgentCommands()? because of the relevant QEMU commands in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the response!
i added it because the agent poller's executeAgentCommands spam the logs with failed qemu agent command failures when the VM is paused (qemu guest agent cannot execute commands if the VM is not running). do you think we need to add it to other places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also fetchAndStoreGuestInfo() that is using libvirt API to get relevant info instead of QEMU directly as an abstraction above it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that since fetchAndStoreGuestInfo() doesn't rely on QEMU guest agent commands, so it's acceptable to retrieve information via the libvirt API even when the VM is paused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i take it back, you are right.
domain.GetGuestInfo(infoTypes, 0) will also fail when the VM is paused, as these messages are also being spammed

{"component":"virt-launcher","level":"info","msg":"Polling API operations: 1","pos":"agent_poller.go:417","timestamp":"2025-06-29T18:45:50.316563Z"}
{"component":"virt-launcher","level":"error","msg":"Fetching guest info failed: virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","pos":"agent_poller.go:432","times

i will work on a fix that will cover both fetchAndStoreGuestInfo and executeAgentCommands
sorry for the confusion :)

@noamasu noamasu force-pushed the bugfix/snapshot_when_vm_paused_fixes branch from c0cf59f to 680daf9 Compare June 30, 2025 13:30
@mhenriks
Copy link
Member

mhenriks commented Jul 1, 2025

Since paused VMs have a consistent filesystem state (similar to frozen),

@noamasu can you supply some documentation for this? I don't think buffers are flushed when a VM is paused. Consider that a VM may be paused automatically by an I/O error


s.state = &sourceState{
online: online,
running: running,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just have online be false if the VM is paused and not add a new state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked another way, why can't we simply treat a paused VM as offline?

@noamasu
Copy link
Contributor Author

noamasu commented Jul 2, 2025

Hey @mhenriks, thank you for the feedback.

Since paused VMs have a consistent filesystem state (similar to frozen),
the assumption was according to the Jira ticket's expected result - "VMSnapshot completed skipping freeze if VM is paused, and not fail".

after looking into this - sorry for this inaccuracy, you are right, pause will keep memory unflushed - risking a non-stable snapshot of the filesystem.

looks like just flushing the suspended vm is also not possible since you need a running VM to do so.

indeed snapshot of a paused VM is not safe, much like a running VM (memory not flushed to disk)...

knowing that's the case what do you think is the best way to deal with this as I see 2 options here:

  1. treat "paused" the same as "running"... which means to check if VM is paused/running (online) and print the existing warning message of "not safe to snapshot a paused/running VM". do you think a quesedFailed indication should be presented in that case?
  2. in the freeze function (snapshot/source.go): if the vm is paused - unpause the vm before freeze, then do the freeze -> snapshot -> unfreeze regularly, then pause the vm again. (I dont think its ok to change the the VM state tbh)

@mhenriks
Copy link
Member

mhenriks commented Jul 2, 2025

  1. treat "paused" the same as "running"... which means to check if VM is paused/running (online) and print the existing warning message of "not safe to snapshot a paused/running VM". do you think a quesedFailed indication should be presented in that case?

@noamasu I think this is the better option, may even want to create a new indication for this case

@noamasu noamasu mentioned this pull request Jul 3, 2025
8 tasks
@noamasu noamasu force-pushed the bugfix/snapshot_when_vm_paused_fixes branch from 680daf9 to 44c254b Compare July 6, 2025 21:06
@kubevirt-bot kubevirt-bot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 6, 2025
@noamasu noamasu force-pushed the bugfix/snapshot_when_vm_paused_fixes branch 2 times, most recently from ec76470 to d0da165 Compare July 6, 2025 21:45
@mhenriks
Copy link
Member

mhenriks commented Jul 8, 2025

I think this is looking pretty good, what are your thoughts @ShellyKa13?

Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions

}

// Check for paused VM
if source.Paused() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you say about checking this before the guestagent and in case of paused not add the guestagent indication since it doesnt affect the snapshot. i.e
if source.Paused{
...
else if source.GuestAgent{
...
} else{
...
}

return nil
}

if s.Paused() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add it to the first if:
if !s.Locked() || !s.GuestAgent() || s.Paused() {

VMSnapshotNoGuestAgentIndication Indication = "NoGuestAgent"
VMSnapshotGuestAgentIndication Indication = "GuestAgent"
VMSnapshotQuiesceFailedIndication Indication = "QuiesceFailed"
VMSnapshotPausedIndication Indication = "Paused"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probable be in the previous commit, or move the indications part from previous commit to this commit

writeError(statusErr, response)
return
}
if vm.Status.SnapshotInProgress != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified with Cursor :) and as I thought in this cases the subresource call itself doesnt change vm/vmi which doesnt call the webhook and then the subresource passes the request directly to virt-handler which passes it right away to trough the grpc connection to the virt-handler to libvirt..


}

func (app *SubresourceAPIApp) UnpauseVMIRequestHandler(request *restful.Request, response *restful.Response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we also need to prevent from doing pause during snapshot? it might fail but should we even pass it along if it will just fail?

@noamasu noamasu force-pushed the bugfix/snapshot_when_vm_paused_fixes branch from 35e2247 to ef98e10 Compare July 15, 2025 13:32
return nil
}

if s.Paused() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should first check the paused condition before the guestagent one

return updatedVMI.Status.FSFreezeStatus
}, 30*time.Second, 2*time.Second).Should(BeEmpty())
})
DescribeTable("should succeed snapshot when VM is paused with Paused indication",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing new line before this block

checkOnlineSnapshotExpectedContentSource(vm, contentName, true)
},

Entry("[test_id:7000] with guest-agent", libvmifact.NewFedora, true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we really need to run this test twice for with or without guest agent.. I think with guest agent is enough


if source.GuestAgent() {
if source.Paused() {
indications = sets.Insert(indications, snapshotv1.VMSnapshotPausedIndication)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about a unit test for this?


}

func (app *SubresourceAPIApp) UnpauseVMIRequestHandler(request *restful.Request, response *restful.Response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks I think we should at least handle cases of unpause done by users..

writeError(statusErr, response)
return
}
if vm.Status.SnapshotInProgress != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks technically any vm state change can come from virtctl(and I would assume UI also) bypassing the vm/vmi yaml change directly using the subresource. I think in such case the cases mentioned in the commit should be avoided is possible. We already do that in the webhook level so why not also subresource level?

writeError(statusErr, response)
return
}
if vm.Status.SnapshotInProgress != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont think we can do this in the unfreeze since the snapshot is still in progress when you call unfreeze to complete the snapshot. And also there is an automatic unfreeze mechanism we use the prevent a case the vm is kept frozen if something happen to the snapshot and it for some reason doesnt unfreeze.

}
}
_, err := app.fetchVirtualMachine(name, namespace)
vm, err := app.fetchVirtualMachine(name, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in this case Im not sure we want to prevent migrate vm call.. I believe the snapshot will just fail if the vm was migrated in the process and if not dont think it will cause data change anyways

@noamasu noamasu force-pushed the bugfix/snapshot_when_vm_paused_fixes branch from ef98e10 to 4d6d98b Compare July 17, 2025 12:45
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@ShellyKa13
Copy link
Contributor

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2025
@kubevirt-bot
Copy link
Contributor

Pull requests that are marked with lgtm should receive a review
from an approver within 1 week.

After that period the bot marks them with the label needs-approver-review.

/label needs-approver-review

@kubevirt-bot kubevirt-bot added the needs-approver-review Indicates that a PR requires a review from an approver. label Aug 3, 2025
@mhenriks
Copy link
Member

mhenriks commented Aug 4, 2025

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mhenriks

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2025
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-k8s-1.31-windows2016
/test pull-kubevirt-e2e-kind-1.33-vgpu
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.33-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.31-sig-network
/test pull-kubevirt-e2e-k8s-1.31-sig-storage
/test pull-kubevirt-e2e-k8s-1.31-sig-compute
/test pull-kubevirt-e2e-k8s-1.31-sig-operator
/test pull-kubevirt-e2e-k8s-1.32-sig-network
/test pull-kubevirt-e2e-k8s-1.32-sig-storage
/test pull-kubevirt-e2e-k8s-1.32-sig-compute
/test pull-kubevirt-e2e-k8s-1.32-sig-operator

@kubevirt-bot kubevirt-bot merged commit 1d23e32 into kubevirt:main Aug 4, 2025
41 checks passed
@kubevirt-bot
Copy link
Contributor

/remove-label needs-approver-review

@kubevirt-bot kubevirt-bot removed the needs-approver-review Indicates that a PR requires a review from an approver. label Aug 5, 2025
@ShellyKa13
Copy link
Contributor

I think we should backport it the relevant issue #10759 was opened several versions ago

@noamasu
Copy link
Contributor Author

noamasu commented Aug 6, 2025

/cherry-pick release-1.6

@kubevirt-bot
Copy link
Contributor

@noamasu: only kubevirt org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually.

Details

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ShellyKa13
Copy link
Contributor

/cherry-pick release-1.6

@kubevirt-bot
Copy link
Contributor

@ShellyKa13: new pull request created: #15385

Details

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ShellyKa13
Copy link
Contributor

/cherry-pick release-1.5

@kubevirt-bot
Copy link
Contributor

@ShellyKa13: #15001 failed to apply on top of branch "release-1.5":

Applying: Treat paused VMs as unsafe for snapshots
Applying: Add Paused indication for VM snapshots
Using index info to reconstruct a base tree...
M	pkg/storage/snapshot/snapshot.go
M	pkg/storage/snapshot/snapshot_test.go
M	staging/src/kubevirt.io/api/snapshot/v1beta1/types.go
M	tests/storage/snapshot.go
Falling back to patching base and 3-way merge...
Auto-merging tests/storage/snapshot.go
CONFLICT (content): Merge conflict in tests/storage/snapshot.go
Auto-merging staging/src/kubevirt.io/api/snapshot/v1beta1/types.go
Auto-merging pkg/storage/snapshot/snapshot_test.go
Auto-merging pkg/storage/snapshot/snapshot.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Add Paused indication for VM snapshots

Details

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 2, 2025
Manual backport of kubevirt#15001

Validates vm.Status.SnapshotInProgress and refuses operations
if it's non-nil, preventing VMs from being modified mid-snapshot.

This ensures snapshot consistency by preventing state changes during
the snapshot process, which could lead to undefined behavior or
inconsistent snapshots.

Operations protected:
- UnpauseVMIRequestHandler

Signed-off-by: Noam Assouline <[email protected]>
noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 9, 2025
Manual backport of kubevirt#15001

Validates vm.Status.SnapshotInProgress and refuses operations
if it's non-nil, preventing VMs from being modified mid-snapshot.

This ensures snapshot consistency by preventing state changes during
the snapshot process, which could lead to undefined behavior or
inconsistent snapshots.

Operations protected:
- UnpauseVMIRequestHandler

Signed-off-by: Noam Assouline <[email protected]>
noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 14, 2025
manual backport of kubevirt#15001

Introduces VMSnapshotPausedIndication to provide programmatic
indication when snapshots are taken of paused VMs. This allows
users and tooling to identify snapshots that may have consistency
issues due to unflushed memory buffers.

The indication appears in the VirtualMachineSnapshot status,
consistent with existing indications like QuiesceFailed and
NoGuestAgent. Includes functional tests to verify the indication
is properly set.

Signed-off-by: Noam Assouline <[email protected]>

lala
noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 14, 2025
Manual backport of kubevirt#15001

Validates vm.Status.SnapshotInProgress and refuses operations
if it's non-nil, preventing VMs from being modified mid-snapshot.

This ensures snapshot consistency by preventing state changes during
the snapshot process, which could lead to undefined behavior or
inconsistent snapshots.

Operations protected:
- UnpauseVMIRequestHandler

Signed-off-by: Noam Assouline <[email protected]>
noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 16, 2025
manual backport of kubevirt#15001

Introduces VMSnapshotPausedIndication to provide programmatic
indication when snapshots are taken of paused VMs. This allows
users and tooling to identify snapshots that may have consistency
issues due to unflushed memory buffers.

The indication appears in the VirtualMachineSnapshot status,
consistent with existing indications like QuiesceFailed and
NoGuestAgent. Includes functional tests to verify the indication
is properly set.

Signed-off-by: Noam Assouline <[email protected]>
noamasu added a commit to noamasu/kubevirt that referenced this pull request Sep 16, 2025
Manual backport of kubevirt#15001

Validates vm.Status.SnapshotInProgress and refuses operations
if it's non-nil, preventing VMs from being modified mid-snapshot.

This ensures snapshot consistency by preventing state changes during
the snapshot process, which could lead to undefined behavior or
inconsistent snapshots.

Operations protected:
- UnpauseVMIRequestHandler

Signed-off-by: Noam Assouline <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/launcher dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/compute sig/network sig/storage size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Paused vm can't make a VirtualMachineSnapshot

7 participants