Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[WIP] Remove allocated snapshots / vm snapshots on start #8452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: 4.19
Choose a base branch
from

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Jan 5, 2024

Description

Some latest active snapshots / vm snapshots are stuck in allocated state when MS is stopped, these are listed / shown in UI as well (not allowed to delete). Remove them on MS start itself.

This PR removes allocated snapshots / vm snapshots on start.

Fixes #8424

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Manually tested: Take VM Snapshot -> Stop MS -> Start MS

VM Snapshot record after MS stopped =>

                 id: 1
               uuid: 07542708-27b7-47c4-959a-f7f92828f43a
               name: i-2-3-VM_VS_20240108075812
       display_name: testvm01-snap
        description: NULL
              vm_id: 3
         account_id: 2
          domain_id: 1
service_offering_id: 1
   vm_snapshot_type: DiskAndMemory
              state: Allocated
             parent: NULL
            current: NULL
       update_count: 0
            updated: NULL
            created: 2024-01-08 07:58:12
            removed: NULL

VM Snapshot record after MS started (Not listed / shown in the UI) =>

                 id: 1
               uuid: 07542708-27b7-47c4-959a-f7f92828f43a
               name: i-2-3-VM_VS_20240108075812
       display_name: testvm01-snap
        description: NULL
              vm_id: 3
         account_id: 2
          domain_id: 1
service_offering_id: 1
   vm_snapshot_type: DiskAndMemory
              state: Allocated
             parent: NULL
            current: NULL
       update_count: 0
            updated: NULL
            created: 2024-01-08 07:58:12
            removed: 2024-01-08 07:59:26

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.

Project coverage is 22.99%. Comparing base (6d916ca) to head (a00734b).
Report is 627 commits behind head on 4.19.

Files with missing lines Patch % Lines
...om/cloud/storage/snapshot/SnapshotManagerImpl.java 25.00% 2 Missing and 1 partial ⚠️
...a/com/cloud/vm/snapshot/VMSnapshotManagerImpl.java 25.00% 2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (6d916ca) and HEAD (a00734b). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (6d916ca) HEAD (a00734b)
simulator-marvin-tests 17 16
unit-tests 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #8452      +/-   ##
============================================
- Coverage     30.85%   22.99%   -7.86%     
+ Complexity    34048    23743   -10305     
============================================
  Files          5341     5209     -132     
  Lines        374861   352833   -22028     
  Branches      54518    50531    -3987     
============================================
- Hits         115659    81140   -34519     
- Misses       243973   260029   +16056     
+ Partials      15229    11664    -3565     
Flag Coverage Δ
simulator-marvin-tests 24.63% <45.45%> (-0.12%) ⬇️
uitests 4.39% <ø> (+<0.01%) ⬆️
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@weizhouapache
Copy link
Member

@sureshanaparti
would it be better to clean the snapshots by storage garbage collector which run periodically ?

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8207

@sureshanaparti
Copy link
Contributor Author

@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?

@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?

@weizhouapache
Copy link
Member

@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?

@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?

@sureshanaparti
the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates

                    // remove snapshots in Error state
                    List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);

We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process

IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.

@sureshanaparti sureshanaparti marked this pull request as ready for review January 8, 2024 08:08
@sureshanaparti
Copy link
Contributor Author

sureshanaparti commented Jan 8, 2024

@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?

@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?

@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates

                    // remove snapshots in Error state
                    List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);

We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process

IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.

@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI.

Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start.

//destroy snapshots in destroying state
List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Destroying);
for (SnapshotVO snapshotVO : snapshots) {
try {
if (!deleteSnapshot(snapshotVO.getId(), null)) {

@weizhouapache
Copy link
Member

@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?

@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates

                    // remove snapshots in Error state
                    List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);

We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process
IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.

@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI.

Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start.

//destroy snapshots in destroying state
List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Destroying);
for (SnapshotVO snapshotVO : snapshots) {
try {
if (!deleteSnapshot(snapshotVO.getId(), null)) {

@sureshanaparti
the mgmt service is a key service for some users, especially on production.
the service is running for several days without stopping.
IMHO it is not good to cleanup resource during start ...

(OK with cleaning up resource in both start process and garbage collector)

@sureshanaparti
Copy link
Contributor Author

@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?

@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates

                    // remove snapshots in Error state
                    List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);

We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process
IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.

@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI.
Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start.

//destroy snapshots in destroying state
List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Destroying);
for (SnapshotVO snapshotVO : snapshots) {
try {
if (!deleteSnapshot(snapshotVO.getId(), null)) {

@sureshanaparti the mgmt service is a key service for some users, especially on production. the service is running for several days without stopping. IMHO it is not good to cleanup resource during start ...

(OK with cleaning up resource in both start process and garbage collector)

ok, will check & update. thanks @weizhouapache

@sureshanaparti sureshanaparti marked this pull request as draft January 8, 2024 10:16
@DaanHoogland DaanHoogland added this to the 4.19.1.0 milestone Jan 22, 2024
Copy link

github-actions bot commented Feb 8, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland
Copy link
Contributor

@sureshanaparti , does this need rebasing?

@DaanHoogland
Copy link
Contributor

on another note, is it maybe safe to first mark it as State == Error and then delete? (I think this is the same @weizhouapache suggests)

@sureshanaparti sureshanaparti changed the title Remove allocated snapshots / vm snapshots on start [WIP] Remove allocated snapshots / vm snapshots on start Jun 25, 2024
@sureshanaparti sureshanaparti modified the milestones: 4.19.1.0, 4.19.2.0 Jun 25, 2024
@DaanHoogland
Copy link
Contributor

@sureshanaparti , any progress on this?

@DaanHoogland DaanHoogland changed the base branch from main to 4.19 February 4, 2025 14:12
@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12329

@DaanHoogland DaanHoogland modified the milestones: 4.19.2, 4.19.3 Feb 7, 2025
@rohityadavcloud
Copy link
Member

@sureshanaparti are you working on this?

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshanaparti what is the status for this? Do we need more changes or it needs testing?

Comment on lines +1519 to +1523
//remove snapshots in allocated state
List<SnapshotVO> allocatedSnapshots = _snapshotDao.listAllByStatus(Snapshot.State.Allocated);
for (SnapshotVO snapshot : allocatedSnapshots) {
_snapshotDao.remove(snapshot.getId());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a single call to DB - removeAllByStatus?

Comment on lines +203 to +207
//Remove VM Snapshots in allocated state
List<VMSnapshotVO> allocatedVMSnapshots = _vmSnapshotDao.listAllByStatus(VMSnapshot.State.Allocated);
for (VMSnapshotVO vmSnapshot : allocatedVMSnapshots) {
_vmSnapshotDao.remove(vmSnapshot.getId());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a single call to DB - removeAllByStatus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Vm Snapshot left in allocated state are not cleaned up
6 participants