-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Updates to capacity management #2499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to capacity management #2499
Conversation
@mike-tutkowski a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
I plan to write a Marvin test for this, but - in the meanwhile - wanted to get this PR opened so reviewers could provide comments on the production-focused code. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1801 |
2b57d32
to
0ab13da
Compare
@blueorangutan package |
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✖debian. JID-1860 |
@blueorangutan test |
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
0ab13da
to
f32fe94
Compare
Trillian test result (tid-2446)
|
I'm pretty sure none of those test failures has to do with this PR. The PR code relates only to managed storage (which none of those tests test). On top of it, the code is really concerned with somewhat of a corner case in managed storage (which none of those tests would test either). |
Yes @mike-tutkowski I think that's absolutely valid. it makes me sad to see these random failures occasionally... :( |
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
f32fe94
to
42ef44b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems to be ok here. There are room for improvements (code extraction and unit tests), though.
// This next call leads to CloudStack asking how many more bytes it will need for the template (if the template is | ||
// already stored on the primary storage, then the answer is 0). | ||
|
||
if (clusterId != null && _clusterDao.getSupportsResigning(clusterId)) { | ||
totalAskingSize += getBytesRequiredForTemplate(tmpl, pool); | ||
if (clusterId != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind extracting the block of this IF condition to a method? This would allow proper documentation an unit tests.
If the resigning is not supported this new method can return 0 as the value to be added to the totalAskingSize
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
42ef44b
to
6047333
Compare
@blueorangutan package |
@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
@blueorangutan test |
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1878 |
Trillian test result (tid-2464)
|
a569d4f
to
8c4a69b
Compare
I've added an integration test. |
All test errors seem inapplicable to this PR. Here are some examples: test_primary_storage.py: errorText:Failed to add data store: Storage pool nfs://10.2.0.16/acs/primary/pr2499-t2464-kvm-centos7/marvin_pri1 already in use by another pod (id=1)\n'] test_snapshots.py: errorText:Failed to add data store: Storage pool nfs://10.2.0.16/acs/primary/pr2499-t2464-kvm-centos7/nfs2 already in use by another pod (id=1)\n'] test_templates.py: 'AssertionError: Extract Template Failed with invalid URL http://192.168.100.96/userdata/99b8334e-ecaa-405b-9168-e902981a3c40.qcow2 (template id: 8cc43b7f-00e7-4250-acbc-53be1de58627)\n'] test_vm_life_cycle.py: errortext : u'Cannot migrate VM, destination host is not in correct state, has status: Up, state: Disabled'}, accountid : u'c600e427-38a5-11e8-a6b6-06db8e010701'}\n"] test_volumes.py: 'AssertionError: Extract Volume Failed with invalid URL http://192.168.100.96/userdata/c146f89d-12e8-4a34-8087-79e66e110239.qcow2 (vol id: ab60d379-a5d3-471a-b17f-7df204e48e53)\n'] |
Trillian test result (tid-2470)
|
8c4a69b
to
f527eae
Compare
These test failures do not seem to be related to this PR. Here are some examples of the errors: test_primary_storage.py: test_snapshots.py: test_vm_life_cycle.py: |
Thanks fro the integration test @mike-tutkowski, let me repackage and run them again |
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1899 |
@blueorangutan test |
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-2490)
|
The test environment is having an issue when we try to put an NFS-based primary storage in maintenance mode. In test_primary_storage.py, the first error is related to that and then we later see other errors where adding a new primary storage with the same name fails because it's already in use (presumably we were originally going to delete the primary storage after putting it in maintenance mode, but putting it in maintenance mode failed). Is this error scenario unique to this PR? It seems like the code in this PR wouldn't be responsible for such a situation. On the up side, both Jenkins and Travis passed. test_primary_storage.py: errorText:Primary storage with id 5 cannot be disabled. Storage pool state : Maintenance\n' test_routers.py: 'AssertionError: Check uptime is less than 3 mins or not\n' test_snapshots.py: errorText:Failed to add data store: Storage pool nfs://10.2.0.16/acs/primary/pr2499-t2490-kvm-centos7/nfs2 already in use by another pod (id=1)\n' test_vm_life_cycle.py: errortext : u'Cannot migrate VM, destination host is not in correct state, has status: Up, state: Disabled' |
I looked at several of the error messages for a recent test run of #2486 and it seems the list is quite similar to the list of error messages for this PR. As such, I suggest it is likely that none of the errors that are listed for this test run are related to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Two LGTMs and regression tests looking good, so merging. |
@mike-tutkowski a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1927 |
@mike-tutkowski I scanned PRs merged on master but not 4.11 and came across this PR, do you think this would be useful for 4.11? If so, can you help create a backport PR for 4.11? |
Yes, @rhtyd, you are correct that this PR is a good candidate for 4.11.1. I can create a PR to back port it. |
Thanks @mike-tutkowski, looking forward to your port-PR. |
Hey @rhtyd - What's the official process for backporting a PR like this that has already been merged? Should I just cherry pick the commit to 4.11.1? Clearly there's not much tracking going on that way, but I'm not sure how we officially do this. Thanks! |
@mike-tutkowski yes, please cherry-pick or manuall port your changes to 4.11 branch. You can then either create a new PR, or push to this backport PR #2621 (on this PR both rafael and I have push changes). There is no official guidelines around things, generally we should send bugfix PRs towards LTS branch or the previous release's branch. |
I went ahead and pushed the cherry-picked commit to #2621. |
Description
In StorageManagerImpl.storagePoolHasEnoughSpace, we need to update a couple areas of the algorithm that calculates if enough space is present when dealing with managed storage:
We no longer can rely on managed storage being exclusively at the zone level. Check if the storage is managed (not if if it at the zone level).
Invoke getBytesRequiredForTemplate not only for XenServer when getSupportsResigning resolves to true, but also if using VMware or KVM.
https://issues.apache.org/jira/browse/CLOUDSTACK-10335
Types of changes
How Has This Been Tested?
Initially I noticed on VMware and KVM that templates were not being included in the space used for primary storage when that storage is managed. I made the necessary changes (included in this PR) and then checked space used to verify that the new calculated number was now accurate for managed storage when using those hypervisor types.
Checklist:
@blueorangutan package