Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Suggestion: When Global Setting such as (network.loadbalancer.haproxy.max.conn) is changed, mark VR as 'Requires Upgrade' instead of marking it as failed healtcheck. #9800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
btzq opened this issue Oct 15, 2024 · 10 comments · May be fixed by #10710
Labels
Milestone

Comments

@btzq
Copy link

btzq commented Oct 15, 2024

ISSUE TYPE
  • Improvement Request
COMPONENT NAME
Virtual Router, HA Proxy
CLOUDSTACK VERSION
4.19.1
CONFIGURATION
OS / ENVIRONMENT
SUMMARY

One of our customers required larger HA Proxy Max Connections as they have many users connecting at the same time.

So, we change the default value of the below parameter in Global Settings to a new one:

  • network.loadbalancer.haproxy.max.conn = 500,000 (Previous is 4096, which was the default value)

Once implemented, and we restarted the cloudstack server, we got a whole bunch of healthcheck failures.

Screenshot below:
Screenshot 2024-10-15 at 10 29 55 PM

Screenshot 2024-10-15 at 10 31 06 PM

In this case, I dont think this should be counted as a healthcheck issue. Because the service seems to be working fine.

I think what would be a better experience for the operator, is to mark the router as 'Requires Upgrade'.

Because the VR does not need to be re-created. It just needed to be forced rebooted. (FYI, normal reboot doesnt seem to cause the VR to load the new maxconn value).

And as an operator, we rely on the 'Alert' section to ensure all customer VR are working normally. This current behavior creates alot of noise.

Even better, is for each customer to be able set their own (network.loadbalancer.haproxy.max.conn) value, and additional settings. Because not all customers requires such large values.

STEPS TO REPRODUCE
Refer above
EXPECTED RESULTS
Mark the router as 'Requires Upgrade', when a Global Setting is changed, such as network.loadbalancer.haproxy.max.conn
ACTUAL RESULTS
Bombarded with Health Check fails for all VRs created, which requires manual force reboot or cleanup VR. (normal reboot doesnt work).
@weizhouapache
Copy link
Member

seems to be a valid bug

@DaanHoogland
Copy link
Contributor

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

@DaanHoogland
Copy link
Contributor

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

In fact is "upgrade" really the "required" thing? I would think it requires a restart/cleanup. I am not sure if it makes sense to add that as a flag, but at first sight it seems more appropriate.

@weizhouapache
Copy link
Member

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

In fact is "upgrade" really the "required" thing? I would think it requires a restart/cleanup. I am not sure if it makes sense to add that as a flag, but at first sight it seems more appropriate.

+1
maybe create another flag

alternatively, we could introduce a new level of health check result ?

  • Success (everything is good)
  • Failure (service is down, VM config is missing, etc)
  • Alert (some value mismatch but do not really cause a failure)

@DaanHoogland
Copy link
Contributor

* Alert (some value mismatch but do not really cause a failure)

with a yellow point (as opposed to green or red)

@weizhouapache
Copy link
Member

* Alert (some value mismatch but do not really cause a failure)

with a yellow point (as opposed to green or red)

yeah, makes sense ?

@btzq
Copy link
Author

btzq commented Apr 14, 2025

@weizhouapache @DaanHoogland yup i think it makes sense. Less misleading, more clearer, less questions.

@DaanHoogland
Copy link
Contributor

@btzq , i have the basic mechs in place

@btzq
Copy link
Author

btzq commented Apr 25, 2025

Hey @DaanHoogland ,

The 'Warning' state for when any of the health check needs attention (but is not in any error/failed state) makes sense. The UI in the ticket attached makes sense, but i see the screenshot is only pertaining to the health check page.

I believe there are few more screens to account for:

  • Alerts Page - How will this alert message look like if a health check is in a warning state? What about webhooks?
  • Events Page - I believe this message output will be the same as whats written in the Healthcheck or Alerts page?
  • Virtual Router Page - Will the 'Requires Upgrade' field turn to 'Yes' when any healthcheck is in a warning state?

@DaanHoogland
Copy link
Contributor

Hey @DaanHoogland ,

The 'Warning' state for when any of the health check needs attention (but is not in any error/failed state) makes sense. The UI in the ticket attached makes sense, but i see the screenshot is only pertaining to the health check page.

I believe there are few more screens to account for:

  • Alerts Page - How will this alert message look like if a health check is in a warning state? What about webhooks?
  • Events Page - I believe this message output will be the same as whats written in the Healthcheck or Alerts page?
  • Virtual Router Page - Will the 'Requires Upgrade' field turn to 'Yes' when any healthcheck is in a warning state?

@btzq , It makes all sense, but I will not take this into scope for now. I am fine if this evolve as we go along, though. For now I will focus on the points I mentioned (in #9800 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Dev In Progress
3 participants