Is your improvement request related to a feature? Please describe (π if you like this request)
Several Longhorn Prometheus metrics (CPU/memory usage of nodes, instance manager pods, and manager pods) depend on the Kubernetes Metrics Server API (metrics.k8s.io). However, on clusters where metrics-server is not installed, longhorn-manager keeps issuing these API calls without any awareness that the API is unavailable.
Problems
- On every Prometheus scrape (typically every 30s), calls to
NodeMetricses().List() / PodMetricses().List() fail and produce an Error during scrape warning. These warnings accumulate continuously.
- While each failure is graceful (collectors log and skip), the result is persistent log noise and unnecessary API traffic in production environments without metrics-server.
- There is currently no user-facing way to disable this behavior.
Call sites (4 locations)
metrics_collector/node_collector.go β collectNodeActualCPUMemoryUsage
metrics_collector/instance_manager_collector.go β collectActualUsage
metrics_collector/manager_collector.go β Collect
controller/setting_controller.go β collectResourceUsage (invoked by the upgrade checker when sending cluster usage along with the version check request)
Describe the solution you'd like
- Setting name:
kubernetes-metrics-server-metrics-enabled
- Type: Bool, Default:
true (preserves existing behavior / backward compatible)
- Category: General
When this setting is false, all four call sites above short-circuit with an early return and skip the metrics-server API call. As a result, only the following 6 metrics stop being exposed:
longhorn_node_cpu_usage_millicpu, longhorn_node_memory_usage_bytes
longhorn_instance_manager_cpu_usage_millicpu, longhorn_instance_manager_memory_usage_bytes
longhorn_manager_cpu_usage_millicpu, longhorn_manager_memory_usage_bytes
All other metrics that do not depend on metrics-server β capacity, resource requests, storage, status, counts, gRPC connections, etc. β continue to be exposed unchanged. This is a partial, surgical disable rather than turning off entire collectors.
Describe alternatives you've considered
No response
Additional context
No response
Is your improvement request related to a feature? Please describe (π if you like this request)
Several Longhorn Prometheus metrics (CPU/memory usage of nodes, instance manager pods, and manager pods) depend on the Kubernetes Metrics Server API (
metrics.k8s.io). However, on clusters where metrics-server is not installed, longhorn-manager keeps issuing these API calls without any awareness that the API is unavailable.Problems
NodeMetricses().List()/PodMetricses().List()fail and produce anError during scrapewarning. These warnings accumulate continuously.Call sites (4 locations)
metrics_collector/node_collector.goβcollectNodeActualCPUMemoryUsagemetrics_collector/instance_manager_collector.goβcollectActualUsagemetrics_collector/manager_collector.goβCollectcontroller/setting_controller.goβcollectResourceUsage(invoked by the upgrade checker when sending cluster usage along with the version check request)Describe the solution you'd like
kubernetes-metrics-server-metrics-enabledtrue(preserves existing behavior / backward compatible)When this setting is
false, all four call sites above short-circuit with an early return and skip the metrics-server API call. As a result, only the following 6 metrics stop being exposed:longhorn_node_cpu_usage_millicpu,longhorn_node_memory_usage_byteslonghorn_instance_manager_cpu_usage_millicpu,longhorn_instance_manager_memory_usage_byteslonghorn_manager_cpu_usage_millicpu,longhorn_manager_memory_usage_bytesAll other metrics that do not depend on metrics-server β capacity, resource requests, storage, status, counts, gRPC connections, etc. β continue to be exposed unchanged. This is a partial, surgical disable rather than turning off entire collectors.
Describe alternatives you've considered
No response
Additional context
No response