Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Fleet Server] Fleet Server Observability #812

@mostlyjason

Description

@mostlyjason

Let's add a dashboard for Fleet Server operators to help them scale the server when needed and troubleshoot issues

It should show:

  • System metrics like CPU, memory usage by host over time. Should be accurate for VMs and containers. Helps to identify infrastructure limits. This should work for agents with system monitoring enabled and hosted agents with only internal monitoring.
  • Fleet server process metrics like CPU, memory usage by host over time. Should be accurate for VMs and containers. Identifies capacity usage from Fleet Server compared to other processes.
  • Status by host over time. Provides a history of when Fleet Server was offline, updating, unhealthy, etc.
  • Log stream component showing errors. Useful for troubleshooting.
  • Add a note with a link to the stack monitoring app where users can monitor APM server and standalone FB/MB, which are in the same container.
  • Filter on hostname. Lets operators isolate metrics from particular Fleet server hosts.

Stretch:

  • Number of active connections by host over time. Lets operators see resource usage as a function of capacity.
  • Number of rejected connections by host over time. Lets operators see when limits are reached and the impact on clients.

Related issues:

Open questions:

  1. Should we have a separate dashboard for Fleet Server or combine it with the Elastic Agent dashboard?
    • They should have separate dashboards. They have separate metrics to visualize, like only Fleet Server has connection count. System metrics are particularly useful for Fleet Server because the goal is to maximize utilization and observe when its necessary to scale the infrastructure. The Elastic Agent running on an endpoint should have low utilization so it will be easier to visualize these use cases separately. Also, its a standard pattern to include dashboards for each integration, so it will be more discoverable as part of the Fleet Server integration.
  2. Confirm system metrics are enabled on cloud
  3. How can we filter the fleet server hosts from the other hosts in the dashboards?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions