RFC: Insights page for Coder admins #8109
Replies: 10 comments 22 replies
-
I like this first iteration. It is pretty clear to me what the first steps are and next steps. What do you think about also switching the order of items in the sidebar? (failed actions are next to failed builds, active users are next to DAU chart) |
Beta Was this translation helpful? Give feedback.
-
Should this be per-template instead of global? Or maybe we allow for a filter... |
Beta Was this translation helpful? Give feedback.
-
Here's a query that returns the connection latency in milliseconds for all users grouped by template: SELECT
user_id,
template_id,
coalesce((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY connection_median_latency_ms)), -1)::FLOAT AS workspace_connection_latency_50,
coalesce((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY connection_median_latency_ms)), -1)::FLOAT AS workspace_connection_latency_95
FROM
workspace_agent_stats WHERE connection_median_latency_ms > 0 GROUP BY user_id, template_id; This would let us easily understand which users are having a bad experience with specific templates. It's important to group by template because some templates might be region limited. |
Beta Was this translation helpful? Give feedback.
-
I like the initial concept for the Insights page. We decide to add more visualizations later, but the graphs you placed on the mockup are good candidates for the MVP of the feature. I'm wondering if we should make them draggable, so DevOps can rearrange or show/hide a few of them.
π What you presented here is good enough to start drafting these requirements. I'm curious if we should pack all relevant endpoints behind the API family |
Beta Was this translation helpful? Give feedback.
-
So what I'm understanding is we want to have two insights page, one for the deployment and another one for a given template. Is that correct @bpmct @kylecarbs ? If yes, do you think we could start by developing the one related to the deployment since we already have the mock? |
Beta Was this translation helpful? Give feedback.
-
I prefer bars instead of lines for 1st and 3rd plots. |
Beta Was this translation helpful? Give feedback.
-
As part of this RFC, I'd like to see a draft for public backend APIs as early as possible. (posting here not to forget) |
Beta Was this translation helpful? Give feedback.
-
This is a proposal for the backend API of RFC: Insights page for Coder admins. This proposal introduces a single endpoint for reporting template insights (or deployment wide, given no template filter). The motivation behind this is to simplify the API and reduce the number of requests needed to get all the data for the insights page. Outside the scope of the proposal, this format can also help ensure data consistency between weekly/daily intervals (for instance when viewing this week and new data came in between the two requests). This also lets us handle concurrency on the server-side instead of the client performing multiple concurrent requests. We would introduce the following endpoint, request and response:
{
"report": {
"start_time": "2023-07-01T00:00:00.000000Z",
"end_time": "2023-07-08T00:00:00.000000Z",
"templates": ["uuid1", "uuid2"],
"active_users": 22,
"user_latency": [
{
"user_id": "fcb9f5c7-ad6d-4515-b12e-496bc04ca116", // Optional, useful for linking.
"name": "John Doe",
"connection_latency_ms": {
"P50": 5.601,
"P95": 16.352049999999984
}
},
{
"user_id": "aee4bef9-479f-488e-abb4-b2bce2bf9e0d",
"name": "Jane Doe",
"connection_latency_ms": {
"P50": 31.312,
"P95": 119.832
}
}
],
"usage_builtin": {
"vscode": {
// TODO: Name + icon here too, to simplify the UI?
"seconds": 54000
},
"jetbrains": {
"seconds": 900
},
"web-terminal": {
"seconds": 5400
},
"ssh": {
"seconds": 10800
}
},
"usage_apps": [
{
// As long as name/slug/icon match, we can merge these between multiple templates.
"display_name": "code-server",
"slug": "code-server",
"icon": "/icon/code.svg",
"seconds": 10800,
}
]
"usage_parameters": [
{
// As long as name/slug match, we can merge these between multiple templates.
"display_name": "Coder Repository Directory",
"name": "coder_repository_directory",
"values": [
{
"value": "~/coder",
"icon": "",
"count": 10
},
{
"value": "~/coder.com",
"icon": "",
"count": 2
}
]
},
{
"display_name": "Dotfiles URL",
"name": "dotfiles_url",
"values": [
{
"value": "~/usr/.file",
"icon": "",
"count": 10
},
{
"value": null,
"icon": "",
"count": 2
}
]
},
{
"display_name": "Region",
"name": "region",
"values": [
{
"value": "Pittsburgh",
"icon": "/icon/flag1.svg",
"count": 8
},
{
"value": "Helsinki",
"icon": "/icon/flag1.svg",
"count": 2
},
{
"value": "Sydney",
"icon": "/icon/flag3.svg",
"count": 1
},
{
"value": "Sao Paulo",
"icon": "/icon/flag4.svg",
"count": 1
}
]
}
]
},
"interval_reports": [
{
"start_time": "2023-07-01T00:00:00.000000Z",
"end_time": "2023-07-02T00:00:00.000000Z",
"templates": ["uuid1", "uuid2"],
"interval": "day",
"active_users": 19
},
{
"start_time": "2023-07-02T00:00:00.000000Z",
"end_time": "2023-07-03T00:00:00.000000Z",
...
},
{ ... },
{ ... },
{ ... },
{ ... },
{ ... }
]
} Note: One logical split that could be done here is to separate For now, our interval reporting requirements are slim, and we only need this data for We can introduce this endpoint in stages where we start with a single or a few KPIs, and expand upon it as we go. The first stage would be to introduce the endpoint with the following KPIs (they are all based on the same existing data source):
This data is available, but we need to write queries to pull it out:
We currently don't track the following, which will require storing the data and querying it:
|
Beta Was this translation helpful? Give feedback.
-
We send many metrics on Prometheus, so why are we adding this natively to Coder? Can't a user create their dashboard on Grafana using our Prometheus? |
Beta Was this translation helpful? Give feedback.
-
Sounds reasonable, so we can paginate these results. |
Beta Was this translation helpful? Give feedback.
-
I have had discussions with a few users to gather valuable insights on the data they find interesting to have. Primarily, they are focused on Coder engagement and detecting failures/errors. To enhance visibility in these areas, we can have a dashboard with the following components:
In terms of error/failure detection, we can implement the following features:
These additions aim to provide valuable insights and facilitate the identification of engagement patterns and potential issues for our customers. A preview of how it should look like:
The mentioned features are just the initial features we want to have, but we also expect to have in a second version the following features:
These additions will be included in the second version to further enhance our insights and improve the overall user experience.
Back-end
I will wait until we have approval from @bpmct and @mtojek regarding the feature proposal to describe the requirements from the back-end (BE) to develop this screen.
Beta Was this translation helpful? Give feedback.
All reactions