-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
Problem Statement
When a Coder admin upgrades Coder, sometimes a database migration can take up to 1 hour to complete. However, users and admins alike lack clear visibility into the following:
- "Why is Coder not upgrading?"
- "Where can I see the status of this migration?"
- "Is everything going as planned?"
Note
For full context, check out our most recent R&D reto
Research Needed
We need to further investigate how Coder behaves with long migrations in a multiple replica scenario
- When a helm upgrade occurs, does it attempt to do a rolling upgrade (e.g. 4 replicas remain on the old version, 1 attempts to upgrade/migrate, often leading to database locks created by the 4 old replicas)?
- If so, we should remove this case
- In Kubernetes, it possible to identify what replica is performing the migration, so that all other replicas can log the (pod name?) of the pod doing a migration
Proposed Solution
- When a database migration is occurring, have clear coderd logs on all replicas
- For the replicas performing the migration, log the number of migration(s) completed (e.g.
Performing database migrations 11/15) - For pods not performing the migration, ideally identify which pod is running the migration so the user can view the logs.
- For the replicas performing the migration, log the number of migration(s) completed (e.g.
- When a database migration is occurring, ensure that Coder functions/replicas are not allowed to continue. The dashboard UI should show that a migration is occurring, ideally with some basic counter/indicator
Out of scope
- Reducing timely migrations (Add support for retention policy on large tables #20743)
- Showing time estimates for migrations (difficult to estimate)
Metadata
Metadata
Assignees
Labels
No labels