Thanks to visit codestin.com
Credit goes to github.com

Skip to content

set up a status page for registry #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 11 tasks
matifali opened this issue Oct 30, 2024 · 6 comments
Closed
5 of 11 tasks

set up a status page for registry #182

matifali opened this issue Oct 30, 2024 · 6 comments
Assignees
Labels
docs Improvements or additions to documentation

Comments

@matifali
Copy link
Member

matifali commented Oct 30, 2024

Problem Description

The registry’s status page at coder.instatus.com needs enhancements for SaaS-grade reliability and transparency. Current monitoring relies on a basic 404-checking script. With the registry now deployed on Google Cloud Run and Instatus supporting Prometheus and Google Cloud integrations, this issue focuses on modernizing monitoring, alerting, and user access.


Desired Solution

Integration

  • Add a prominent link to the status page in the registry footer.
  • Update status categories (e.g., uptime, latency, etc) for better user insights. (Nice to have)

Monitoring

  • Use Google Cloud Run metrics to track container health, CPU and memory usage, latency, etc. feeding data into Prometheus.
  • Integrate Prometheus with Instatus for real-time metrics (e.g., success rates, errors, latency). (Nice to have)

Alerting

  • Configure multi-channel alerts (email, Slack, PagerDuty) with thresholds for high(outage)/low(e.g. higher latency) priority issues.
  • Ensure alerts are logged and visible on the status page for incident transparency.

Documentation

  • Provide a playbook for handling alerts and troubleshooting incidents.
  • Document Prometheus, Cloud Run, and Instatus setup steps.

Definition of Done

  • Status page displays real-time, accurate health data and is linked in the registry UI.
  • Monitoring (Prometheus + Cloud Run) and alerting pipelines are fully tested.
  • All setup and maintenance documentation is complete and accessible.
@matifali matifali added this to the Registry Stability milestone Oct 30, 2024
@coder-labeler coder-labeler bot added the docs Improvements or additions to documentation label Oct 30, 2024
@matifali matifali self-assigned this Nov 18, 2024
@matifali
Copy link
Member Author

I am setting one up. See coder/modules#342

@matifali
Copy link
Member Author

instatus support Prometheus as a source.

@bpmct
Copy link
Member

bpmct commented Jan 2, 2025

Looks like we're currently hosting this on https://coder.instatus.com/.

@matifali - can you work with Kira and Ben to create a good "definition of done" or product requirements for this issue? Some ideas, not well baked:

  • The status page is clearly linked on the bottom of the registry
  • We are confident in the "checker" script, and don't attribute it to any issues
  • Alerting is properly set up

@matifali
Copy link
Member Author

matifali commented Jan 2, 2025

@Kira-Pilot, @bpmct , I Updated the issue Body to include the requirement.

@Parkreiner
Copy link
Member

From the discussions last week, it sounds like we're okay with just updating the registry to link to the Instatus page. In which case, I can make a quick PR for that in a few minutes

@matifali
Copy link
Member Author

Yes. We can link it to the footer. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants