If this repository is useful, please consider starring β it.
Items with π indicate open source projects.
AUTO-GENERATED FILE - DO NOT EDIT MANUALLY. Auto-generated by CI workflow or pre-commit hooks using
node generate-readme.js.
Jump to: Incident Response | Observability | AIOps | IDP | IaC | Security | Deployment
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Agent SRE | AgentSRE is built for enterprises that can't afford downtime. A fleet of AI agents automates detection, root cause analysis, and remediation - delivering faster recovery, lower cloud costs, and resilient operations. | Hybrid | |
| AlertD | AlertD is an agentic AI teammate for SRE and DevOps on AWS, cutting alert noise and dashboard fatigue while delivering contextual answers and automated actions. | SaaS | |
| AutonomOps AI | Autonomous operations platform that applies AI to improve SRE and incident management workflows. | SaaS | |
| Azure SRE Agent | AI-powered reliability assistant for Azure that automates incident response, root-cause analysis, and mitigation workflows. | SaaS | |
| Bacca.ai | AI SRE for high-scale platforms that uses tribal knowledge to triage and mitigate incidents accurately. | SaaS | |
| Beeps | AI-powered operations assistant focused on helping teams handle alerts and incident workflows faster. | SaaS | |
| Cleric | Cleric is an autonomous AI SRE that helps engineering teams quickly diagnose production issues in complex cloud-native environments. | SaaS | |
| DrDroid | AI that understands your production system, infrastructure, applications, and business context to investigate incidents and explain root causes. | SaaS | |
| FireHydrant | All-in-one incident management software for modern teams. FireHydrant helps you plan, respond, and resolve faster with smart alerting, on-call scheduling, AI-powered. | SaaS | |
| Harness AI-SRE | Most incidents start with changeβso why manage them in isolation? Learn how Harness AI-SRE connects the dots between alerts, changes, and workflows, powered. | SaaS | |
| incident.io | incident.io is an all-in-one incident management platform unifying on-call scheduling, real-time incident response, and integrated status pages β helping teams resolve. | SaaS | |
| IncidentFox | AI incident response platform designed to help teams investigate and resolve operational issues. | SaaS | |
| NeuBird AI | NeuBird AI's agentic AI SRE delivers autonomous incident resolution, helping teams cut MTTR by up to 90% and reclaim engineering hours lost to troubleshooting. | SaaS | |
| NOFire AI | NOFire handles alerts, flags risky changes, turns incidents and tribal knowledge into lasting reliability memory. | SaaS | |
| OpsCompanion | OpsCompanion is the AI-driven Operations Intelligence Engine that automates root cause analysis, resolves alerts, and unifies observability across your stack helping. | SaaS | |
| PagerDuty SRE Agent | Transform critical operations with PagerDuty's AI first Operations Platform. Harness agentic AI and automation to accelerate work and build resilience. | SaaS | |
| Phoebe | The immune system for your software. AI agents that continuously investigate live data, diagnose emerging issues and generate preemptive fixes. | SaaS | |
| ProdRescue AI | Automates incident reports and evidence-backed RCA for SRE teams from Slack war rooms or logs in minutes. | SaaS | |
| Resolve AI | Resolve AI handles all alerts, performs root cause analysis, and troubleshoots incidents within minutes | SaaS | |
| RobinRelay | AI on-call copilot for Slack that cuts MTTR by 75%. Reduce alert noise, recall past incident fixes, and save thousands of engineering hours yearly. | SaaS | |
| Rootly | The all-in-one incident management platform, including AI SRE agentsβbuilt for fast-moving engineering teams to detect, manage, learn from, and resolve incidents faster. | SaaS | |
| RunLLM | The AI SRE for mission-critical systems that delivers transparent investigations, evidence-backed root cause analysis, and continuous runbook improvement. | SaaS | |
| Scoutflo | Your AI SRE for incident response and debugging. AI handles alerts, finds root causes, and fixes issues in minutes. | SaaS | |
| Sherlocks.ai | Cut MTTR by 10x with AI SREs that investigate incidents 24/7, automate root cause analysis, and prevent outages before they happen. Try Sherlocks.ai free. | SaaS | |
| Steadwing | Steadwing is an autonomous on-call engineer that finds root causes in under 5 minutes and fixes them. It correlates logs, metrics, traces, and code to deliver actionable RCAs and real remediation-PRs, rollbacks, config changes, and more-with 20+ integrations. | SaaS | |
| TierZero AI | TierZero's AI agents investigate incidents, triage alerts, and fix production problems automatically β so your engineers can ship faster. | SaaS | |
| πTracer | OS-level AI SRE platform for high-compute workloads that accelerates alert investigation, root-cause analysis, and mitigation inside your environment. | On-Prem | |
| Traversal | Traversal cuts through alert noise, surfaces root causes, and guides your team to remediation β so incidents get fixed in minutes, not hours. | SaaS | |
| Vibranium Labs | AI reliability tooling company focused on incident response automation and operations intelligence. | SaaS | |
| Vigiles | Incident management platform for modern teams with outage detection, on-call alerting, response coordination, status pages, and AI postmortems. | SaaS | |
| Wild Moose | Wild Moose helps developers solve production issues faster, kicking off any root cause investigation automatically. Triggered by alerts, the AI moose autonomously. | SaaS |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Agent0 by Dash0 | Dash0's agentic AI platform for observability that helps engineers with incident triage, query building, instrumentation guidance, trace analysis, and dashboard creation. | SaaS | |
| Better Stack | Observability and incident management platform with AI SRE, eBPF-based tracing, logs, metrics, uptime monitoring, and on-call workflows. | SaaS | |
| Causely | Causely pinpoints the root cause of errors so that you can consistently meet reliability expectations of application users in complex, cloud native environments. | SaaS | |
| DagKnows, Inc | AI operations company focused on improving incident diagnostics and reliability workflows. | SaaS | |
| Datadog (Bits AI) | See metrics from all of your apps, tools & services in one place with Datadogβs cloud monitoring as a service solution. Try it for free. | SaaS | |
| Deductive AI | Deductive AI transforms your root-causing process by effortlessly understanding your entire codebase along with the telemetry data. | SaaS | |
| Deeptrace | Automate and cut your on-call/debugging time in half with AI. | SaaS | |
| Edge Delta | Observability pipeline and AI analytics platform for processing telemetry at scale and accelerating incident investigation. | SaaS | |
| Elastic | Learn more about Elastic Observability. Elastic Observability resolves problems faster at reduced cost with an open source, AI-powered observability, that is accurate,. | SaaS | |
| Lightrun | Lightrun's AI SRE that handles alerts, prevent issues early with live runtime context during development, and resolve alerts in minutes with verified RCA. | SaaS | |
| Logz.io | Stop Chasing Alerts. Get Ahead of Problems with AI-Powered Observability. | SaaS | |
| Mezmo | Combine intelligent telemetry with AI-driven observability to detect issues, pinpoint root cause, and power agentic operations across logs, metrics, and traces. | SaaS | |
| Observe, Inc. | Observe is a modern observability platform built on a streaming data lake, for faster search and correlation at lower cost. | SaaS | |
| Sentry | Application performance monitoring for developers and software teams to see errors more clearly, solve issues faster, and improve reliability continuously. | SaaS | |
| SIXTA | AI-powered root cause analysis for database reliability | SaaS |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| BigPanda | AIOps platform for event correlation, incident detection, and response orchestration across modern IT operations. | SaaS | |
| Ciroos | Ciroos transforms SRE with AI-driven automation, reducing toil, detecting anomalies early, and accelerating incident investigations. | SaaS | |
| Cloudship AI | AI platform for cloud and platform engineering workflows focused on reliability and operations. | SaaS | |
| Cokpit | Cokpit scales with your needs β from startups to global enterprises. | SaaS | |
| πHolmesGPT | Open source AI SRE agent that iteratively investigates incidents using data from your Kubernetes and observability stack. | Hybrid | |
| πK8sGPT | K8sGPT is an AI-powered tool that helps diagnose and fix Kubernetes issues with intelligent insights and automated troubleshooting. | Hybrid | |
| πKagent | Open-source Kubernetes-native framework for building and running AI agents that automate DevOps operations and troubleshooting tasks. | Hybrid | |
| Komodor | Komodor automatically detects, investigates and remediates complex issues to proactively reduce cloud costs, slash MTTR and vanquish TicketOps. | SaaS | |
| Kura | AI platform for engineering operations and incident response automation in modern infrastructure environments. | SaaS | |
| NudgeBee | Agentic AI platform for SRE & CloudOps, troubleshooting, cost optimization, and no-code workflow automation. | SaaS | |
| πObot | Open source agent platform for creating, running, and integrating autonomous assistants across workflows. | Hybrid | |
| Opsy | AI-powered reliability operations platform for faster incident response and SRE workflow automation. | SaaS | |
| Robusta Dev | Robusta's AI assistant empowers teams to troubleshoot Prometheus and Kubernetes alerts faster, leading to reduced MTTR and enhanced engineering productivity. | Multi | |
| RunWhen | RunWhen is committed to simplifying troubleshooting for complex cloud systems with the help of AI powered Engineering Assistants capable of suggesting what to run, and. | SaaS | |
| SRE Bench | Evaluation and benchmarking platform for SRE agents and operational AI reliability workflows. | SaaS | |
| SRE.ai | SRE.ai is the most advanced natural language DevOps platform, powering automation and software delivery for fast-moving organizations at scale, freeing up teams to build. | SaaS | |
| πStakpak | An open source agent that lives on your machines 24/7, keeps your apps running, and only pings when it needs a human. | SaaS | |
| StarSling | Multi-agent automation platform that orchestrates AI workflows for operations, troubleshooting, and remediation. | SaaS |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Rebase | Every company needs to become an AI company. Rebase is the infrastructure to get there β connect all your systems, access any LLMs, and deploy AI agents across your. | SaaS | |
| StackGen | Autonomous infrastructure platform powered by Aiden for platform engineering, DevOps, and SRE teams to automate provisioning, governance, and operations. | Hybrid |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Ops0 | ops0 automates how infrastructure is created, managed, and operated. Turn intent into IaC, apply updates intelligently, and resolve issues before they happen all powered. | SaaS |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Cloudgeni | AI-powered cloud infrastructure platform that detects misconfigurations, remediates security and compliance issues, and generates reviewable infrastructure changes through deterministic workflows. | SaaS | |
| Infrabase | Infrabase scans code and organizational context to surface security gaps, cost spikes, and policy breaks before they ever hit your cloud. | SaaS |
| Name | Summary | Deployment | Links |
|---|---|---|---|
| Cutover | Cutover's cloud-hosted Collaborative Automation platform connects teams and technology, helping you manage disaster recovery, migration, and release. | SaaS | |
| Lens K8s IDE | Kubernetes IDE for cluster operations and troubleshooting with AI-assisted diagnostics via Lens Prism. | Hybrid | |
| πSkyflo.ai | Skyflo is an open-source AI agent for DevOps and cloud operations. It plans, executes, and verifies infrastructure changes across Kubernetes, CI/CD, and cloud platforms. | Hybrid |
