Thanks to visit codestin.com
Credit goes to github.com

Skip to content

imran1509/infraguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ InfraGuard

AI-Powered Autonomous Infrastructure Incident Response System

CodeRabbit

🎯 Overview

InfraGuard is an intelligent incident response platform that automatically detects, diagnoses, and fixes infrastructure issues using AI agents. Built for the AI Agents Assemble hackathon.

The Problem

Infrastructure incidents cost companies millions in downtime. Traditional monitoring alerts humans who must:

  1. Wake up at 3 AM
  2. Manually diagnose the issue
  3. Research solutions
  4. Apply fixes
  5. Verify resolution

Average MTTR: 30-60 minutes per incident

Our Solution

InfraGuard reduces this to seconds by:

  1. πŸ” Detecting anomalies in real-time with Prometheus
  2. 🧠 Analyzing root causes with Kestra AI Agent
  3. πŸ”§ Generating fixes automatically with Cline CLI
  4. βœ… Reviewing code quality with CodeRabbit
  5. πŸ“Š Visualizing everything on a Vercel dashboard

πŸ† Sponsor Technologies

Technology Usage Prize Track
Cline CLI Autonomous code generation for fixes Infinity Build ($5K)
Kestra AI Agent for data summarization & decisions Wakanda Data ($4K)
Oumi RL fine-tuned action selection model Iron Intelligence ($3K)
Vercel Production dashboard deployment Stormbreaker ($2K)
CodeRabbit AI code review on all PRs Captain Code ($1K)

🎬 Demo Video

Demo Video

Watch 2-minute demo β†’

πŸ–₯️ Live Dashboard

infraguard.vercel.app

Dashboard Screenshot

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     GITHUB + CODERABBIT                     β”‚
β”‚            Reviews ALL PRs (human + AI-generated)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     LOCAL ENVIRONMENT                      β”‚
β”‚                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Minikube β”‚β†’ β”‚Prometheusβ”‚β†’ β”‚  Kestra  β”‚β†’ β”‚  Cline   β”‚    β”‚
β”‚  β”‚ Cluster  β”‚  β”‚ +Grafana β”‚  β”‚ AI Agent β”‚  β”‚   CLI    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                     β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AWS Bedrock β”‚      β”‚Google Colab β”‚      β”‚   Vercel    β”‚
β”‚ (LLM API)   β”‚      β”‚(Oumi Train) β”‚      β”‚ (Dashboard) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

πŸ” Real-Time Detection

  • Monitors Kubernetes pods, CPU, memory, restarts
  • Custom Prometheus alerts for common issues
  • Sub-minute incident detection

🧠 AI-Powered Analysis

  • Kestra AI Agent summarizes system state
  • Correlates metrics from multiple sources
  • Identifies root causes automatically

πŸ”§ Autonomous Fixes

  • Cline generates targeted code patches
  • Creates K8s manifest updates
  • Opens PRs with proper documentation

🐰 Quality Assurance

  • CodeRabbit reviews all generated code
  • Catches bugs before they reach production
  • Ensures best practices

πŸ“Š Beautiful Dashboard

  • Real-time incident feed
  • System health at a glance
  • Action log with PR links

πŸš€ Quick Start

Prerequisites

  • Docker
  • Minikube
  • Node.js 18+
  • Python 3.10+

Setup

# Clone the repo
git clone https://github.com/YOUR_USERNAME/infraguard
cd infraguard

# Start Minikube
minikube start --cpus=4 --memory=8192

# Deploy sample apps
kubectl apply -f k8s/manifests/sample-apps.yaml

# Install monitoring stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace \
  -f k8s/manifests/prometheus-values.yaml

# Start Kestra
docker run -d --name kestra -p 8080:8080 kestra/kestra:latest server local

# Start Metrics API
python scripts/metrics-api.py

# Open dashboard
npm run dev --prefix dashboard

Inject Test Incident

# Inject a crash loop
./scripts/inject-incident.sh crash-loop

# Watch the magic happen in the dashboard!

# Cleanup
./scripts/inject-incident.sh cleanup

🐰 CodeRabbit Integration

CodeRabbit reviews every PR in this repository:

CodeRabbit Review

CodeRabbit Highlights

  • βœ… Reviewed 20+ PRs during development
  • βœ… Caught 5 potential bugs
  • βœ… Improved documentation quality
  • βœ… Reviews AI-generated fixes from Cline

πŸ€– Oumi Training

We fine-tuned an action selection model using Oumi's GRPO:

  • Base Model: SmolLM2-360M-Instruct
  • Training Data: 500 synthetic incident scenarios
  • Reward Function: +10 (resolved), -10 (failed)
  • Training Time: ~30 minutes on T4 GPU

See oumi/training/ for details.

πŸ“ Project Structure

infraguard/
β”œβ”€β”€ k8s/
β”‚   └── manifests/          # Kubernetes configurations
β”œβ”€β”€ kestra/
β”‚   └── workflows/          # Kestra flow definitions
β”œβ”€β”€ dashboard/              # Next.js Vercel app
β”œβ”€β”€ oumi/
β”‚   └── training/           # Oumi training scripts
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ metrics-api.py      # Prometheus API wrapper
β”‚   β”œβ”€β”€ inject-incident.sh  # Demo incident injection
β”‚   └── cline-incident-fix.py
β”œβ”€β”€ cline-tasks/            # Auto-generated Cline tasks
β”œβ”€β”€ .coderabbit.yaml        # CodeRabbit configuration
└── README.md

πŸ“„ License

MIT License - see LICENSE


Built with ❀️ for AI Agents Assemble

About

AI-powered autonomous infrastructure incident response system

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published