Thanks to visit codestin.com
Credit goes to github.com

Skip to content

wonkday/sre-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SRE Agent

A sophisticated multi-agent system for Site Reliability Engineering (SRE) operations with an interactive chat interface. This application enables engineers to query infrastructure, Kubernetes clusters, and monitoring systems using natural language.

πŸ“‹ Summary

This project is a multi-agent AI system that provides a conversational interface for SRE operations. It consists of:

  • SRE Agent: A main orchestrating agent that intelligently routes queries to specialized agents
  • Kubectl Agent: Handles Kubernetes-related queries through kubectl-ai MCP server (pods, deployments, services, logs, etc.)
  • Prometheus Agent: Manages metrics and monitoring queries (HTTP request metrics, error rates, performance data)
  • Web Interface: A modern React-based chat UI for interacting with the agents

The system is built with LangGraph for agent orchestration and supports multiple LLM providers. It provides real-time insights into infrastructure health, application performance, and helps correlate issues across different systems.

Key Features

  • πŸ€– Intelligent Query Routing: Automatically determines which specialized agents to invoke based on user queries
  • πŸ” Multi-Agent Coordination: Correlates data from multiple sources (Kubernetes, Prometheus) for unified insights
  • πŸ“Š Rich Visualizations: Displays metrics, pod status, and error rates in formatted tables and summaries
  • 🌐 Web UI: Modern, responsive chat interface for seamless interaction
  • πŸ”„ Graceful Degradation: Falls back to mock data when external services are unavailable

πŸš€ Getting Started

Prerequisites

  • Node.js 20 or higher
  • npm 11.2.1 or higher
  • Access to Kubernetes cluster (optional, for kubectl agent)
  • Prometheus endpoint (optional, for metrics queries)

Initialization Steps

1. Clone the Repository

git clone <repository-url>
cd SRE-Agent

2. Install Dependencies

Install all project dependencies (this will install dependencies for both the root workspace and all apps):

npm install

3. Environment Configuration

Create a .env file in the project root with the following configuration:

# Google Gemini API Key (required)
GOOGLE_API_KEY=your-gemini-api-key-here

# Kubectl-AI MCP Server Configuration (optional)
# Note: Default port in start-kubectl-ai-mcp.sh is 8180
KUBECTL_AI_MCP_ENDPOINT=http://localhost:8180
KUBECTL_AI_API_KEY=your_api_key_here

# Prometheus Configuration (optional)
# Use localhost:9090 when using port forwarding
PROMETHEUS_ENDPOINT=http://localhost:9090
PROMETHEUS_API_KEY=your_prometheus_api_key

# Default LLM Model (optional)
DEFAULT_MODEL=google/gemini-2.0-flash-lite

# Proxy Configuration (if behind corporate firewall)
# HTTP_PROXY=http://your-proxy-server:port
# HTTPS_PROXY=http://your-proxy-server:port
# NO_PROXY=localhost,127.0.0.1,.local,.internal

Note:

  • Refer to .env.example file for full config. Copy contents to .env and update with relevant values
  • See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and Prometheus port forwarding instructions
  • See PROXY_SETUP.md for detailed proxy configuration if you're behind a corporate firewall

4. Start the Development Servers

You have two options to run the application:

Option A: Run both Agent and Web together (Recommended)

npm run dev

This command starts both:

  • Agent Server: Runs on port 2024 (default)
  • Web UI: Runs on port 5173 (default Vite port)

Option B: Run with simplified agent setup

npm run dev:simple

This uses a simplified agent configuration without full LangGraph Studio integration.

5. Access the Application

Once the servers are running:

  • Web UI: Open your browser and navigate to http://localhost:5173
  • Agent API: Available at http://localhost:2024

Starting Components Individually

If you need to start components separately:

Start Agent Server Only

cd apps/agents
npm run dev

Or with simplified mode:

cd apps/agents
npm run dev:simple

Start Web UI Only

cd apps/web
npm run dev

The web UI will be available at http://localhost:5173.

Using the Application

Before starting the agent, ensure:

  1. kubectl-ai MCP Server is running (if using Kubernetes features):

    ./scripts/start-kubectl-ai-mcp.sh
  2. Prometheus port forwarding is active (if using metrics features):

    ./scripts/accessP8sViaPortFwd.sh
    # Or run in background: nohup ./scripts/accessP8sViaPortFwd.sh &

See KUBECTL_AI_SETUP.md for detailed setup instructions.

Then:

  1. Access the Web UI: Open http://localhost:5173 in your browser
  2. Configure Connection: Enter the following:
    • Deployment URL: http://localhost:2024 (for local development)
    • Assistant/Graph ID: sre_agent
    • LangSmith API Key: (Optional, only required for deployed servers)
  3. Start Chatting: Begin asking questions about your infrastructure!

Example Queries

Kubernetes Queries:

  • "What's the status of the user-service deployment?"
  • "Show me pods with high CPU usage"
  • "List all pods in the default namespace"
  • "What's the version of metrics-demo microservice?"

Prometheus Queries:

  • "What's the current HTTP request rate for the API?"
  • "Show me the error rate for metrics-demo in the last hour"
  • "What's the success/failure rate for my application?"

Multi-Agent Queries:

  • "Check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace"
  • "Is the high error rate related to pod failures?"

πŸ“Έ Sample Run Outputs

Example Query and Response

Input Query - Example #1 :

check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace

Response output #1:

Sample Output

Input Query - Example #2 :

perform same check again

Response output #2: Sample Output 2

The agent provides a comprehensive analysis including:

  1. πŸ“Š Summary Section
  2. πŸ“‹ Detailed Analysis
  3. 🎯 Recommendations
  4. πŸ” Suggested Next Steps

See SampleOutput.md for the complete formatted output example.

πŸ—οΈ Project Structure

SRE-Agent/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ agents/          # LangGraph agents (SRE Agent, Research Agent)
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ sre-agent/    # Main SRE agent with kubectl & prometheus tools
β”‚   β”‚       └── research-agent/ # Research agent for document retrieval
β”‚   └── web/             # React-based web UI
β”‚       └── src/
β”‚           β”œβ”€β”€ components/   # UI components
β”‚           β”œβ”€β”€ providers/   # LangGraph client providers
β”‚           └── hooks/        # React hooks
β”œβ”€β”€ langgraph.json       # LangGraph configuration
β”œβ”€β”€ package.json         # Root workspace configuration
└── .env                 # Environment variables (create this)

πŸ“š Additional Documentation

  • Agent Documentation: See apps/agents/src/sre-agent/README.md for detailed SRE agent architecture
  • Web UI Documentation: See apps/web/README.md for web interface details
  • kubectl-ai Setup: See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and configuration
  • Proxy Setup: See PROXY_SETUP.md for corporate firewall configuration
  • Sample Output: See SampleOutput.md for a complete example response

πŸ”§ Development

Building the Project

npm run build

Linting

npm run lint
npm run lint:fix  # Auto-fix linting issues

Formatting

npm run format

πŸ› Troubleshooting

Common Issues

  1. Agent Server Won't Start

    • Verify .env file exists and contains required API keys
    • Check that port 2024 is not already in use
    • Review logs in apps/agents/logs/sre-agent.log
  2. Web UI Can't Connect to Agent

    • Ensure agent server is running on port 2024
    • Check that the Deployment URL in web UI is http://localhost:2024
    • Verify CORS configuration if accessing from different origin
  3. MCP Server Connection Failed

    • Verify kubectl-ai MCP server is running
    • Check KUBECTL_AI_MCP_ENDPOINT in .env
    • Agent will use mock data if MCP server is unavailable
  4. Prometheus Connection Failed

    • Verify Prometheus server is accessible
    • Check PROMETHEUS_ENDPOINT in .env
    • Ensure proper authentication if required

πŸ“ License

Private project - All rights reserved

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published