SRE Agent

A sophisticated multi-agent system for Site Reliability Engineering (SRE) operations with an interactive chat interface. This application enables engineers to query infrastructure, Kubernetes clusters, and monitoring systems using natural language.

📋 Summary

This project is a multi-agent AI system that provides a conversational interface for SRE operations. It consists of:

SRE Agent: A main orchestrating agent that intelligently routes queries to specialized agents
Kubectl Agent: Handles Kubernetes-related queries through kubectl-ai MCP server (pods, deployments, services, logs, etc.)
Prometheus Agent: Manages metrics and monitoring queries (HTTP request metrics, error rates, performance data)
Web Interface: A modern React-based chat UI for interacting with the agents

The system is built with LangGraph for agent orchestration and supports multiple LLM providers. It provides real-time insights into infrastructure health, application performance, and helps correlate issues across different systems.

Key Features

🤖 Intelligent Query Routing: Automatically determines which specialized agents to invoke based on user queries
🔍 Multi-Agent Coordination: Correlates data from multiple sources (Kubernetes, Prometheus) for unified insights
📊 Rich Visualizations: Displays metrics, pod status, and error rates in formatted tables and summaries
🌐 Web UI: Modern, responsive chat interface for seamless interaction
🔄 Graceful Degradation: Falls back to mock data when external services are unavailable

🚀 Getting Started

Prerequisites

Node.js 20 or higher
npm 11.2.1 or higher
Access to Kubernetes cluster (optional, for kubectl agent)
Prometheus endpoint (optional, for metrics queries)

Initialization Steps

1. Clone the Repository

git clone <repository-url>
cd SRE-Agent

2. Install Dependencies

Install all project dependencies (this will install dependencies for both the root workspace and all apps):

npm install

3. Environment Configuration

Create a .env file in the project root with the following configuration:

# Google Gemini API Key (required)
GOOGLE_API_KEY=your-gemini-api-key-here

# Kubectl-AI MCP Server Configuration (optional)
# Note: Default port in start-kubectl-ai-mcp.sh is 8180
KUBECTL_AI_MCP_ENDPOINT=http://localhost:8180
KUBECTL_AI_API_KEY=your_api_key_here

# Prometheus Configuration (optional)
# Use localhost:9090 when using port forwarding
PROMETHEUS_ENDPOINT=http://localhost:9090
PROMETHEUS_API_KEY=your_prometheus_api_key

# Default LLM Model (optional)
DEFAULT_MODEL=google/gemini-2.0-flash-lite

# Proxy Configuration (if behind corporate firewall)
# HTTP_PROXY=http://your-proxy-server:port
# HTTPS_PROXY=http://your-proxy-server:port
# NO_PROXY=localhost,127.0.0.1,.local,.internal

Note:

Refer to .env.example file for full config. Copy contents to .env and update with relevant values

See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and Prometheus port forwarding instructions

See PROXY_SETUP.md for detailed proxy configuration if you're behind a corporate firewall

4. Start the Development Servers

You have two options to run the application:

Option A: Run both Agent and Web together (Recommended)

npm run dev

This command starts both:

Agent Server: Runs on port 2024 (default)
Web UI: Runs on port 5173 (default Vite port)

Option B: Run with simplified agent setup

npm run dev:simple

This uses a simplified agent configuration without full LangGraph Studio integration.

5. Access the Application

Once the servers are running:

Web UI: Open your browser and navigate to http://localhost:5173
Agent API: Available at http://localhost:2024

Starting Components Individually

If you need to start components separately:

Start Agent Server Only

cd apps/agents
npm run dev

Or with simplified mode:

cd apps/agents
npm run dev:simple

Start Web UI Only

cd apps/web
npm run dev

The web UI will be available at http://localhost:5173.

Using the Application

Before starting the agent, ensure:

kubectl-ai MCP Server is running (if using Kubernetes features):
```
./scripts/start-kubectl-ai-mcp.sh
```

Prometheus port forwarding is active (if using metrics features):

./scripts/accessP8sViaPortFwd.sh
# Or run in background: nohup ./scripts/accessP8sViaPortFwd.sh &

See KUBECTL_AI_SETUP.md for detailed setup instructions.

Then:

Access the Web UI: Open http://localhost:5173 in your browser
Configure Connection: Enter the following:
- Deployment URL: http://localhost:2024 (for local development)
- Assistant/Graph ID: sre_agent
- LangSmith API Key: (Optional, only required for deployed servers)
Start Chatting: Begin asking questions about your infrastructure!

Example Queries

Kubernetes Queries:

"What's the status of the user-service deployment?"
"Show me pods with high CPU usage"
"List all pods in the default namespace"
"What's the version of metrics-demo microservice?"

Prometheus Queries:

"What's the current HTTP request rate for the API?"
"Show me the error rate for metrics-demo in the last hour"
"What's the success/failure rate for my application?"

Multi-Agent Queries:

"Check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace"
"Is the high error rate related to pod failures?"

📸 Sample Run Outputs

Example Query and Response

Input Query - Example #1 :

check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace

Response output #1:

Input Query - Example #2 :

perform same check again

Response output #2:

The agent provides a comprehensive analysis including:

📊 Summary Section
📋 Detailed Analysis
🎯 Recommendations
🔍 Suggested Next Steps

See SampleOutput.md for the complete formatted output example.

🏗️ Project Structure

SRE-Agent/
├── apps/
│   ├── agents/          # LangGraph agents (SRE Agent, Research Agent)
│   │   └── src/
│   │       ├── sre-agent/    # Main SRE agent with kubectl & prometheus tools
│   │       └── research-agent/ # Research agent for document retrieval
│   └── web/             # React-based web UI
│       └── src/
│           ├── components/   # UI components
│           ├── providers/   # LangGraph client providers
│           └── hooks/        # React hooks
├── langgraph.json       # LangGraph configuration
├── package.json         # Root workspace configuration
└── .env                 # Environment variables (create this)

📚 Additional Documentation

Agent Documentation: See apps/agents/src/sre-agent/README.md for detailed SRE agent architecture
Web UI Documentation: See apps/web/README.md for web interface details
kubectl-ai Setup: See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and configuration
Proxy Setup: See PROXY_SETUP.md for corporate firewall configuration
Sample Output: See SampleOutput.md for a complete example response

🔧 Development

Building the Project

npm run build

Linting

npm run lint
npm run lint:fix  # Auto-fix linting issues

Formatting

npm run format

🐛 Troubleshooting

Common Issues

Agent Server Won't Start
- Verify .env file exists and contains required API keys
- Check that port 2024 is not already in use
- Review logs in apps/agents/logs/sre-agent.log
Web UI Can't Connect to Agent
- Ensure agent server is running on port 2024
- Check that the Deployment URL in web UI is http://localhost:2024
- Verify CORS configuration if accessing from different origin
MCP Server Connection Failed
- Verify kubectl-ai MCP server is running
- Check KUBECTL_AI_MCP_ENDPOINT in .env
- Agent will use mock data if MCP server is unavailable
Prometheus Connection Failed
- Verify Prometheus server is accessible
- Check PROMETHEUS_ENDPOINT in .env
- Ensure proper authentication if required

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apps		apps
.env.example		.env.example
.gitignore		.gitignore
DISABLE_LANGSMITH.md		DISABLE_LANGSMITH.md
KUBECTL_AI_SETUP.md		KUBECTL_AI_SETUP.md
PROXY_SETUP.md		PROXY_SETUP.md
README.md		README.md
SampleOutput.md		SampleOutput.md
SampleOutput.png		SampleOutput.png
SampleOutput_prompt2.png		SampleOutput_prompt2.png
accessP8sViaPortFwd.sh		accessP8sViaPortFwd.sh
langgraph.json		langgraph.json
langgraph.simple.json		langgraph.simple.json
localstorage-polyfill.js		localstorage-polyfill.js
package-lock.json		package-lock.json
package.json		package.json
start-kubectl-ai-mcp.sh		start-kubectl-ai-mcp.sh
stop-kubectl-ai-mcp.sh		stop-kubectl-ai-mcp.sh
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SRE Agent

📋 Summary

Key Features

🚀 Getting Started

Prerequisites

Initialization Steps

1. Clone the Repository

2. Install Dependencies

3. Environment Configuration

4. Start the Development Servers

5. Access the Application

Starting Components Individually

Start Agent Server Only

Start Web UI Only

Using the Application

Example Queries

📸 Sample Run Outputs

Example Query and Response

🏗️ Project Structure

📚 Additional Documentation

🔧 Development

Building the Project

Linting

Formatting

🐛 Troubleshooting

Common Issues

📝 License

About

Uh oh!

Releases

Packages

Languages

wonkday/sre-agent

Folders and files

Latest commit

History

Repository files navigation

SRE Agent

📋 Summary

Key Features

🚀 Getting Started

Prerequisites

Initialization Steps

1. Clone the Repository

2. Install Dependencies

3. Environment Configuration

4. Start the Development Servers

5. Access the Application

Starting Components Individually

Start Agent Server Only

Start Web UI Only

Using the Application

Example Queries

📸 Sample Run Outputs

Example Query and Response

🏗️ Project Structure

📚 Additional Documentation

🔧 Development

Building the Project

Linting

Formatting

🐛 Troubleshooting

Common Issues

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages