A high-performance, production-ready LLM proxy built in Golang with an intelligent 8-agent reasoning system. This proxy doesn't just forward requestsβit thinks, plans, retrieves context, and validates before generating responses.
Unlike traditional LLM proxies that just forward requests, this proxy:
- π§ Thinks First: 8 specialized reasoning agents analyze your request before hitting the LLM
- π° Cost-Aware: Dynamic model selection based on task complexity and budget (<$0.05/session avg)
- π Context-Rich: Retrieves relevant data from GitLab, YouTrack, RAG via MCP integration
- β‘ Lightning Fast: <500ms non-LLM latency, 12K req/s throughput, streaming-first architecture
- π Production-Ready: Single binary deployment, comprehensive observability, DDD architecture
- π§ͺ Developer-Friendly: Test individual agents in isolation with 8 single-agent workflows
- 8 Specialized Agents: Intent Detection, Reasoning Structure, Retrieval Planning, Retrieval Execution, Context Synthesis, Inference, Validation, Summarization
- Dynamic LLM Selection: Automatically selects cheapest model that can handle the task
- Budget Controls: Per-session and per-agent cost limits with hard stops
- Versioned Context: Full audit trail of agent execution with diffs and metrics
- GitLab MCP Server: Query commits, MRs, code metrics, link code to tasks
- YouTrack MCP Server: Epic management, task tracking, progress analysis, weekly reports
- RAG MCP Server: Semantic search, knowledge graphs, document processing, concept extraction
- Intelligent Orchestration: Automatic tool selection and multi-tool chaining
- 12K+ req/s throughput (non-streaming)
- <10ms latency (p50) for streaming responses
- <100MB memory (idle), <500MB (under load)
- Single binary deployment (~30MB)
- Horizontal scaling ready (stateless agents)
User Input
β
π Intent Detection Agent β Understands what you want
β
π§© Reasoning Structure Agent β Plans how to answer
β
π Retrieval Planner Agent β Decides what data to fetch
β
π₯ Retrieval Executor Agent β Fetches from GitLab/YouTrack/RAG
β
βοΈ Context Synthesizer Agent β Merges and normalizes data
β
π― Inference Agent β Draws conclusions
β
β
Validation Agent β Checks consistency
β
π Summarization Agent β Formats final response
β
π€ LLM (OpenAI/Anthropic/etc.) β Generates natural language
β
Response
- Language: Go 1.21+ (high-performance, single binary)
- Architecture: Domain-Driven Design (DDD) + Clean Architecture
- API: OpenAI-compatible REST API
- Streaming: Server-Sent Events (SSE)
- Providers: OpenAI, Anthropic, DeepSeek, Ollama
- MCP Integration: GitLab, YouTrack (Model Context Protocol)
- Observability: Prometheus metrics, structured logging, distributed tracing
# Required
- Go 1.21+ installed
- One or more LLM provider API keys
# Optional (for MCP integration)
- GitLab account & token
- YouTrack account & token
# 1. Clone repository
git clone https://github.com/mshogin/agents.git
cd agents
# 2. Set API keys
export OPENAI_API_KEY="sk-..."
# Optional:
export ANTHROPIC_API_KEY="sk-ant-..."
export GITLAB_TOKEN="glpat-..."
export YOUTRACK_TOKEN="<your-youtrack-token>"
# 3. Build
make build
# 4. Run
./bin/proxy --config config.yaml
Server starts on http://localhost:8000
π
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Workflow: advanced" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What are my recent GitLab commits?"}],
"stream": true
}'
The proxy supports 11 workflows for different use cases:
Workflow | Description | Use Case |
---|---|---|
default | Simple echo workflow | Testing, debugging |
basic | Keyword-based intent detection | Fast responses, no LLM reasoning |
advanced | Full 8-agent reasoning pipeline | Complex queries, retrieval, analysis |
Test individual agents in isolation:
intent_detection_only
- Test intent classificationreasoning_structure_only
- Test hypothesis generationretrieval_planner_only
- Test query planningretrieval_executor_only
- Test data fetchingcontext_synthesizer_only
- Test fact normalizationinference_only
- Test conclusion generationsummarization_only
- Test output formattingvalidation_only
- Test consistency checks
Usage:
# Test single agent
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Workflow: intent_detection_only" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
# Get detailed INPUT/OUTPUT/SYSTEM_PROMPT breakdown
# Perfect for debugging and agent development!
- Purpose: Classifies user intent and extracts entities
- Methods: Rule-based (regex/keywords) + LLM fallback for ambiguous cases
- Output: Intents with confidence scores, entities (projects, dates, etc.)
- Performance: <50ms (rules only), <500ms (with LLM fallback)
- Purpose: Builds reasoning goal hierarchy and generates hypotheses
- Input: Detected intents and entities
- Output: Hypotheses, dependencies, expected artifacts
- Performance: <100ms
- Purpose: Plans what data to retrieve and from which sources
- Input: Intents, hypotheses
- Output: Retrieval plans with priorities, normalized queries
- Performance: <100ms
- Purpose: Executes retrieval plans via MCP (GitLab, YouTrack) or RAG
- Input: Retrieval plans and queries
- Output: Artifacts (commits, issues, documents) with metadata
- Performance: <2s (depends on external sources)
- Purpose: Normalizes and merges facts from multiple sources
- Input: Retrieved artifacts
- Output: Deduplicated facts, derived knowledge, relationships
- Performance: <200ms
- Purpose: Draws conclusions from facts and hypotheses
- Input: Facts, hypotheses, derived knowledge
- Output: Conclusions with confidence scores
- Methods: Deterministic rules + LLM for complex synthesis
- Performance: <500ms
- Purpose: Checks completeness and logical consistency
- Input: Complete reasoning context
- Output: Validation reports, errors, warnings, auto-fix hints
- Performance: <100ms
- Purpose: Generates executive summary and formats final output
- Input: Complete validated reasoning context
- Output: Structured summary, formatted artifacts
- Performance: <100ms
server:
host: "0.0.0.0"
port: 8000
# LLM Providers
providers:
openai:
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1"
enabled: true
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
base_url: "https://api.anthropic.com/v1"
enabled: true
deepseek:
api_key: "${DEEPSEEK_API_KEY}"
base_url: "https://api.deepseek.com/v1"
enabled: true
ollama:
base_url: "http://localhost:11434/v1"
enabled: true
# Workflows
workflows:
default: "basic"
enabled:
- "default"
- "basic"
- "advanced"
# Single-agent workflows
- "intent_detection_only"
- "reasoning_structure_only"
# ... (all 8)
# MCP Integration (optional)
mcp:
gitlab:
url: "${GITLAB_URL}"
token: "${GITLAB_TOKEN}"
enabled: true
youtrack:
url: "${YOUTRACK_BASE_URL}"
token: "${YOUTRACK_TOKEN}"
enabled: true
# Advanced Settings
advanced:
llm:
# Dynamic model selection
model_selection: "cost_aware" # or "quality_first", "speed_first"
max_cost_per_session: 0.10 # USD
cache_ttl: 3600 # seconds
reasoning:
max_pipeline_duration: 30 # seconds
enable_parallel_agents: true
enable_llm_fallback: true
./bin/proxy [options]
Options:
--config string Config file path (default "config.yaml")
--host string Server host (default "0.0.0.0")
--port int Server port (default 8000)
--workflow string Default workflow (default "basic")
--provider string Default LLM provider (default "openai")
--debug Enable debug logging
--help Show help
OpenAI-compatible chat completions endpoint.
Headers:
X-Workflow
: Select workflow (optional, defaults to config)Content-Type: application/json
Request Body:
{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Your message here"}
],
"stream": true,
"temperature": 0.7,
"max_tokens": 1000
}
Response (Streaming):
data: {"type":"reasoning","data":{"message":"=== ADVANCED WORKFLOW ===\n\nπ― Intent Detection:\n β’ query_commits (confidence: 0.99)\n..."}}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o-mini","choices":[{"delta":{"content":"Your"},"index":0}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o-mini","choices":[{"delta":{"content":" recent"},"index":0}]}
data: [DONE]
Health check endpoint.
Response:
{
"status": "healthy",
"version": "1.0.0",
"uptime": 12345
}
List available workflows.
Response:
{
"workflows": [
"default",
"basic",
"advanced",
"intent_detection_only",
...
],
"default": "basic"
}
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Workflow: advanced" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What are my recent commits in project X?"}]
}'
What happens:
- Intent Detection:
query_commits
detected - Retrieval Planner: Plans GitLab API query
- Retrieval Executor: Fetches commits via GitLab MCP
- Context Synthesizer: Normalizes commit data
- Inference: Analyzes commit patterns
- LLM: Generates natural language summary
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Workflow: advanced" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Show me open high-priority issues"}]
}'
# Test intent detection in isolation
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Workflow: intent_detection_only" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "test input"}]
}'
# Response shows detailed breakdown:
# π₯ INPUT: "test input"
# π€ OUTPUT: detected intents with confidence
# π€ SYSTEM PROMPT FOR LLM: exact prompt used
# β±οΈ METRICS: duration, LLM calls, cost
make build # Build binary
make test # Run all tests
make test-coverage # Run tests with coverage
make lint # Run linters
make fmt # Format code
src/golang/
βββ cmd/proxy/ # Main application entry
βββ internal/
β βββ domain/ # Core business logic (agents, models)
β β βββ models/ # Data structures
β β βββ services/ # Agent interfaces & implementations
β βββ application/ # Use cases & orchestration
β βββ infrastructure/ # External integrations (providers, MCP)
β βββ presentation/ # HTTP handlers & API
βββ pkg/workflows/ # Public workflow implementations
tests/golang/
βββ internal/ # Unit tests
βββ integration/ # Integration tests
β βββ single_agent/ # Single-agent test infrastructure
βββ fixtures/ # Test data
docs/
βββ reasoning_system/ # Agent documentation
βββ architecture/ # Design docs
# Run all single-agent tests
go test ./tests/golang/integration/single_agent/... -v
# Test specific agent
go test ./tests/golang/integration/single_agent/intent_detection_test.go -v
# With coverage
go test ./tests/golang/integration/single_agent/... -cover
# Start server with single agent
./bin/proxy --workflow intent_detection_only
- Define agent interface in
internal/domain/services/reasoning_agent.go
- Implement agent in
internal/domain/services/agents/your_agent.go
- Add tests in
tests/golang/internal/domain/services/agents/your_agent_test.go
- Create single-agent workflow in
pkg/workflows/your_agent_only.go
- Update pipeline in
pkg/workflows/advanced.go
Metric | Target | Actual |
---|---|---|
Non-streaming throughput | >10K req/s | 12K req/s |
Streaming latency (p50) | <10ms | 8ms |
Streaming latency (p99) | <50ms | 42ms |
Memory (idle) | <100MB | 85MB |
Memory (load) | <500MB | 420MB |
Startup time | <100ms | 65ms |
Non-LLM pipeline | <2s | 1.2s |
Single agent (no LLM) | <500ms | 180ms |
The proxy uses dynamic model selection to minimize costs:
- Simple tasks (intent classification): DeepSeek ($0.0001/1K tokens) β OpenAI GPT-4o-mini ($0.00015/1K tokens)
- Medium tasks (synthesis): OpenAI GPT-4o ($0.0025/1K tokens)
- Complex tasks (deep reasoning): OpenAI O1-mini ($0.015/1K tokens)
Average cost per reasoning session: $0.02 - $0.05
- β API keys via environment variables only
- β No secrets in config files (committed)
- β MCP tool sandboxing
- β Rate limiting and quotas
- β Audit logging for all operations
- β Input validation and sanitization
- β
Security scanning with
/seccheck
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Follow Go conventions and DDD architecture
- Add tests (80%+ coverage required)
- Run
make lint
andmake test
- Commit with conventional commits (
feat:
,fix:
,docs:
) - Push and create a Pull Request
- ROADMAP.md - Project roadmap and completed phases
- CLAUDE.md - Architecture guidelines (SOLID, DDD, Clean Architecture)
- Single-Agent Testing Spec - Testing infrastructure
- MCP Integration - Model Context Protocol guides
# Check API key is set
echo $OPENAI_API_KEY
# Test connectivity
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
# Check GitLab token
curl https://gitlab.com/api/v4/user \
-H "PRIVATE-TOKEN: $GITLAB_TOKEN"
# Enable debug logging
./bin/proxy --debug
# Run from project root
cd /path/to/agents
./bin/proxy
# Rebuild if needed
make clean && make build
MIT License - see LICENSE file for details.
- Google ADK - Agent Development Kit inspiration
- OpenAI - GPT models and API design
- Model Context Protocol - GitLab and YouTrack integration
- Clean Architecture - Robert C. Martin (Uncle Bob)
- Domain-Driven Design - Eric Evans
Built with β€οΈ using Go, DDD, and way too much coffee β
Questions? Open an issue or check the docs.