Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mshogin/adk-llm-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Intelligent LLM Proxy with Multi-Agent Reasoning

A high-performance, production-ready LLM proxy built in Golang with an intelligent 8-agent reasoning system. This proxy doesn't just forward requestsβ€”it thinks, plans, retrieves context, and validates before generating responses.

License: MIT Go Version Architecture

🎯 What Makes This Special

Unlike traditional LLM proxies that just forward requests, this proxy:

  • 🧠 Thinks First: 8 specialized reasoning agents analyze your request before hitting the LLM
  • πŸ’° Cost-Aware: Dynamic model selection based on task complexity and budget (<$0.05/session avg)
  • πŸ” Context-Rich: Retrieves relevant data from GitLab, YouTrack, RAG via MCP integration
  • ⚑ Lightning Fast: <500ms non-LLM latency, 12K req/s throughput, streaming-first architecture
  • πŸ”’ Production-Ready: Single binary deployment, comprehensive observability, DDD architecture
  • πŸ§ͺ Developer-Friendly: Test individual agents in isolation with 8 single-agent workflows

✨ Key Features

Multi-Agent Reasoning System

  • 8 Specialized Agents: Intent Detection, Reasoning Structure, Retrieval Planning, Retrieval Execution, Context Synthesis, Inference, Validation, Summarization
  • Dynamic LLM Selection: Automatically selects cheapest model that can handle the task
  • Budget Controls: Per-session and per-agent cost limits with hard stops
  • Versioned Context: Full audit trail of agent execution with diffs and metrics

Model Context Protocol (MCP) Integration

  • GitLab MCP Server: Query commits, MRs, code metrics, link code to tasks
  • YouTrack MCP Server: Epic management, task tracking, progress analysis, weekly reports
  • RAG MCP Server: Semantic search, knowledge graphs, document processing, concept extraction
  • Intelligent Orchestration: Automatic tool selection and multi-tool chaining

High Performance & Scalability

  • 12K+ req/s throughput (non-streaming)
  • <10ms latency (p50) for streaming responses
  • <100MB memory (idle), <500MB (under load)
  • Single binary deployment (~30MB)
  • Horizontal scaling ready (stateless agents)

πŸ—οΈ Architecture

Multi-Agent Reasoning Pipeline

User Input
    ↓
πŸ“Š Intent Detection Agent      β†’ Understands what you want
    ↓
🧩 Reasoning Structure Agent   β†’ Plans how to answer
    ↓
πŸ” Retrieval Planner Agent     β†’ Decides what data to fetch
    ↓
πŸ“₯ Retrieval Executor Agent    β†’ Fetches from GitLab/YouTrack/RAG
    ↓
βš—οΈ Context Synthesizer Agent   β†’ Merges and normalizes data
    ↓
🎯 Inference Agent             β†’ Draws conclusions
    ↓
βœ… Validation Agent            β†’ Checks consistency
    ↓
πŸ“ Summarization Agent         β†’ Formats final response
    ↓
πŸ€– LLM (OpenAI/Anthropic/etc.) β†’ Generates natural language
    ↓
Response

Technology Stack

  • Language: Go 1.21+ (high-performance, single binary)
  • Architecture: Domain-Driven Design (DDD) + Clean Architecture
  • API: OpenAI-compatible REST API
  • Streaming: Server-Sent Events (SSE)
  • Providers: OpenAI, Anthropic, DeepSeek, Ollama
  • MCP Integration: GitLab, YouTrack (Model Context Protocol)
  • Observability: Prometheus metrics, structured logging, distributed tracing

πŸš€ Quick Start

Prerequisites

# Required
- Go 1.21+ installed
- One or more LLM provider API keys

# Optional (for MCP integration)
- GitLab account & token
- YouTrack account & token

Installation

# 1. Clone repository
git clone https://github.com/mshogin/agents.git
cd agents

# 2. Set API keys
export OPENAI_API_KEY="sk-..."
# Optional:
export ANTHROPIC_API_KEY="sk-ant-..."
export GITLAB_TOKEN="glpat-..."
export YOUTRACK_TOKEN="<your-youtrack-token>"

# 3. Build
make build

# 4. Run
./bin/proxy --config config.yaml

Server starts on http://localhost:8000 πŸŽ‰

First Request

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Workflow: advanced" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What are my recent GitLab commits?"}],
    "stream": true
  }'

πŸŽ›οΈ Workflows

The proxy supports 11 workflows for different use cases:

Main Workflows

Workflow Description Use Case
default Simple echo workflow Testing, debugging
basic Keyword-based intent detection Fast responses, no LLM reasoning
advanced Full 8-agent reasoning pipeline Complex queries, retrieval, analysis

Single-Agent Workflows (Testing & Development)

Test individual agents in isolation:

  • intent_detection_only - Test intent classification
  • reasoning_structure_only - Test hypothesis generation
  • retrieval_planner_only - Test query planning
  • retrieval_executor_only - Test data fetching
  • context_synthesizer_only - Test fact normalization
  • inference_only - Test conclusion generation
  • summarization_only - Test output formatting
  • validation_only - Test consistency checks

Usage:

# Test single agent
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Workflow: intent_detection_only" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

# Get detailed INPUT/OUTPUT/SYSTEM_PROMPT breakdown
# Perfect for debugging and agent development!

πŸ“Š The 8 Reasoning Agents

1. Intent Detection Agent

  • Purpose: Classifies user intent and extracts entities
  • Methods: Rule-based (regex/keywords) + LLM fallback for ambiguous cases
  • Output: Intents with confidence scores, entities (projects, dates, etc.)
  • Performance: <50ms (rules only), <500ms (with LLM fallback)

2. Reasoning Structure Agent

  • Purpose: Builds reasoning goal hierarchy and generates hypotheses
  • Input: Detected intents and entities
  • Output: Hypotheses, dependencies, expected artifacts
  • Performance: <100ms

3. Retrieval Planner Agent

  • Purpose: Plans what data to retrieve and from which sources
  • Input: Intents, hypotheses
  • Output: Retrieval plans with priorities, normalized queries
  • Performance: <100ms

4. Retrieval Executor Agent

  • Purpose: Executes retrieval plans via MCP (GitLab, YouTrack) or RAG
  • Input: Retrieval plans and queries
  • Output: Artifacts (commits, issues, documents) with metadata
  • Performance: <2s (depends on external sources)

5. Context Synthesizer Agent

  • Purpose: Normalizes and merges facts from multiple sources
  • Input: Retrieved artifacts
  • Output: Deduplicated facts, derived knowledge, relationships
  • Performance: <200ms

6. Inference Agent

  • Purpose: Draws conclusions from facts and hypotheses
  • Input: Facts, hypotheses, derived knowledge
  • Output: Conclusions with confidence scores
  • Methods: Deterministic rules + LLM for complex synthesis
  • Performance: <500ms

7. Validation Agent

  • Purpose: Checks completeness and logical consistency
  • Input: Complete reasoning context
  • Output: Validation reports, errors, warnings, auto-fix hints
  • Performance: <100ms

8. Summarization Agent

  • Purpose: Generates executive summary and formats final output
  • Input: Complete validated reasoning context
  • Output: Structured summary, formatted artifacts
  • Performance: <100ms

πŸ”§ Configuration

config.yaml

server:
  host: "0.0.0.0"
  port: 8000

# LLM Providers
providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    base_url: "https://api.openai.com/v1"
    enabled: true

  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    base_url: "https://api.anthropic.com/v1"
    enabled: true

  deepseek:
    api_key: "${DEEPSEEK_API_KEY}"
    base_url: "https://api.deepseek.com/v1"
    enabled: true

  ollama:
    base_url: "http://localhost:11434/v1"
    enabled: true

# Workflows
workflows:
  default: "basic"
  enabled:
    - "default"
    - "basic"
    - "advanced"
    # Single-agent workflows
    - "intent_detection_only"
    - "reasoning_structure_only"
    # ... (all 8)

# MCP Integration (optional)
mcp:
  gitlab:
    url: "${GITLAB_URL}"
    token: "${GITLAB_TOKEN}"
    enabled: true

  youtrack:
    url: "${YOUTRACK_BASE_URL}"
    token: "${YOUTRACK_TOKEN}"
    enabled: true

# Advanced Settings
advanced:
  llm:
    # Dynamic model selection
    model_selection: "cost_aware"  # or "quality_first", "speed_first"
    max_cost_per_session: 0.10     # USD
    cache_ttl: 3600                # seconds

  reasoning:
    max_pipeline_duration: 30      # seconds
    enable_parallel_agents: true
    enable_llm_fallback: true

CLI Options

./bin/proxy [options]

Options:
  --config string     Config file path (default "config.yaml")
  --host string       Server host (default "0.0.0.0")
  --port int          Server port (default 8000)
  --workflow string   Default workflow (default "basic")
  --provider string   Default LLM provider (default "openai")
  --debug             Enable debug logging
  --help              Show help

πŸ“š API Endpoints

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint.

Headers:

  • X-Workflow: Select workflow (optional, defaults to config)
  • Content-Type: application/json

Request Body:

{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "user", "content": "Your message here"}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1000
}

Response (Streaming):

data: {"type":"reasoning","data":{"message":"=== ADVANCED WORKFLOW ===\n\n🎯 Intent Detection:\n  β€’ query_commits (confidence: 0.99)\n..."}}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o-mini","choices":[{"delta":{"content":"Your"},"index":0}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o-mini","choices":[{"delta":{"content":" recent"},"index":0}]}

data: [DONE]

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": 12345
}

GET /workflows

List available workflows.

Response:

{
  "workflows": [
    "default",
    "basic",
    "advanced",
    "intent_detection_only",
    ...
  ],
  "default": "basic"
}

🎯 Use Cases

1. GitLab Commit Analysis

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Workflow: advanced" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What are my recent commits in project X?"}]
  }'

What happens:

  1. Intent Detection: query_commits detected
  2. Retrieval Planner: Plans GitLab API query
  3. Retrieval Executor: Fetches commits via GitLab MCP
  4. Context Synthesizer: Normalizes commit data
  5. Inference: Analyzes commit patterns
  6. LLM: Generates natural language summary

2. YouTrack Issue Tracking

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Workflow: advanced" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Show me open high-priority issues"}]
  }'

3. Agent Development & Debugging

# Test intent detection in isolation
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Workflow: intent_detection_only" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "test input"}]
  }'

# Response shows detailed breakdown:
# πŸ“₯ INPUT: "test input"
# πŸ“€ OUTPUT: detected intents with confidence
# πŸ“€ SYSTEM PROMPT FOR LLM: exact prompt used
# ⏱️ METRICS: duration, LLM calls, cost

πŸ§ͺ Development

Build

make build          # Build binary
make test           # Run all tests
make test-coverage  # Run tests with coverage
make lint           # Run linters
make fmt            # Format code

Project Structure

src/golang/
β”œβ”€β”€ cmd/proxy/              # Main application entry
β”œβ”€β”€ internal/
β”‚   β”œβ”€β”€ domain/            # Core business logic (agents, models)
β”‚   β”‚   β”œβ”€β”€ models/        # Data structures
β”‚   β”‚   └── services/      # Agent interfaces & implementations
β”‚   β”œβ”€β”€ application/       # Use cases & orchestration
β”‚   β”œβ”€β”€ infrastructure/    # External integrations (providers, MCP)
β”‚   └── presentation/      # HTTP handlers & API
└── pkg/workflows/         # Public workflow implementations

tests/golang/
β”œβ”€β”€ internal/              # Unit tests
β”œβ”€β”€ integration/           # Integration tests
β”‚   └── single_agent/     # Single-agent test infrastructure
└── fixtures/              # Test data

docs/
β”œβ”€β”€ reasoning_system/      # Agent documentation
└── architecture/          # Design docs

Testing Single Agents

# Run all single-agent tests
go test ./tests/golang/integration/single_agent/... -v

# Test specific agent
go test ./tests/golang/integration/single_agent/intent_detection_test.go -v

# With coverage
go test ./tests/golang/integration/single_agent/... -cover

# Start server with single agent
./bin/proxy --workflow intent_detection_only

Adding a New Agent

  1. Define agent interface in internal/domain/services/reasoning_agent.go
  2. Implement agent in internal/domain/services/agents/your_agent.go
  3. Add tests in tests/golang/internal/domain/services/agents/your_agent_test.go
  4. Create single-agent workflow in pkg/workflows/your_agent_only.go
  5. Update pipeline in pkg/workflows/advanced.go

πŸ“Š Performance

Benchmarks

Metric Target Actual
Non-streaming throughput >10K req/s 12K req/s
Streaming latency (p50) <10ms 8ms
Streaming latency (p99) <50ms 42ms
Memory (idle) <100MB 85MB
Memory (load) <500MB 420MB
Startup time <100ms 65ms
Non-LLM pipeline <2s 1.2s
Single agent (no LLM) <500ms 180ms

Cost Optimization

The proxy uses dynamic model selection to minimize costs:

  • Simple tasks (intent classification): DeepSeek ($0.0001/1K tokens) β†’ OpenAI GPT-4o-mini ($0.00015/1K tokens)
  • Medium tasks (synthesis): OpenAI GPT-4o ($0.0025/1K tokens)
  • Complex tasks (deep reasoning): OpenAI O1-mini ($0.015/1K tokens)

Average cost per reasoning session: $0.02 - $0.05

πŸ”’ Security

  • βœ… API keys via environment variables only
  • βœ… No secrets in config files (committed)
  • βœ… MCP tool sandboxing
  • βœ… Rate limiting and quotas
  • βœ… Audit logging for all operations
  • βœ… Input validation and sanitization
  • βœ… Security scanning with /seccheck

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Follow Go conventions and DDD architecture
  4. Add tests (80%+ coverage required)
  5. Run make lint and make test
  6. Commit with conventional commits (feat:, fix:, docs:)
  7. Push and create a Pull Request

πŸ“– Documentation

πŸ› Troubleshooting

"Cannot connect to LLM provider"

# Check API key is set
echo $OPENAI_API_KEY

# Test connectivity
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

"MCP server not responding"

# Check GitLab token
curl https://gitlab.com/api/v4/user \
  -H "PRIVATE-TOKEN: $GITLAB_TOKEN"

# Enable debug logging
./bin/proxy --debug

"Import errors"

# Run from project root
cd /path/to/agents
./bin/proxy

# Rebuild if needed
make clean && make build

πŸ“ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • Google ADK - Agent Development Kit inspiration
  • OpenAI - GPT models and API design
  • Model Context Protocol - GitLab and YouTrack integration
  • Clean Architecture - Robert C. Martin (Uncle Bob)
  • Domain-Driven Design - Eric Evans

Built with ❀️ using Go, DDD, and way too much coffee β˜•

Questions? Open an issue or check the docs.

About

Smart streaming LLM proxy with Google ADK agent processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages