feat: olla profile #32

thushan · 2025-07-26T05:22:21Z

This PR takes Olla adds the intelligent routing system that understands model capabilities and can make smart decisions about where to send requests based on model requirements and endpoint capabilities.

It's not quite scout level of complexity nor sherpa's level of polish (yet).

model routing with capabilities - added interception and inspection of request body to extract model information from requests and implemented capability-based routing (GetModelsByCapability), this took quite a few attempts to get right, will continue to tweak.
overhauled the profile system, - Extended platform profiles with model capabilities and resource requirements, now it's a bit more flexible InferenceProfile interface for different LLM providers, hopefully easier to add new ones later
Request header improvements - Added proper response headers (X-Olla-Endpoint, X-Olla-Model, etc.) to track which backend and model handled each request. these are now presented in the test scripts (test/scripts/logic/*.sh)
perf updates - Implemented reservoir sampling for efficient percentile tracking in stats collection
config changes - fixed path matching with prefix configuration and refactored handler for better maintainability

Summary by CodeRabbit

New Features

Added advanced model-aware routing and capability-based endpoint filtering for AI model requests.
Introduced detailed model and endpoint statistics with a new API endpoint and enhanced response headers for routing transparency.
Added support for configurable inference profiles via YAML, simplifying integration of new AI platforms.
Included Docker Compose configuration and improved configuration management with new default and profile config files.
Added a comprehensive test script for validating model routing across capabilities.

Bug Fixes

Improved error handling and robustness in service initialisation and dependency resolution.
Enhanced response header enforcement to prevent upstream overrides.

Documentation

Major updates and new guides covering configuration, inference profiles, model routing, usage examples, and API query formats.
Added detailed user, technical, capability filtering, and header documentation.

Refactor

Modularised proxy request handling for clearer logic and maintainability.
Replaced panic-based control flow with explicit error returns across services.
Centralised profile management with hot-reload support and interface abstractions.
Enhanced concurrency and statistics tracking with improved atomic counters and percentile tracking.

Tests

Added extensive unit, integration, and benchmark tests for model routing, capability filtering, statistics collection, profile logic, and proxy headers.
Introduced integration tests for model routing and endpoint selection.
Added tests for configurable profiles and percentile tracking implementations.

Chores

Updated .gitignore, Makefile, and installer script for improved build, release, and local development workflows.
Added new Docker Compose file and refined Docker usage instructions.
Enhanced logging and configuration file tracking.

initial ProfileConfig struct for yaml-based configuration create ConfigurableProfile to bridge yaml to interfaces implement ProfileLoader with built-in defaults forgot to add mutex to ProfileLoader... rookie mistake add model response parsers for ollama/lmstudio/openai lmstudio parser not extracting MaxContextLength field debugging why lmstudio tests are failing lmstudio uses publisher field as parent_model, weird but ok create yaml config files for existing profiles fix broken tests :( wrong discovery path for lmstudio - should be /api/v0/models extract common parser logic to reduce duplication Merge branch 'main' into feature/olla-profiles update ProfileFactory to use new loader factory tests failing due to signature change add NewFactoryLegacy for backward compatibility update comments to explain why not what simplify loader error handling windows path issues in domain file creation remove duplicate type declarations in parsers.go update all tests to use NewFactoryLegacy built-in profiles not loading when config dir missing cleanup ollama paths and remove unused endpoints add hot-reload support to ProfileLoader Merge branch 'feature/profile-validation' into feature/olla-profiles openai parser not handling nil created timestamp clarify models.yaml is for unifier not profiles make human-like comments per review feedback typo in configurable_profile.go comment ensure backward compatibility maintained run gofmt on profile package phase 1 complete - ready for model-aware routing

slightly faster way of extracting model, without breaking existing payload add tests for inspector add tests for proxy handler update handler for showing model in request context add a benchmark test for inspector chain for later testing

track model stats with percentiles and per-endpoint metrics handle platform quirks, LM Studio single threading, Ollama load times switch to proper switch statements per linter detect vision/embedding/code models from naming patterns calculate memory requirements from parameter count and quantization add model-aware routing with capability matching add coderabbit cleanup old model stats after 24h retention track routing effectiveness - hits, misses, fallbacks respect concurrency limits per platform transform model names for LM Studio publisher format bigger timeouts for 70B models on Ollama fix context length detection for llama3:8b implement InfenceProfile for all three platforms tests for model capabilities and resource calculations handle panics across the board a bit better update litepool to handle errors

add GetHealthyEndpointsForModel update tests fix broken tests lint and fix complexity

… implementation) refactor model collector update registry profiles to avoid lcase remove old modelcollector methods

adds capabilities to body inspector & registries WIP: capabilities Fix capabilities for lmstudio fix vllm issues update proxies to handle capabilities

coderabbitai · 2025-07-26T05:22:28Z

Walkthrough

This update introduces a major refactor of the model registry, inference profile, and stats collection subsystems, with a new YAML-driven profile system, improved error handling, and enhanced model-aware routing and statistics. Numerous new documentation files, configuration files, and comprehensive test suites are added. Proxy handlers and services are modularised and now support detailed model and endpoint statistics, capability-based routing, and robust config management.

Changes

File(s) / Group	Change Summary
`.coderabbit.yaml`, `.gitignore`, `docker-compose.yaml`, `main.go`, `readme.md`, `makefile`, `install.sh`	Added/updated configuration files, Docker Compose, installer script, and documentation for improved clarity, local config handling, and Docker workflows.
`config/config.yaml`, `config/profiles/ollama.yaml`, `config/profiles/lmstudio.yaml`, `config/profiles/openai.yaml`, `config/profiles/README.md`, `default.yaml`	Added/updated core and profile YAML config files for Olla, including detailed platform profiles and documentation.
`docs/overview.md`, `docs/headers.md`, `docs/technical.md`, `docs/user-guide.md`, `CLAUDE.md`	Added/rewrote documentation: overview, headers, technical details, user guide, and summary.
`test/scripts/logic/README.md`, `test/scripts/logic/test-model-routing.sh`	Added a comprehensive model routing test script and documentation.
`internal/config/config.go`, `internal/config/types.go`	Added `Filename` field to config struct and logic to track loaded config file source.
`internal/core/domain/inference_profile.go`, `internal/core/domain/profile_config.go`, `internal/core/domain/model.go`, `internal/core/domain/routing.go`	Introduced model-aware inference profile interface, profile config struct, model registry capability query, and extended request profile metadata.
`internal/core/ports/stats.go`	Extended stats collector interface and types for model and endpoint statistics tracking.
`internal/adapter/registry/profile/configurable_profile.go`, `internal/adapter/registry/profile/parsers.go`, `internal/adapter/registry/profile/loader.go`	Added YAML-driven configurable profile implementation, response parsers, and thread-safe profile loader with built-in and custom profile support.
`internal/adapter/registry/profile/factory.go`	Refactored factory to use loader, added reload and interface abstraction, and support for inference profiles.
`internal/adapter/registry/profile/ollama.go`, `internal/adapter/registry/profile/lmstudio.go`, `internal/adapter/registry/profile/openai_compatible.go`	Removed hardcoded profile implementations, retaining only response data structures.
`internal/adapter/registry/profile/configurable_profile_test.go`, `internal/adapter/registry/profile/configurable_profile_extended_test.go`, `internal/adapter/registry/profile/profile_test.go`, `internal/adapter/registry/profile/factory_test.go`, `internal/adapter/registry/profile/inference_profile_test.go`	Added/updated comprehensive tests for configurable profile, resource requirements, concurrency, timeout scaling, and profile factory.
`internal/adapter/registry/unified_memory_registry.go`, `internal/adapter/registry/unified_memory_registry_test.go`, `internal/adapter/registry/unified_memory_registry_benchmark_test.go`	Enhanced unified memory registry with healthy endpoint/model capability queries, added unit and benchmark tests.
`internal/adapter/registry/memory_registry.go`	Added stub for model capability query to memory registry.
`internal/adapter/discovery/http_client_test.go`, `internal/adapter/discovery/integration_test.go`, `internal/adapter/discovery/repository.go`, `internal/adapter/discovery/service_test.go`	Refactored tests and repo to use new profile factory with error handling; added mock capability method.
`internal/adapter/inspector/body_inspector.go`, `internal/adapter/inspector/body_inspector_test.go`, `internal/adapter/inspector/chain_benchmark_test.go`, `internal/adapter/inspector/factory.go`, `internal/adapter/inspector/factory_test.go`, `internal/adapter/inspector/path_inspector_test.go`	Added body inspector for model/capability extraction, supporting tests and benchmarks, and factory method for creation.
`internal/adapter/proxy/proxy_olla.go`, `internal/adapter/proxy/proxy_sherpa.go`	Enhanced proxy services: added error handling, model-aware metrics, custom response headers (including model info, request ID, backend type), and response time trailer.
`internal/adapter/proxy/proxy_headers_test.go`, `internal/adapter/proxy/proxy_test.go`	Added/updated tests for proxy response headers and endpoint creation.
`internal/adapter/proxy/config.go`	Added default value logic to config getters for proxy settings.
`internal/adapter/stats/collector.go`, `internal/adapter/stats/model_collector.go`, `internal/adapter/stats/model_collector_config.go`, `internal/adapter/stats/percentile_tracker.go`, `internal/adapter/stats/percentile_tracker_test.go`	Refactored stats collector to use xsync counters, added model and endpoint stats, percentile tracking, configs, and comprehensive tests/benchmarks.
`internal/app/handlers/application.go`, `internal/app/handlers/handler_proxy.go`, `internal/app/handlers/handler_proxy_capability_test.go`, `internal/app/handlers/handler_proxy_model_test.go`, `internal/app/handlers/handler_stats_models.go`, `internal/app/handlers/server.go`, `internal/app/model_routing_integration_test.go`	Refactored application and proxy handler for modularity, model/capability-aware routing, detailed stats endpoint, and added comprehensive tests and integration tests.
`internal/app/services/discovery.go`, `internal/app/services/http.go`, `internal/app/services/proxy.go`, `internal/app/services/security.go`, `internal/app/services/stats.go`	Improved error handling in service initialisation and accessors, replacing panics with error returns.
`pkg/pool/lite_pool.go`	Updated pool constructor to return error on invalid input.
`internal/adapter/unifier/model_config.go`	Updated comment for model unification config struct.
`internal/adapter/unifier/default_unifier_test.go`	Removed stale model cleanup test (now handled internally).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTPHandler
    participant InspectorChain
    participant ModelRegistry
    participant EndpointRepo
    participant ProxyService
    participant StatsCollector

    Client->>HTTPHandler: Sends LLM request
    HTTPHandler->>InspectorChain: Inspect request (path/body)
    InspectorChain->>HTTPHandler: Extract model/capabilities
    HTTPHandler->>ModelRegistry: Get endpoints by model/capability
    ModelRegistry->>EndpointRepo: Get healthy endpoints
    EndpointRepo-->>ModelRegistry: Return endpoints
    ModelRegistry-->>HTTPHandler: Return compatible endpoints
    HTTPHandler->>ProxyService: Proxy request to endpoint(s)
    ProxyService->>StatsCollector: Record request (model/endpoint stats)
    ProxyService-->>HTTPHandler: Return response
    HTTPHandler-->>Client: Respond with headers (model, endpoint, timing)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Complexity: Extensive, high-complexity changes spanning configuration, core interfaces, registry, stats, proxy, and handler logic, with major new features, documentation, and comprehensive tests. Review requires attention to interface changes, backward compatibility, error handling, and the new YAML-driven profile system.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/olla-profiles

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 18

🔭 Outside diff range comments (1)

internal/adapter/stats/collector.go (1)
262-274: Fix potential race in recordRateLimitedIP

This method accesses the map without consistent locking - it reads before taking the lock.

The cutoff calculation should be inside the lock to ensure consistency:
 func (c *Collector) recordRateLimitedIP(clientIP string) {
-	now := time.Now().UnixNano()
-	cutoff := now - int64(time.Hour)
-
 	c.securityMu.Lock()
+	now := time.Now().UnixNano()
+	cutoff := now - int64(time.Hour)
+	
 	c.uniqueRateLimitedIPs[clientIP] = now
 	for ip, ts := range c.uniqueRateLimitedIPs {
 		if ts < cutoff {
 			delete(c.uniqueRateLimitedIPs, ip)
 		}
 	}
 	c.securityMu.Unlock()
 }

🧹 Nitpick comments (45)

internal/adapter/discovery/repository.go (1)
27-31: Consider logging the fallback scenario and handling fallback errors.

The graceful degradation pattern is good, but silently ignoring the error from profile.NewFactory("") on line 30 could mask configuration issues. Consider logging when the fallback occurs and handling potential errors from the fallback creation.
func NewStaticEndpointRepository() *StaticEndpointRepository {
	profileFactory, err := profile.NewFactoryWithDefaults()
	if err != nil {
		// For tests, use empty profile dir to get built-in profiles
-		profileFactory, _ = profile.NewFactory("")
+		profileFactory, fallbackErr := profile.NewFactory("")
+		if fallbackErr != nil {
+			// This should not happen with built-in profiles, but handle gracefully
+			panic(fmt.Sprintf("failed to create profile factory with built-in profiles: %v", fallbackErr))
+		}
+		// Could add logging here if logger was available
	}
	return &StaticEndpointRepository{
		endpoints:      make(map[string]*domain.Endpoint),
		profileFactory: profileFactory,
	}
}
internal/adapter/proxy/config.go (1)

42-46: Confirm zero timeout behaviour is intentional.

Returning 0 for response timeout means no timeout by default. This could lead to hanging connections if not handled properly upstream.

Consider documenting this behaviour and ensuring upstream code handles infinite timeouts appropriately.
docs/overview.md (1)
20-24: Add language specification to code block.

The static analysis tool correctly identified that the fenced code block lacks a language specification.
-```
+```text
 Your App → Olla → [Workstation Ollama]
                 → [Laptop LM Studio]
                 → [Cloud API]
config-base/profiles/lmstudio.yaml (1)
1-58: Fix YAML formatting issues.

The static analysis tool identified several formatting issues that should be addressed to maintain code quality standards.
+# LM Studio inference platform profile
+name: lm-studio
+version: "1.0"
+display_name: "LM Studio"
+description: "LM Studio local inference server"
+
+# API compatibility
+api:
+  openai_compatible: true
+  paths:
+    - /v1/models          # 0: health check & models
+    - /v1/chat/completions # 1: chat completions
+    - /v1/completions     # 2: completions
+    - /v1/embeddings      # 3: embeddings
+    - /api/v0/models      # 4: legacy models endpoint
+  model_discovery_path: /api/v0/models
+  health_check_path: /v1/models
+
+# Platform characteristics
+characteristics:
+  timeout: 3m
+  max_concurrent_requests: 1  # LM Studio typically handles one at a time
+  default_priority: 90
+  streaming_support: true
+
+# Detection hints for auto-discovery
+detection:
+  path_indicators:
+    - "/v1/models"
+    - "/api/v0/models"
+  default_ports:
+    - 1234
+
+# Model handling
+models:
+  name_format: "{{.Name}}"
+  capability_patterns:
+    chat:
+      - "*"  # All models support chat in LM Studio
+
+# Request/response handling
+request:
+  model_field_paths:
+    - "model"
+  response_format: "lmstudio"
+  parsing_rules:
+    chat_completions_path: "/v1/chat/completions"
+    completions_path: "/v1/completions"
+    model_field_name: "model"
+    supports_streaming: true
+
-# Path indices for specific functions  
+# Path indices for specific functions
+path_indices:
+  health: 0
+  models: 0
+  chat_completions: 1
+  completions: 2
+  embeddings: 3
+
This addresses:

Ensures proper line endings (will be handled by the file system)

Removes trailing spaces on line 52

Adds missing newline at end of file
CLAUDE.md (2)
56-61: Fix list indentation and compound adjective.

The benchmark test list has incorrect indentation, and "rate limiting" should be hyphenated when used as a compound adjective.

Apply this diff to fix the formatting issues:
-3. **Benchmark Tests**:
-  - Performance of critical paths
-  - Proxy engine comparisons
-  - Connection pooling efficiency
-  - Circuit breaker behavior
+3. **Benchmark Tests**:
+- Performance of critical paths
+- Proxy engine comparisons
+- Connection pooling efficiency
+- Circuit breaker behaviour
 4. **Security Tests**: Validate rate-limiting and size restrictions (see `/test/scripts/security/`)
66-71: Add language specification to code block.

The code block should specify the shell language for proper syntax highlighting.

Apply this diff:
-```
+```bash
 # Run proxy engine tests
 go test -v ./internal/adapter/proxy -run TestAllProxies
 go test -v ./internal/adapter/proxy -run TestSherpa
 go test -v ./internal/adapter/proxy -run TestOlla
</blockquote></details>
<details>
<summary>default.yaml (1)</summary><blockquote>

`75-75`: **Add missing newline at end of file.**

YAML files should end with a newline character as per YAML standards.

Add a newline after the last line:

```diff
 engineering:
-  show_nerdstats: false
+  show_nerdstats: false
+
config-base/profiles/openai.yaml (1)

1-1: Fix newline character issues.

The file has incorrect newline characters and is missing a final newline, which should be corrected for proper YAML formatting.

Ensure the file uses Unix-style line endings (\n) and add a newline at the end of the file.

Also applies to: 61-61
internal/core/domain/profile_config.go (1)
9-12: Consider adding validation tags for capability patterns.

The CapabilityPatterns field uses map[string][]string which is appropriate, but consider adding validation to ensure pattern keys are from a known set of capabilities.
type ProfileConfig struct {
	Models struct {
-		CapabilityPatterns map[string][]string `yaml:"capability_patterns"`
+		CapabilityPatterns map[string][]string `yaml:"capability_patterns" validate:"dive,keys,oneof=vision embeddings code chat streaming"`
		NameFormat         string              `yaml:"name_format"`
	} `yaml:"models"`
docs/user-guide.md (2)
276-282: Add language specification to fenced code block.

The response headers example should specify a language for proper syntax highlighting.
-```
+```http
X-Olla-Endpoint: gaming-pc        # Which backend handled it
X-Olla-Model: llama3.2:3b         # Actual model used
X-Olla-Backend-Type: ollama       # Backend type
X-Olla-Request-ID: req_abc123     # For log correlation
X-Olla-Response-Time: 523ms       # Total time
354-364: Fix grammatical issues in troubleshooting section.

There are some grammatical issues that should be corrected for clarity.
-1. Check endpoints are running: `curl http://endpoint-url/api/tags`
+1. Check that endpoints are running: `curl http://endpoint-url/api/tags`
2. Verify URLs in config.yaml
3. Check Olla logs for health check failures
4. Ensure network connectivity

### "Model not found"
-1. List available models: `curl http://localhost:40114/olla/models`
+1. List available models: `curl http://localhost:40114/olla/models`
2. Verify model is loaded on at least one endpoint
3. Check model name spelling and format

### High latency
1. Check response time header: `X-Olla-Response-Time`
2. Verify primary endpoints are healthy
-3. Consider switching to Olla engine for better performance
+3. Consider switching to the Olla engine for better performance
4. Check connection pool settings
config-base/profiles/ollama.yaml (1)
1-79: Address YAML formatting issues.

The YAML linter has identified several formatting issues that should be corrected for consistency.
-# Ollama inference platform profile
+# Ollama inference platform profile
 name: ollama
 version: "1.0"
 display_name: "Ollama"
 description: "Local Ollama instance for running GGUF models"

 # API compatibility
 api:
   openai_compatible: true
   paths:
     - /                    # 0: health check
     - /api/generate        # 1: text completion
-    - /api/chat            # 2: chat completion  
+    - /api/chat            # 2: chat completion
     - /api/embeddings      # 3: generate embeddings
     - /api/tags            # 4: list local models
     - /api/show            # 5: show model info
     - /v1/models           # 6: OpenAI compat
     - /v1/chat/completions # 7: OpenAI compat
     - /v1/completions      # 8: OpenAI compat
     - /v1/embeddings       # 9: OpenAI compat
   model_discovery_path: /api/tags
   health_check_path: /

 # Platform characteristics
 characteristics:
   timeout: 5m  # Ollama can be slow for large models
   max_concurrent_requests: 10
   default_priority: 100
   streaming_support: true
-  
+
 # Detection hints for auto-discovery
 detection:
   user_agent_patterns:
     - "ollama/"
   headers:
     - "X-ProfileOllama-Version"
   path_indicators:
     - "/"
     - "/api/tags"
   default_ports:
     - 11434

 # Model handling
 models:
   name_format: "{{.Name}}"  # e.g., "llama3:latest"
   capability_patterns:
     vision:
       - "*llava*"
       - "*vision*"
       - "*bakllava*"
     embeddings:
       - "*embed*"
       - "nomic-embed-text"
       - "mxbai-embed-large"
     code:
       - "*code*"
       - "codellama*"
       - "deepseek-coder*"
       - "qwen*coder*"
-      
+
 # Request/response handling
 request:
   model_field_paths:
     - "model"
   response_format: "ollama"
   parsing_rules:
     chat_completions_path: "/api/chat"
     completions_path: "/api/generate"
     generate_path: "/api/generate"
     model_field_name: "model"
     supports_streaming: true

 # Path indices for specific functions
 path_indices:
   health: 0
   completions: 1
   chat_completions: 2
   embeddings: 3
   models: 4
+
internal/adapter/registry/unified_memory_registry_benchmark_test.go (1)
14-14: Consider adding benchmark verification for test setup.

The benchmarks assume createTestUnifiedRegistry() and mockEndpointRepository exist but don't verify their creation succeeded.

Consider adding nil checks after creating test dependencies:
func BenchmarkGetHealthyEndpointsForModel(b *testing.B) {
	ctx := context.Background()
	registry := createTestUnifiedRegistry()
+	if registry == nil {
+		b.Fatal("failed to create test registry")
+	}

	// Create mock endpoint repository
	endpoints := make([]*domain.Endpoint, 100)
	// ... endpoint setup ...

	mockRepo := &mockEndpointRepository{endpoints: endpoints}
+	if mockRepo == nil {
+		b.Fatal("failed to create mock repository")
+	}
Also applies to: 30-30
config-base/profiles/README.md (1)
59-66: Consider improving the wording for better readability.

The static analysis tool suggests that "with success" might be wordy in a couple of places. Consider these more concise alternatives:
- Real-time test results with success/failure indicators
+ Real-time test results showing success/failure status

- Summary statistics with success rate  
+ Summary statistics including success rate
internal/adapter/inspector/body_inspector_test.go (1)
578-607: Mock logger implementation could be simplified.

The mock logger has many empty method implementations. Consider creating a shared mock or using a testing library that provides null implementations to reduce boilerplate code.

You could create a shared mock in a test utilities package:
// internal/testutil/mocks.go
type NoOpStyledLogger struct{}

func (n *NoOpStyledLogger) Debug(msg string, args ...any) {}
func (n *NoOpStyledLogger) Info(msg string, args ...any) {}
// ... other methods
This would eliminate duplicate mock implementations across test files.
internal/app/handlers/handler_proxy_model_test.go (1)
174-259: Consider consolidating mock implementations.

The mock logger and model registry implementations are duplicated across multiple test files. Consider creating shared mocks in a test utilities package to reduce code duplication and improve maintainability.

Create a shared test utilities package:
// internal/testutil/mocks.go
package testutil

// Shared mock implementations that can be reused across test files
type MockStyledLogger struct { /* implementation */ }
type MockModelRegistry struct { /* implementation */ }
This would eliminate duplicate mock code and make tests more maintainable.
docs/technical.md (3)
94-94: Add missing determiner for grammatical correctness.

The phrase "endpoint with fewest active connections" should include "the" for proper grammar.
-**Least Connections**
-- Routes to endpoint with fewest active connections
+**Least Connections**
+- Routes to endpoint with the fewest active connections
7-7: Add language specifications to fenced code blocks for better documentation.

Multiple code blocks lack language specifications, which affects syntax highlighting and accessibility. Consider adding appropriate language identifiers.

For example:
-```
+```text
 ┌─────────────────────────────────────────────────────────┐
And for configuration examples:
-```
+```http
 X-Olla-Endpoint: workstation-ollama
Also applies to: 50-50, 100-100, 109-109, 122-122, 208-208, 253-253, 258-258, 265-265

89-89: Convert emphasis to proper headings for better document structure.

Several sections use emphasis (**text**) instead of proper markdown headings, which affects document hierarchy and navigation.
-**Round Robin**
+#### Round Robin

-**Least Connections**
+#### Least Connections

-**High latency**
+#### High latency
Also applies to: 93-93, 285-285, 290-290, 295-295
test/scripts/logic/test-model-routing.sh (2)
24-24: Remove unused BLUE variable.

The BLUE colour variable is declared but never used in the script, as indicated by shellcheck.
-BLUE='\033[0;34m'
110-110: Consider separating variable declaration and assignment.

Shellcheck recommends separating declaration and assignment to avoid masking return values, which can help with debugging if the command fails.

For better error handling:
-    local model_data=$(echo "$response" | sed '$d')
+    local model_data
+    model_data=$(echo "$response" | sed '$d')

-    local curl_output=$(curl -s -w "\n%{http_code}\n%{time_total}" \
+    local curl_output
+    curl_output=$(curl -s -w "\n%{http_code}\n%{time_total}" \
Also applies to: 155-155
internal/adapter/stats/percentile_tracker.go (2)
28-37: Consider validating sample size upper bound

While the code validates that sampleSize is positive, there's no upper bound check. Very large sample sizes could consume excessive memory.
 func NewReservoirSampler(sampleSize int) *ReservoirSampler {
 	if sampleSize <= 0 {
 		sampleSize = 100 // Default to 100 samples
+	} else if sampleSize > 10000 {
+		sampleSize = 10000 // Cap at reasonable maximum
 	}
 	return &ReservoirSampler{
76-90: Percentile index calculation could be more accurate

The current percentile calculation uses integer division which truncates. Consider using proper rounding for more accurate percentile indices.
 	// Calculate percentile indices
-	p50Idx := len(sorted) * 50 / 100
-	p95Idx := len(sorted) * 95 / 100
-	p99Idx := len(sorted) * 99 / 100
+	p50Idx := (len(sorted)-1) * 50 / 100
+	p95Idx := (len(sorted)-1) * 95 / 100
+	p99Idx := (len(sorted)-1) * 99 / 100
internal/adapter/registry/unified_memory_registry_test.go (1)
241-242: Consider testing function_calling alias

The test uses "function" as the capability but the comment mentions testing "function_calling" capability. According to the code, both should work as aliases.
 		{
 			name:        "Function calling capability",
-			capability:  "function",
+			capability:  "function_calling",
 			expectedIDs: []string{"code-model"},
 		},
internal/adapter/stats/model_collector.go (1)
109-110: Consider bounds checking for consecutive errors

The consecutive errors counter uses atomic operations but doesn't have an upper bound, which could lead to integer overflow in pathological cases.
 	if data, ok := endpointMap.Load(endpoint.Name); ok {
-		atomic.AddInt32(&data.consecutiveErrors, 1)
+		// Cap at reasonable maximum to prevent overflow
+		current := atomic.LoadInt32(&data.consecutiveErrors)
+		if current < 1000000 {
+			atomic.AddInt32(&data.consecutiveErrors, 1)
+		}
 	}
internal/app/handlers/handler_proxy_capability_test.go (5)
51-116: Consider adding edge cases to the mock data

The mock registry setup is comprehensive, but consider adding test cases for:

Models with empty capabilities arrays

Models with nil SourceEndpoints

Models with outdated LastSeen timestamps

This would help ensure the filtering logic handles edge cases gracefully.

130-139: Fix the comment accuracy

The comment states "endpoints 1, 2, 3 have vision models", but according to the mock data:

Endpoints 1, 2 have llava:13b (vision)

Endpoint 3 has gpt-4-vision (vision)

So the comment is technically correct but could be clearer about which specific models are on each endpoint.
-			expectedCount: 3, // endpoints 1, 2, 3 have vision models
+			expectedCount: 3, // endpoints 1, 2 (llava:13b), 3 (gpt-4-vision) have vision models
282-297: Test name could be more specific

The test verifies that when both model name and capabilities are specified, only endpoints matching both criteria are returned. Consider renaming to better reflect this behavior.
-	t.Run("capability filtering takes precedence over model name", func(t *testing.T) {
+	t.Run("filters by both model name and capabilities when both specified", func(t *testing.T) {
321-326: Add nil check in GetModelsByCapability

The mock implementation should handle nil context gracefully to match real implementations.
 func (m *mockCapabilityModelRegistry) GetModelsByCapability(ctx context.Context, capability string) ([]*domain.UnifiedModel, error) {
+	if ctx == nil {
+		return nil, fmt.Errorf("context cannot be nil")
+	}
 	if models, ok := m.modelsByCapability[capability]; ok {
 		return models, nil
 	}
 	return []*domain.UnifiedModel{}, nil
 }
374-380: Inconsistent return values in mock methods

The methods ModelsToString and ModelsToStrings return empty values, but a more realistic mock might return formatted strings based on the input models.
 func (m *mockFullModelRegistry) ModelsToString(models []*domain.ModelInfo) string {
-	return ""
+	if len(models) == 0 {
+		return ""
+	}
+	names := make([]string, len(models))
+	for i, model := range models {
+		names[i] = model.Name
+	}
+	return strings.Join(names, ", ")
 }

 func (m *mockFullModelRegistry) ModelsToStrings(models []*domain.ModelInfo) []string {
-	return []string{}
+	names := make([]string, len(models))
+	for i, model := range models {
+		if model != nil {
+			names[i] = model.Name
+		}
+	}
+	return names
 }
Don't forget to add the import:
 import (
 	"context"
 	"net/url"
+	"strings"
 	"testing"
 	"time"
internal/core/domain/inference_profile.go (2)
36-38: Consider adding context parameter to capability methods

The methods GetModelCapabilities and IsModelSupported take a registry parameter but not a context. For consistency with other registry operations and to support cancellation/timeouts, consider adding context.
-	GetModelCapabilities(modelName string, registry ModelRegistry) ModelCapabilities
-	IsModelSupported(modelName string, registry ModelRegistry) bool
+	GetModelCapabilities(ctx context.Context, modelName string, registry ModelRegistry) ModelCapabilities
+	IsModelSupported(ctx context.Context, modelName string, registry ModelRegistry) bool
69-76: Consider adding CPU requirements to ResourceRequirements

The struct covers GPU and memory requirements well, but modern inference can also be CPU-bound. Consider adding CPU core requirements.
 type ResourceRequirements struct {
 	MinMemoryGB         float64
 	RecommendedMemoryGB float64
 	RequiresGPU         bool
 	MinGPUMemoryGB      float64
 	EstimatedLoadTimeMS int64
+	MinCPUCores         int
+	RecommendedCPUCores int
 }
internal/adapter/registry/profile/configurable_profile.go (3)
119-130: Validation could be more comprehensive

The validation only checks URL presence and scheme. Consider validating other endpoint properties that might be profile-specific.
 func (p *ConfigurableProfile) ValidateEndpoint(endpoint *domain.Endpoint) error {
 	if endpoint == nil {
 		return fmt.Errorf("%s endpoint cannot be nil", p.config.Name)
 	}
 	if endpoint.URL == nil {
 		return fmt.Errorf("%s endpoint requires URL", p.config.Name)
 	}
 
 	// ollama defaults to http, but we need to be explicit for safety
 	if endpoint.URL.Scheme == "" {
 		return fmt.Errorf("%s endpoint URL must include scheme (http:// or https://)", p.config.Name)
 	}
+
+	// Validate endpoint type matches profile if specified
+	if endpoint.Type != "" && endpoint.Type != p.config.Name {
+		return fmt.Errorf("%s endpoint type mismatch: expected %s, got %s", p.config.Name, p.config.Name, endpoint.Type)
+	}
 
 	return nil
 }
134-143: Consider using configuration for default capabilities

The method returns hardcoded capabilities but could leverage the ProfileConfig's Models.CapabilityPatterns to provide more accurate defaults based on model name patterns.
 func (p *ConfigurableProfile) GetModelCapabilities(modelName string, registry domain.ModelRegistry) domain.ModelCapabilities {
-	// Default capabilities for configurable profiles
-	return domain.ModelCapabilities{
+	caps := domain.ModelCapabilities{
 		ChatCompletion:   true,
 		TextGeneration:   true,
 		StreamingSupport: p.config.Request.ParsingRules.SupportsStreaming,
 		MaxContextLength: 4096,
 		MaxOutputTokens:  2048,
 	}
+
+	// Check capability patterns from config
+	if p.config.Models.CapabilityPatterns != nil {
+		lowerName := strings.ToLower(modelName)
+		for capability, patterns := range p.config.Models.CapabilityPatterns {
+			for _, pattern := range patterns {
+				if matched, _ := filepath.Match(pattern, lowerName); matched {
+					switch capability {
+					case "vision":
+						caps.VisionUnderstanding = true
+					case "embeddings":
+						caps.Embeddings = true
+					case "code":
+						caps.CodeGeneration = true
+					case "function_calling":
+						caps.FunctionCalling = true
+					}
+					break
+				}
+			}
+		}
+	}
+
+	return caps
 }
Add the required import:
 import (
 	"fmt"
+	"path/filepath"
+	"strings"
 	"time"
169-176: TODO comment could reference configuration field

The TODO suggests adding IsLocal to config, which would be useful for routing decisions. Consider implementing this now since it affects routing strategy.

Would you like me to help implement the IsLocal configuration field and update the routing strategy accordingly?
internal/adapter/registry/profile/lmstudio.go (2)
242-284: Model capability detection could be more robust

The capability detection logic is good but could benefit from regex patterns for more accurate matching.
+import (
+	"regexp"
+)
+
+var (
+	embedPattern = regexp.MustCompile(`(?i)(embed|bge-|e5-)`)
+	visionPattern = regexp.MustCompile(`(?i)(vision|llava|bakllava|cogvlm)`)
+	codePattern = regexp.MustCompile(`(?i)(code|starcoder|deepseek-coder)`)
+)

 func (p *LMStudioProfile) GetModelCapabilities(modelName string, registry domain.ModelRegistry) domain.ModelCapabilities {
 	caps := domain.ModelCapabilities{
 		ChatCompletion:   true,
 		TextGeneration:   true,
 		StreamingSupport: true,
 		MaxContextLength: 4096, // Conservative default
 		MaxOutputTokens:  2048,
 	}
 
-	lowerName := strings.ToLower(modelName)
-
 	// Check for embeddings models
-	if strings.Contains(lowerName, "embed") ||
-		strings.Contains(lowerName, "bge-") ||
-		strings.Contains(lowerName, "e5-") {
+	if embedPattern.MatchString(modelName) {
 		caps.Embeddings = true
 		caps.ChatCompletion = false
 		caps.TextGeneration = false
 	}
 
 	// Check for vision models
-	if strings.Contains(lowerName, "vision") ||
-		strings.Contains(lowerName, "llava") ||
-		strings.Contains(lowerName, "cogvlm") {
+	if visionPattern.MatchString(modelName) {
 		caps.VisionUnderstanding = true
 	}
 
 	// Check for code models
-	if strings.Contains(lowerName, "code") ||
-		strings.Contains(lowerName, "starcoder") {
+	if codePattern.MatchString(modelName) {
 		caps.CodeGeneration = true
 	}
276-277: TODO could be addressed with registry context

The TODO mentions using registry to get actual context length. Since the registry is already passed in, consider attempting to fetch this information.

Would you like me to help implement fetching actual model details from the registry when available?
internal/adapter/registry/profile/openai_compatible.go (2)

149-152: Consider making concurrent request limit configurable.

The hardcoded value of 100 concurrent requests might not be suitable for all deployments. Consider making this configurable through the profile configuration.

299-304: Consider rate limit configuration per model tier.

Different model tiers (GPT-4 vs GPT-3.5) often have different rate limits. The current concurrency limits might not accurately reflect provider-specific rate limits.

Consider implementing a more sophisticated rate limiting strategy that accounts for:

Provider-specific rate limits

Model tier differences

Token-based rate limiting (not just request count)

internal/adapter/registry/profile/ollama.go (1)

228-230: Consider making timeout configurable.

The 5-minute timeout is reasonable for large model loading, but different deployments might need different values based on their hardware and model sizes.
internal/adapter/registry/profile/parsers.go (1)
146-148: Document the LM Studio quirk.

The code mentions "lmstudio quirk" but doesn't explain why publisher is mapped to ParentModel. Consider adding a more detailed comment.
 		if model.Publisher != nil {
 			details.Publisher = model.Publisher
-			details.ParentModel = model.Publisher // lmstudio quirk
+			// LM Studio uses "publisher" field to indicate the parent model
+			// (e.g., "meta-llama" for Llama models), so we map it to ParentModel
+			details.ParentModel = model.Publisher
 			hasDetails = true
 		}
internal/adapter/inspector/body_inspector.go (2)

19-20: Consider making max body size configurable.

The 1MB limit might be too restrictive for some use cases (e.g., requests with embedded images or large contexts).

309-317: Code keyword detection is fragile.

The keyword-based detection for code generation could produce many false positives. Words like "function" and "class" appear in many non-code contexts.

Consider:

Making the keywords configurable

Using more specific patterns or combinations

Weighting keywords differently based on context

Allowing users to explicitly specify code generation intent
internal/app/handlers/handler_proxy.go (2)
336-342: Handle GetModelsByCapability errors more gracefully.

Currently, errors are logged but processing continues. This could lead to incorrect filtering if a critical capability check fails.

Consider:

Tracking which capabilities failed to query

Returning an error if all capability queries fail

Adding metrics for capability query failures

356-359: Document the nil return semantics.

The distinction between nil (no capability support) and empty map (no matches) is important but subtle. Consider adding more detailed documentation.
 	// nil means "don't filter", empty map means "no matches found"
 	if !hasCapabilitySupport {
+		// Return nil to indicate the registry doesn't support capability queries
+		// This is different from returning an empty map, which would mean
+		// "we queried but found no matching models"
 		return nil
 	}

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 595b6ad and e5b2425.

📒 Files selected for processing (67)

.coderabbit.yaml (1 hunks)
.goreleaser.yml (1 hunks)
CLAUDE.md (2 hunks)
config-base/profiles/README.md (1 hunks)
config-base/profiles/lmstudio.yaml (1 hunks)
config-base/profiles/ollama.yaml (1 hunks)
config-base/profiles/openai.yaml (1 hunks)
default.yaml (1 hunks)
docs/headers.md (1 hunks)
docs/overview.md (1 hunks)
docs/technical.md (1 hunks)
docs/user-guide.md (1 hunks)
internal/adapter/discovery/http_client_test.go (7 hunks)
internal/adapter/discovery/integration_test.go (4 hunks)
internal/adapter/discovery/repository.go (1 hunks)
internal/adapter/discovery/service_test.go (1 hunks)
internal/adapter/inspector/body_inspector.go (1 hunks)
internal/adapter/inspector/body_inspector_test.go (1 hunks)
internal/adapter/inspector/chain_benchmark_test.go (1 hunks)
internal/adapter/inspector/factory.go (1 hunks)
internal/adapter/inspector/factory_test.go (11 hunks)
internal/adapter/inspector/path_inspector_test.go (1 hunks)
internal/adapter/proxy/config.go (2 hunks)
internal/adapter/proxy/proxy_headers_test.go (1 hunks)
internal/adapter/proxy/proxy_olla.go (10 hunks)
internal/adapter/proxy/proxy_sherpa.go (9 hunks)
internal/adapter/proxy/proxy_test.go (1 hunks)
internal/adapter/registry/memory_registry.go (1 hunks)
internal/adapter/registry/profile/configurable_profile.go (1 hunks)
internal/adapter/registry/profile/factory.go (3 hunks)
internal/adapter/registry/profile/factory_test.go (5 hunks)
internal/adapter/registry/profile/inference_profile_test.go (1 hunks)
internal/adapter/registry/profile/lmstudio.go (3 hunks)
internal/adapter/registry/profile/loader.go (1 hunks)
internal/adapter/registry/profile/ollama.go (4 hunks)
internal/adapter/registry/profile/openai_compatible.go (3 hunks)
internal/adapter/registry/profile/parsers.go (1 hunks)
internal/adapter/registry/unified_memory_registry.go (2 hunks)
internal/adapter/registry/unified_memory_registry_benchmark_test.go (1 hunks)
internal/adapter/registry/unified_memory_registry_test.go (1 hunks)
internal/adapter/stats/collector.go (9 hunks)
internal/adapter/stats/model_collector.go (1 hunks)
internal/adapter/stats/model_collector_config.go (1 hunks)
internal/adapter/stats/percentile_tracker.go (1 hunks)
internal/adapter/stats/percentile_tracker_test.go (1 hunks)
internal/adapter/unifier/default_unifier_test.go (0 hunks)
internal/adapter/unifier/model_config.go (1 hunks)
internal/app/handlers/application.go (3 hunks)
internal/app/handlers/handler_proxy.go (1 hunks)
internal/app/handlers/handler_proxy_capability_test.go (1 hunks)
internal/app/handlers/handler_proxy_model_test.go (1 hunks)
internal/app/handlers/handler_stats_models.go (1 hunks)
internal/app/handlers/server.go (1 hunks)
internal/app/model_routing_integration_test.go (1 hunks)
internal/app/services/discovery.go (3 hunks)
internal/app/services/http.go (3 hunks)
internal/app/services/proxy.go (2 hunks)
internal/app/services/security.go (3 hunks)
internal/app/services/stats.go (2 hunks)
internal/core/domain/inference_profile.go (1 hunks)
internal/core/domain/model.go (1 hunks)
internal/core/domain/profile_config.go (1 hunks)
internal/core/domain/routing.go (1 hunks)
internal/core/ports/stats.go (2 hunks)
pkg/pool/lite_pool.go (3 hunks)
test/scripts/logic/README.md (1 hunks)
test/scripts/logic/test-model-routing.sh (1 hunks)

💤 Files with no reviewable changes (1)

internal/adapter/unifier/default_unifier_test.go

🧰 Additional context used

🧬 Code Graph Analysis (26)

internal/adapter/inspector/factory.go (1)

internal/adapter/inspector/body_inspector.go (2)

BodyInspector (27-31)

NewBodyInspector (33-46)

internal/adapter/discovery/service_test.go (1)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/adapter/inspector/factory_test.go (2)

internal/adapter/inspector/factory.go (1)

Factory (9-12)

internal/adapter/registry/profile/factory.go (2)

Factory (18-21)

NewFactoryWithDefaults (37-39)

internal/core/domain/model.go (1)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/app/services/stats.go (1)

internal/core/ports/stats.go (1)

StatsCollector (9-25)

internal/adapter/discovery/repository.go (3)

internal/adapter/registry/profile/factory.go (2)

NewFactoryWithDefaults (37-39)

NewFactory (25-35)

internal/adapter/inspector/factory.go (1)

NewFactory (14-19)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/adapter/proxy/config.go (2)

internal/core/constants/context.go (1)

ProxyPathPrefix (4-4)

internal/adapter/proxy/proxy_sherpa.go (4)

DefaultTimeout (77-77)

DefaultKeepAlive (78-78)

DefaultReadTimeout (71-71)

DefaultStreamBufferSize (72-72)

internal/app/services/security.go (2)

internal/core/ports/security.go (2)

SecurityChain (46-48)

SecurityMetricsService (71-74)

internal/adapter/security/factory.go (1)

Adapters (16-21)

internal/adapter/registry/memory_registry.go (2)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/core/domain/model.go (1)

NewModelRegistryError (79-86)

internal/core/domain/routing.go (1)

internal/core/domain/inference_profile.go (2)

ModelCapabilities (53-63)

ResourceRequirements (69-75)

internal/adapter/registry/unified_memory_registry_benchmark_test.go (3)

internal/core/domain/endpoint.go (3)

Endpoint (21-40)

StatusHealthy (53-53)

StatusUnhealthy (57-57)

internal/core/domain/model.go (1)

ModelInfo (26-33)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/app/services/http.go (1)

internal/app/handlers/application.go (1)

NewApplication (79-145)

internal/app/services/proxy.go (2)

internal/core/ports/proxy.go (1)

ProxyService (14-19)

internal/core/domain/endpoint.go (1)

EndpointSelector (104-109)

internal/app/handlers/application.go (1)

internal/adapter/registry/profile/factory.go (2)

NewFactoryWithDefaults (37-39)

NewFactory (25-35)

internal/adapter/stats/percentile_tracker_test.go (1)

internal/adapter/stats/percentile_tracker.go (3)

NewReservoirSampler (29-37)

NewSimpleStatsTracker (122-126)

PercentileTracker (10-15)

internal/adapter/registry/profile/loader.go (3)

internal/core/domain/profile_config.go (1)

ProfileConfig (8-61)

internal/adapter/registry/profile/configurable_profile.go (1)

NewConfigurableProfile (19-24)

internal/core/domain/profile.go (3)

ProfileOllama (4-4)

ProfileLmStudio (5-5)

ProfileOpenAICompatible (6-6)

internal/app/services/discovery.go (4)

internal/adapter/registry/profile/factory.go (1)

NewFactoryWithDefaults (37-39)

internal/core/domain/model.go (1)

ModelRegistry (41-54)

internal/core/domain/endpoint.go (1)

EndpointRepository (95-102)

internal/adapter/health/checker.go (1)

HTTPHealthChecker (24-31)

internal/adapter/registry/profile/inference_profile_test.go (4)

internal/adapter/registry/profile/ollama.go (1)

NewOllamaProfile (86-88)

internal/core/domain/inference_profile.go (1)

ModelCapabilities (53-63)

internal/adapter/registry/profile/lmstudio.go (1)

NewLMStudioProfile (60-62)

internal/adapter/registry/profile/openai_compatible.go (1)

NewOpenAICompatibleProfile (45-47)

internal/adapter/proxy/proxy_sherpa.go (3)

internal/version/version.go (1)

Name (13-13)

pkg/format/format.go (2)

Latency (65-76)

Duration (31-46)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/adapter/registry/profile/factory.go (3)

internal/core/domain/inference_profile.go (1)

InferenceProfile (8-48)

internal/adapter/registry/profile/loader.go (2)

ProfileLoader (16-20)

NewProfileLoader (22-27)

internal/core/domain/profile.go (2)

ProfileOpenAICompatible (6-6)

PlatformProfile (11-33)

internal/adapter/proxy/proxy_olla.go (2)

pkg/pool/lite_pool.go (1)

NewLitePool (44-68)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/adapter/registry/profile/lmstudio.go (3)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/domain/profile_config.go (1)

ProfileConfig (8-61)

internal/core/domain/inference_profile.go (3)

ModelCapabilities (53-63)

ResourceRequirements (69-75)

RoutingStrategy (80-85)

internal/adapter/stats/collector.go (5)

internal/adapter/stats/model_collector.go (2)

ModelCollector (15-23)

NewModelCollectorWithConfig (58-70)

internal/logger/styled.go (1)

StyledLogger (12-35)

internal/adapter/stats/model_collector_config.go (2)

DefaultModelCollectorConfig (34-45)

ModelCollectorConfig (6-31)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/ports/stats.go (2)

ModelStats (49-66)

EndpointModelStats (69-77)

internal/adapter/registry/profile/parsers.go (5)

internal/core/domain/model.go (2)

ModelInfo (26-33)

ModelDetails (11-24)

internal/adapter/registry/profile/ollama.go (2)

OllamaResponse (12-14)

OllamaModel (16-23)

internal/util/parsing.go (1)

ParseTime (8-18)

internal/adapter/registry/profile/lmstudio.go (1)

LMStudioResponse (16-19)

internal/adapter/registry/profile/openai_compatible.go (1)

OpenAICompatibleResponse (12-15)

internal/app/handlers/handler_proxy.go (9)

internal/core/ports/proxy.go (1)

RequestStats (46-62)

internal/logger/styled.go (1)

StyledLogger (12-35)

internal/core/domain/routing.go (1)

RequestProfile (20-34)

internal/app/handlers/application.go (1)

Application (61-76)

internal/util/request.go (3)

GenerateRequestID (12-29)

GetClientIP (31-58)

StripRoutePrefix (67-78)

internal/core/constants/context.go (3)

RequestIDKey (5-5)

RequestTimeKey (6-6)

ProxyPathPrefix (4-4)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/domain/inference_profile.go (1)

ModelCapabilities (53-63)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/adapter/registry/profile/ollama.go (4)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/domain/profile_config.go (1)

ProfileConfig (8-61)

internal/core/domain/model.go (1)

ModelRegistry (41-54)

internal/core/domain/inference_profile.go (3)

ModelCapabilities (53-63)

ResourceRequirements (69-75)

RoutingStrategy (80-85)

🪛 YAMLlint (1.37.1)

.coderabbit.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 15-15: no new line character at the end of file

(new-line-at-end-of-file)

default.yaml

[error] 75-75: no new line character at the end of file

(new-line-at-end-of-file)

config-base/profiles/lmstudio.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 52-52: trailing spaces

(trailing-spaces)

[error] 58-58: no new line character at the end of file

(new-line-at-end-of-file)

config-base/profiles/openai.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 61-61: no new line character at the end of file

(new-line-at-end-of-file)

config-base/profiles/ollama.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 13-13: trailing spaces

(trailing-spaces)

[error] 30-30: trailing spaces

(trailing-spaces)

[error] 60-60: trailing spaces

(trailing-spaces)

[error] 79-79: no new line character at the end of file

(new-line-at-end-of-file)

🪛 markdownlint-cli2 (0.17.2)

docs/overview.md

20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/user-guide.md

276-276: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

CLAUDE.md

57-57: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

58-58: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

59-59: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

60-60: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

66-66: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/technical.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

50-50: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

89-89: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

93-93: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

100-100: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

122-122: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

208-208: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

253-253: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

258-258: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

265-265: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

285-285: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

290-290: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

295-295: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 LanguageTool

test/scripts/logic/README.md

[style] ~59-~59: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...ipt provides:

Real-time test results with success/failure indicators
HTTP response cod...

(EN_WORDINESS_PREMIUM_WITH_SUCCESS)

[style] ~66-~66: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...r non-JSON errors

Summary statistics with success rate
Common troubleshooting tips for...

(EN_WORDINESS_PREMIUM_WITH_SUCCESS)

docs/user-guide.md

[uncategorized] ~356-~356: Possible missing preposition found.
Context: ...ivity

"Model not found"

List available models: `curl http://localhost:40114/ol...

(AI_HYDRA_LEO_MISSING_OF)

[grammar] ~363-~363: It seems that “to” is missing before the verb.
Context: ...erify primary endpoints are healthy
3. Consider switching to Olla engine for better per...

(MISSING_TO_BETWEEN_BE_AND_VB)

CLAUDE.md

[uncategorized] ~61-~61: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...havior
4. Security Tests: Validate rate limiting and size restrictions (see `/test/scrip...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

docs/technical.md

[grammar] ~94-~94: A determiner may be missing.
Context: ...onnections**

Routes to endpoint with fewest active connections
Best for varying ...

(THE_SUPERLATIVE)

[uncategorized] ~235-~235: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...``

Security Considerations

Rate Limiting

Token bucket algorithm
Per-IP and...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Shellcheck (0.10.0)

test/scripts/logic/test-model-routing.sh

[warning] 24-24: BLUE appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 110-110: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 155-155: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (156)

.coderabbit.yaml (1)

2-15: Configuration choices look appropriate for the project.

The Australian English setting and "chill" review profile align well with the project objectives. The disabled features (poem generation, draft reviews) help focus reviews on essential feedback.

internal/adapter/inspector/factory.go (1)

40-42: Well-implemented factory method following established patterns.

The new CreateBodyInspector method maintains consistency with existing factory methods and properly handles error propagation from the NewBodyInspector constructor.

internal/app/handlers/server.go (1)

96-96: Good addition that follows established routing patterns.

The new model statistics route maintains consistency with existing internal endpoints and provides a logical extension to the stats API surface.

internal/adapter/inspector/path_inspector_test.go (1)

261-261: Excellent refactoring that improves test maintainability.

Using the centralised createTestProfileFactory(t) helper function provides better error handling and consistency across test files. This change aligns with similar improvements in other test files.

internal/core/domain/model.go (1)

53-53: Well-designed interface method that enables capability-aware model routing.

The new GetModelsByCapability method provides a clean API for filtering models by their capabilities, supporting the intelligent routing system mentioned in the PR objectives. The method signature follows established patterns with proper context handling and error return.

.goreleaser.yml (1)

14-14: LGTM! Profile directory setup correctly implemented.

The changes properly support the new inference profile system by ensuring the config/profiles directory is created and populated with profile configurations during the release process. This aligns well with the broader profile system enhancements described in the PR objectives.

Also applies to: 19-19

internal/adapter/discovery/integration_test.go (1)

302-302: LGTM! Consistent test factory refactoring.

The replacement of direct profile.NewFactory() calls with the createTestProfileFactory(t) helper function centralises factory creation and ensures proper error handling in tests. This change maintains consistency across the test suite and aligns with the updated profile factory API.

Also applies to: 361-361, 461-461, 577-577

internal/adapter/discovery/service_test.go (1)

539-542: LGTM! Appropriate mock implementation for capability support.

The new GetModelsByCapability method provides a proper stub implementation for the enhanced model registry interface. The empty return value and clear documentation appropriately indicate that the mock doesn't support capabilities, allowing tests to run without errors whilst maintaining interface compliance.

internal/adapter/registry/memory_registry.go (1)

437-448: LGTM! Well-implemented capability method with graceful degradation.

The GetModelsByCapability method follows established patterns in the codebase with proper context cancellation handling and appropriate error wrapping. The empty return value with clear documentation allows for graceful degradation, which is appropriate for the basic memory registry that doesn't track unified model capabilities.

internal/adapter/registry/profile/factory_test.go (2)

9-17: LGTM! Excellent test helper implementation.

The testFactory helper function properly centralises factory creation for tests with appropriate error handling and uses t.Helper() for correct test failure attribution. This encapsulates the new factory API complexity and ensures consistent initialisation across all tests.

20-20: LGTM! Consistent adoption of test helper.

The consistent replacement of direct NewFactory() calls with the testFactory(t) helper improves test maintainability and ensures uniform error handling across all test cases. This refactoring aligns well with the updated factory API requirements.

Also applies to: 44-44, 64-64, 96-96, 130-130

internal/adapter/unifier/model_config.go (1)

13-16: Well-structured documentation improvement.

The updated comment effectively clarifies the distinction between model unification config (name normalisation) and profile configs (platform behaviour), which aligns well with the broader profile system introduced in this PR.

internal/app/services/stats.go (2)

5-5: Import addition supports error handling enhancement.

The fmt import is correctly added to support the new error formatting in the method below.

55-60: Excellent error handling improvement.

Replacing the panic with proper error return follows Go best practices and makes the service more robust. This change enables callers to handle uninitialized collector gracefully rather than experiencing a runtime crash.

internal/adapter/proxy/proxy_test.go (1)

143-152: Enhanced test endpoint data supports profile system.

The addition of URLString and Type fields enriches the test endpoints with metadata that supports the new profile-based routing system. Using domain.ProfileOllama as the default type is a sensible choice for test consistency.

internal/adapter/inspector/factory_test.go (2)

13-13: Consistent use of test helper improves maintainability.

Replacing direct factory creation with the helper function centralises error handling and makes tests more robust.

353-363: Well-implemented test helper function.

The helper function properly follows Go testing conventions with t.Helper() and provides consistent error handling for profile factory creation across all tests. The use of NewFactoryWithDefaults() aligns with the updated factory pattern.

docs/headers.md (1)

1-59: Comprehensive and well-structured header documentation.

This documentation clearly explains all custom response headers with practical examples and use cases. The structure is logical, moving from individual header descriptions to usage scenarios and testing guidance. This enhances system observability and aids in debugging and monitoring efforts.

internal/adapter/proxy/config.go (5)

21-25: LGTM! Robust default value handling.

The implementation correctly returns a sensible default when the field is empty, following the zero-value pattern consistently.

35-39: DefaultKeepAlive constant confirmed

The DefaultKeepAlive constant is defined in internal/adapter/proxy/proxy_sherpa.go as 60 * time.Second within the same package, so it’s accessible. No further changes required.

28-32: DefaultTimeout constant is defined and accessible

DefaultTimeout is declared in the same internal/adapter/proxy package, so the reference in config.go is valid:

Defined in internal/adapter/proxy/proxy_sherpa.go as
DefaultTimeout = 60 * time.Second

Also appears in internal/adapter/discovery/http_client.go, confirming the naming is consistent

No further action required.

49-53: DefaultReadTimeout constant confirmed
The DefaultReadTimeout constant is defined in:

internal/app/config.go (300 * time.Second)

internal/adapter/proxy/proxy_sherpa.go (60 * time.Second)

No further action required.

56-60: DefaultStreamBufferSize constant is properly defined

The DefaultStreamBufferSize constant is declared in both locations below, so no further action is needed:

internal/app/config.go – line 17 (DefaultStreamBufferSize = 8 * 1024 // 8KB)

internal/adapter/proxy/proxy_sherpa.go – line 72 (DefaultStreamBufferSize = 8 * 1024)

internal/adapter/discovery/http_client_test.go (2)

227-227: Excellent refactoring for consistent test setup.

The replacement of direct profile.NewFactory() calls with createTestProfileFactory(t) centralises test setup and ensures proper error handling across all tests.

Also applies to: 456-456, 499-499, 551-551, 566-566, 617-617

660-670: Well-implemented test helper function.

The helper function follows Go testing best practices:

Uses t.Helper() to mark it as a helper

Provides clear error messages

Fails fast with t.Fatalf() on errors

Uses profile.NewFactoryWithDefaults() for consistency

pkg/pool/lite_pool.go (3)

44-52: Excellent error handling improvement.

The function now properly validates inputs and returns descriptive errors instead of panicking. The early validation of the constructor result is particularly good practice.

59-62: Good defensive programming with descriptive panic.

While the early validation should prevent this scenario, keeping the panic as a last resort with a descriptive message is good defensive programming.

16-22: Documentation updated appropriately.

The example code has been updated to reflect the new error-returning signature, helping users understand the proper usage pattern.

internal/app/services/http.go (5)

61-66: Robust error handling for stats collector.

The error handling properly wraps the error with context and prevents service startup if the stats collector cannot be obtained.

68-73: Consistent error handling pattern.

Good consistent application of the error handling pattern with proper error wrapping.

75-92: Comprehensive discovery service validation.

The code properly validates all three discovery service dependencies (registry, discovery service, and repository) with appropriate error handling.

94-99: Security chain validation completed.

The security chain validation follows the same robust pattern as other dependencies.

105-119: Application handler creation with proper error handling.

The application handler creation now properly handles errors, which aligns with the broader pattern of replacing panics with error returns throughout the codebase.

internal/app/services/proxy.go (4)

54-58: Excellent error handling improvement.

The addition of proper error handling when retrieving the stats collector significantly improves the robustness of the service initialisation. This change aligns well with the broader pattern of replacing panics with explicit error returns across the codebase.

61-71: Improved dependency resolution with proper error handling.

The enhanced error handling for endpoint repository and discovery service retrieval is well-implemented. The error messages are descriptive and use proper error wrapping, which will aid in debugging service startup issues.

137-142: Method signature change improves error handling.

The updated method signature for GetProxyService is a significant improvement over the previous panic-based approach. Returning an explicit error allows callers to handle uninitialized state gracefully.

145-150: Consistent error handling pattern.

The GetLoadBalancer method follows the same improved error handling pattern as GetProxyService, providing consistency across the service's API. The descriptive error message clearly indicates the issue when the load balancer is not initialised.

internal/adapter/registry/unified_memory_registry.go (3)

245-283: Well-implemented healthy endpoint filtering.

The GetHealthyEndpointsForModel method is well-structured with proper context handling, clear error messages, and efficient filtering logic using a map for O(1) lookups. The method correctly handles edge cases like empty endpoint lists.

285-315: Comprehensive capability matching logic.

The capabilityMatches function provides thorough alias support for different capability types. The implementation correctly handles various naming conventions that different AI platforms might use.

317-339: Efficient capability-based model filtering.

The GetModelsByCapability method implements efficient filtering with proper context cancellation handling. The early return in the Range function optimises performance by avoiding unnecessary iterations once a match is found.

internal/app/services/security.go (5)

5-5: Necessary import addition.

The addition of the fmt package is required for the new error formatting functionality introduced in the updated error handling.

49-54: Consistent error handling improvement.

The error handling for stats collector retrieval matches the pattern established in the proxy service, providing consistent behaviour across services. The error wrapping with descriptive context is well-implemented.

88-93: Improved getter method with proper error handling.

The GetSecurityChain method now provides explicit error handling instead of potential panics, improving the robustness of the security service API.

96-101: Consistent error handling pattern continued.

The GetAdapters method follows the same improved error handling pattern, maintaining consistency across the service's API surface.

104-109: Complete error handling coverage.

The GetMetrics method completes the error handling improvements for all getter methods in the security service, ensuring comprehensive protection against uninitialised state access.

docs/overview.md (1)

1-84: Excellent comprehensive overview documentation.

This documentation provides a clear, well-structured introduction to Olla that effectively communicates its value proposition. The content is well-organised with:

Clear problem definition and solution explanation

Practical benefits categorised by user type (developers, operations, end users)

Realistic use cases that readers can relate to

Technical transparency through response header documentation

The writing is accessible and avoids unnecessary jargon whilst maintaining technical accuracy.

config-base/profiles/lmstudio.yaml (2)

20-24: Well-configured platform characteristics.

The platform characteristics are appropriately configured for LM Studio's typical behaviour, particularly the max_concurrent_requests: 1 setting which reflects LM Studio's single-request processing model. The 3-minute timeout and priority of 90 are reasonable defaults.

34-51: Comprehensive model and request handling configuration.

The model handling and request configuration provides thorough support for LM Studio's API structure. The wildcard pattern for chat capabilities ("*") correctly reflects that all models in LM Studio support chat functionality. The parsing rules comprehensively cover the various endpoint types.

internal/core/domain/routing.go (3)

9-18: LGTM! Well-structured enum implementation.

The RequestType enumeration follows Go conventions properly with meaningful values for LLM request classification.

23-33: Excellent extension of RequestProfile for intelligent routing.

The addition of rich metadata fields enables capability-based routing as described in the PR objectives. Using pointers for ModelCapabilities and ResourceRequirements is memory-efficient, allowing nil values when capabilities aren't needed.

36-42: Constructor follows appropriate incremental population pattern.

The constructor initialises essential fields while leaving rich metadata fields for later population by inspection logic. This design supports the request inspection workflow described in the PR.

CLAUDE.md (1)

3-6: Documentation restructuring improves readability.

The new concise format with clear sections makes the documentation much more accessible compared to the previous verbose version.

default.yaml (1)

1-35: Comprehensive and well-structured default configuration.

The configuration covers all necessary components with sensible defaults. The rate limiting, request size limits, and timeout values are appropriate for production use.

internal/adapter/stats/model_collector_config.go (3)

5-31: Well-designed configuration struct with comprehensive options.

The ModelCollectorConfig provides excellent configurability for statistics collection with clear field documentation and sensible parameter groupings.

33-45: Good default values with helpful comments.

The defaults are well-chosen with explanatory comments about reductions from previous values, showing thoughtful tuning for performance.

47-71: Robust validation with defensive programming.

The Validate method ensures all configuration fields have reasonable values and provides fallbacks, following defensive programming principles.

config-base/profiles/openai.yaml (2)

1-17: Comprehensive OpenAI-compatible profile configuration.

The profile provides excellent coverage of OpenAI API endpoints and characteristics, enabling proper integration with OpenAI-style services.

31-42: Well-defined model capability patterns.

The capability patterns effectively categorise models by their functional capabilities (chat, embeddings, vision) using logical naming patterns.

internal/core/domain/profile_config.go (1)

5-8: Excellent documentation and design approach.

The comment clearly explains the purpose of this YAML-based configuration system, which enables dynamic platform support without code changes. This is a solid architectural decision for extensibility.

docs/user-guide.md (1)

1-53: Comprehensive installation section covers all major platforms.

The installation instructions are clear and cover Docker, binary installation, and building from source. The examples are practical and well-formatted.

config-base/profiles/ollama.yaml (2)

1-6: Well-structured profile header with clear metadata.

The profile metadata is comprehensive and clearly identifies the Ollama platform configuration.

8-22: Comprehensive API path configuration.

The path definitions cover all necessary endpoints with clear comments indicating their purposes. The OpenAI compatibility flag is correctly set.

internal/app/handlers/application.go (4)

89-89: Proper error handling addition to constructor.

The function signature change to return an error is appropriate for robust initialization with proper error propagation.

91-100: Excellent fallback strategy for profile factory creation.

The error handling implements a sensible fallback strategy: first attempt to load from the default profiles directory, then fallback to built-in profiles if that fails. The logging provides appropriate visibility into which path was taken.

106-111: Proper error handling for body inspector creation.

The body inspector creation includes appropriate error handling with early return on failure, following Go error handling best practices.

144-144: All NewApplication call sites properly handle errors

I’ve verified the sole caller in internal/app/services/http.go checks err != nil and returns a wrapped error:

internal/app/services/http.go:105–112

No other call sites were found. No changes required.

internal/adapter/registry/unified_memory_registry_benchmark_test.go (4)

12-88: Comprehensive endpoint retrieval benchmarks with realistic scenarios.

The benchmarks test both existing and non-existent model scenarios, plus parallel execution. The mock data setup with varied endpoint health states (2/3 healthy) provides realistic test conditions.

90-147: Well-designed capability benchmarks with diverse test data.

The capability benchmarks test various capability types with 1000 models, providing good scale testing. The parallel benchmark ensures thread safety under concurrent access.

149-198: Effective concurrency benchmarks using xsync primitives.

The concurrent registration and read benchmarks properly test the xsync-based synchronization under high concurrency conditions. The pre-population in the read benchmark ensures meaningful test data.

200-233: Thorough memory usage analysis across different scales.

The memory benchmarks test three different scales (100, 1000, 10000 models) and include forcing stats updates to capture full memory impact. The model data includes realistic size and description fields.

config-base/profiles/README.md (1)

1-194: Excellent comprehensive documentation for the inference profile system.

This README provides thorough coverage of the new YAML-based inference profile system, including practical examples, configuration templates, and troubleshooting guidance. The documentation effectively explains how users can configure, customise, and extend Olla's multi-platform AI model support.

The structure is logical, progressing from overview to practical implementation details, which will help both users and developers understand the system.

test/scripts/logic/README.md (1)

1-85: Well-structured documentation for the model routing test script.

This README effectively documents the test-model-routing.sh script, providing clear explanations of its purpose, features, usage patterns, and error handling. The documentation will help users understand how to validate Olla's model routing capabilities comprehensively.

The progression from basic usage to advanced scenarios and troubleshooting is logical and user-friendly.

internal/adapter/inspector/body_inspector_test.go (2)

18-140: Comprehensive test coverage for model extraction scenarios.

The test suite effectively covers various model extraction scenarios including different JSON formats (OpenAI, Ollama, LM Studio), edge cases (empty fields, missing fields, invalid JSON), and content type handling. The table-driven approach makes the tests maintainable and easy to extend.

244-509: Excellent capability detection test coverage.

The capability detection tests thoroughly validate the inspector's ability to identify model capabilities based on request content, including:

Vision understanding (image_url, base64 images)

Function calling (tools, functions, tool_choice)

Streaming support

Embeddings detection

Code generation capabilities

The test cases cover realistic scenarios and edge cases, ensuring robust capability detection logic.

internal/adapter/registry/profile/inference_profile_test.go (3)

11-109: Comprehensive Ollama profile testing with good model coverage.

The test suite effectively validates Ollama profile behaviour across different model types (chat, embedding, vision, code models) with appropriate capability mappings and resource requirements. The test cases cover realistic scenarios and validate key profile characteristics.

111-143: Good validation of LMStudio's single-threaded nature.

The tests correctly validate LMStudio's key characteristics:

Single concurrent request limitation

Model name transformation logic

Optimal concurrency behaviour

The model name transformation tests cover various scenarios including already-formatted names.

145-178: Thorough OpenAI compatible profile testing.

The test suite validates OpenAI profile capabilities across different model types with appropriate vision and function calling capabilities. The resource requirements test correctly reflects that cloud APIs don't require local resources.

internal/app/handlers/handler_proxy_model_test.go (2)

20-104: Well-designed endpoint filtering tests with realistic scenarios.

The test suite effectively validates the model-based endpoint filtering logic with three important scenarios:

Filtering by available models

Fallback when model not found

No filtering when model not specified

The mock registry setup realistically simulates model-to-endpoint mappings, ensuring the filtering logic works correctly.

106-172: Excellent integration test for body inspector chain.

The integration test validates that the inspector chain correctly:

Extracts models from different request formats (OpenAI, Ollama)

Preserves request body readability after inspection

Handles requests without model specifications

This ensures the body inspector integrates properly with the path inspector in the chain.
internal/app/model_routing_integration_test.go (3)

19-62: LGTM! Well-structured mock implementations.

The mock components provide good flexibility for testing different routing scenarios.

292-332: LGTM! Clean mock repository implementation.

The mock correctly implements the required interface methods with appropriate filtering logic.

98-106: Fix string conversion and inefficient concatenation.

The current implementation has issues:

Converting i to rune produces control characters, not digits

String concatenation in a loop is inefficient

Apply this fix:
-largeMessages := make([]string, 10)
-for i := range largeMessages {
-    largeMessages[i] = `{"role": "user", "content": "This is message ` + string(rune(i+'0')) + ` in a multi-turn conversation"}`
-}
-largeBody := `{"model": "claude-3-opus", "messages": [` + bytes.NewBufferString(largeMessages[0]).String()
-for i := 1; i < len(largeMessages); i++ {
-    largeBody += "," + largeMessages[i]
-}
+largeMessages := make([]string, 10)
+for i := range largeMessages {
+    largeMessages[i] = fmt.Sprintf(`{"role": "user", "content": "This is message %d in a multi-turn conversation"}`, i)
+}
+largeBody := fmt.Sprintf(`{"model": "claude-3-opus", "messages": [%s]}`, strings.Join(largeMessages, ","))
Also add the required import:
+import (
+    "fmt"
+    "strings"
+    // ... other imports
+)
Likely an incorrect or invalid review comment.
internal/adapter/inspector/chain_benchmark_test.go (1)

14-75: LGTM! Well-structured performance comparison benchmark.

The benchmark effectively measures the overhead of adding body inspection to the chain while validating correct behavior.

internal/app/services/discovery.go (6)

59-63: LGTM! Proper error handling for stats collector.

Good improvement to return errors instead of panicking.

120-123: LGTM! Robust error handling for profile factory creation.

The error handling ensures profile loading failures are properly surfaced.

179-184: LGTM! Safe error handling in GetRegistry.

Returning an error instead of panicking improves service reliability.

187-192: LGTM! Consistent error handling in GetEndpointRepository.

Maintains the same pattern as other getters.

195-200: LGTM! Proper error handling in GetHealthChecker.

Consistent with the improved error handling pattern.

203-208: LGTM! Comprehensive validation in GetDiscoveryService.

Good practice to check both required components before returning the service.

internal/adapter/proxy/proxy_sherpa.go (5)

179-179: LGTM! Consistent context passing for model-aware metrics.

All recordFailure calls correctly pass the context parameter.

Also applies to: 204-204, 221-221, 253-253, 279-279, 331-331

292-318: LGTM! Well-implemented response headers with proper ordering.

Headers are correctly added before upstream headers to prevent override, and the trailer is properly declared for clients that support it.

343-348: LGTM! Proper trailer implementation for response time.

The response time is correctly sent as a trailer after streaming completes.

354-375: LGTM! Clean model-aware metrics recording.

The implementation properly handles both model-specific and generic metrics.

390-407: LGTM! Consistent implementation with success recording.

The failure recording properly mirrors the success recording logic.

internal/adapter/proxy/proxy_olla.go (5)

413-430: LGTM! Consistent with Sherpa proxy implementation.

The model-aware failure recording matches the pattern in proxy_sherpa.

456-456: LGTM! All recordFailure calls correctly updated.

Context parameter consistently added to all failure recording calls.

Also applies to: 472-472, 498-498, 537-537, 567-567, 622-622

583-609: LGTM! Headers implementation matches Sherpa proxy.

Good consistency between proxy implementations with proper header ordering.

635-640: LGTM! Consistent trailer implementation.

Response time trailer properly implemented matching Sherpa proxy.

645-666: LGTM! Model-aware success metrics properly implemented.

Consistent with the failure recording and Sherpa proxy implementation.

docs/technical.md (1)

1-310: Excellent comprehensive technical documentation.

This documentation provides outstanding coverage of the Olla proxy system architecture, including detailed explanations of the dual proxy engines, request flow, and troubleshooting guidance. The inclusion of ASCII diagrams, configuration examples, and performance benchmarks makes it highly valuable for both developers and operators.

internal/adapter/proxy/proxy_headers_test.go (2)

14-103: Well-structured comprehensive header tests.

The test implementation excellently validates the proxy header functionality with good coverage of both proxy engines. The table-driven approach for testing both Sherpa and Olla implementations is appropriate, and the separation of test cases for different scenarios (with/without model context) provides thorough validation.

106-137: Excellent security test for header override prevention.

This test properly validates that upstream services cannot override Olla's custom headers, which is crucial for maintaining the integrity of routing metadata and preventing potential security issues or misleading information.

internal/adapter/stats/percentile_tracker_test.go (3)

7-69: Comprehensive ReservoirSampler test coverage.

The test suite provides excellent coverage of the ReservoirSampler functionality, including proper validation of percentile ordering, edge cases with empty and single-value scenarios, and reset behaviour. The assertions correctly validate the expected statistical properties.

71-132: Thorough SimpleStatsTracker validation.

The tests properly validate all aspects of the SimpleStatsTracker including average calculations, min/max tracking, count accuracy, and reset functionality. The test logic correctly handles the expected behaviour for empty trackers.

134-223: Valuable performance benchmarks for implementation comparison.

The benchmark suite provides excellent insights into the performance characteristics of different statistical tracking approaches. The memory allocation benchmarks particularly demonstrate the efficiency gains of the new reservoir sampling approach over large array-based implementations.

test/scripts/logic/test-model-routing.sh (1)

1-420: Excellent comprehensive model routing test script.

This script provides outstanding end-to-end testing capabilities for the Olla model routing functionality. The implementation demonstrates excellent bash scripting practices with proper error handling, comprehensive help documentation, colour-coded output, and thorough validation of routing behaviour through response header parsing. The script effectively tests various model capabilities and provides detailed feedback on routing decisions.

internal/adapter/registry/profile/loader.go (4)

16-27: Well-designed thread-safe ProfileLoader structure.

The ProfileLoader implementation properly uses sync.RWMutex for thread-safe access to the profiles map, and the constructor initialises all necessary fields correctly.

32-68: Robust profile loading with excellent error handling.

The LoadProfiles method demonstrates excellent error handling practices by continuing to load other profiles even when individual files fail to parse. The approach of loading built-ins first and then allowing YAML files to override them provides good flexibility whilst ensuring the system works out of the box.

107-223: Comprehensive built-in profile configurations.

The built-in profiles for Ollama, LM Studio, and OpenAI-compatible APIs are thoroughly configured with appropriate timeouts, concurrency limits, API paths, and detection heuristics. This ensures excellent out-of-the-box functionality for the most common inference platforms.

225-245: Thread-safe profile access methods.

The GetProfile and GetAllProfiles methods properly implement thread-safe access patterns. The GetAllProfiles method correctly returns a copy of the profiles map to prevent external mutation, which is an excellent defensive programming practice.

internal/core/ports/stats.go (3)

15-20: Interface extension looks good

The new model-specific tracking methods follow consistent naming patterns and provide a clear API for model statistics collection.

48-66: Well-structured model statistics types

The ModelStats struct is comprehensive with all necessary metrics for tracking model performance and routing effectiveness. The JSON tags are consistent and the field names are descriptive.

69-77: Clear endpoint-model statistics structure

The EndpointModelStats struct effectively captures the relationship between models and endpoints with appropriate metrics for monitoring performance per endpoint.
internal/adapter/stats/percentile_tracker.go (1)

137-142: Potential issue with negative latency values

The min field is initialised to -1 as a sentinel value, but this could cause issues if legitimate latency values are 0.
 	if st.min == -1 || value < st.min {
 		st.min = value
 	}
Consider using a boolean flag to track if the tracker has been initialised:
 type SimpleStatsTracker struct {
 	count   int64
 	sum     int64
 	min     int64
 	max     int64
 	sumOfSq int64 // For standard deviation calculation if needed
 	mu      sync.Mutex
+	initialized bool
 }
Likely an incorrect or invalid review comment.
internal/adapter/registry/unified_memory_registry_test.go (2)

13-52: Well-structured mock implementation

The mock endpoint repository correctly implements the interface and provides good test isolation.

282-295: Context cancellation tests are concise and effective

Good coverage of context cancellation scenarios for both main methods.

internal/adapter/registry/profile/factory.go (3)

23-35: Good refactoring with explicit configuration

The change to require an explicit profiles directory makes the configuration more testable and explicit. The error handling is appropriate.

47-52: OpenAI-compatible fallback is a sensible default

Using OpenAI-compatible profile as a fallback for unknown platform types is a good approach for compatibility.

84-89: Hot-reload capability is valuable

The ability to reload profiles without restarting is excellent for operational flexibility.

internal/adapter/stats/model_collector.go (4)

72-75: Early return for empty model name is good defensive programming

Protecting against empty model names prevents polluting statistics with invalid data.

291-308: Cleanup synchronisation looks correct

The double-check locking pattern for cleanup is properly implemented to avoid race conditions.

336-375: Client IP cleanup logic is thorough

The two-phase cleanup (by age and by count) ensures memory bounds while preserving recent data. The sorting approach for removing oldest IPs is appropriate.

377-402: Model pruning maintains configured limits

The pruning logic correctly maintains the maximum number of tracked models by removing the least recently used ones.

internal/app/handlers/handler_proxy_capability_test.go (2)

24-48: LGTM! Well-structured test endpoints

The endpoint setup is clear and follows a consistent pattern. Each endpoint has distinct characteristics (Type field) that will be useful for filtering tests.

195-201: Good defensive programming for nil capabilities

Excellent test case for nil capabilities. This ensures the filtering logic doesn't panic when capabilities are not specified.

internal/core/domain/inference_profile.go (2)

5-8: Excellent interface design with clear migration path

Great approach to extend PlatformProfile while maintaining backward compatibility. The comment clearly explains the rationale for embedding rather than replacing.

14-48: Well-documented methods with practical context

The inline documentation is exceptional - each method includes real-world scenarios that explain why it exists. This will greatly help future implementers understand the design decisions.

internal/adapter/registry/profile/configurable_profile.go (2)

10-24: Good abstraction for YAML-based profiles

The struct design cleanly separates configuration from parsing logic. The constructor properly initializes the model parser based on the response format.

45-50: URL normalisation prevents common configuration errors

Good defensive programming to handle trailing slashes. This prevents the common "double slash" issue in URLs.

internal/adapter/stats/collector.go (3)

49-66: Excellent concurrency optimisation with xsync.Counter

Great choice to migrate from atomic int64 to xsync.Counter for better performance under high contention. The decision to keep certain fields as atomic int64 for CAS operations (activeConnections, minLatency, maxLatency) is correct.

86-109: Clean constructor pattern with config support

Good design to have a config-aware constructor while maintaining backward compatibility through delegation.

385-401: Well-structured model statistics integration

The model-specific methods cleanly delegate to the modelCollector while maintaining the existing request recording. This provides good separation of concerns.

internal/adapter/registry/profile/lmstudio.go (4)

3-9: Excellent documentation of LM Studio's unique features

Clear explanation of why LM Studio's beta API is valuable for routing decisions. The enhanced metadata about quantization and memory state is indeed useful.

195-203: Critical concurrency limit properly documented

Excellent that the single-threaded nature of LM Studio is clearly documented and enforced. This is crucial for preventing request failures.

315-334: Model size detection is comprehensive

Good coverage of common model size patterns. The logic handles various naming conventions well.

336-345: Resource estimation uses reasonable multipliers

The memory estimation factors (0.6x for min, 0.75x for recommended) are sensible defaults accounting for quantization. Setting RequiresGPU to false is appropriate since LM Studio can run CPU-only.

internal/adapter/registry/profile/openai_compatible.go (2)

5-5: Import looks good.

The addition of the strings package is appropriate for the string manipulation operations in the new methods.

39-39: Helpful comment addition.

The comment clarifies the scope of supported endpoints, which aids maintainability.

internal/adapter/registry/profile/ollama.go (2)

5-5: Import addition approved.

The strings package is necessary for the string operations in the new methods.

49-82: Good architectural decision documented.

The comments clearly explain why model management endpoints are excluded, which helps future maintainers understand the design choice.
internal/adapter/registry/profile/parsers.go (3)

12-12: Good choice of JSON library configuration.

Using jsoniter with standard library compatibility provides performance benefits while maintaining compatibility.

18-30: Parser factory implementation looks good.

The factory pattern with a sensible default (OpenAI format) ensures compatibility with unknown formats.

213-219: Potential nil pointer dereference.

The code checks if model.Created != nil && *model.Created > 0 but then only checks model.Created != nil again inside the if block. The OwnedBy field is checked but never used.
-		// openai is stingy with metadata
-		if (model.Created != nil && *model.Created > 0) || model.OwnedBy != nil {
-			details := &domain.ModelDetails{}
-
-			if model.Created != nil && *model.Created > 0 {
-				createdTime := time.Unix(*model.Created, 0)
-				details.ModifiedAt = &createdTime
-			}
-
-			modelInfo.Details = details
-		}
+		// openai is stingy with metadata
+		if model.Created != nil && *model.Created > 0 {
+			createdTime := time.Unix(*model.Created, 0)
+			modelInfo.Details = &domain.ModelDetails{
+				ModifiedAt: &createdTime,
+			}
+		}
Likely an incorrect or invalid review comment.
internal/adapter/inspector/body_inspector.go (1)

83-83: Good body restoration implementation.

The use of io.MultiReader to combine read and unread content ensures the request body remains available for downstream handlers.

internal/app/handlers/handler_proxy.go (4)

16-28: Well-structured request state encapsulation.

The proxyRequest struct effectively encapsulates all request-related state, improving code organisation and reducing parameter passing.

30-52: Excellent refactoring of proxy handler.

The decomposition into focused methods with clear responsibilities greatly improves readability and testability.

196-198: Comprehensive three-stage filtering documentation.

The comment clearly explains the filtering pipeline, making the complex logic easier to understand.

384-412: Excellent capability filtering implementation.

The method handles edge cases well with appropriate logging and fallback behaviour. The distinction between "no capability support" and "no matches" is handled correctly.

internal/app/handlers/handler_stats_models.go (8)

1-15: LGTM! Clean package structure and appropriate imports.

The imports are well-organised and all appear necessary for the handler's functionality.

17-23: LGTM! Safe conversion utility function.

The function correctly handles the edge case of negative values when converting to uint64 for byte formatting.

25-78: LGTM! Well-designed data structures.

The structs are properly organised with clear JSON tags and a good mix of formatted display strings and raw numerical data. The accumulator pattern is implemented correctly.

109-143: LGTM! Solid implementation of statistics aggregation.

The function correctly implements the accumulator pattern with proper initialisation and handles edge cases well, such as excluding zero-request models from least popular tracking.

145-176: LGTM! Proper data conversion with safety checks.

The function correctly handles division by zero for success rate calculation and uses formatting utilities appropriately for consistent display values.

178-198: LGTM! Clean endpoint breakdown implementation.

The function properly handles the case where a model doesn't exist in the endpoint stats and correctly formats the breakdown data.

200-226: LGTM! Comprehensive summary calculation.

The function correctly calculates active models using a one-hour window and properly handles division by zero for the overall success rate calculation.

228-279: LGTM! Complex aggregation logic implemented correctly.

The function properly aggregates endpoint statistics across models using weighted averages for latency and success rates. The floating-point calculations are appropriate for this statistical use case.

.coderabbit.yaml

internal/adapter/inspector/body_inspector.go

coderabbitai · 2025-07-26T05:31:44Z

internal/adapter/inspector/body_inspector.go

+	caps := &domain.ModelCapabilities{
+		// Default capabilities - most models support these
+		ChatCompletion:   true,
+		TextGeneration:   true,
+		StreamingSupport: true,
+	}


🛠️ Refactor suggestion

Default capabilities assumption might be incorrect.

Setting ChatCompletion and TextGeneration to true by default might not be appropriate for all requests (e.g., pure embedding requests).

Consider inferring the default capabilities based on the endpoint path or request structure rather than assuming them.

🤖 Prompt for AI Agents

In internal/adapter/inspector/body_inspector.go around lines 164 to 169, the code sets default model capabilities ChatCompletion and TextGeneration to true, which may be incorrect for all request types such as embedding-only requests. Modify the logic to infer these default capabilities dynamically based on the endpoint path or the structure of the incoming request instead of hardcoding them as true. This will ensure capabilities accurately reflect the actual request context.

coderabbitai · 2025-07-26T05:31:44Z

internal/adapter/inspector/chain_benchmark_test.go

+	largeMessages := make([]string, 10)
+	for i := range largeMessages {
+		largeMessages[i] = `{"role": "user", "content": "This is message ` + string(rune(i+'0')) + ` in a multi-turn conversation"}`
+	}
+	largeBody := `{"model": "claude-3-opus", "messages": [` + bytes.NewBufferString(largeMessages[0]).String()
+	for i := 1; i < len(largeMessages); i++ {
+		largeBody += "," + largeMessages[i]
+	}
+	largeBody += `]}`


⚠️ Potential issue

Fix string conversion and inefficient concatenation.

Same issues as in the integration test file.

Apply this fix:

-largeMessages := make([]string, 10) -for i := range largeMessages { - largeMessages[i] = `{"role": "user", "content": "This is message ` + string(rune(i+'0')) + ` in a multi-turn conversation"}` -} -largeBody := `{"model": "claude-3-opus", "messages": [` + bytes.NewBufferString(largeMessages[0]).String() -for i := 1; i < len(largeMessages); i++ { - largeBody += "," + largeMessages[i] -} +largeMessages := make([]string, 10) +for i := range largeMessages { + largeMessages[i] = fmt.Sprintf(`{"role": "user", "content": "This is message %d in a multi-turn conversation"}`, i) +} +var largeBodyBuilder strings.Builder +largeBodyBuilder.WriteString(`{"model": "claude-3-opus", "messages": [`) +largeBodyBuilder.WriteString(strings.Join(largeMessages, ",")) +largeBodyBuilder.WriteString(`]}`) +largeBody := largeBodyBuilder.String()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In internal/adapter/inspector/chain_benchmark_test.go around lines 98 to 106, the code incorrectly converts integers to strings using string(rune(i+'0')) and inefficiently concatenates strings with += in a loop. Replace string(rune(i+'0')) with strconv.Itoa(i) for correct integer to string conversion. Use a strings.Builder to efficiently build the largeBody string by writing the initial part, then appending each message with commas, and finally closing the JSON array and object.

coderabbitai · 2025-07-26T05:31:44Z

internal/adapter/proxy/proxy_olla.go

+	bufPool, err := pool.NewLitePool(func() *[]byte {
 		// we use the configured buffer size to balance memory usage vs syscall frequency
 		buf := make([]byte, configuration.StreamBufferSize)
 		return &buf
 	})
+	if err != nil {
+		logger.Error("Failed to create buffer pool", "error", err)
+		return nil
+	}
+	service.bufferPool = bufPool

-	service.requestPool = pool.NewLitePool(func() *requestContext {
+	reqPool, err := pool.NewLitePool(func() *requestContext {
 		// pre-allocate request context objects to avoid heap allocations in hot path
 		return &requestContext{}
 	})
+	if err != nil {
+		logger.Error("Failed to create request pool", "error", err)
+		return nil
+	}
+	service.requestPool = reqPool

-	service.responsePool = pool.NewLitePool(func() []byte {
+	respPool, err := pool.NewLitePool(func() []byte {
 		// 4kb is a good starting size for most response headers
 		return make([]byte, 4096)
 	})
+	if err != nil {
+		logger.Error("Failed to create response pool", "error", err)
+		return nil
+	}
+	service.responsePool = respPool

-	service.errorPool = pool.NewLitePool(func() *errorContext {
+	errPool, err := pool.NewLitePool(func() *errorContext {
 		// pooling error contexts helps maintain performance even during error cases
 		return &errorContext{}
 	})
+	if err != nil {
+		logger.Error("Failed to create error pool", "error", err)
+		return nil
+	}
+	service.errorPool = errPool


🛠️ Refactor suggestion

Consider returning an error instead of nil.

While the error logging is good, returning nil from a constructor can lead to nil pointer panics if callers don't check the return value.

Consider modifying the function signature to return an error:

-func NewOllaService( +func NewOllaService( discoveryService ports.DiscoveryService, selector domain.EndpointSelector, configuration *OllaConfiguration, statsCollector ports.StatsCollector, logger logger.StyledLogger, -) *OllaProxyService { +) (*OllaProxyService, error) {

Then return the error instead of nil:

if err != nil { logger.Error("Failed to create buffer pool", "error", err) - return nil + return nil, fmt.Errorf("failed to create buffer pool: %w", err) }

This makes error handling more explicit and follows Go best practices.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bufPool, err := pool.NewLitePool(func() *[]byte {

// we use the configured buffer size to balance memory usage vs syscall frequency

buf := make([]byte, configuration.StreamBufferSize)

return &buf

})

if err != nil {

logger.Error("Failed to create buffer pool", "error", err)

return nil

}

service.bufferPool = bufPool

service.requestPool = pool.NewLitePool(func() *requestContext {

reqPool, err := pool.NewLitePool(func() *requestContext {

// pre-allocate request context objects to avoid heap allocations in hot path

return &requestContext{}

})

if err != nil {

logger.Error("Failed to create request pool", "error", err)

return nil

}

service.requestPool = reqPool

service.responsePool = pool.NewLitePool(func() []byte {

respPool, err := pool.NewLitePool(func() []byte {

// 4kb is a good starting size for most response headers

return make([]byte, 4096)

})

if err != nil {

logger.Error("Failed to create response pool", "error", err)

return nil

}

service.responsePool = respPool

service.errorPool = pool.NewLitePool(func() *errorContext {

errPool, err := pool.NewLitePool(func() *errorContext {

// pooling error contexts helps maintain performance even during error cases

return &errorContext{}

})

if err != nil {

logger.Error("Failed to create error pool", "error", err)

return nil

}

service.errorPool = errPool

// add fmt to your imports if it’s not already there:

// import "fmt"

func NewOllaService(

discoveryService ports.DiscoveryService,

selector domain.EndpointSelector,

configuration *OllaConfiguration,

statsCollector ports.StatsCollector,

logger logger.StyledLogger,

) (*OllaProxyService, error) {

bufPool, err := pool.NewLitePool(func() *[]byte {

// we use the configured buffer size to balance memory usage vs syscall frequency

buf := make([]byte, configuration.StreamBufferSize)

return &buf

})

if err != nil {

logger.Error("Failed to create buffer pool", "error", err)

return nil, fmt.Errorf("failed to create buffer pool: %w", err)

}

service.bufferPool = bufPool

// (repeat the same pattern for reqPool, respPool, errPool…)

// …

}

🤖 Prompt for AI Agents

In internal/adapter/proxy/proxy_olla.go around lines 226 to 265, the constructor function currently returns nil on error after logging it, which can cause nil pointer dereferences if callers do not check the return value. Modify the function signature to return both the constructed service and an error. Replace all return nil statements with returning nil and the corresponding error, making error handling explicit and following Go best practices.

coderabbitai · 2025-07-26T05:31:45Z

internal/adapter/stats/percentile_tracker.go

+	avg := st.sum / st.count
+
+	// Simple approximation: use average for p50, interpolate for p95/p99
+	p50 = avg
+	p95 = avg + (st.max-avg)*50/100
+	p99 = avg + (st.max-avg)*90/100
+
+	return p50, p95, p99


🛠️ Refactor suggestion

Percentile approximation may be misleading

The simple approximation for p95 and p99 using linear interpolation between average and max could be very inaccurate, especially with skewed distributions common in latency measurements.

Consider documenting this limitation clearly or using a more sophisticated approximation:

// GetPercentiles returns approximated percentiles based on min/max/average // This is less accurate but uses minimal memory +// WARNING: These approximations can be significantly inaccurate for skewed distributions func (st *SimpleStatsTracker) GetPercentiles() (p50, p95, p99 int64) {

🤖 Prompt for AI Agents

In internal/adapter/stats/percentile_tracker.go around lines 155 to 162, the current percentile calculation uses a simple linear interpolation between average and max for p95 and p99, which can be inaccurate for skewed data. To fix this, add clear comments documenting this limitation of the approximation, and consider implementing or integrating a more accurate percentile calculation method that better handles skewed distributions, such as using a histogram or a more advanced statistical approach.

coderabbitai · 2025-07-26T05:31:45Z

internal/app/handlers/handler_proxy.go

+	if pr.model != "" {
+		ctx = context.WithValue(ctx, "model", pr.model)
+		r = r.WithContext(ctx)
+	}


🛠️ Refactor suggestion

Consider using a typed context key.

Using a string literal as a context key could lead to collisions. Consider using a typed key.

+type contextKey string +const modelContextKey contextKey = "model" + func (a *Application) executeProxyRequest(ctx context.Context, w http.ResponseWriter, r *http.Request, endpoints []*domain.Endpoint, pr *proxyRequest) error { if pr.model != "" { - ctx = context.WithValue(ctx, "model", pr.model) + ctx = context.WithValue(ctx, modelContextKey, pr.model) r = r.WithContext(ctx) }

🤖 Prompt for AI Agents

In internal/app/handlers/handler_proxy.go around lines 114 to 117, the context key "model" is currently a string literal, which risks collisions with other context keys. Define a new unexported type for the context key and use a constant of that type as the key when calling context.WithValue. Replace the string key with this typed key to ensure type safety and avoid collisions.

coderabbitai · 2025-07-26T05:31:45Z

internal/app/handlers/handler_stats_models.go

+func (a *Application) modelStatsHandler(w http.ResponseWriter, r *http.Request) {
+	statsCollector := a.statsCollector
+	if statsCollector == nil {
+		http.Error(w, "Stats collector not initialized", http.StatusServiceUnavailable)
+		return
+	}
+
+	modelStats := statsCollector.GetModelStats()
+	modelEndpointStats := statsCollector.GetModelEndpointStats()
+
+	includeEndpoints := r.URL.Query().Get("include_endpoints") == queryValueTrue
+	includeSummary := r.URL.Query().Get("include_summary") == queryValueTrue
+
+	models, accumulator := a.buildModelStats(modelStats, modelEndpointStats, includeEndpoints)
+	summary := a.buildSummary(models, modelStats, accumulator)
+	endpoints := a.buildEndpointSummary(modelEndpointStats, includeSummary)
+
+	response := ModelStatsResponse{
+		Timestamp: time.Now(),
+		Models:    models,
+		Endpoints: endpoints,
+		Summary:   summary,
+	}
+
+	w.Header().Set(ContentTypeHeader, ContentTypeJSON)
+	w.WriteHeader(http.StatusOK)
+	json.NewEncoder(w).Encode(response)
+}


🛠️ Refactor suggestion

Add error handling for JSON encoding.

The handler is well-structured, but it's missing error handling for the JSON encoding operation. If json.NewEncoder(w).Encode(response) fails, it could result in a partial response being sent to the client.

- w.Header().Set(ContentTypeHeader, ContentTypeJSON) - w.WriteHeader(http.StatusOK) - json.NewEncoder(w).Encode(response) + w.Header().Set(ContentTypeHeader, ContentTypeJSON) + if err := json.NewEncoder(w).Encode(response); err != nil { + http.Error(w, "Failed to encode response", http.StatusInternalServerError) + return + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (a *Application) modelStatsHandler(w http.ResponseWriter, r *http.Request) {

statsCollector := a.statsCollector

if statsCollector == nil {

http.Error(w, "Stats collector not initialized", http.StatusServiceUnavailable)

return

}

modelStats := statsCollector.GetModelStats()

modelEndpointStats := statsCollector.GetModelEndpointStats()

includeEndpoints := r.URL.Query().Get("include_endpoints") == queryValueTrue

includeSummary := r.URL.Query().Get("include_summary") == queryValueTrue

models, accumulator := a.buildModelStats(modelStats, modelEndpointStats, includeEndpoints)

summary := a.buildSummary(models, modelStats, accumulator)

endpoints := a.buildEndpointSummary(modelEndpointStats, includeSummary)

response := ModelStatsResponse{

Timestamp: time.Now(),

Models: models,

Endpoints: endpoints,

Summary: summary,

}

w.Header().Set(ContentTypeHeader, ContentTypeJSON)

w.WriteHeader(http.StatusOK)

json.NewEncoder(w).Encode(response)

}

func (a *Application) modelStatsHandler(w http.ResponseWriter, r *http.Request) {

statsCollector := a.statsCollector

if statsCollector == nil {

http.Error(w, "Stats collector not initialized", http.StatusServiceUnavailable)

return

}

modelStats := statsCollector.GetModelStats()

modelEndpointStats := statsCollector.GetModelEndpointStats()

includeEndpoints := r.URL.Query().Get("include_endpoints") == queryValueTrue

includeSummary := r.URL.Query().Get("include_summary") == queryValueTrue

models, accumulator := a.buildModelStats(modelStats, modelEndpointStats, includeEndpoints)

summary := a.buildSummary(models, modelStats, accumulator)

endpoints := a.buildEndpointSummary(modelEndpointStats, includeSummary)

response := ModelStatsResponse{

Timestamp: time.Now(),

Models: models,

Endpoints: endpoints,

Summary: summary,

}

w.Header().Set(ContentTypeHeader, ContentTypeJSON)

if err := json.NewEncoder(w).Encode(response); err != nil {

http.Error(w, "Failed to encode response", http.StatusInternalServerError)

return

}

}

🤖 Prompt for AI Agents

In internal/app/handlers/handler_stats_models.go around lines 80 to 107, the JSON encoding of the response lacks error handling. Modify the code to capture the error returned by json.NewEncoder(w).Encode(response) and handle it appropriately, such as logging the error and sending an HTTP 500 Internal Server Error response if encoding fails, to prevent partial or malformed responses being sent to the client.

coderabbitai · 2025-07-26T05:31:45Z

internal/app/model_routing_integration_test.go

+			handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+				// Read and parse request body
+				var requestData map[string]interface{}
+				if r.Body != nil {
+					decoder := json.NewDecoder(r.Body)
+					decoder.Decode(&requestData)
+				}
+
+				// Extract model name
+				modelName, _ := requestData["model"].(string)
+
+				// Get healthy endpoints for model
+				var filteredEndpoints []*domain.Endpoint
+				if modelName != "" {
+					healthyForModel, _ := unifiedRegistry.GetHealthyEndpointsForModel(ctx, modelName, &mockEndpointRepository{endpoints: endpoints})
+					if len(healthyForModel) > 0 {
+						// Use model-specific endpoints
+						filteredEndpoints = healthyForModel
+					} else {
+						// Check if model exists at all
+						allEndpointsForModel, _ := unifiedRegistry.GetEndpointsForModel(ctx, modelName)
+						if len(allEndpointsForModel) > 0 {
+							// Model exists but not on healthy endpoints
+							filteredEndpoints = []*domain.Endpoint{}
+						} else {
+							// Model doesn't exist, fallback to all healthy endpoints
+							filteredEndpoints, _ = discovery.GetHealthyEndpoints(ctx)
+						}
+					}
+				} else {
+					// No model specified, use all healthy endpoints
+					filteredEndpoints, _ = discovery.GetHealthyEndpoints(ctx)
+				}
+
+				// Select endpoint
+				selected, _ := selector.Select(ctx, filteredEndpoints)
+
+				if selected == nil {
+					w.WriteHeader(http.StatusServiceUnavailable)
+					w.Write([]byte("No available endpoints"))
+					return
+				}
+
+				// Success - would normally proxy to selected endpoint
+				w.WriteHeader(http.StatusOK)
+				w.Write([]byte(`{"endpoint": "` + selected.Name + `"}`))
+			})


🛠️ Refactor suggestion

Consider extracting the routing logic to avoid duplication.

The handler contains what appears to be a reimplementation of the model routing logic. This could lead to tests passing even if the actual production implementation differs.

Consider using the actual proxy handler or extracting this logic into a testable function that both the test and production code can use. This ensures you're testing the real implementation rather than a test-specific duplicate.

🤖 Prompt for AI Agents

In internal/app/model_routing_integration_test.go around lines 233 to 279, the HTTP handler duplicates the model routing logic instead of reusing the production implementation, risking divergence between test and production behavior. Refactor by extracting the routing logic into a shared function that both the test and production code call, or directly use the actual proxy handler in the test. This ensures the test validates the real routing logic and avoids duplication.

coderabbitai · 2025-07-26T05:31:46Z

internal/core/domain/profile_config.go

+	// PathIndices allows configuring which paths serve specific purposes
+	PathIndices struct {
+		Health          int `yaml:"health"`
+		Models          int `yaml:"models"`
+		Completions     int `yaml:"completions"`
+		ChatCompletions int `yaml:"chat_completions"`
+		Embeddings      int `yaml:"embeddings"`
+	} `yaml:"path_indices"`


🛠️ Refactor suggestion

Verify PathIndices field types and ranges.

All path indices use int type, which should be validated to ensure they reference valid indices in the API.Paths slice.

Consider adding validation to ensure path indices are within valid ranges:

PathIndices struct { - Health int `yaml:"health"` - Models int `yaml:"models"` - Completions int `yaml:"completions"` - ChatCompletions int `yaml:"chat_completions"` - Embeddings int `yaml:"embeddings"` + Health int `yaml:"health" validate:"min=0"` + Models int `yaml:"models" validate:"min=0"` + Completions int `yaml:"completions" validate:"min=0"` + ChatCompletions int `yaml:"chat_completions" validate:"min=0"` + Embeddings int `yaml:"embeddings" validate:"min=0"` } `yaml:"path_indices"`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// PathIndices allows configuring which paths serve specific purposes

PathIndices struct {

Health int `yaml:"health"`

Models int `yaml:"models"`

Completions int `yaml:"completions"`

ChatCompletions int `yaml:"chat_completions"`

Embeddings int `yaml:"embeddings"`

} `yaml:"path_indices"`

// PathIndices allows configuring which paths serve specific purposes

PathIndices struct {

Health int `yaml:"health" validate:"min=0"`

Models int `yaml:"models" validate:"min=0"`

Completions int `yaml:"completions" validate:"min=0"`

ChatCompletions int `yaml:"chat_completions" validate:"min=0"`

Embeddings int `yaml:"embeddings" validate:"min=0"`

} `yaml:"path_indices"`

🤖 Prompt for AI Agents

In internal/core/domain/profile_config.go around lines 46 to 53, the PathIndices struct uses int fields for indices without validation. Add a validation method to check that each index is within the valid range of the API.Paths slice length, ensuring no out-of-bounds access occurs. This method should return an error if any index is negative or exceeds the slice length, and be called after loading or modifying the configuration.

update profiles adds back window detection and dynamic latency calcs. fix tests :( update routing script to show progress and ignore broken attempts at embedded etc

update readme

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

.coderabbit.yaml (1)

1-16: Fix YAML formatting issues to comply with yamllint standards.

The static analysis tool has identified formatting issues with line endings. Please ensure the file uses Unix-style line endings (\n).

internal/core/domain/profile_config.go (1)

57-64: Add validation for PathIndices to prevent out-of-bounds access.

The PathIndices fields reference positions in the API.Paths slice but lack validation to ensure they're within valid bounds. This could lead to runtime panics.

internal/adapter/inspector/body_inspector.go (1)

164-169: Consider inferring default capabilities based on request context.

Setting ChatCompletion and TextGeneration to true by default may not be appropriate for all request types (e.g., embedding-only requests).

🧹 Nitpick comments (17)

config/config.yaml (2)
76-76: Add newline at end of file to comply with YAML standards.

The file is missing a newline character at the end, which violates YAML formatting standards as identified by yamllint.
engineering:
- show_nerdstats: false
+ show_nerdstats: false
+
51-58: Consider removing or updating the specific IP address in default config.

The endpoint configuration at 192.168.0.1:11434 appears to be environment-specific and may not be universally applicable for users deploying Olla. Consider either removing this endpoint from the default config or using a more generic placeholder like 192.168.1.100 with appropriate documentation.
-      - url: "http://192.168.0.1:11434"
-        name: "neo-llama"
+      # - url: "http://192.168.1.100:11434"  # Example remote endpoint
+      #   name: "remote-ollama"
         type: "ollama"
         priority: 100
         model_url: "/api/tags"
         health_check_url: "/"
         check_interval: 5s
         check_timeout: 2s
internal/adapter/registry/profile/configurable_profile_test.go (1)
206-244: Enhance test to demonstrate "first match" behavior

The comment mentions testing that "only the first matching quantization is applied", but the test doesn't demonstrate this behavior. Consider adding a test case with a model name containing multiple quantization patterns.
 	// Test that only the first matching quantization is applied
 	result := profile.GetResourceRequirements("llama-13b-q5_K_M", nil)
 	assert.Equal(t, 6.25, result.MinMemoryGB) // 10 * 0.625
+
+	// Test model with multiple quantization patterns - should use first match
+	result2 := profile.GetResourceRequirements("llama-13b-q4_q5_K_M", nil)
+	assert.Equal(t, 5.0, result2.MinMemoryGB) // 10 * 0.5 (q4 matches first)
docker-compose.yaml (1)

1-42: Fix YAML formatting issues

The Docker Compose configuration is well-structured, but there are formatting issues that should be addressed for consistency.

Add a newline character at the end of the file after line 42 to comply with YAML standards.

config/profiles/lmstudio.yaml (1)

1-129: Fix YAML formatting issues

The LM Studio profile configuration is comprehensive and well-structured, but has formatting issues that should be addressed.

Remove trailing spaces from lines 70, 112, 120, and 125, and add a newline at the end of the file to comply with YAML standards.
config/profiles/openai.yaml (1)
1-97: Fix YAML formatting issues

The static analysis has identified several formatting issues that should be addressed:

Wrong newline character on line 1 (should use \n)

Trailing spaces on lines 88 and 93

Missing newline character at the end of file
-  
+
   # Cloud services handle their own concurrency
   concurrency_limits:
     - min_memory_gb: 0
       max_concurrent: 20  # Reasonable default for cloud APIs
-  
+
   # No load time buffer needed for cloud services
   timeout_scaling:
     base_timeout_seconds: 120  # 2 minutes
-    load_time_buffer: false
+    load_time_buffer: false
+
config/profiles/ollama.yaml (1)

1-169: Fix YAML formatting issues

Multiple formatting issues need to be addressed:

Wrong newline character on line 1

Trailing spaces on lines 13, 30, 78, 138, 146, 154, and 165

Missing newline at end of file

Clean up all trailing spaces and ensure the file ends with a newline character for better compatibility with YAML parsers and version control systems.
internal/adapter/registry/profile/loader.go (2)
56-61: Consider collecting profile loading errors

While continuing after individual profile failures is reasonable, consider collecting these errors for better observability:
+var loadErrors []error

 profile, err := l.loadProfile(path)
 if err != nil {
-    // don't fail everything because of one bad yaml file
-    fmt.Printf("failed to load profile %s: %v\n", path, err)
+    loadErrors = append(loadErrors, fmt.Errorf("failed to load profile %s: %w", path, err))
     return nil
 }
Then return or log the collected errors at the end of the method.

107-289: Consider reducing duplication between built-in profiles and YAML files

The built-in profiles duplicate the configuration from the YAML files. This could lead to maintenance issues if they diverge.

Consider either:

Embedding the YAML files as resources and parsing them at runtime

Generating the built-in profiles from the YAML files at build time

Adding tests to ensure built-in profiles match their YAML counterparts
test/scripts/logic/test-model-routing.sh (2)
24-24: Remove unused BLUE variable

The BLUE colour variable is defined but never used in the script.
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
-BLUE='\033[0;34m'
 PURPLE='\033[0;35m'
 CYAN='\033[0;36m'
115-115: Separate variable declaration from assignment

To avoid masking return values, declare and assign variables separately:
-local model_data=$(echo "$response" | sed '$d')
+local model_data
+model_data=$(echo "$response" | sed '$d')
-local curl_output=$(curl -s -w "\n%{http_code}\n%{time_total}" \
+local curl_output
+curl_output=$(curl -s -w "\n%{http_code}\n%{time_total}" \
-local SCRIPT_END_TIME=$(date +%s)
+local SCRIPT_END_TIME
+SCRIPT_END_TIME=$(date +%s)
Also applies to: 160-160, 385-385
internal/core/domain/profile_config.go (1)
89-93: Use time.Duration for BaseTimeoutSeconds field.

The BaseTimeoutSeconds int field is converted to time.Duration during use, which could overflow with large values. Using time.Duration directly would be safer and clearer.
type TimeoutScaling struct {
-    BaseTimeoutSeconds int  `yaml:"base_timeout_seconds"`
+    BaseTimeout        time.Duration `yaml:"base_timeout"`
     LoadTimeBuffer     bool `yaml:"load_time_buffer"`
}
This would also make the YAML more readable:
timeout_scaling:
  base_timeout: 30s  # Instead of base_timeout_seconds: 30
readme.md (1)
503-513: Add language specifiers to fenced code blocks.

The directory structure code blocks lack language specifiers, which triggers markdown linting warnings.
-```
+```text
config/
├── config.yaml              # Main configuration (shipped)
...
This applies to both directory structure blocks at lines 503 and 627.

Also applies to: 627-662
internal/adapter/inspector/body_inspector.go (2)
309-318: Code keyword detection may produce false positives.

The current keyword list includes common words that appear in non-programming contexts (e.g., "function" in mathematics, "class" in education). This could incorrectly flag requests as requiring code generation.

Consider making the keywords configurable or using more specific patterns:
// More specific code-related patterns
codePatterns := []string{
    "write code", "generate code", "implement function",
    "create class", "debug the", "syntax error",
    "```", // Code blocks
}
Alternatively, expose this as a configuration option in the profile system.

52-108: Consider tracking inspection failures for monitoring.

While the defensive error handling is appropriate, systematic inspection failures could go unnoticed with only debug logging.

Consider adding metrics to track inspection failures:
if _, err := io.Copy(buffer, limitedReader); err != nil {
    bi.logger.Debug("Failed to read request body", "error", err)
    // Increment a metric counter for monitoring
    // bi.metrics.IncrementInspectionFailures("body_read_error")
    return nil
}
This would help identify issues like consistently malformed requests or I/O problems.
install.sh (1)
263-266: Consider adding curl security options

When downloading from external sources, it's good practice to add additional security options to curl to prevent potential attacks.

Add timeout and max-redirs options:
    if [[ "$show_progress" == true ]]; then
-       curl_opts="-L"
+       curl_opts="-L --max-time 300 --max-redirs 5"
    else
-       curl_opts="-sL"
+       curl_opts="-sL --max-time 300 --max-redirs 5"
    fi
internal/adapter/registry/profile/parsers.go (1)
65-67: Simplify model details creation condition

The condition for creating model details is redundant since createOllamaModelDetails already handles nil checks internally.

Simplify by always calling the function:
-       if ollamaModel.Details != nil || ollamaModel.Digest != nil || ollamaModel.ModifiedAt != nil {
-           modelInfo.Details = createOllamaModelDetails(ollamaModel)
-       }
+       modelInfo.Details = createOllamaModelDetails(ollamaModel)
Then update createOllamaModelDetails to return nil if no details are present:
 func createOllamaModelDetails(ollamaModel OllamaModel) *domain.ModelDetails {
+    // Check if any details are present
+    if ollamaModel.Details == nil && ollamaModel.Digest == nil && ollamaModel.ModifiedAt == nil {
+        return nil
+    }
     details := &domain.ModelDetails{}

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5b2425 and 50c9983.

📒 Files selected for processing (28)

.coderabbit.yaml (1 hunks)
.gitignore (1 hunks)
.goreleaser.yml (3 hunks)
config/config.yaml (1 hunks)
config/profiles/README.md (1 hunks)
config/profiles/lmstudio.yaml (1 hunks)
config/profiles/ollama.yaml (1 hunks)
config/profiles/openai.yaml (1 hunks)
docker-compose.yaml (1 hunks)
install.sh (2 hunks)
internal/adapter/inspector/body_inspector.go (1 hunks)
internal/adapter/registry/profile/configurable_profile.go (1 hunks)
internal/adapter/registry/profile/configurable_profile_extended_test.go (1 hunks)
internal/adapter/registry/profile/configurable_profile_test.go (1 hunks)
internal/adapter/registry/profile/inference_profile_test.go (1 hunks)
internal/adapter/registry/profile/lmstudio.go (1 hunks)
internal/adapter/registry/profile/loader.go (1 hunks)
internal/adapter/registry/profile/ollama.go (2 hunks)
internal/adapter/registry/profile/openai_compatible.go (1 hunks)
internal/adapter/registry/profile/parsers.go (1 hunks)
internal/adapter/registry/profile/profile_test.go (18 hunks)
internal/config/config.go (2 hunks)
internal/config/types.go (1 hunks)
internal/core/domain/profile_config.go (1 hunks)
main.go (1 hunks)
makefile (4 hunks)
readme.md (8 hunks)
test/scripts/logic/test-model-routing.sh (1 hunks)

✅ Files skipped from review due to trivial changes (3)

main.go
.gitignore
config/profiles/README.md

🚧 Files skipped from review as they are similar to previous changes (2)

.goreleaser.yml
internal/adapter/registry/profile/inference_profile_test.go

🧰 Additional context used

🧬 Code Graph Analysis (1)

internal/adapter/registry/profile/parsers.go (5)

internal/core/domain/model.go (2)

ModelInfo (26-33)

ModelDetails (11-24)

internal/adapter/registry/profile/ollama.go (2)

OllamaResponse (4-6)

OllamaModel (9-16)

internal/util/parsing.go (1)

ParseTime (8-18)

internal/adapter/registry/profile/lmstudio.go (1)

LMStudioResponse (7-10)

internal/adapter/registry/profile/openai_compatible.go (1)

OpenAICompatibleResponse (4-7)

🪛 markdownlint-cli2 (0.17.2)

readme.md

503-503: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

627-627: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 YAMLlint (1.37.1)

.coderabbit.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

config/config.yaml

[error] 76-76: no new line character at the end of file

(new-line-at-end-of-file)

config/profiles/lmstudio.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 70-70: trailing spaces

(trailing-spaces)

[error] 112-112: trailing spaces

(trailing-spaces)

[error] 120-120: trailing spaces

(trailing-spaces)

[error] 125-125: trailing spaces

(trailing-spaces)

[error] 129-129: no new line character at the end of file

(new-line-at-end-of-file)

config/profiles/ollama.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 13-13: trailing spaces

(trailing-spaces)

[error] 30-30: trailing spaces

(trailing-spaces)

[error] 78-78: trailing spaces

(trailing-spaces)

[error] 138-138: trailing spaces

(trailing-spaces)

[error] 146-146: trailing spaces

(trailing-spaces)

[error] 154-154: trailing spaces

(trailing-spaces)

[error] 165-165: trailing spaces

(trailing-spaces)

[error] 169-169: no new line character at the end of file

(new-line-at-end-of-file)

config/profiles/openai.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 88-88: trailing spaces

(trailing-spaces)

[error] 93-93: trailing spaces

(trailing-spaces)

[error] 97-97: no new line character at the end of file

(new-line-at-end-of-file)

docker-compose.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

[error] 42-42: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Shellcheck (0.10.0)

test/scripts/logic/test-model-routing.sh

[warning] 24-24: BLUE appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 115-115: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 160-160: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 385-385: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (20)

internal/config/types.go (1)

11-11: LGTM! Clean addition of runtime metadata field.

The Filename field is properly tagged with yaml:"-" to exclude it from serialization, which is correct since this is runtime metadata rather than configuration data. This enhances transparency by tracking which config file was actually loaded.

internal/config/config.go (2)

116-116: Excellent addition of local config override support.

Prepending "config/config.local.yaml" to the config paths enables local configuration overrides, which is a common and useful pattern for development environments.

122-122: Clean implementation of config filename tracking.

The addition of configFilename variable and its assignment to config.Filename provides valuable transparency about which configuration file was actually loaded at runtime. The implementation preserves existing control flow whilst adding this useful metadata.

Also applies to: 130-130, 143-143

internal/adapter/registry/profile/profile_test.go (2)

12-21: Excellent addition of centralised test helper function.

The getTestProfile helper function properly uses t.Helper() and provides consistent error handling across all tests. This centralised approach improves maintainability and aligns well with the new factory-based profile loading architecture.

24-28: Clean refactoring to use factory-based profile creation.

The migration from direct profile instantiation to factory-based creation with proper error handling maintains all existing test logic whilst supporting the new YAML-driven profile system. The consistent pattern across all test functions improves code maintainability.

Also applies to: 67-71, 103-107

config/config.yaml (1)

1-76: Well-structured default configuration with comprehensive coverage.

The configuration file provides sensible defaults across all system components including server settings, proxy configuration, service discovery, and logging. The structure is clear and well-organised, supporting the new YAML-driven configuration approach effectively.

internal/adapter/registry/profile/configurable_profile_test.go (1)

10-204: Well-structured comprehensive test coverage!

The table-driven tests thoroughly cover all key scenarios including quantization multipliers, model patterns, defaults, and edge cases. The test structure is clear and maintainable.

internal/adapter/registry/profile/openai_compatible.go (1)

3-15: Clean refactoring to data structures

Good separation of concerns by extracting the data structures and moving the implementation logic to the configurable profile system. The updated comments are clearer.

internal/adapter/registry/profile/lmstudio.go (1)

3-23: Excellent documentation of LM Studio's enhanced metadata

The comments clearly explain the advantages of LM Studio's beta API for intelligent routing decisions. The data structures are well-defined with appropriate use of optional fields.

config/profiles/lmstudio.yaml (1)

78-129: Well-designed resource management configuration

The resource requirements are thoughtfully configured with consistent multipliers (0.6x for minimum, 0.75x for recommended) and the single concurrency limit correctly reflects LM Studio's architecture. The 1-second load time for preloaded models is appropriate.

config/profiles/openai.yaml (1)

1-97: Well-structured OpenAI profile configuration

The profile configuration is comprehensive and appropriately designed for cloud-based OpenAI-compatible services:

Correct resource allocation (0 GB) for cloud services

Reasonable timeout settings (2 minutes)

Comprehensive model capability patterns

Well-defined context window patterns for various GPT models

config/profiles/ollama.yaml (1)

99-169: Excellent resource management configuration

The resource requirements and scaling parameters are well-thought-out:

Accurate memory requirements for different model sizes

Correct quantization multipliers (e.g., q4 = 0.5 for 4-bit quantization)

Sensible concurrency limits that decrease with model size

Appropriate timeout scaling with load time buffer for local models

internal/adapter/registry/profile/ollama.go (1)

3-27: Clean data structure definitions

The remaining data structures are well-designed for parsing Ollama API responses:

Appropriate use of pointers for optional fields

Required Name field is correctly non-pointer

Clear JSON tags for serialisation

internal/adapter/registry/profile/loader.go (1)

292-311: Excellent concurrent access implementation

The methods properly handle concurrent access:

Appropriate use of RLock for read operations

GetAllProfiles correctly returns a defensive copy
test/scripts/logic/test-model-routing.sh (2)

140-304: Excellent error handling and response header processing

The test_model_routing function demonstrates:

Comprehensive error handling for various HTTP status codes

Detailed extraction and display of X-Olla-* headers

Proper cleanup of temporary files

Informative error messages for debugging

367-369: Review llama3 function calling exclusion

In test/scripts/logic/test-model-routing.sh (lines 367–369), we only exclude the llama3:latest alias:
if [[ "$model" =~ (gpt-4|claude|mistral|llama3\.|llama3\.2|llama3\.3|magistral) ]] \
   && [[ ! "$model" =~ ^llama3:latest$ ]]; then
  test_model_routing …
fi
However:

The include pattern llama3\. will match any sub-version (llama3.1, .2, .3), but the base alias (e.g. llama3:8b) isn’t matched at all.

We only rule out llama3:latest, yet other llama3 variants may also lack function-calling support.

Future llama3 releases without function-calling could slip through or be overlooked.

Please confirm which llama3 variants actually support function calling and adjust the regex accordingly. For example:

Explicitly list supported sub-versions in the include pattern (e.g. llama3\.[123]).

Broaden the exclusion to catch any unsupported aliases, e.g. ^llama3:(latest|8b)$ or ^llama3(:|\.).* against function-calling.

[test/scripts/logic/test-model-routing.sh:367-369]
makefile (1)

100-100: Well-structured build and Docker workflow improvements!

The modularisation of build targets and addition of Docker workflows provides clear separation between local development and release builds. The deprecation of ready-local with clear alternatives is a good practice.

Also applies to: 113-127, 129-151, 185-187

internal/adapter/registry/profile/configurable_profile_extended_test.go (1)

11-284: Excellent comprehensive test coverage!

The test suite thoroughly covers all aspects of the configurable profile system:

Pattern matching for context windows

Memory-based concurrency limits

Timeout scaling with load buffers

Integration of multiple features

Quantization effects

The table-driven approach and edge case handling demonstrate good testing practices.

readme.md (1)

70-86: Excellent documentation improvements!

The README updates provide much clearer guidance for:

Docker usage with custom configurations

Local development setup with proper config management

Comprehensive build and release workflows

Clear separation between shipped and local configurations

The emphasis on local config files being git-ignored is particularly important for security.

Also applies to: 98-107, 497-600

internal/adapter/inspector/body_inspector.go (1)

81-84: Clever body restoration approach!

The use of io.MultiReader to combine already-read bytes with remaining content elegantly preserves the complete request body for downstream handlers.

coderabbitai · 2025-07-26T09:20:02Z

install.sh

+        -v|--version)
+            version="$2"
+            shift 2
+            ;;
+        -d|--dir)
+            install_dir="$2"
+            shift 2
+            ;;


⚠️ Potential issue

Add validation for required arguments

The script doesn't validate that required arguments are provided for options like -v, -d, and --local. This could cause the script to fail with an unclear error if a user runs something like ./install.sh -v without providing a version.

Add validation after each option that requires an argument:

-v|--version) + if [[ -z "$2" ]] || [[ "$2" == -* ]]; then + echo "Error: --version requires an argument" + exit 1 + fi version="$2" shift 2 ;; -d|--dir) + if [[ -z "$2" ]] || [[ "$2" == -* ]]; then + echo "Error: --dir requires an argument" + exit 1 + fi install_dir="$2" shift 2 ;;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-v|--version)

version="$2"

shift 2

;;

-d|--dir)

install_dir="$2"

shift 2

;;

-v|--version)

if [[ -z "$2" ]] || [[ "$2" == -* ]]; then

echo "Error: --version requires an argument"

exit 1

fi

version="$2"

shift 2

;;

-d|--dir)

if [[ -z "$2" ]] || [[ "$2" == -* ]]; then

echo "Error: --dir requires an argument"

exit 1

fi

install_dir="$2"

shift 2

;;

🤖 Prompt for AI Agents

In install.sh around lines 71 to 78, the script assigns values to variables from options like -v and -d without checking if the required argument is provided, which can cause unclear errors. Add validation after each option that requires an argument to check if the next parameter exists and is not another option; if missing, print a clear error message and exit the script to prevent failure later.

coderabbitai · 2025-07-26T09:20:02Z

internal/adapter/registry/profile/configurable_profile.go

+	// Use defaults if no pattern matched
+	if baseReqs == nil {
+		baseReqs = &p.config.Resources.Defaults
+	}
+
+	// Apply quantization multipliers if configured
+	if p.config.Resources.Quantization.Multipliers != nil {
+		for quantType, multiplier := range p.config.Resources.Quantization.Multipliers {
+			if strings.Contains(lowerName, quantType) {
+				baseReqs.MinMemoryGB *= multiplier
+				baseReqs.RecommendedMemoryGB *= multiplier
+				baseReqs.MinGPUMemoryGB *= multiplier
+				break
+			}
+		}
+	}


⚠️ Potential issue

Critical: Avoid modifying config defaults

When no pattern matches, baseReqs points directly to p.config.Resources.Defaults. The quantization multiplier logic then modifies these defaults in-place, affecting all future calls.

Create a copy of the defaults:

// Use defaults if no pattern matched if baseReqs == nil { - baseReqs = &p.config.Resources.Defaults + baseReqs = &domain.ResourceRequirements{ + MinMemoryGB: p.config.Resources.Defaults.MinMemoryGB, + RecommendedMemoryGB: p.config.Resources.Defaults.RecommendedMemoryGB, + MinGPUMemoryGB: p.config.Resources.Defaults.MinGPUMemoryGB, + RequiresGPU: p.config.Resources.Defaults.RequiresGPU, + EstimatedLoadTimeMS: p.config.Resources.Defaults.EstimatedLoadTimeMS, + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Use defaults if no pattern matched

if baseReqs == nil {

baseReqs = &p.config.Resources.Defaults

}

// Apply quantization multipliers if configured

if p.config.Resources.Quantization.Multipliers != nil {

for quantType, multiplier := range p.config.Resources.Quantization.Multipliers {

if strings.Contains(lowerName, quantType) {

baseReqs.MinMemoryGB *= multiplier

baseReqs.RecommendedMemoryGB *= multiplier

baseReqs.MinGPUMemoryGB *= multiplier

break

}

}

}

// Use defaults if no pattern matched

if baseReqs == nil {

- baseReqs = &p.config.Resources.Defaults

+ baseReqs = &domain.ResourceRequirements{

+ MinMemoryGB: p.config.Resources.Defaults.MinMemoryGB,

+ RecommendedMemoryGB: p.config.Resources.Defaults.RecommendedMemoryGB,

+ MinGPUMemoryGB: p.config.Resources.Defaults.MinGPUMemoryGB,

+ RequiresGPU: p.config.Resources.Defaults.RequiresGPU,

+ EstimatedLoadTimeMS: p.config.Resources.Defaults.EstimatedLoadTimeMS,

+ }

}

// Apply quantization multipliers if configured

if p.config.Resources.Quantization.Multipliers != nil {

🤖 Prompt for AI Agents

In internal/adapter/registry/profile/configurable_profile.go around lines 284 to 299, the code assigns baseReqs directly to p.config.Resources.Defaults when no pattern matches, then modifies baseReqs in-place with quantization multipliers, unintentionally altering the default config. To fix this, create a deep copy of p.config.Resources.Defaults before applying any multipliers so that the original defaults remain unchanged for future calls.

coderabbitai · 2025-07-26T09:20:02Z

internal/core/domain/profile_config.go

+type ProfileConfig struct {
+	Models struct {
+		CapabilityPatterns map[string][]string `yaml:"capability_patterns"`
+		NameFormat         string              `yaml:"name_format"`
+		ContextPatterns    []ContextPattern    `yaml:"context_patterns"`
+	} `yaml:"models"`
+
+	Name        string `yaml:"name"`
+	Version     string `yaml:"version"`
+	DisplayName string `yaml:"display_name"`
+	Description string `yaml:"description"`
+
+	Detection struct {
+		Headers           []string `yaml:"headers"`
+		UserAgentPatterns []string `yaml:"user_agent_patterns"`
+		ResponsePatterns  []string `yaml:"response_patterns"`
+		PathIndicators    []string `yaml:"path_indicators"`
+		DefaultPorts      []int    `yaml:"default_ports"`
+	} `yaml:"detection"`
+
+	Request struct {
+		ModelFieldPaths []string `yaml:"model_field_paths"`
+		ResponseFormat  string   `yaml:"response_format"`
+		ParsingRules    struct {
+			ChatCompletionsPath string `yaml:"chat_completions_path"`
+			CompletionsPath     string `yaml:"completions_path"`
+			GeneratePath        string `yaml:"generate_path"`
+			ModelFieldName      string `yaml:"model_field_name"`
+			SupportsStreaming   bool   `yaml:"supports_streaming"`
+		} `yaml:"parsing_rules"`
+	} `yaml:"request"`
+
+	API struct {
+		ModelDiscoveryPath string   `yaml:"model_discovery_path"`
+		HealthCheckPath    string   `yaml:"health_check_path"`
+		Paths              []string `yaml:"paths"`
+		OpenAICompatible   bool     `yaml:"openai_compatible"`
+	} `yaml:"api"`
+
+	Resources struct {
+		Quantization struct {
+			Multipliers map[string]float64 `yaml:"multipliers"`
+		} `yaml:"quantization"`
+		ModelSizes        []ModelSizePattern        `yaml:"model_sizes"`
+		ConcurrencyLimits []ConcurrencyLimitPattern `yaml:"concurrency_limits"`
+		Defaults          ResourceRequirements      `yaml:"defaults"`
+		TimeoutScaling    TimeoutScaling            `yaml:"timeout_scaling"`
+	} `yaml:"resources"`
+
+	// PathIndices allows configuring which paths serve specific purposes
+	PathIndices struct {
+		Health          int `yaml:"health"`
+		Models          int `yaml:"models"`
+		Completions     int `yaml:"completions"`
+		ChatCompletions int `yaml:"chat_completions"`
+		Embeddings      int `yaml:"embeddings"`
+	} `yaml:"path_indices"`
+
+	Characteristics struct {
+		Timeout               time.Duration `yaml:"timeout"`
+		MaxConcurrentRequests int           `yaml:"max_concurrent_requests"`
+		DefaultPriority       int           `yaml:"default_priority"`
+		StreamingSupport      bool          `yaml:"streaming_support"`
+	} `yaml:"characteristics"`
+}


🛠️ Refactor suggestion

Consider adding comprehensive validation for ProfileConfig fields.

Several fields could benefit from validation to prevent runtime issues:

Port validation: DefaultPorts should only contain valid port numbers (1-65535)

Time fields: Consider using time.Duration directly instead of int seconds to avoid overflow

Resource values: MinMemoryGB, RecommendedMemoryGB should be non-negative

Pattern validation: Empty strings in pattern slices could cause issues

Consider implementing a Validate() method on ProfileConfig:

func (pc *ProfileConfig) Validate() error { // Validate ports for _, port := range pc.Detection.DefaultPorts { if port < 1 || port > 65535 { return fmt.Errorf("invalid port number: %d", port) } } // Validate path indices for name, idx := range map[string]int{ "health": pc.PathIndices.Health, "models": pc.PathIndices.Models, // ... other indices } { if idx < 0 || idx >= len(pc.API.Paths) { return fmt.Errorf("path index %s out of bounds: %d", name, idx) } } // Validate resource requirements for _, ms := range pc.Resources.ModelSizes { if ms.MinMemoryGB < 0 || ms.RecommendedMemoryGB < 0 { return fmt.Errorf("negative memory requirement in model size pattern") } } return nil }

🤖 Prompt for AI Agents

In internal/core/domain/profile_config.go from lines 8 to 72, add a Validate() method to the ProfileConfig struct that checks key fields for correctness. Implement validation to ensure DefaultPorts contain only valid port numbers between 1 and 65535, verify that PathIndices values are within bounds of the API.Paths slice, and confirm that memory-related fields like MinMemoryGB and RecommendedMemoryGB in ModelSizes are non-negative. Also, check that pattern slices do not contain empty strings. Return descriptive errors for any invalid values found to prevent runtime issues.

…lers from using it - but we hold onto it in case it comes back online Make unification settings based including intervals Update tests Fix linting Simplify complex method Fix tests add new unification tests update docs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

internal/adapter/registry/unified_memory_registry.go (1)
193-197: Explicitly handle errors in the unified model lookup.

Ignoring the error returned by GetUnifiedModel treats context cancellation and any upstream failures as "model not found", which can mask legitimate issues.

Apply this diff to properly handle errors:
-	unified, _ := r.GetUnifiedModel(ctx, modelName)
-	if unified == nil {
-		// Model not found in unified registry, return empty array
-		return []string{}, nil
+	unified, err := r.GetUnifiedModel(ctx, modelName)
+	if err != nil {
+		return nil, fmt.Errorf("failed to retrieve unified model %q: %w", modelName, err)
+	}
+	if unified == nil {
+		// Model not found in unified registry, return empty array
+		return []string{}, nil

🧹 Nitpick comments (3)

internal/app/handlers/handler_unified_models_test.go (1)

108-110: Address the TODO: Replace time-based wait with proper synchronization.

Using time.After for synchronization in tests is fragile and can lead to flaky tests. Consider implementing a synchronization mechanism such as:

A callback or channel that signals when unification is complete

A polling mechanism with a timeout that checks for completion

Exposing a method to wait for pending operations

Would you like me to open an issue to track this technical debt or help implement a proper synchronization mechanism?
internal/app/handlers/handler_unified_models.go (2)
150-165: Use consistent constants for boolean string values.

The code uses queryValueTrue constant for "true" but hardcodes "false". Consider defining a queryValueFalse constant for consistency.
 	// Parse include_unavailable parameter
 	includeUnavailable := false
 	if includeStr := qry.Get("include_unavailable"); includeStr == queryValueTrue {
 		includeUnavailable = true
 	}

 	// Parse available filter
 	if availStr := qry.Get("available"); availStr != "" {
 		switch availStr {
 		case queryValueTrue:
 			avail := true
 			filters.Available = &avail
-		case "false":
+		case queryValueFalse:
 			avail := false
 			filters.Available = &avail
 		default:
 			return filters, false, errors.New("invalid value for 'available' parameter. Use 'true' or 'false'")
 		}
 	}
Also add the constant definition:
const (
    queryValueTrue  = "true"
    queryValueFalse = "false"
)
184-214: Efficient implementation with good algorithmic complexity.

The method correctly filters models based on endpoint health status using an efficient map lookup approach.

Consider pre-allocating the slice capacity for a minor performance improvement:
-	filteredModels := make([]*domain.UnifiedModel, 0)
+	filteredModels := make([]*domain.UnifiedModel, 0, len(models))

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50c9983 and 3bc40e0.

📒 Files selected for processing (13)

config/config.yaml (3 hunks)
default.yaml (1 hunks)
docs/api/query-formats.md (1 hunks)
internal/adapter/registry/factory.go (1 hunks)
internal/adapter/registry/unified_memory_registry.go (4 hunks)
internal/adapter/registry/unified_memory_registry_test.go (1 hunks)
internal/adapter/unifier/integration_test.go (3 hunks)
internal/app/handlers/handler_status_models.go (1 hunks)
internal/app/handlers/handler_unified_models.go (5 hunks)
internal/app/handlers/handler_unified_models_test.go (1 hunks)
internal/app/model_routing_integration_test.go (1 hunks)
internal/app/services/discovery.go (4 hunks)
internal/config/types.go (2 hunks)

✅ Files skipped from review due to trivial changes (1)

internal/adapter/unifier/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (5)

internal/config/types.go
config/config.yaml
internal/app/model_routing_integration_test.go
internal/app/services/discovery.go
internal/adapter/registry/unified_memory_registry_test.go

🧰 Additional context used

🧬 Code Graph Analysis (2)

internal/adapter/registry/factory.go (5)

internal/config/types.go (1)

UnificationConfig (123-129)

internal/logger/styled.go (1)

StyledLogger (12-35)

internal/core/domain/model.go (1)

ModelRegistry (41-54)

internal/adapter/registry/unified_memory_registry.go (1)

NewUnifiedMemoryModelRegistry (28-53)

internal/adapter/registry/memory_registry.go (1)

NewMemoryModelRegistry (25-38)

internal/app/handlers/handler_unified_models.go (5)

internal/app/handlers/server.go (2)

ContentTypeHeader (15-15)

ContentTypeJSON (13-13)

internal/app/handlers/application.go (1)

Application (61-76)

internal/core/ports/model_converter.go (1)

ModelFilters (18-23)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

🪛 YAMLlint (1.37.1)

default.yaml

[error] 80-80: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (13)

internal/app/handlers/handler_status_models.go (1)

67-67: LGTM: Improved consistency with constant usage.

Replacing the literal string "true" with the queryValueTrue constant enhances maintainability and ensures consistency across the codebase.

docs/api/query-formats.md (1)

115-136: Excellent documentation for the new availability filter.

The documentation clearly explains the include_unavailable parameter functionality with helpful JSON examples showing the availability status structure. This provides users with clear guidance on endpoint health visibility.

internal/adapter/registry/factory.go (2)

13-15: Well-structured configuration extension.

The addition of UnificationConf field with proper YAML tagging integrates unification configuration cleanly into the registry configuration structure. The field ordering is logical with configuration before behavior flags.

18-22: Proper configuration passing and parameter clarity.

The parameter rename from config to regConfig avoids naming conflicts with the imported config package, and the UnificationConf is correctly passed to the unified registry constructor when the unifier is enabled.

default.yaml (1)

1-79: Comprehensive and well-structured default configuration.

The configuration file provides sensible defaults across all system components including server settings, proxy configuration, discovery mechanisms, model registry with unification support, and logging. The structure is clear and the values are appropriate for a typical deployment.

internal/adapter/registry/unified_memory_registry.go (3)

28-44: Excellent configuration-driven registry construction.

The constructor properly integrates UnificationConfig to create unifiers with custom TTL and cleanup intervals when provided, falling back to sensible defaults. The configuration-based approach enhances flexibility without compromising functionality.

259-297: Well-implemented healthy endpoint filtering.

The GetHealthyEndpointsForModel method properly handles context cancellation, error propagation, and efficiently filters healthy endpoints using set intersection. The implementation is robust and follows good error handling practices.

299-353: Comprehensive capability matching system.

The capability matching logic with alias support and the GetModelsByCapability method provide flexible model discovery based on capabilities. The implementation handles context cancellation properly and includes sensible capability aliases for different use cases.

internal/app/handlers/handler_unified_models_test.go (2)

22-59: LGTM!

The mock repository implementation is well-structured and correctly implements the required interface methods for testing.

61-188: Well-structured test coverage for the include_unavailable parameter.

The test effectively validates the filtering behavior with comprehensive scenarios and proper assertions, including verification of availability information when requested.

internal/app/handlers/handler_unified_models.go (3)

18-25: LGTM!

The ModelAvailability struct is well-designed with clear field names and documentation.

76-97: LGTM!

The handler correctly implements the health-based filtering logic with appropriate error handling.

131-133: LGTM!

Clear and helpful comment explaining the interaction between filtering and availability information.

coderabbitai · 2025-07-26T12:36:53Z

default.yaml

+  output: "stdout"  # stdout, file
+
+engineering:
+  show_nerdstats: false


⚠️ Potential issue

Add missing newline at end of file.

Static analysis correctly identifies the missing newline character at the end of the file.

- show_nerdstats: false + show_nerdstats: false +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

show_nerdstats: false

show_nerdstats: false

🧰 Tools

🪛 YAMLlint (1.37.1)

[error] 80-80: no new line character at the end of file

(new-line-at-end-of-file)

🤖 Prompt for AI Agents

In default.yaml at line 80, the file is missing a newline character at the end. Add a newline after the last line to comply with standard file formatting and satisfy static analysis checks.

thushan added 11 commits July 18, 2025 21:17

update ci pipes

a61fcae

adds body inspector and extraction of models

58eaffa

slightly faster way of extracting model, without breaking existing payload add tests for inspector add tests for proxy handler update handler for showing model in request context add a benchmark test for inspector chain for later testing

implement GetModelsByCapability

bddd3e2

add GetHealthyEndpointsForModel update tests fix broken tests lint and fix complexity

added reservoir sampling for percentiles (thanks to Browny's original…

9133d74

… implementation) refactor model collector update registry profiles to avoid lcase remove old modelcollector methods

fix configuration for olla in path matching with prefix

471fae0

adds capabilities to body inspector & registries WIP: capabilities Fix capabilities for lmstudio fix vllm issues update proxies to handle capabilities

refactor handler to be a bit easier to manage moving forward

6059bb8

adds proper headers to requests.

c196332

concise claude.md

0ab4adb

basic docs

e5b2425

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

thushan added 7 commits July 26, 2025 16:11

move model size / quants to configuration files and update tests

7cf4fcd

clean up + coderabbit review comments

390c117

update tests

7f12858

update profiles adds back window detection and dynamic latency calcs. fix tests :( update routing script to show progress and ignore broken attempts at embedded etc

show which config has been loaded

cbf0373

cleanup configuration issues which is back to being basic

00b5952

update readme

forgot updated readme

c0ff64a

beef up installer script

50c9983

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

thushan merged commit a24c7a9 into main Jul 26, 2025
3 checks passed

thushan deleted the feature/olla-profiles branch July 26, 2025 12:37

This was referenced Jul 27, 2025

feat: API Redesign #33

Merged

tweaks & bugfixes #35

Merged

This was referenced Aug 14, 2025

feat: Improved health recovery #50

Merged

feat: better stats #52

Merged

coderabbitai bot mentioned this pull request Aug 23, 2025

feat: filtering adapter #57

Merged

This was referenced Oct 9, 2025

fixes: October 2025 performance improvements #71

Merged

feat: Backend llamacpp #73

Merged

coderabbitai bot mentioned this pull request Oct 21, 2025

feat: anthropic / message logger (development only) #77

Merged

Uh oh!

feat: olla profile #32

feat: olla profile #32

Uh oh!

Conversation

thushan commented Jul 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

"Model not found"

Security Considerations

Rate Limiting

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thushan commented Jul 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 26, 2025 •

edited

Loading