feat: API Redesign #33

thushan · 2025-07-27T12:30:39Z

This PR redoes the API with explicit provider-specific routing, profile-driven route registration, and cleaner separation of logic for different LLM backends. It fixes backend ambiguity, simplifies debugging, and still plays nice with legacy clients.

tldr;

Explicit provider routing = less confusion, easier debugging
Profile-based config = easier to extend, less hardcoding
Separation of logic = cleaner code, fewer bugs pretending to be features

Provider-Specific Routing

Swaps the catch-all /olla/ proxy for clearly scoped routes:
- /olla/ollama/
- /olla/lmstudio/
- /olla/openai/
- /olla/vllm/
Requests now only hit their matching backend type - no more mystery detours.
Still supports legacy endpoints, for those who enjoy living in the past.

Dynamic Route Registration

Routes are now built from YAML config profiles (config/profiles/*.yaml).
Profile factory handles:
- Prefix normalisation
- Provider mapping
Falls back to hardcoded routes in test mode -because mocks deserve love too.

Enhanced Model Discovery

Each backend gets its own model handler, tuned to its API quirks:
- Ollama: /api/tags
- LM Studio: /api/v0/models
- OpenAI-style JSON still works too
LM Studio responses include richer metadata - shiny!

Improved Code Organisation

Route setup pulled into server_routes.go
Common handler patterns centralised in:
- handler_common.go
- handler_provider_common.go
Backend-specific logic isolated to:
- handler_provider_ollama.go
- handler_provider_lmstudio.go
- handler_provider_openai.go

Testing

Bash test script: test-provider-routing.sh
Python test suite: test-model-routing-provider.py
Integration tests check profile-based and legacy routing
Compatibility tests confirm all backends behave (mostly)

Documentation

New provider routing guide: docs/api/provider-routing.md
User guide updated with the latest endpoints
Dev guide added for bringing in new providers: docs/adding-providers.md

Summary by CodeRabbit

New Features

Introduced provider-specific routing with dedicated URL namespaces for Ollama, LM Studio, OpenAI-compatible, and vLLM backends.
Added dynamic provider profile loading with flexible routing prefixes and alias normalization.
Added new HTTP endpoints for provider-specific model listings in native and OpenAI-compatible formats.
Introduced a new vLLM provider profile supporting OpenAI API compatibility and advanced resource management.
Enhanced request handling to include provider-aware routing and model filtering.

Bug Fixes

Improved provider alias normalization and compatibility checks for consistent routing and model discovery.

Documentation

Added detailed guides on adding providers, API references, provider routing architecture, and updated user guides with provider-specific endpoint examples.
Standardised load balancer naming conventions and usage instructions.

Tests

Added extensive unit, integration, and logic tests covering provider routing, model listing endpoints, compatibility, and proxy behaviour.
Introduced new shell and Python test scripts for automated provider-specific routing and model validation.

Chores

Refactored routing registration to support dynamic and static provider routes.
Updated configuration files and load balancer keys for consistency.
Enhanced logging detail with model context and modularised path handling utilities.
Updated load test and security scripts to support provider-specific proxy endpoints.

…es invalid) ah, normalise lmstudio, lm-studio, lm_studio in configuration add tests for model_builder update tests for model builder fix invalid routes for ollama and broken for vllm adds a test to check the unfied models too

adds two new scripts to check formats and compatibility (Claude created)

consolidates naming for load balancers

update scripts fix tests support lm_studio variations in API (not finalised yet). update tests update more tests

coderabbitai · 2025-07-27T12:30:48Z

Walkthrough

This update introduces a provider-specific routing system, enabling explicit namespacing of proxy endpoints by backend provider type (e.g., /olla/ollama/, /olla/lmstudio/, /olla/openai/, /olla/vllm/). It adds dynamic route registration based on YAML profile configurations, implements flexible provider alias handling, and refactors HTTP handler logic for provider-aware model discovery and proxying. Documentation and test scripts are updated to reflect the new routing architecture.

Changes

Files/Groups	Change Summary
config/config.yaml, default.yaml, readme.md	Standardised load balancer naming (hyphenated), updated documentation and comments for load balancer options.
config/profiles/*.yaml	Added/updated routing prefixes in provider profiles (Ollama, LM Studio, OpenAI, vLLM), new vllm profile added.
docs/adding-providers.md, docs/api/README.md, docs/api/provider-routing.md, docs/user-guide.md	New/updated documentation for provider routing, API endpoints, adding providers, and usage examples.
internal/adapter/proxy/proxy_olla.go, internal/adapter/proxy/proxy_sherpa.go	Improved logging: added model name to info logs, removed debug log for target URL.
internal/adapter/registry/profile/factory.go, factory_test.go, loader.go	Introduced prefix-based provider alias resolution, updated profile validation, added/routed tests.
internal/adapter/unifier/model_builder.go, model_builder_test.go, default_unifier.go	Platform detection now considers endpoint type; new tests for platform detection logic.
internal/app/handlers/application.go	Application struct extended with profileFactory field.
internal/app/handlers/handler_common.go, handler_common_test.go	New utilities for provider normalisation, path extraction, provider support checks, with unit tests.
internal/app/handlers/handler_provider_common.go, handler_provider_compatibility_test.go	New provider-specific routing/filtering logic, compatibility tests for provider/endpoint types.
internal/app/handlers/handler_provider_generic.go	Generic handlers for provider models and model show endpoints.
internal/app/handlers/handler_provider_lmstudio.go	Handlers for LM Studio models in OpenAI and enhanced LM Studio formats.
internal/app/handlers/handler_provider_models_test.go	Tests for provider-specific model listing endpoints and format filtering.
internal/app/handlers/handler_provider_ollama.go	Ollama provider: handlers for model listing, unsupported management, OpenAI-compatible endpoints.
internal/app/handlers/handler_provider_openai.go	Handler for OpenAI-compatible model listing endpoint.
internal/app/handlers/handler_provider_test.go	Tests for provider routing and path stripping logic.
internal/app/handlers/handler_proxy.go	Sets model name in request stats; normalises endpoint type before compatibility checks.
internal/app/handlers/handler_unified_models_test.go	Minor spelling correction in comment.
internal/app/handlers/server.go	Removed old route registration method and related import.
internal/app/handlers/server_routes.go	New dynamic/static route registration system for internal, unified, and provider-specific endpoints.
internal/core/constants/context.go, endpoint.go, providers.go	Added new constants for original path, default prefixes, and provider types/display names/prefixes.
internal/core/domain/profile_config.go	Added Routing field to ProfileConfig for prefix mapping; reordered Models field.
internal/core/ports/proxy.go	Added Model field to RequestStats struct.
internal/util/request.go	Refactored prefix stripping logic into helper function.
test/integration/profile_routing_test.go	Integration tests for profile routing and validation.
test/scripts/load/test-load-chaos.sh, test-load-limits.sh	Proxy path now includes provider variable; usage documentation updated.
test/scripts/logic/.gitignore, requirements.txt	Added .gitignore and Python requirements for logic test scripts.
test/scripts/logic/README.md	Major expansion: covers all logic test scripts, usage, endpoints, error handling, and CI integration.
test/scripts/logic/test-model-routing-provider.py	New Python script: comprehensive provider-specific model routing tests, endpoint validation, summary reporting.
test/scripts/logic/test-model-routing.sh	Proxy endpoint now provider-specific; usage docs updated.
test/scripts/logic/test-provider-models.sh	New Bash script: validates provider-specific model listing endpoints and formats.
test/scripts/logic/test-provider-routing.sh	New Bash script: validates provider-specific routing and proxy endpoints, response headers, and model counts.
test/scripts/security/test-request-rate-limits.sh, test-request-size-limits.sh	Proxy endpoint now configurable by provider variable.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Olla
    participant ProfileFactory
    participant BackendProvider

    Client->>Olla: HTTP request to /olla/{provider}/...
    Olla->>ProfileFactory: Resolve provider profile by prefix/alias
    ProfileFactory-->>Olla: Profile config (with routing prefixes)
    Olla->>BackendProvider: Proxy or model discovery request (provider-specific)
    BackendProvider-->>Olla: Response (models, inference, etc.)
    Olla-->>Client: Response (format depends on endpoint/provider)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40–60 minutes

Possibly related PRs

feat: olla profile #32: Implements a new routing system with model capability awareness, provider alias normalisation, and provider-specific routing prefixes—directly related to the provider-specific routing, alias handling, and dynamic route registration introduced here.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2191eb and 0103cf8.

📒 Files selected for processing (1)

internal/app/handlers/handler_provider_test.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/app/handlers/handler_provider_test.go

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/api-redesign

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

default.yaml (1)

34-35: Same key rename as above – keep parsers in sync

Duplicated concern: the default config now uses "least-connections". Make sure both JSON/YAML unmarshal logic and any CLI validation recognise the updated spelling, or startup will fail in deployments that copy the default.

internal/adapter/proxy/proxy_olla.go (1)

518-522: Mirror the conditional logging here as well

Same recommendation as for the Sherpa proxy to suppress empty "model" entries in production logs (see diff in previous comment).

🧹 Nitpick comments (25)

internal/core/constants/context.go (1)
4-8: Prefer typed context keys to plain strings

Using raw strings as context.Context keys risks clashes across packages. A lightweight, unexported type avoids this:
-package constants
+package constants
+
+// ctxKey is an unexported type used to prevent context-key collisions.
+type ctxKey string
@@
-	OriginalPathKey = "original_path" // original path before any modifications, useful for logging/debugging
+	OriginalPathKey ctxKey = "original_path" // original path before any modifications, useful for logging/debugging
Migrating now keeps the public surface stable before more call-sites appear.
test/scripts/logic/requirements.txt (1)
1-1: Pin the dependency version for repeatable test runs

Unpinned dependencies can break the test harness when requests releases a new major version.
-requests
+requests>=2.32,<3
Locking to a tested range makes CI failures easier to diagnose.
internal/adapter/proxy/proxy_sherpa.go (1)
236-239: New structured-log field is valuable – avoid zero-value noise

Including "model" in the dispatch log improves traceability. Consider guarding the field when stats.Model == "":
- rlog.Info("Request dispatching to endpoint", "endpoint", endpoint.Name, "target", stats.TargetUrl, "model", stats.Model)
+ if stats.Model != "" {
+     rlog.Info("Request dispatching to endpoint", "endpoint", endpoint.Name, "target", stats.TargetUrl, "model", stats.Model)
+ } else {
+     rlog.Info("Request dispatching to endpoint", "endpoint", endpoint.Name, "target", stats.TargetUrl)
+ }
This keeps logs tidy when model info is unavailable.
internal/core/constants/endpoint.go (1)

5-6: Well-defined constants for standardised routing.

The new constants provide centralised configuration for proxy path prefixes, supporting the provider-specific routing architecture. The naming is clear and follows Go conventions.

Note on struct alignment: Whilst these are constants rather than struct fields, ensure any related structs in this package follow proper field alignment for optimal memory layout as per the coding guidelines.
internal/core/domain/profile_config.go (1)
34-43: Routing configuration properly added for provider-specific routing.

The addition of the Routing field with Prefixes slice correctly enables the new provider-specific routing architecture. The YAML tags are properly configured for external configuration loading.

However, consider optimising struct field alignment for better memory layout:
type ProfileConfig struct {
	Name        string `yaml:"name"`
	Version     string `yaml:"version"`
	DisplayName string `yaml:"display_name"`
	Description string `yaml:"description"`

	Detection struct {
		Headers           []string `yaml:"headers"`
		UserAgentPatterns []string `yaml:"user_agent_patterns"`
		ResponsePatterns  []string `yaml:"response_patterns"`
		PathIndicators    []string `yaml:"path_indicators"`
		DefaultPorts      []int    `yaml:"default_ports"`
	} `yaml:"detection"`

	Request struct {
		ModelFieldPaths []string `yaml:"model_field_paths"`
		ResponseFormat  string   `yaml:"response_format"`
		ParsingRules    struct {
			ChatCompletionsPath string `yaml:"chat_completions_path"`
			CompletionsPath     string `yaml:"completions_path"`
			GeneratePath        string `yaml:"generate_path"`
			ModelFieldName      string `yaml:"model_field_name"`
			SupportsStreaming   bool   `yaml:"supports_streaming"`
		} `yaml:"parsing_rules"`
	} `yaml:"request"`

+	// Group smaller fields together for better memory alignment
+	PathIndices struct {
+		Health          int `yaml:"health"`
+		Models          int `yaml:"models"`
+		Completions     int `yaml:"completions"`
+		ChatCompletions int `yaml:"chat_completions"`
+		Embeddings      int `yaml:"embeddings"`
+	} `yaml:"path_indices"`
+
+	Characteristics struct {
+		Timeout               time.Duration `yaml:"timeout"`
+		MaxConcurrentRequests int           `yaml:"max_concurrent_requests"`
+		DefaultPriority       int           `yaml:"default_priority"`
+		StreamingSupport      bool          `yaml:"streaming_support"`
+	} `yaml:"characteristics"`

	Models struct {
		CapabilityPatterns map[string][]string `yaml:"capability_patterns"`
		NameFormat         string              `yaml:"name_format"`
		ContextPatterns    []ContextPattern    `yaml:"context_patterns"`
	} `yaml:"models"`

	Routing struct {
		Prefixes []string `yaml:"prefixes"`
	} `yaml:"routing"`

	API struct {
		ModelDiscoveryPath string   `yaml:"model_discovery_path"`
		HealthCheckPath    string   `yaml:"health_check_path"`
		Paths              []string `yaml:"paths"`
		OpenAICompatible   bool     `yaml:"openai_compatible"`
	} `yaml:"api"`

	Resources struct {
		Quantization struct {
			Multipliers map[string]float64 `yaml:"multipliers"`
		} `yaml:"quantization"`
		ModelSizes        []ModelSizePattern        `yaml:"model_sizes"`
		ConcurrencyLimits []ConcurrencyLimitPattern `yaml:"concurrency_limits"`
		Defaults          ResourceRequirements      `yaml:"defaults"`
		TimeoutScaling    TimeoutScaling            `yaml:"timeout_scaling"`
	} `yaml:"resources"`

-	PathIndices struct {
-		Health          int `yaml:"health"`
-		Models          int `yaml:"models"`
-		Completions     int `yaml:"completions"`
-		ChatCompletions int `yaml:"chat_completions"`
-		Embeddings      int `yaml:"embeddings"`
-	} `yaml:"path_indices"`
-
-	Characteristics struct {
-		Timeout               time.Duration `yaml:"timeout"`
-		MaxConcurrentRequests int           `yaml:"max_concurrent_requests"`
-		DefaultPriority       int           `yaml:"default_priority"`
-		StreamingSupport      bool          `yaml:"streaming_support"`
-	} `yaml:"characteristics"`
}
internal/app/handlers/handler_provider_lmstudio.go (1)
56-57: Consider removing or expanding this comment.

The comment about LM Studio's focus on local model serving seems incomplete and doesn't add significant value to the code. Consider either expanding it with more context or removing it.
-// lm studio focuses on local model serving without centralised management.
-// this simplifies deployment but limits remote administration capabilities
internal/app/handlers/handler_provider_generic.go (1)
44-46: Consider using a more specific struct type.

While the anonymous struct works, consider defining a named struct type for better code maintainability and reusability across handlers.
type ModelRequest struct {
    Name string `json:"name"`
}

var req ModelRequest
test/integration/profile_routing_test.go (1)

11-54: Consider enhancing to test full request flow through the proxy

Whilst this test validates the profile factory's provider validation logic, per the retrieved learnings, integration tests should test the full request flow through the proxy. This appears to be more of a unit test for the ProfileFactory.ValidateProfileType method rather than an end-to-end integration test.

Consider adding tests that:

Send actual HTTP requests to provider-specific endpoints

Validate the complete routing chain from request to backend

Test the interaction between profile loading, route registration, and request handling
readme.md (1)
366-375: Address minor grammar issues

The static analysis tool identified missing determiners in the load balancer section.
-### 📊 Least Connections (`least-connections`) - **Recommended**
-Routes to the endpoint with least active requests. Ideal for:
+### 📊 Least Connections (`least-connections`) - **Recommended**
+Routes to the endpoint with the least active requests. Ideal for:
docs/adding-providers.md (1)
112-112: Consider reducing exclamation marks for professional tone

The static analysis tool flagged excessive exclamation marks. Consider a more measured tone for technical documentation.
-3. Run Olla - done!
+3. Run Olla - done!
Or alternatively:
-3. Run Olla - done!
+3. Run Olla - that's it!
internal/app/handlers/handler_provider_models_test.go (1)
64-96: Consider replacing hardcoded sleep with deterministic synchronisation

The 200ms sleep on line 85 could make tests flaky and slower than necessary. Consider using a more deterministic approach to wait for async unification.
-	// Wait for async unification to complete
-	time.Sleep(200 * time.Millisecond)
+	// Wait for async unification to complete with timeout
+	ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
+	defer cancel()
+	for {
+		select {
+		case <-ctx.Done():
+			require.Fail(t, "unification did not complete within timeout")
+		default:
+			// Check if unification is complete by verifying model count
+			models, err := unifiedRegistry.GetAllUnifiedModels(ctx)
+			if err == nil && len(models) >= 4 { // Expected total models
+				goto unified
+			}
+			time.Sleep(10 * time.Millisecond)
+		}
+	}
+	unified:
docs/api/provider-routing.md (2)
163-163: Minor grammar improvement

Consider adding the missing determiner for better readability:
-Within each provider type, the configured load balancing strategy (round-robin, least connections, priority) is applied. This means:
+Within each provider type, the configured load balancing strategy (round-robin, the least connections, priority) is applied. This means:
24-27: Add language specifications to fenced code blocks

The fenced code blocks should specify a language for proper syntax highlighting and consistency:
-```
+```http
 GET /olla/ollama/api/tags       # Ollama native format
 GET /olla/ollama/v1/models      # OpenAI-compatible format
Apply similar changes to the other endpoint listings:
- Lines 69-72: Add `http` language
- Lines 127-129: Add `http` language  
- Lines 138-140: Add `http` language
- Lines 148-150: Add `http` language
- Lines 155-157: Add `http` language


Also applies to: 69-72, 127-129, 138-140, 148-150, 155-157

</blockquote></details>
<details>
<summary>test/scripts/logic/test-provider-models.sh (1)</summary><blockquote>

`89-89`: **Address shellcheck warnings for better script reliability**

Several shellcheck warnings should be addressed:



```diff
-    local response_body=$(echo "$response" | sed '$d')
+    local response_body
+    response_body=$(echo "$response" | sed '$d')
-                if [ $(echo "$response_body" | wc -l) -gt 10 ]; then
+                if [ "$(echo "$response_body" | wc -l)" -gt 10 ]; then
-    local response_body=$(echo "$response" | sed '$d')
+    local response_body
+    response_body=$(echo "$response" | sed '$d')
These changes prevent masking return values and avoid word splitting issues.

Also applies to: 139-139, 191-191
test/scripts/logic/test-provider-routing.sh (1)
12-12: Remove unused colour variable

The BLUE variable is defined but never used in the script.
-BLUE='\033[0;34m'
test/scripts/logic/README.md (1)
125-125: Minor style and formatting improvements

A few minor improvements for consistency and readability:
-- **Flexible Testing** - Test specific providers or all providers
+- **Flexible Testing** - Test-specific providers or all providers
-- Endpoint usage statistics with success/failure breakdown
+- Endpoint usage statistics with success and failure breakdown
-```
+```text
 Available endpoints:
Also applies to: 168-168, 172-172
internal/adapter/registry/profile/factory.go (1)
92-107: Consider atomic update pattern for prefix lookup rebuild.

While the current implementation works, consider building the new prefix lookup in a temporary variable first, then atomically replacing it. This ensures consistency if buildPrefixLookup encounters any issues.
 func (f *Factory) ReloadProfiles() error {
 	f.mu.Lock()
 	defer f.mu.Unlock()
 
 	if err := f.loader.LoadProfiles(); err != nil {
 		return err
 	}
 
-	// Invalidate and rebuild the prefix cache
-	f.prefixLookup = make(map[string]string)
-	f.buildPrefixLookup()
+	// Build new prefix lookup atomically
+	newPrefixLookup := make(map[string]string)
+	f.buildPrefixLookupInto(newPrefixLookup)
+	f.prefixLookup = newPrefixLookup
 
 	return nil
 }
You'd need to refactor buildPrefixLookup to accept the map as a parameter:
func (f *Factory) buildPrefixLookupInto(lookup map[string]string) {
	profiles := f.loader.GetAllProfiles()
	
	for profileName, profile := range profiles {
		config := profile.GetConfig()
		if config == nil {
			continue
		}
		
		// Each prefix in the YAML becomes a valid route
		for _, prefix := range config.Routing.Prefixes {
			lookup[prefix] = profileName
		}
		
		// Profile names are implicit prefixes for convenience
		lookup[profileName] = profileName
	}
}

func (f *Factory) buildPrefixLookup() {
	f.buildPrefixLookupInto(f.prefixLookup)
}
internal/app/handlers/handler_provider_ollama.go (2)
8-28: Consider more descriptive error messages.

While the implementation is correct, the error responses could be more informative for debugging.
 func (a *Application) ollamaModelsHandler(w http.ResponseWriter, r *http.Request) {
 	ctx := r.Context()
 
 	models, err := a.getProviderModels(ctx, "ollama")
 	if err != nil {
-		http.Error(w, err.Error(), http.StatusInternalServerError)
+		http.Error(w, fmt.Sprintf("Failed to fetch Ollama models: %v", err), http.StatusInternalServerError)
 		return
 	}
 
 	response, err := a.convertModelsToProviderFormat(models, "ollama")
 	if err != nil {
-		http.Error(w, err.Error(), http.StatusInternalServerError)
+		http.Error(w, fmt.Sprintf("Failed to convert models to Ollama format: %v", err), http.StatusInternalServerError)
 		return
 	}
 
 	w.Header().Set(ContentTypeHeader, ContentTypeJSON)
 	w.WriteHeader(http.StatusOK)
 	json.NewEncoder(w).Encode(response)
 }
52-73: Consider consistent error message improvements.

This handler follows the same pattern as ollamaModelsHandler. Consider applying similar error message improvements here for consistency.
test/scripts/logic/test-model-routing-provider.py (1)
1-14: Remove unused import and fix loop variable.

The static analysis correctly identifies an unused import and loop variable that should be addressed.
 import sys
-import json
 import time
 import argparse
 import requests
-from typing import Dict, List, Tuple
+from typing import List, Tuple
 from collections import defaultdict
Also update line 150:
-for i, model in enumerate(self.provider_models[provider][:5]):
+for model in self.provider_models[provider][:5]:
internal/app/handlers/handler_provider_common.go (1)
132-178: Comprehensive model filtering with provider compatibility checks.

The implementation correctly checks both source endpoints and aliases for provider compatibility. Consider performance optimisation for large model sets.

For better performance with large model sets, consider pre-computing a provider compatibility map:
// Pre-compute which endpoints are compatible with the provider
compatibleEndpoints := make(map[string]bool)
for _, ep := range endpoints {
    normalisedType := NormaliseProviderType(ep.Type)
    if providerProfile.IsCompatibleWith(normalisedType) {
        compatibleEndpoints[ep.URLString] = true
    }
}

// Then use simple map lookups in the model loop
for _, source := range model.SourceEndpoints {
    if compatibleEndpoints[source.EndpointURL] {
        hasProvider = true
        break
    }
}
internal/app/handlers/server_routes.go (4)
12-19: Align struct fields for better memory layout.

According to the coding guidelines for **/*.go files, struct fields should be aligned for better memory layout. The current field ordering in staticRoute is not optimal.
 type staticRoute struct {
-	path        string
-	handler     http.HandlerFunc
-	description string
-	method      string
-	isProxy     bool
+	handler     http.HandlerFunc // 8 bytes
+	path        string           // 16 bytes
+	description string           // 16 bytes
+	method      string           // 16 bytes
+	isProxy     bool             // 1 byte
 }
63-64: Extract hardcoded string to improve maintainability.

The "openai-compatible" string is hardcoded here and referenced again at line 72. Consider extracting it to a constant for better maintainability.

Add a constant in the constants package:
// In internal/core/constants/providers.go
ProfileNameOpenAICompatible = "openai-compatible"
Then update the code:
-	profiles = append(profiles, "openai-compatible")
+	profiles = append(profiles, constants.ProfileNameOpenAICompatible)
And at line 72:
-			if profileName != "openai-compatible" {
+			if profileName != constants.ProfileNameOpenAICompatible {
92-175: Consider refactoring to reduce complexity.

This function is quite long (83 lines) with multiple nested conditionals. Consider extracting the OpenAI compatibility registration (lines 133-171) into a separate method for better readability and testability.

Extract the OpenAI compatibility section:
func (a *Application) registerOpenAICompatibilityRoutes(basePath, prefix, profileName string) {
    openAIPath := basePath + "v1/models"
    
    switch profileName {
    case constants.ProviderTypeOllama:
        // ... existing logic
    case constants.ProviderTypeLMStudio:
        // ... existing logic
    case constants.ProviderTypeOpenAICompat:
        // ... existing logic
    default:
        // ... existing logic
    }
}
Then call it from the main function:
 	// OpenAI compatibility enables cross-provider client support
 	if config.API.OpenAICompatible {
-		openAIPath := basePath + "v1/models"
-		// ... all the switch logic ...
+		a.registerOpenAICompatibilityRoutes(basePath, prefix, profileName)
 	}
177-221: Static provider configuration looks comprehensive.

The static provider definitions appropriately mirror the YAML configurations for test isolation. The comment clearly explains why this duplication exists. Note that any changes to the YAML profiles will need to be manually synchronised here.

Would you like me to generate a test that validates the static providers match their corresponding YAML configurations to catch drift?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a24c7a9 and c2191eb.

📒 Files selected for processing (52)

config/config.yaml (1 hunks)
config/profiles/lmstudio.yaml (1 hunks)
config/profiles/ollama.yaml (1 hunks)
config/profiles/openai.yaml (1 hunks)
config/profiles/vllm.yaml (1 hunks)
default.yaml (1 hunks)
docs/adding-providers.md (1 hunks)
docs/api/README.md (1 hunks)
docs/api/provider-routing.md (1 hunks)
docs/user-guide.md (4 hunks)
internal/adapter/proxy/proxy_olla.go (1 hunks)
internal/adapter/proxy/proxy_sherpa.go (1 hunks)
internal/adapter/registry/profile/factory.go (6 hunks)
internal/adapter/registry/profile/factory_test.go (1 hunks)
internal/adapter/registry/profile/loader.go (3 hunks)
internal/adapter/unifier/default_unifier.go (2 hunks)
internal/adapter/unifier/model_builder.go (2 hunks)
internal/adapter/unifier/model_builder_test.go (1 hunks)
internal/app/handlers/application.go (2 hunks)
internal/app/handlers/handler_common.go (1 hunks)
internal/app/handlers/handler_common_test.go (1 hunks)
internal/app/handlers/handler_provider_common.go (1 hunks)
internal/app/handlers/handler_provider_compatibility_test.go (1 hunks)
internal/app/handlers/handler_provider_generic.go (1 hunks)
internal/app/handlers/handler_provider_lmstudio.go (1 hunks)
internal/app/handlers/handler_provider_models_test.go (1 hunks)
internal/app/handlers/handler_provider_ollama.go (1 hunks)
internal/app/handlers/handler_provider_openai.go (1 hunks)
internal/app/handlers/handler_provider_test.go (1 hunks)
internal/app/handlers/handler_proxy.go (2 hunks)
internal/app/handlers/handler_unified_models_test.go (1 hunks)
internal/app/handlers/server.go (0 hunks)
internal/app/handlers/server_routes.go (1 hunks)
internal/core/constants/context.go (1 hunks)
internal/core/constants/endpoint.go (1 hunks)
internal/core/constants/providers.go (1 hunks)
internal/core/domain/profile_config.go (1 hunks)
internal/core/ports/proxy.go (1 hunks)
internal/util/request.go (1 hunks)
readme.md (3 hunks)
test/integration/profile_routing_test.go (1 hunks)
test/scripts/load/test-load-chaos.sh (1 hunks)
test/scripts/load/test-load-limits.sh (2 hunks)
test/scripts/logic/.gitignore (1 hunks)
test/scripts/logic/README.md (2 hunks)
test/scripts/logic/requirements.txt (1 hunks)
test/scripts/logic/test-model-routing-provider.py (1 hunks)
test/scripts/logic/test-model-routing.sh (3 hunks)
test/scripts/logic/test-provider-models.sh (1 hunks)
test/scripts/logic/test-provider-routing.sh (1 hunks)
test/scripts/security/test-request-rate-limits.sh (1 hunks)
test/scripts/security/test-request-size-limits.sh (1 hunks)

💤 Files with no reviewable changes (1)

internal/app/handlers/server.go

🧰 Additional context used

📓 Path-based instructions (5)

**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

Align struct fields for better memory layout

Files:

internal/app/handlers/handler_proxy.go
internal/app/handlers/handler_unified_models_test.go
internal/core/ports/proxy.go
internal/core/constants/context.go
internal/adapter/proxy/proxy_sherpa.go
internal/adapter/proxy/proxy_olla.go
internal/app/handlers/application.go
internal/adapter/unifier/default_unifier.go
internal/core/constants/endpoint.go
internal/core/domain/profile_config.go
internal/app/handlers/handler_provider_compatibility_test.go
internal/adapter/unifier/model_builder.go
internal/adapter/registry/profile/factory_test.go
internal/util/request.go
internal/core/constants/providers.go
internal/adapter/unifier/model_builder_test.go
internal/app/handlers/handler_provider_openai.go
internal/app/handlers/handler_provider_lmstudio.go
internal/app/handlers/handler_common_test.go
test/integration/profile_routing_test.go
internal/app/handlers/handler_provider_test.go
internal/app/handlers/handler_common.go
internal/adapter/registry/profile/loader.go
internal/app/handlers/handler_provider_generic.go
internal/app/handlers/handler_provider_ollama.go
internal/app/handlers/handler_provider_models_test.go
internal/adapter/registry/profile/factory.go
internal/app/handlers/handler_provider_common.go
internal/app/handlers/server_routes.go

internal/**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

All internal packages must be placed under /internal/ and are not importable by external projects

Files:

internal/app/handlers/handler_proxy.go
internal/app/handlers/handler_unified_models_test.go
internal/core/ports/proxy.go
internal/core/constants/context.go
internal/adapter/proxy/proxy_sherpa.go
internal/adapter/proxy/proxy_olla.go
internal/app/handlers/application.go
internal/adapter/unifier/default_unifier.go
internal/core/constants/endpoint.go
internal/core/domain/profile_config.go
internal/app/handlers/handler_provider_compatibility_test.go
internal/adapter/unifier/model_builder.go
internal/adapter/registry/profile/factory_test.go
internal/util/request.go
internal/core/constants/providers.go
internal/adapter/unifier/model_builder_test.go
internal/app/handlers/handler_provider_openai.go
internal/app/handlers/handler_provider_lmstudio.go
internal/app/handlers/handler_common_test.go
internal/app/handlers/handler_provider_test.go
internal/app/handlers/handler_common.go
internal/adapter/registry/profile/loader.go
internal/app/handlers/handler_provider_generic.go
internal/app/handlers/handler_provider_ollama.go
internal/app/handlers/handler_provider_models_test.go
internal/adapter/registry/profile/factory.go
internal/app/handlers/handler_provider_common.go
internal/app/handlers/server_routes.go

**/*_test.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*_test.go: Unit tests should test individual components in isolation
Integration tests should test full request flow through the proxy
Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

Files:

internal/app/handlers/handler_unified_models_test.go
internal/app/handlers/handler_provider_compatibility_test.go
internal/adapter/registry/profile/factory_test.go
internal/adapter/unifier/model_builder_test.go
internal/app/handlers/handler_common_test.go
test/integration/profile_routing_test.go
internal/app/handlers/handler_provider_test.go
internal/app/handlers/handler_provider_models_test.go

internal/adapter/proxy/**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

internal/adapter/proxy/**/*.go: Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling
Sherpa engine should be a simple, maintainable proxy for moderate traffic
Olla engine should use 64KB default buffer size, while Sherpa uses 8KB
Olla engine should use object pooling to reduce GC pressure and larger buffers for streaming
Olla engine should prevent cascade failures using circuit breakers

Files:

internal/adapter/proxy/proxy_sherpa.go
internal/adapter/proxy/proxy_olla.go

test/scripts/security/**

📄 CodeRabbit Inference Engine (CLAUDE.md)

Security tests should validate rate limiting and size restrictions (see /test/scripts/security/)

Files:

test/scripts/security/test-request-rate-limits.sh
test/scripts/security/test-request-size-limits.sh

🧠 Learnings (29)

📓 Common learnings

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the `proxy` section of `config.yaml`

internal/app/handlers/handler_proxy.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/stats/**/*.go : Automatic cleanup of stale endpoint data in statistics collection

internal/app/handlers/handler_unified_models_test.go (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

default.yaml (3)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the proxy section of config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Endpoint definitions and priorities are configured in the discovery section of config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Use the priority balancer as the recommended load balancing strategy

config/profiles/openai.yaml (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Primary configuration is in config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Endpoint definitions and priorities are configured in the discovery section of config.yaml

config/config.yaml (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the proxy section of config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Primary configuration is in config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Endpoint definitions and priorities are configured in the discovery section of config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Use the priority balancer as the recommended load balancing strategy

internal/core/constants/context.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

internal/adapter/proxy/proxy_sherpa.go (5)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use 64KB default buffer size, while Sherpa uses 8KB

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/stats/**/*.go : Automatic cleanup of stale endpoint data in statistics collection

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

internal/adapter/proxy/proxy_olla.go (8)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use object pooling to reduce GC pressure and larger buffers for streaming

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/stats/**/*.go : Automatic cleanup of stale endpoint data in statistics collection

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use 64KB default buffer size, while Sherpa uses 8KB

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should prevent cascade failures using circuit breakers

internal/app/handlers/application.go (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

test/scripts/load/test-load-chaos.sh (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

internal/core/constants/endpoint.go (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/health/**/*.go : Use circuit breaker pattern for failing endpoints in health checking

internal/app/handlers/handler_provider_compatibility_test.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

internal/adapter/registry/profile/factory_test.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

test/scripts/logic/README.md (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to test/scripts/security/** : Security tests should validate rate limiting and size restrictions (see /test/scripts/security/)

test/scripts/security/test-request-rate-limits.sh (3)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to test/scripts/security/** : Security tests should validate rate limiting and size restrictions (see /test/scripts/security/)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

internal/adapter/unifier/model_builder_test.go (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

test/scripts/security/test-request-size-limits.sh (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to test/scripts/security/** : Security tests should validate rate limiting and size restrictions (see /test/scripts/security/)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

internal/app/handlers/handler_common_test.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

test/scripts/load/test-load-limits.sh (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the proxy section of config.yaml

docs/api/provider-routing.md (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the proxy section of config.yaml

test/integration/profile_routing_test.go (5)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

internal/app/handlers/handler_provider_test.go (5)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

internal/adapter/registry/profile/loader.go (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use object pooling to reduce GC pressure and larger buffers for streaming

internal/app/handlers/handler_provider_ollama.go (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

readme.md (6)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to config.yaml : Proxy engine and load balancer strategy are configured in the proxy section of config.yaml

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use 64KB default buffer size, while Sherpa uses 8KB

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Use the priority balancer as the recommended load balancing strategy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use object pooling to reduce GC pressure and larger buffers for streaming

internal/app/handlers/handler_provider_models_test.go (4)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Unit tests should test individual components in isolation

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Integration tests should test full request flow through the proxy

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/*_test.go : Shared proxy tests should ensure compatibility between both proxy engines

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to **/*_test.go : Benchmark tests should measure performance of critical paths, proxy engine comparisons, connection pooling efficiency, and circuit breaker behavior

internal/app/handlers/handler_provider_common.go (2)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Olla engine should use per-endpoint connection pooling, circuit breakers, and object pooling

internal/app/handlers/server_routes.go (1)

Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-26T12:37:55.605Z
Learning: Applies to internal/adapter/proxy/**/*.go : Sherpa engine should be a simple, maintainable proxy for moderate traffic

🧬 Code Graph Analysis (15)

internal/app/handlers/handler_proxy.go (1)

internal/app/handlers/handler_common.go (1)

NormaliseProviderType (14-25)

internal/core/ports/proxy.go (1)

internal/adapter/unifier/default_unifier.go (1)

Model (12-21)

internal/adapter/proxy/proxy_sherpa.go (1)

internal/adapter/unifier/default_unifier.go (1)

Model (12-21)

internal/app/handlers/application.go (1)

internal/adapter/registry/profile/factory.go (1)

ProfileFactory (11-17)

internal/app/handlers/handler_provider_compatibility_test.go (2)

internal/app/handlers/application.go (1)

Application (61-77)

internal/app/handlers/handler_common.go (1)

NormaliseProviderType (14-25)

internal/adapter/unifier/model_builder_test.go (1)

internal/adapter/unifier/model_builder.go (1)

ModelExtractor (144-144)

internal/app/handlers/handler_provider_openai.go (2)

internal/app/handlers/application.go (1)

Application (61-77)

internal/app/handlers/server.go (2)

ContentTypeHeader (14-14)

ContentTypeJSON (12-12)

internal/app/handlers/handler_provider_lmstudio.go (2)

internal/app/handlers/application.go (1)

Application (61-77)

internal/app/handlers/server.go (2)

ContentTypeHeader (14-14)

ContentTypeJSON (12-12)

test/integration/profile_routing_test.go (1)

internal/adapter/registry/profile/factory.go (1)

NewFactoryWithDefaults (45-47)

internal/app/handlers/handler_provider_test.go (1)

internal/core/constants/endpoint.go (1)

DefaultOllaProxyPathPrefix (5-5)

internal/app/handlers/handler_common.go (4)

internal/core/constants/providers.go (6)

ProviderPrefixLMStudio1 (17-17)

ProviderPrefixLMStudio2 (18-18)

ProviderTypeLMStudio (5-5)

ProviderTypeOllama (4-4)

ProviderTypeOpenAI (6-6)

ProviderTypeVLLM (8-8)

internal/core/constants/endpoint.go (2)

DefaultOllaProxyPathPrefix (5-5)

DefaultPathPrefix (6-6)

internal/app/handlers/application.go (1)

Application (61-77)

internal/core/constants/context.go (1)

OriginalPathKey (7-7)

internal/app/handlers/handler_provider_ollama.go (2)

internal/app/handlers/application.go (1)

Application (61-77)

internal/app/handlers/server.go (2)

ContentTypeHeader (14-14)

ContentTypeJSON (12-12)

test/scripts/logic/test-provider-routing.sh (2)

test/scripts/logic/test-model-routing.sh (3)

banner (65-70)

show_summary (388-451)

main (455-492)

test/scripts/logic/test-provider-models.sh (4)

banner (32-52)

test_endpoint (67-168)

show_summary (289-310)

main (313-343)

internal/app/handlers/handler_provider_common.go (10)

internal/app/handlers/application.go (1)

Application (61-77)

internal/core/domain/routing.go (2)

RequestProfile (20-34)

NewRequestProfile (36-42)

internal/app/handlers/handler_common.go (1)

NormaliseProviderType (14-25)

internal/core/constants/providers.go (3)

ProviderTypeOpenAI (6-6)

ProviderTypeOpenAICompat (7-7)

ProviderTypeVLLM (8-8)

internal/core/domain/profile.go (1)

ProfileOpenAICompatible (6-6)

internal/core/constants/context.go (1)

ProxyPathPrefix (4-4)

internal/core/domain/endpoint.go (1)

Endpoint (21-40)

internal/core/domain/unified_model.go (1)

UnifiedModel (15-31)

internal/adapter/registry/unified_memory_registry.go (1)

UnifiedMemoryModelRegistry (18-25)

internal/core/ports/model_converter.go (1)

ModelFilters (18-23)

internal/app/handlers/server_routes.go (4)

internal/app/handlers/application.go (1)

Application (61-77)

internal/core/constants/endpoint.go (3)

DefaultHealthCheckEndpoint (4-4)

DefaultOllaProxyPathPrefix (5-5)

DefaultPathPrefix (6-6)

internal/core/domain/profile_config.go (1)

ProfileConfig (8-77)

internal/core/constants/providers.go (10)

ProviderTypeOllama (4-4)

ProviderTypeLMStudio (5-5)

ProviderTypeOpenAICompat (7-7)

ProviderTypeOpenAI (6-6)

ProviderPrefixLMStudio2 (18-18)

ProviderPrefixLMStudio1 (17-17)

ProviderPrefixLMStudio3 (19-19)

ProviderTypeVLLM (8-8)

ProviderDisplayLMStudio (12-12)

ProviderDisplayOllama (11-11)

🪛 LanguageTool

test/scripts/logic/README.md

[uncategorized] ~125-~125: When ‘Test-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ...coded results

Flexible Testing - Test specific providers or all providers
**Configu...

(SPECIFIC_HYPHEN)

[style] ~168-~168: ‘with success’ might be wordy. Consider a shorter alternative.
Context: ...onse times

Endpoint usage statistics with success/failure breakdown
Overall success ra...

(EN_WORDINESS_PREMIUM_WITH_SUCCESS)

docs/api/README.md

[grammar] ~87-~87: A verb may be missing.
Context: ... - 404 - Not found (model or endpoint not available)

429 - Too many requests...

(NN_NOT_JJ)

docs/api/provider-routing.md

[grammar] ~163-~163: A determiner may be missing.
Context: ...d load balancing strategy (round-robin, least connections, priority) is applied. This...

(THE_SUPERLATIVE)

docs/adding-providers.md

[uncategorized] ~59-~59: The preposition ‘to’ seems more likely in this position.
Context: ... will automatically:

Register routes for /olla/vllm/
Create model discovery...

(AI_HYDRA_LEO_REPLACE_FOR_TO)

[uncategorized] ~60-~60: The preposition ‘to’ seems more likely in this position.
Context: ...m/`

Create model discovery endpoints at /olla/vllm/v1/models
Set up the p...

(AI_HYDRA_LEO_REPLACE_AT_TO)

[style] ~112-~112: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 2019 characters long)
Context: ... priority: 90


3. Run Olla - done!

## Benefits

- **No Code Changes**...

(EN_EXCESSIVE_EXCLAMATION)

</details>
<details>
<summary>readme.md</summary>

[grammar] ~366-~366: A determiner may be missing.
Context: ...ight strategy for your setup:

### 📊 Least Connections (`least-connections`) - **R...

(THE_SUPERLATIVE)

---

[grammar] ~367-~367: A determiner may be missing.
Context: ...ommended**
Routes to the endpoint with least active requests. Ideal for:
- **Mixed ...

(THE_SUPERLATIVE)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>test/scripts/logic/README.md</summary>

172-172: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>
<details>
<summary>docs/api/provider-routing.md</summary>

24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

69-69: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

127-127: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

138-138: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

148-148: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

155-155: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>
<details>
<summary>🪛 YAMLlint (1.37.1)</summary>

<details>
<summary>config/profiles/vllm.yaml</summary>

[error] 1-1: wrong new line character: expected \n

(new-lines)

---

[error] 63-63: no new line character at the end of file

(new-line-at-end-of-file)

</details>

</details>
<details>
<summary>🪛 Shellcheck (0.10.0)</summary>

<details>
<summary>test/scripts/logic/test-provider-models.sh</summary>

[warning] 89-89: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 103-103: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 113-113: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 123-123: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 139-139: Quote this to prevent word splitting.

(SC2046)

---

[warning] 191-191: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 204-204: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 211-211: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 218-218: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 225-225: Declare and assign separately to avoid masking return values.

(SC2155)

</details>
<details>
<summary>test/scripts/logic/test-provider-routing.sh</summary>

[warning] 12-12: BLUE appears unused. Verify use (or export if used externally).

(SC2034)

---

[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 188-188: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 189-189: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 193-193: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 194-194: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 198-198: Declare and assign separately to avoid masking return values.

(SC2155)

---

[warning] 199-199: Declare and assign separately to avoid masking return values.

(SC2155)

</details>

</details>
<details>
<summary>🪛 Ruff (0.12.2)</summary>

<details>
<summary>test/scripts/logic/test-model-routing-provider.py</summary>

8-8: `json` imported but unused

Remove unused import: `json`

(F401)

---

12-12: `typing.Dict` imported but unused

Remove unused import: `typing.Dict`

(F401)

---

150-150: Loop control variable `i` not used within loop body

Rename unused `i` to `_i`

(B007)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (85)</summary><blockquote>

<details>
<summary>test/scripts/logic/.gitignore (1)</summary>

`1-2`: **Good inclusion of local artefact ignores**

Ignoring the virtual-env and temp directories will keep the repo clean and CI runs deterministic.

</details>
<details>
<summary>config/profiles/ollama.yaml (1)</summary>

`7-10`: **Routing prefix entry looks correct**

The `routing.prefixes` section aligns with the new provider-specific router and uses a lower-case slug, matching the normalisation logic in the profile factory.

</details>
<details>
<summary>internal/core/ports/proxy.go (1)</summary>

`46-53`: **Addition of `Model` field looks good – verify all producers/consumers set it**

The extra `Model string` field is placed sensibly after `RequestID`, keeping the two most-frequent string values adjacent and avoiding any regressive padding issues.  

Just ensure every path that instantiates `RequestStats` (handlers, tests, etc.) populates the new field – otherwise log output will show an empty `model` attribute, reducing the value of the change.

</details>
<details>
<summary>config/config.yaml (1)</summary>

`35-35`: **Config key rename requires code-side support**

`load_balancer` value has moved from `"least_conn"` → `"least-connections"`. Please confirm the configuration loader accepts the new hyphenated string; otherwise the proxy will silently fall back to its default strategy.

</details>
<details>
<summary>internal/adapter/registry/profile/loader.go (3)</summary>

`116-116`: **LGTM! Clean routing prefix assignment for Ollama profile.**

The explicit assignment of routing prefixes aligns perfectly with the new provider-specific routing architecture. This enables requests to be routed to `/olla/ollama/*` endpoints.

---

`214-214`: **Excellent support for multiple alias variations.**

The LM Studio profile correctly supports multiple routing prefix variations (`lmstudio`, `lm-studio`, `lm_studio`), which provides flexibility for users and maintains compatibility with different naming conventions.

---

`264-264`: **Appropriate dual prefix support for OpenAI compatibility.**

The OpenAI-compatible profile includes both `openai` and `openai-compatible` prefixes, which correctly reflects the dual nature of this provider type.

</details>
<details>
<summary>config/profiles/lmstudio.yaml (1)</summary>

`7-12`: **Well-structured routing configuration with comprehensive prefix coverage.**

The routing section properly defines all LM Studio naming variations, ensuring users can access the provider through multiple intuitive URL patterns (`/olla/lmstudio/*`, `/olla/lm-studio/*`, `/olla/lm_studio/*`).

</details>
<details>
<summary>config/profiles/openai.yaml (1)</summary>

`7-11`: **Clear and logical routing prefix configuration.**

The dual prefix approach (`openai` and `openai-compatible`) provides both concise and descriptive routing options, enhancing user experience whilst maintaining clarity about the provider's compatibility layer.

</details>
<details>
<summary>internal/adapter/unifier/default_unifier.go (2)</summary>

`148-148`: **Enhanced platform detection with endpoint type context.**

The addition of `endpoint.Type` as a third parameter to `DetectPlatform` provides better context for platform detection logic, improving accuracy in provider identification.

---

`221-221`: **Consistent parameter addition for platform detection.**

The method call correctly includes the endpoint type parameter, maintaining consistency with the enhanced `DetectPlatform` signature throughout the codebase.

</details>
<details>
<summary>internal/app/handlers/handler_proxy.go (2)</summary>

`96-96`: **Approve model tracking enhancement in request stats.**

This change properly sets the `Model` field in request stats when a profile contains a model name, which will improve observability and logging throughout the proxy request lifecycle. The field assignment is appropriately guarded by the existing null checks.

---

`210-212`: **Provider type normalisation ensures consistent compatibility checks.**

The addition of provider type normalisation before profile compatibility checking is a solid improvement. This handles provider name variations (e.g., "lmstudio" → "lm-studio") and ensures consistent matching with profile configurations. The implementation correctly uses the `NormaliseProviderType` function from `handler_common.go` as shown in the relevant code snippets.

</details>
<details>
<summary>internal/app/handlers/application.go (2)</summary>

`73-73`: **Proper integration of profile factory into application struct.**

The addition of the `profileFactory` field correctly follows the existing struct field alignment and uses the appropriate interface type as defined in the relevant code snippets.

---

`141-141`: **Profile factory properly initialised in constructor.**

The profile factory is correctly assigned during application initialisation, maintaining consistency with the existing dependency injection pattern used throughout the constructor.

</details>
<details>
<summary>test/scripts/security/test-request-rate-limits.sh (1)</summary>

`17-18`: **Dynamic provider routing properly implemented for security testing.**

The addition of the `PROVIDER` environment variable with sensible default and the dynamic `PROXY_ENDPOINT` correctly adapts the security test to the new provider-specific routing architecture. This maintains backward compatibility whilst enabling testing across different provider endpoints (ollama, lmstudio, openai, vllm).

</details>
<details>
<summary>test/scripts/security/test-request-size-limits.sh (1)</summary>

`20-21`: **Consistent provider routing implementation in size limit testing.**

The implementation correctly follows the same pattern as the rate limit test script, using the `PROVIDER` environment variable with appropriate default and updating the endpoint to match the new provider-specific routing scheme. This ensures consistent testing across all security validation scripts.

</details>
<details>
<summary>test/scripts/load/test-load-chaos.sh (1)</summary>

`50-52`: **LGTM! Provider-specific routing implementation looks good.**

The introduction of the `PROVIDER` environment variable and dynamic proxy path construction aligns perfectly with the PR's provider-specific routing architecture. This change enables testing different backend providers whilst maintaining backward compatibility.

</details>
<details>
<summary>internal/adapter/unifier/model_builder.go (2)</summary>

`219-220`: **Enhanced platform detection with explicit endpoint type support.**

The method signature change adds valuable functionality by allowing explicit platform specification via `endpointType` parameter, which supports the provider-specific routing enhancements in this PR.

---

`234-241`: **Well-implemented endpoint type normalisation.**

The normalisation logic properly handles common provider naming variations (e.g., "lm-studio" → "lmstudio") by removing hyphens and underscores whilst converting to lowercase. This ensures consistent platform identification across different naming conventions.

</details>
<details>
<summary>internal/util/request.go (2)</summary>

`67-72`: **Good refactoring to improve code modularity.**

The delegation to the new `StripPrefix` helper function maintains the original behaviour whilst improving code organisation and reusability.

---

`74-83`: **Well-implemented prefix stripping logic.**

The extracted helper function correctly handles edge cases:
- Checks if path starts with the prefix
- Ensures the resulting path starts with "/" if necessary
- Returns the original path if no prefix match

This modular approach supports the provider-specific routing changes throughout the PR.

</details>
<details>
<summary>test/scripts/logic/test-model-routing.sh (2)</summary>

`33-35`: **Provider-specific routing correctly implemented.**

The introduction of the `PROVIDER` variable with a sensible default and dynamic proxy endpoint construction aligns perfectly with the PR's routing architecture changes.

---

`77-98`: **Excellent documentation updates.**

The usage instructions are comprehensive and clearly document:
- The new PROVIDER variable and its possible values
- Practical examples for different providers
- Clear explanations of the functionality

This will help users understand and test the new provider-specific routing features.

</details>
<details>
<summary>test/scripts/load/test-load-limits.sh (2)</summary>

`36-37`: **Consistent provider-specific routing implementation.**

The `PROVIDER` variable introduction and dynamic proxy endpoint construction matches the pattern established in other test scripts and supports the PR's provider-specific routing architecture.

---

`88-90`: **Clear documentation for new functionality.**

The environment variables section properly documents the new `PROVIDER` variable with its default value and possible options, helping users understand how to test different providers.

</details>
<details>
<summary>internal/adapter/registry/profile/factory_test.go (1)</summary>

`77-117`: **Comprehensive test coverage for routing prefix validation.**

The test thoroughly covers the new routing prefix functionality with good test case organisation. The table-driven approach tests direct profile names, routing prefixes (including LM Studio variations), the auto profile, and edge cases like unknown providers.

</details>
<details>
<summary>internal/core/constants/providers.go (1)</summary>

`1-21`: **Well-organised provider constants with clear naming conventions.**

The constants are properly grouped and follow Go naming conventions. The multiple LM Studio prefix variations support the alias resolution system effectively.

</details>
<details>
<summary>internal/app/handlers/handler_provider_compatibility_test.go (1)</summary>

`14-121`: **Comprehensive compatibility test coverage.**

The test cases thoroughly cover the provider compatibility matrix, including edge cases for unknown providers and various provider/endpoint type combinations.

</details>
<details>
<summary>internal/adapter/unifier/model_builder_test.go (1)</summary>

`7-90`: **Thorough test coverage for platform detection logic.**

The test comprehensively covers platform detection scenarios including metadata hints, version keys, endpoint type normalization, and fallback behaviour. The table-driven approach ensures good test organisation and coverage.

</details>
<details>
<summary>internal/app/handlers/handler_provider_openai.go (2)</summary>

`8-30`: **Clean implementation following established handler patterns.**

The handler properly implements the OpenAI-compatible model listing endpoint with appropriate error handling, content type headers, and response encoding. The documentation clearly explains the purpose for local inference servers.

---

`32-34`: **Good explanatory comment about OpenAI API standardisation.**

This comment provides valuable context about why OpenAI API compatibility is important in the local inference ecosystem.

</details>
<details>
<summary>docs/user-guide.md (4)</summary>

`157-167`: **LGTM! Provider-specific routing examples are clear and consistent.**

The updated Python examples correctly demonstrate the new provider-specific base URLs, providing clear examples for both Ollama and LM Studio backends.

---

`189-199`: **LGTM! JavaScript examples align with the new routing architecture.**

The JavaScript examples properly demonstrate the provider-specific routing for both Ollama and OpenAI-compatible backends.

---

`214-235`: **LGTM! cURL examples demonstrate provider-specific endpoints effectively.**

The cURL examples showcase the new provider-specific routing for both Ollama and LM Studio, with appropriate model filtering examples.

---

`240-253`: **LGTM! Ollama Native API section correctly reflects the new routing.**

The section rename from "Ollama Compatibility" to "Ollama Native API" is appropriate and the examples correctly use the provider-specific Ollama endpoints.

</details>
<details>
<summary>internal/app/handlers/handler_provider_lmstudio.go (2)</summary>

`8-29`: **LGTM! Clean and consistent handler implementation.**

The `lmstudioOpenAIModelsHandler` follows good patterns with proper error handling, content type setting, and JSON encoding. The use of shared helper methods promotes code reuse.

---

`31-54`: **LGTM! Enhanced models handler is well-implemented.**

The `lmstudioEnhancedModelsHandler` correctly uses the "lmstudio" format converter for enhanced metadata and follows the same consistent error handling pattern.

</details>
<details>
<summary>config/profiles/vllm.yaml (2)</summary>

`8-11`: **LGTM! Routing configuration is correctly defined.**

The routing configuration with the "vllm" prefix aligns well with the provider-specific routing architecture.

---

`13-22`: **LGTM! API compatibility settings are comprehensive.**

The API configuration correctly specifies OpenAI compatibility and includes all necessary endpoints for vLLM functionality.

</details>
<details>
<summary>internal/app/handlers/handler_provider_generic.go (2)</summary>

`8-34`: **LGTM! Generic models handler is well-implemented.**

The `genericProviderModelsHandler` follows good patterns with proper provider normalisation, error handling, and consistent response formatting. The use of a closure to return an `http.HandlerFunc` is appropriate for this use case.

---

`36-71`: **LGTM! Model show handler has proper validation and error handling.**

The `genericModelShowHandler` correctly validates the HTTP method, parses JSON input safely, validates required fields, and provides appropriate error responses with correct status codes.

</details>
<details>
<summary>docs/api/README.md (3)</summary>

`1-96`: **LGTM! Comprehensive and well-structured API documentation.**

The API reference provides excellent coverage of all endpoints, features, and functionality. The organisation is logical and the content accurately reflects the new provider-specific routing architecture.

---

`11-28`: **LGTM! Provider-specific routing documentation is clear and complete.**

The provider-specific routing section clearly explains the new namespace-based approach and lists all relevant endpoints for each provider type.

---

`84-91`: **Grammar is correct - static analysis false positive.**

The sentence structure "Not found (model or endpoint not available)" is grammatically correct in the context of HTTP status code descriptions. The static analysis tool has flagged a false positive.

</details>
<details>
<summary>test/integration/profile_routing_test.go (1)</summary>

`56-90`: **Test structure follows good patterns**

The test comprehensively covers valid and invalid provider scenarios, including routing prefix variations. The test data structure and assertions are clear and maintainable.

</details>
<details>
<summary>internal/app/handlers/handler_provider_test.go (1)</summary>

`96-149`: **Path stripping tests are well-structured**

The path stripping logic tests cover important edge cases including root paths and trailing slashes. The test implementation correctly mirrors the expected path manipulation logic.

</details>
<details>
<summary>readme.md (2)</summary>

`186-186`: **Load balancer naming standardisation looks good**

The change from underscore-separated to hyphen-separated load balancer strategy names (e.g., "round_robin" → "round-robin") improves naming consistency across the configuration.

---

`398-456`: **Provider-specific routing documentation is comprehensive**

The new provider-specific routing examples clearly demonstrate the namespace-based approach (`/olla/ollama/*`, `/olla/lmstudio/*`, etc.) and provide practical code examples for both Python OpenAI clients and curl commands.

</details>
<details>
<summary>docs/adding-providers.md (1)</summary>

`1-63`: **Excellent documentation for dynamic provider system**

The guide clearly explains how the new dynamic route registration system works and provides complete, practical examples. The step-by-step process makes it easy for users to add new providers without code changes.

</details>
<details>
<summary>internal/app/handlers/handler_common.go (5)</summary>

`14-25`: **Provider normalisation logic is well-designed**

The normalisation function correctly handles LM Studio's multiple naming variants and uses constants for consistency. The special case handling ensures all variations map to the canonical form.

---

`29-51`: **Path extraction logic handles edge cases well**

The function correctly handles various URL formats including paths without trailing segments. The use of constants and normalisation ensures consistency with the rest of the system.

---

`55-66`: **Good practice preserving original path in context**

Storing the original path in the request context before modification is excellent for debugging and logging purposes. The handling of both `Path` and `RawPath` is thorough.

---

`69-88`: **Provider validation with appropriate fallback**

The validation logic correctly uses the profile factory when available and falls back to static validation for tests. The fallback list is consistent with the constants, ensuring reliability.

---

`91-95`: **Simple and effective prefix generation**

The function correctly constructs provider prefixes whilst preserving the original provider name format for compatibility.

</details>
<details>
<summary>internal/app/handlers/handler_provider_models_test.go (3)</summary>

`1-18`: **LGTM: Clean import structure**

The imports are well-organised and follow proper Go conventions for testing. All dependencies are necessary and correctly scoped to internal packages.

---

`98-216`: **Excellent comprehensive test coverage**

The table-driven test approach effectively covers all provider-specific model endpoints with proper format validation. The test cases clearly verify:
- Ollama native and OpenAI-compatible formats
- LM Studio OpenAI and enhanced formats  
- OpenAI provider aggregation across all endpoints
- vLLM OpenAI-compatible format

Each test properly validates JSON structure and model counts.

---

`218-436`: **Thorough provider filtering and format validation tests**

Both `TestProviderModelFiltering` and `TestUnifiedModelsFormatFiltering` provide essential coverage for:
- Provider isolation ensuring endpoints only return models from their specific provider type
- Format parameter filtering on unified endpoints
- Correct JSON structure validation for each format type

The tests properly validate that provider-specific endpoints maintain isolation whilst the unified endpoint supports multiple output formats with appropriate filtering.

</details>
<details>
<summary>docs/api/provider-routing.md (1)</summary>

`1-274`: **Comprehensive and well-structured documentation**

This documentation effectively explains the new provider-specific routing architecture with:
- Clear architectural overview with explicit URL patterns
- Detailed endpoint specifications with response format examples
- Practical configuration examples
- Comprehensive use cases demonstrating real-world applications
- Proper distinction between intercepted and proxied endpoints

The content aligns well with the implementation and provides excellent guidance for users migrating to the new routing system.

</details>
<details>
<summary>test/scripts/logic/test-provider-models.sh (2)</summary>

`20-96`: **Well-structured and comprehensive test script**

The script demonstrates excellent design with:
- Clear separation of concerns with dedicated functions for different test types
- Comprehensive provider coverage (Ollama, LM Studio, OpenAI, vLLM)
- Proper JSON format validation for each provider's expected response structure
- Good error handling with descriptive messages and appropriate HTTP status code interpretation
- Professional colour-coded output with progress tracking

The main execution flow properly validates prerequisites and provides helpful usage information.



Also applies to: 247-344

---

`67-168`: **Robust test validation logic**

Both `test_endpoint` and `test_unified_endpoint` functions implement solid validation:
- Appropriate format-specific JSON structure validation
- Correct model counting using grep pattern matching
- Proper handling of different response formats (Ollama, OpenAI, LM Studio)
- Good error reporting with format-specific error messages
- Support for verbose output showing response samples

The unified endpoint testing properly validates format parameter filtering behaviour.



Also applies to: 170-245

</details>
<details>
<summary>test/scripts/logic/test-provider-routing.sh (2)</summary>

`108-180`: **Excellent proxy routing validation logic**

The `test_proxy_routing` function implements comprehensive routing validation:
- Proper POST request testing with realistic JSON payload
- Header validation for routing confirmation (`X-Olla-Endpoint`, `X-Olla-Backend-Type`)
- Smart provider name normalisation handling (e.g., `lm-studio` vs `lmstudio`)
- Appropriate HTTP status code interpretation (404 for no providers, 2xx/4xx/5xx for processed requests)
- Clean temporary file handling for response headers

This thoroughly validates that the provider-specific routing works correctly.

---

`235-293`: **Well-structured script with proper execution flow**

The main function demonstrates good bash scripting practices:
- Comprehensive help text with usage examples
- Proper prerequisite validation (curl availability, Olla health check)
- Logical test execution sequence (model endpoints, then proxy routing, then comparison)
- Clean summary reporting with coloured output

The script provides a thorough validation of the provider-specific routing functionality.

</details>
<details>
<summary>test/scripts/logic/README.md (1)</summary>

`1-259`: **Comprehensive and well-organised test documentation**

This README provides excellent coverage of the logic test scripts with:
- Clear overview table summarising each script's purpose and key features
- Detailed sections for each script with practical usage examples
- Comprehensive endpoint coverage documentation showing what each script tests
- Proper documentation of environment variables and requirements
- Helpful guidance for running all tests sequentially including Python dependencies
- Standard exit code conventions for CI/CD integration

The documentation accurately reflects the implemented test scripts and provides valuable guidance for users and developers.

</details>
<details>
<summary>internal/adapter/registry/profile/factory.go (5)</summary>

`11-23`: **LGTM! Well-structured interface extension and thread-safe implementation.**

The addition of `NormalizeProviderName` to the interface and the `prefixLookup` map with mutex protection demonstrates good design for concurrent access patterns.

---

`25-43`: **Good optimisation with pre-computed prefix mappings.**

Pre-computing the prefix lookup table at factory creation enables O(1) provider resolution, which is an excellent performance optimisation.

---

`109-126`: **Well-designed validation with prefix lookup fallback.**

The two-stage validation approach (prefix lookup followed by exact match) provides good flexibility for handling provider name variations while maintaining backward compatibility.

---

`128-141`: **Clean implementation of provider name normalisation.**

The method correctly uses the prefix lookup for alias resolution and sensibly returns unknown names unchanged, allowing for future extensibility.

---

`143-162`: **Efficient routing table construction from profiles.**

Good implementation that extracts routing prefixes from YAML configs and includes profile names as implicit prefixes for convenience.

</details>
<details>
<summary>internal/app/handlers/handler_common_test.go (4)</summary>

`11-49`: **Well-structured mock implementation for testing.**

The mock ProfileFactory provides appropriate test doubles for the interface methods, with simplified but representative normalisation logic.

---

`51-79`: **Comprehensive test coverage for provider normalisation.**

Excellent table-driven test with thorough coverage of edge cases including various lmstudio variants, case sensitivity, and other provider types.

---

`81-108`: **Thorough testing of provider path extraction.**

Well-designed test cases covering successful extraction, normalisation, edge cases, and error conditions.

---

`110-164`: **Excellent dual-scenario testing for provider support.**

The test effectively covers both fallback behaviour (without factory) and factory-based validation, ensuring the method works correctly in different deployment scenarios.

</details>
<details>
<summary>internal/app/handlers/handler_provider_ollama.go (3)</summary>

`30-39`: **Appropriate handling of unsupported operation with clear documentation.**

Good decision to return 501 with clear comments explaining the challenges of aggregating model details across multiple instances.

---

`41-50`: **Consistent handling of unsupported running models endpoint.**

The implementation correctly returns 501 with clear explanation of the state synchronisation challenges.

---

`75-80`: **Good reusable handler for unsupported operations.**

Clean implementation that clearly communicates the limitation of model management operations in a distributed proxy environment.

</details>
<details>
<summary>test/scripts/logic/test-model-routing-provider.py (2)</summary>

`30-371`: **Excellent test implementation with comprehensive coverage.**

The `ProviderTester` class is well-designed with thorough error handling, detailed statistics tracking, and user-friendly colored output. The test coverage for provider-specific endpoints is comprehensive.

---

`372-421`: **Well-structured test orchestration.**

The main function provides clean argument parsing, sensible defaults, and logical test workflow orchestration.

</details>
<details>
<summary>internal/app/handlers/handler_provider_common.go (5)</summary>

`14-52`: **Well-designed provider profile creation with OpenAI compatibility.**

The implementation correctly handles OpenAI's inclusive routing model and provides appropriate fallbacks for test scenarios. The dynamic inclusion of OpenAI-compatible providers is a thoughtful design choice.

---

`54-105`: **Solid implementation of provider-scoped proxy handler.**

Good use of context for passing provider information through the request lifecycle. Appropriate error handling for invalid paths and unsupported providers.

---

`107-130`: **Efficient endpoint filtering with smart reuse of existing logic.**

Good design that leverages the existing RequestProfile filtering mechanism for provider-specific endpoint selection, with appropriate two-stage filtering.

---

`180-205`: **Well-structured model fetching pipeline.**

Clean implementation with logical flow: unified models → health filtering → provider filtering. Good error handling with descriptive messages.

---

`207-222`: **Clean format conversion using factory pattern.**

Good use of the converter factory for format-specific transformations. The implementation is clean and follows single responsibility principle.

</details>
<details>
<summary>internal/app/handlers/server_routes.go (2)</summary>

`27-49`: **Well-structured route registration with clear separation of concerns.**

The route registration follows a logical hierarchy: internal endpoints → unified models → legacy compatibility → provider-specific routes. The comments effectively document the purpose of each route group.

---

`233-242`: **Clean handling of provider display names.**

The display name formatting provides clear, human-readable route descriptions that will be helpful in logs and debugging. The special handling for LM Studio's multiple prefixes is well implemented.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

config/profiles/vllm.yaml

internal/app/handlers/handler_provider_compatibility_test.go

internal/app/handlers/handler_provider_test.go

internal/app/handlers/handler_unified_models_test.go

test/scripts/logic/test-provider-routing.sh

thushan added 10 commits July 27, 2025 11:54

first attempt at API redesign

5d79680

update docs

c4f2719

adds two new scripts to check formats and compatibility (Claude created)

docs and scripts updates.

bae025b

adds back sherpa compatibility layer

2ce45e4

consolidates naming for load balancers

refactor and use constants

e5ea155

update scripts fix tests support lm_studio variations in API (not finalised yet). update tests update more tests

trying an alternative method of reducing tracking profile compatibility

724bed1

testing across all endpoints

b7fcfe6

simplify the regstration of routes, part one of two.

87cc3ee

simplification part 2

c2191eb

coderabbitai bot reviewed Jul 27, 2025

View reviewed changes

coderabbit comment feedback.

0103cf8

thushan merged commit d03fe40 into main Jul 27, 2025
3 checks passed

thushan deleted the feature/api-redesign branch July 27, 2025 12:59

coderabbitai bot mentioned this pull request Aug 7, 2025

feat: backend/vllm #44

Merged

This was referenced Aug 23, 2025

feat: filtering adapter #57

Merged

chore: Consolidate Converters #58

Merged

This was referenced Sep 23, 2025

refactor: Proxy Configurations #59

Merged

feat: backend/sglang #69

Merged

coderabbitai bot mentioned this pull request Oct 8, 2025

feat: backend/lemonade #70

Merged

coderabbitai bot mentioned this pull request Oct 20, 2025

feat: Anthropic Message format Support #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: API Redesign #33

feat: API Redesign #33

Uh oh!

thushan commented Jul 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 27, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: API Redesign #33

feat: API Redesign #33

Uh oh!

Conversation

thushan commented Jul 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Provider-Specific Routing

Dynamic Route Registration

Enhanced Model Discovery

Improved Code Organisation

Testing

Documentation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thushan commented Jul 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 27, 2025 •

edited

Loading