-
-
Notifications
You must be signed in to change notification settings - Fork 11
feat: backend/litellm #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds native LiteLLM backend support: introduces a new LiteLLM profile, updates discovery endpoints and priorities, and expands documentation and site navigation to include LiteLLM across API reference, integrations, comparisons, usage, index and README. Minor link, metadata and “Since” table updates across other backend docs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Olla as Olla Router
participant Prof as Profile Resolver
participant HC as Health Monitor
participant LLM as LiteLLM Gateway
participant Cloud as Cloud Providers
Client->>Olla: OpenAI-compatible request (e.g. /v1/chat/completions)
Olla->>Prof: Resolve backend/profile & capabilities
Prof-->>Olla: Select `litellm` endpoint
HC->>Olla: Provide health/priority status
Olla->>LLM: Forward OpenAI-style request
LLM->>Cloud: Translate & route to provider API
Cloud-->>LLM: Provider response / stream
LLM-->>Olla: Normalised response / stream
Olla-->>Client: Return response with X-Olla-* headers
note over Olla,LLM: If LiteLLM unhealthy, Olla applies fallback per priorities
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docs/content/concepts/profile-system.md (1)
78-105: Verify header ID support and update residual linkConfirmed that
docs/mkdocs.ymlenables header IDs via theattr_listandtocextensions (lines 61–68), so the explicit{#routing-prefixes}anchor will render correctly. Whilepymdownx.headeranchorisn’t configured, theattr_listextension suffices for adding custom IDs.There is one remaining relative link that should be updated to maintain deep‐link consistency:
- docs/content/integrations/overview.md (line 33):
* [Remove prefixes](../concepts/profile-system.md#routing-prefixes) you don't usePlease review and update this link (and any others) as needed to point to the new anchor.
docs/content/compare/integration-patterns.md (3)
64-79: Update the Hybrid Cloud example to use the native LiteLLM backend and corrected port.The example still shows
type: openaiand port 8000 for LiteLLM, which conflicts with the new native backend guidance and the PR’s move to port 4000.- name: cloud-litellm - url: http://litellm:8000 - priority: 10 # Only when local unavailable - type: openai + url: http://litellm:4000 + priority: 10 # Only when local unavailable + type: litellmWhy: using the dedicated
litellmtype enables correct health checks and discovery behaviour; the port aligns with the updated defaults in this PR.
297-299: Fix load balancer enum to match config:least-connectionsnotleast_connections.Docs currently show an invalid value which will cause confusion/copy-paste errors.
- load_balancer: least_connections + load_balancer: least-connectionsWhy: the configuration schema uses hyphenated enums (see
config/config.yaml).
391-393: Same issue: useround-robinnotround_robin.Keep enum values consistent with the config schema to avoid misconfiguration.
- load_balancer: round_robin # For even distribution + load_balancer: round-robin # For even distribution ```<!-- review_comment_end --> </blockquote></details> </blockquote></details>🧹 Nitpick comments (35)
docs/content/concepts/profile-system.md (2)
229-245: Anchor for “Capability Detection” added—nice; consider future-proofing with aliases.This will help cross-page references. If you anticipate linking using alternate phrasings (e.g., “capabilities” or “model-capabilities”), you might optionally add secondary anchors to avoid future churn.
Example (optional):
-### Capability Detection {#capability-detection} +### Capability Detection {#capability-detection #capabilities #model-capabilities}
399-429: Anchor for “Creating Custom Profiles” is helpful; consider adding a brief “why” note.A one-liner explaining why custom profiles are useful (e.g., override vendor defaults safely) would align with our style of explaining “why” in docs and aid first‑time readers landing directly on this anchor.
Suggested tweak:
-## Creating Custom Profiles {#creating-custom-profiles} +## Creating Custom Profiles {#creating-custom-profiles} +Create custom profiles to tailor routing, safety allowlists, and discovery without forking the codebase—useful for environment‑specific policies.CLAUDE.md (2)
25-27: Project tree now shows litellm.yaml—good; consider linking to it.Minor improvement: link to the file in GitHub as done for other profiles in integration pages to aid discoverability.
Example tweak:
-│ │ ├── litellm.yaml # LiteLLM gateway configuration +│ │ ├── litellm.yaml # LiteLLM gateway configuration (see repo for latest)
88-94: Header list updated; clarify Response-Time is sent as a trailer.Docs elsewhere note
X-Olla-Response-Timeis a trailer. Add the trailer note here for consistency and to set user expectations for streaming responses.Proposed edit:
- `X-Olla-Backend-Type`: ollama/openai/lmstudio/vllm/litellm - `X-Olla-Request-ID`: Request ID -- `X-Olla-Response-Time`: Total processing time +- `X-Olla-Response-Time`: Total processing time (sent as a trailer for streaming responses)config/config.yaml (2)
81-88: LM Studio default port fix looks right; consider aligning health probe semantics.Changing LM Studio to http://localhost:1234 matches typical defaults and removes confusion with 11234. One ask: LM Studio’s health behaviour can vary; if "/" occasionally 200s even when the API isn’t ready, a more explicit probe (e.g., a lightweight GET that exercises
/v1/models) provides stronger signal and avoids flapping during startup.Proposed tweak:
- - url: "http://localhost:1234" + - url: "http://localhost:1234" name: "local-lm-studio" type: "lm-studio" priority: 100 model_url: "/v1/models" - health_check_url: "/" + health_check_url: "/v1/models" check_interval: 2s check_timeout: 1sIf you prefer to keep
/, consider increasingcheck_intervalto reduce noisy transitions on cold starts. Why: probing the actual models path prevents false-positive “healthy” states while the model list is not yet served.
89-96: vLLM host fix to localhost is sensible; confirm health path and intervals under load.The shift to http://localhost:8000 is safer for local-first setups. Confirm
/healthis the correct vLLM probe in your deployment mode (it is for the OpenAI-compatible server). Given GPU initialisation time, the 5s/2s cadence may still mark it down briefly on startup; consider a slightly longercheck_timeout(3–5s) if you see transient flaps.docs/content/compare/overview.md (1)
46-46: Nice: clear guidance to use native LiteLLM backend. Verify doc link path.The relative link
[LiteLLM](./litellm.md)assumescompare/litellm.mdexists. If the dedicated LiteLLM compare page lives elsewhere, adjust the path to avoid a 404 in MkDocs.docs/content/api-reference/overview.md (1)
91-95: Include ‘litellm’ in backend-type header docs to match the new section.The response-headers table still lists
ollama/lmstudio/openai/vllm. Addlitellmfor consistency with the new API section. Also consider adding “litellm api” to page keywords to improve searchability.- | `X-Olla-Backend-Type` | Provider type (ollama/lmstudio/openai/vllm) | + | `X-Olla-Backend-Type` | Provider type (ollama/lmstudio/openai/vllm/litellm) |Optional keywords tweak (Line 4):
- keywords: ["olla api", "llm proxy api", "rest api", "ollama api", "lm studio api", "vllm api", "openai api", "system endpoints"] + keywords: ["olla api", "llm proxy api", "rest api", "ollama api", "lm studio api", "vllm api", "litellm api", "openai api", "system endpoints"] ```<!-- review_comment_end --> </blockquote></details> <details> <summary>docs/content/compare/integration-patterns.md (3)</summary><blockquote> `128-145`: **Use proper heading levels instead of bold for options (MD036).** Bold text as a pseudo-heading trips linters and hurts skimmability. Convert to `####` headings. ```diff -**Option 1: Native Integration with LiteLLM (Recommended)** +#### Option 1: Native integration with LiteLLM (recommended)Why: proper headings improve navigation, anchors, and ToC generation in MkDocs.
147-160: Do the same for Option 2 heading (MD036).Consistent heading levels keep the section structure clear and linter-clean.
-**Option 2: Redundant LiteLLM Instances with Native Support** +#### Option 2: Redundant LiteLLM instances with native support ```<!-- review_comment_end --> --- `345-351`: **Docker Compose port map should reflect 4000 if that’s the new LiteLLM default.** Elsewhere in this PR, LiteLLM endpoints use port 4000. Align the Compose example to avoid surprises. ```diff litellm: image: ghcr.io/berriai/litellm:latest environment: - OPENAI_API_KEY=${OPENAI_API_KEY} ports: - - "8000:8000" + - "4000:4000"Why: consistent defaults reduce setup friction and mismatched discovery.
docs/content/api-reference/litellm.md (2)
97-103: Add a language to the streaming SSE code fence (MD040).Specifying a language keeps linters quiet and improves rendering.
-``` +```text data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]} data: [DONE]--- `228-247`: **Tidy provider model lists to avoid punctuation glitches and improve scanability.** LanguageTool reports double-comma artefacts; adding spacing and keeping one model per bullet (or splitting the embeddings into a separate list) avoids these. Also ensure consistent capitalisation and Australian English where applicable. ```diff -### OpenAI Models -- `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo` -- `text-embedding-ada-002`, `text-embedding-3-small` +### OpenAI models +- `gpt-4` +- `gpt-4-turbo` +- `gpt-3.5-turbo` +- `text-embedding-ada-002` +- `text-embedding-3-small` -### Anthropic Models -- `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku` -- `claude-2.1`, `claude-2`, `claude-instant` +### Anthropic models +- `claude-3-opus` +- `claude-3-sonnet` +- `claude-3-haiku` +- `claude-2.1` +- `claude-2` +- `claude-instant` -### Google Models -- `gemini-pro`, `gemini-pro-vision` -- `palm-2`, `chat-bison` +### Google (Gemini) models +- `gemini-pro` +- `gemini-pro-vision` +- `palm-2` +- `chat-bison` -### AWS Bedrock Models -- `bedrock/claude-3-opus`, `bedrock/claude-3-sonnet` -- `bedrock/llama2-70b`, `bedrock/mistral-7b` +### AWS Bedrock models +- `bedrock/claude-3-opus` +- `bedrock/claude-3-sonnet` +- `bedrock/llama2-70b` +- `bedrock/mistral-7b` -### Together AI Models -- `together_ai/llama-3-70b`, `together_ai/mixtral-8x7b` -- `together_ai/qwen-72b`, `together_ai/deepseek-coder` +### Together AI models +- `together_ai/llama-3-70b` +- `together_ai/mixtral-8x7b` +- `together_ai/qwen-72b` +- `together_ai/deepseek-coder`Why: it removes the linter noise and reads more clearly in MkDocs with ToC and anchor links.
docs/content/compare/litellm.md (4)
34-44: Label fenced diagram as text to satisfy MD040 and improve renderingThe architecture ASCII block lacks a language hint. Mark it as text to stop markdownlint complaining and to keep formatting stable.
-``` +```text Application → Olla → Multiple Backends ├── Ollama instance 1 ├── Ollama instance 2 ├── LM Studio instance └── LiteLLM gateway → Cloud Providers ├── OpenAI API ├── Anthropic API └── 100+ other providers--- `46-52`: **Also label the LiteLLM standalone diagram as text** Same MD040 issue here. Add a language to the fence. ```diff -``` +```text Application → LiteLLM → Provider APIs ├── OpenAI API ├── Anthropic API └── Cohere API--- `129-133`: **Label the side‑by‑side topology block** Add a language (text) to keep lint happy and preserve spacing. ```diff -``` +```text Applications ├── Olla → Local Models (Ollama, LM Studio) └── LiteLLM → Cloud Providers (OpenAI, Anthropic)--- `169-180`: **Qualify latency/resource figures or link to benchmarks** The stated overheads (<2 ms for Olla, 10–50 ms for LiteLLM, ~250 MB RAM combined) are valuable but will be questioned by readers. Please add a short footnote describing test setup (hardware, versions, workload) or link to an internal/benchmark page. </blockquote></details> <details> <summary>config/profiles/litellm.yaml (7)</summary><blockquote> `4-4`: **Trim trailing whitespace to satisfy yamllint** There’s trailing whitespace on this and several other lines flagged by CI. Please strip trailing spaces across the file. Apply a blanket whitespace clean-up (no functional changes). --- `291-291`: **Prevent negative new_tokens when cache_read_tokens > total_tokens** In edge cases (provider returns cache_read_tokens greater than total_tokens), this can go negative. Clamp at zero. ```diff - new_tokens: 'cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens' + new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))'
44-47: Duration format: consider seconds for parser portabilityIf the profile loader expects a numeric duration or seconds string, unquoted 5m can be misread. Recommend a seconds string to avoid ambiguity.
- timeout: 5m # Remote providers can be slow + timeout: "300s" # Remote providers can be slow; explicit seconds avoids YAML/loader ambiguity
45-47: Default priority may diverge from docs examplesDocs show LiteLLM examples with priority 50–90, while profile default is 95. If the default is consumed by auto‑discovery, this could unexpectedly outrank local endpoints. Align or document the rationale.
Would you prefer setting default_priority to 75 to match the “gateway” examples?
97-128: Capability patterns are broad; risk of false positivesPatterns like gpt-5*, /function, /vision may over‑match custom model names, misclassifying capabilities. Narrow by provider prefixes where possible, or document precedence/order.
Happy to propose a tightened pattern set if desired.
254-276: Metrics extraction: variable availability and streaming behaviourThe calculations reference finish_reason, cache_* tokens; ensure these keys exist for all provider responses via LiteLLM, especially under streaming where usage may be absent. If not, default null handling should be explicit.
Consider documenting fallback behaviour for streaming (e.g., accumulate tokens from final usage delta).
291-291: Add newline at EOFyamllint flags lack of newline at end of file.
- new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))' + new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))' +docs/content/integrations/backend/litellm.md (3)
411-417: Add language to header sample blockLabel the header snippet to satisfy MD040 and improve readability.
-``` +```http X-Olla-Endpoint: litellm-gateway X-Olla-Backend-Type: litellm X-Olla-Model: gpt-4 X-Olla-Request-ID: req_abc123 X-Olla-Response-Time: 2.341s--- `457-464`: **Heading style: lower‑case common nouns** Minor wording polish for consistency with other headings. ```diff -### Slow Response Times +### Slow response times
533-541: Backend type key: confirm lmstudio vs lm-studioElsewhere we use lmstudio (file name, links). This example uses "lm-studio". Please align with the actual type expected by the config parser to avoid copy‑paste errors.
- - url: "http://localhost:1234" - type: "lm-studio" + - url: "http://localhost:1234" + type: "lmstudio"docs/content/index.md (3)
15-21: Fix HTML casing and alt text for accuracy and rendering consistency
- Close the paragraph tag in lowercase to avoid mixed-case HTML (some renderers are strict).
- The LM Deploy badge alt text says “Lemonade AI: OpenAI Compatible”, which is misleading and likely a copy/paste remnant.
Why: Keeps the badges accessible and prevents confusing screen-reader output.
- </P> + </p>Additionally, please correct the LM Deploy badge alt text on Line 20 to reference LM Deploy rather than Lemonade. If you want, I can supply a precise diff once you confirm preferred wording.
26-29: Clarify unification scope and LiteLLM positioningThe intro says “unified model catalogues across all providers,” which can read as cross-provider unification. Elsewhere (Key Features) it’s “per-provider unification” with cross-provider access via OpenAI-compatible APIs. Suggest tightening the wording here to avoid confusion and to position LiteLLM as the cloud gateway.
Why: Reduces ambiguity for readers configuring routing expectations.
-It intelligently routes LLM requests across local and remote inference nodes - including [Ollama](https://github.com/ollama/ollama), [LM Studio](https://lmstudio.ai/), [LiteLLM](https://github.com/BerriAI/litellm) (100+ cloud providers), and OpenAI-compatible endpoints like [vLLM](https://github.com/vllm-project/vllm). Olla provides model discovery and unified model catalogues across all providers, enabling seamless routing to available models on compatible endpoints. +It intelligently routes LLM requests across local and remote inference nodes — including [Ollama](https://github.com/ollama/ollama), [LM Studio](https://lmstudio.ai/), OpenAI‑compatible endpoints like [vLLM](https://github.com/vllm-project/vllm), and [LiteLLM](https://github.com/BerriAI/litellm) (gateway to 100+ cloud providers). Olla provides model discovery and per‑provider unified catalogues, with cross‑provider access via OpenAI‑compatible APIs for seamless routing to available models.-With native [LiteLLM support](integrations/backend/litellm.md), Olla bridges local and cloud infrastructure - use local models when available, automatically failover to cloud APIs when needed. Unlike orchestration platforms like [GPUStack](compare/gpustack.md), Olla focuses on making your existing LLM infrastructure reliable through intelligent routing and failover. +With native [LiteLLM support](integrations/backend/litellm.md), Olla bridges local and cloud infrastructure — prefer local models when available and automatically fail over to cloud APIs when needed. Unlike orchestration platforms like [GPUStack](compare/gpustack.md), Olla focuses on making your existing LLM infrastructure reliable through intelligent routing and failover.
91-100: Document that X-Olla-Response-Time is sent as a trailerPer prior guidance, X‑Olla‑Response‑Time is exposed as a trailer rather than a standard header. The table currently doesn’t mention this.
Why: Prevents users from missing the metric when inspecting only regular response headers.
-| `X-Olla-Response-Time` | Total processing time | +| `X-Olla-Response-Time` | Total processing time (sent as a trailer) |If you’d like, I can also add a short note beneath the table with a curl example showing
--rawand--http2usage to view trailers.readme.md (2)
72-84: Make unordered list style consistent with markdownlint (MD004)The linter expects dashes for unordered lists in this repository; these lines use asterisks.
Why: Consistent list markers keep the docs lint-clean in CI.
-* [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \ + - [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \ Use: `/olla/ollama/` -* [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \ + - [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \ Use: `/olla/lmstudio/` || `/olla/lm-studio/` || `/olla/lm_studio/` -* [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \ + - [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \ Use: `/olla/vllm/` \ Models from vLLM will be available under `/olla/models` and `/olla/vllm/v1/models` -* [LiteLLM](https://github.com/BerriAI/litellm) - native support for LiteLLM, providing unified gateway to 100+ LLM providers. \ + - [LiteLLM](https://github.com/BerriAI/litellm) - native support for LiteLLM, providing unified gateway to 100+ LLM providers. \ Use: `/olla/litellm/` \ Access models from OpenAI, Anthropic, Bedrock, Azure, Google Vertex AI, Cohere, and many more through a single interface -* [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \ + - [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \ Use: `/olla/openai/`If you prefer to fix this globally, I can prep a follow-up patch converting bullets across the README.
339-349: Note trailers and enumerate backend types for X‑Olla‑Backend‑TypeThe sample is helpful, but:
- Consider adding a comment that X‑Olla‑Response‑Time is a trailer.
- Consider indicating possible values for X‑Olla‑Backend‑Type now that LiteLLM is supported.
Why: Avoids confusion when users don’t see the timing metric in regular headers and clarifies expected values.
X-Olla-Endpoint: local-ollama # Which backend handled it X-Olla-Model: llama4 # Model used -X-Olla-Backend-Type: ollama # Platform type +X-Olla-Backend-Type: ollama # Platform type (ollama|lmstudio|vllm|litellm|openai) X-Olla-Request-ID: req_abc123 # For debugging -X-Olla-Response-Time: 1.234s # Total processing time +X-Olla-Response-Time: 1.234s # Total processing time (sent as a trailer)docs/content/usage.md (3)
32-34: Match markdownlint bullet style (MD004) to expected asterisksThis file’s lint rules expect asterisks, not dashes, for unordered lists.
Why: Keeps docs CI green and style consistent.
-- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support) -- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed +* **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support) +* **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed
77-85: Tidy copy and reinforce unified-interface claimMinor style/consistency tweak (use en dash) and re-emphasise that the unified interface is OpenAI‑compatible, which helps set expectations.
Why: Readers often skim; explicitly naming the interface reduces misconfiguration.
-Seamlessly combine local and cloud models with native LiteLLM support: +Seamlessly combine local and cloud models with native LiteLLM support:And optionally amend Line 85 to:
-- **Unified Interface**: One API endpoint for all models (local and cloud) +- **Unified Interface**: One OpenAI‑compatible API endpoint for all models (local and cloud)
112-116: Bullet style: switch dashes to asterisks per file lint ruleMaintains consistency within this document (lint expects asterisks here).
-- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructure +* **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on‑premise infrastructure📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (18)
CLAUDE.md(3 hunks)config/config.yaml(1 hunks)config/profiles/litellm.yaml(1 hunks)docs/content/api-reference/litellm.md(1 hunks)docs/content/api-reference/overview.md(1 hunks)docs/content/compare/integration-patterns.md(1 hunks)docs/content/compare/litellm.md(3 hunks)docs/content/compare/overview.md(1 hunks)docs/content/concepts/profile-system.md(3 hunks)docs/content/index.md(2 hunks)docs/content/integrations/backend/litellm.md(1 hunks)docs/content/integrations/backend/lmstudio.md(2 hunks)docs/content/integrations/backend/ollama.md(2 hunks)docs/content/integrations/backend/vllm.md(2 hunks)docs/content/integrations/overview.md(1 hunks)docs/content/usage.md(2 hunks)docs/mkdocs.yml(2 hunks)readme.md(6 hunks)🧰 Additional context used
📓 Path-based instructions (1)
**/*.{go,md}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Australian English for comments and documentation, and write comments explaining why rather than what
Files:
docs/content/integrations/backend/vllm.mddocs/content/integrations/backend/lmstudio.mddocs/content/api-reference/overview.mddocs/content/compare/integration-patterns.mddocs/content/integrations/overview.mddocs/content/api-reference/litellm.mddocs/content/usage.mddocs/content/compare/overview.mddocs/content/compare/litellm.mddocs/content/integrations/backend/litellm.mddocs/content/concepts/profile-system.mdCLAUDE.mddocs/content/index.mddocs/content/integrations/backend/ollama.mdreadme.md🧠 Learnings (1)
📚 Learning: 2025-08-11T11:48:06.628Z
Learnt from: CR PR: thushan/olla#0 File: CLAUDE.md:0-0 Timestamp: 2025-08-11T11:48:06.628Z Learning: Applies to {internal/app/handlers/handler_proxy.go,internal/adapter/proxy/sherpa/service.go,internal/adapter/proxy/olla/service.go} : Ensure responses set the headers: X-Olla-Endpoint, X-Olla-Model, X-Olla-Backend-Type, X-Olla-Request-ID, and X-Olla-Response-Time (as a trailer)Applied to files:
CLAUDE.mddocs/content/index.md🪛 markdownlint-cli2 (0.17.2)
docs/content/compare/integration-patterns.md
130-130: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
147-147: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
docs/content/api-reference/litellm.md
13-13: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/content/usage.md
32-32: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
33-33: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
81-81: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
82-82: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
83-83: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
84-84: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
112-112: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
113-113: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
114-114: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
115-115: Unordered list style
Expected: asterisk; Actual: dash(MD004, ul-style)
docs/content/compare/litellm.md
35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/content/integrations/backend/litellm.md
411-411: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
readme.md
79-79: Unordered list style
Expected: dash; Actual: asterisk(MD004, ul-style)
🪛 LanguageTool
docs/content/api-reference/litellm.md
[typographical] ~229-~229: Two consecutive commas
Context: ...OpenAI Models -gpt-4,gpt-4-turbo,gpt-3.5-turbo-text-embedding-ada-002,text-embedding-3-small### Anthropic...(DOUBLE_PUNCTUATION)
[typographical] ~233-~233: Two consecutive commas
Context: ...s -claude-3-opus,claude-3-sonnet,claude-3-haiku-claude-2.1,claude-2,claude-instant### Googl...(DOUBLE_PUNCTUATION)
[typographical] ~237-~237: Two consecutive commas
Context: ...ant### Google Models -gemini-pro,gemini-pro-vision-palm-2,chat-bison` ### AWS Bedrock Models - ...(DOUBLE_PUNCTUATION)
[typographical] ~241-~241: Two consecutive commas
Context: ...drock Models -bedrock/claude-3-opus,bedrock/claude-3-sonnet-bedrock/llama2-70b,bedrock/mistral-7b### Together AI M...(DOUBLE_PUNCTUATION)
[typographical] ~244-~244: Consider inserting a comma for improved readability.
Context: ...70b,bedrock/mistral-7b### Together AI Models -together_ai/llama-3-70b,to...(MISSING_COMMAS)
[typographical] ~245-~245: Two consecutive commas
Context: ... AI Models -together_ai/llama-3-70b,together_ai/mixtral-8x7b-together_ai/qwen-72b,together_ai/deepseek-coder--- ## R...(DOUBLE_PUNCTUATION)
docs/content/integrations/backend/litellm.md
[uncategorized] ~459-~459: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...sponse Times 1. Check LiteLLM logs for rate limiting 2. Monitor provider API status 3. Enabl...(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 YAMLlint (1.37.1)
config/profiles/litellm.yaml
[error] 4-4: trailing spaces
(trailing-spaces)
[error] 36-36: trailing spaces
(trailing-spaces)
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 48-48: trailing spaces
(trailing-spaces)
[error] 96-96: trailing spaces
(trailing-spaces)
[error] 128-128: trailing spaces
(trailing-spaces)
[error] 138-138: trailing spaces
(trailing-spaces)
[error] 153-153: trailing spaces
(trailing-spaces)
[error] 164-164: trailing spaces
(trailing-spaces)
[error] 175-175: trailing spaces
(trailing-spaces)
[error] 227-227: trailing spaces
(trailing-spaces)
[error] 243-243: trailing spaces
(trailing-spaces)
[error] 248-248: trailing spaces
(trailing-spaces)
[error] 261-261: trailing spaces
(trailing-spaces)
[error] 269-269: trailing spaces
(trailing-spaces)
[error] 272-272: trailing spaces
(trailing-spaces)
[error] 275-275: trailing spaces
(trailing-spaces)
[error] 277-277: trailing spaces
(trailing-spaces)
[error] 281-281: trailing spaces
(trailing-spaces)
[error] 286-286: trailing spaces
(trailing-spaces)
[error] 289-289: trailing spaces
(trailing-spaces)
[error] 291-291: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (14)
docs/content/integrations/backend/lmstudio.md (1)
66-66: Confirmed base path ‘/olla’; no changes required
- docs/mkdocs.yml (line 4) sets
site_url: https://thushan.github.io/olla/, so the site is indeed served under/olla.- All references to the Routing Prefixes link use the absolute path
/olla/concepts/profile-system/#routing-prefixesacross the integration docs (vllm.md, ollama.md, lmstudio.md, litellm.md).The absolute links are correct and will not 404 in the current deployment.
docs/content/integrations/backend/vllm.md (2)
14-17: Add “Since” row—looks good; verify version v0.0.16.Nice for provenance. Please confirm v0.0.16 matches the first vLLM doc/profile introduction to avoid confusing readers.
You can reuse the approach from the LM Studio comment, swapping paths for
config/profiles/vllm.yaml.
66-66: No changes needed—base path is correctly configuredThe MkDocs configuration (
docs/mkdocs.yml) setssite_url: https://thushan.github.io/olla/, which means all absolute links beginning with/olla/will resolve correctly on your GitHub Pages deployment.docs/content/integrations/backend/ollama.md (1)
64-65: Absolute link for Routing Prefixes is consistent—nice.This aligns with other backend docs and the new explicit anchor in the concepts page.
CLAUDE.md (1)
4-4: Naming consistency verified—no changes requiredI’ve reviewed occurrences of “LiteLLM” and
litellmacross CLAUDE.md and the wider docs. Narrative text consistently uses “LiteLLM” and configuration identifiers/types uselitellm, so there’s no further action needed.docs/content/api-reference/overview.md (1)
54-58: Good addition: LiteLLM API section is placed logically after vLLM.Positioning and description read well and match the new
/olla/litellm/*prefix.docs/content/compare/integration-patterns.md (1)
382-386: Script generated to verify schema consistency of the health settings. Once the results are in, we can confirm whether the docs need to usediscovery.health_checkor the top‐levelhealth:key and then update the review accordingly.docs/content/integrations/overview.md (1)
18-18: LGTM — adds LiteLLM to backends listConsistent with the new backend page and profile. No issues.
docs/mkdocs.yml (1)
155-156: LGTM — navigation entries for LiteLLMNav paths match the added pages. Indentation and grouping are consistent with existing sections.
readme.md (5)
11-14: Badge addition looks goodLiteLLM badge matches the others in style and casing.
34-35: Good positioning: “alongside LiteLLM”This reframe reduces perceived overlap and sets correct expectations.
175-181: Architecture bullets read wellClear guidance on where LiteLLM fits in the routing path.
441-449: YAML example looks correct and practicalGood inclusion of model and health endpoints for LiteLLM; sensible priorities and timeouts.
497-497: FAQ update is accurate and helpfulClear delineation between infrastructure proxy (Olla) and API gateway (LiteLLM), and how they complement each other.
| "status": "healthy", | ||
| "models": 25, | ||
| "providers": [ | ||
| "openai", | ||
| "anthropic", | ||
| "bedrock", | ||
| "gemini", | ||
| "together_ai" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Provider naming inconsistency: ‘google’ vs ‘gemini’. Please confirm canonical label.
The models list uses owned_by: "google" while the health example lists "gemini" as a provider. Pick one naming scheme to avoid confusing users and align with LiteLLM’s provider IDs.
Would you like me to update both examples to the canonical provider key used by LiteLLM (e.g., “gemini” if that’s what LiteLLM reports)? I can follow up with a PR diff once confirmed.
You can also ask me to check the latest LiteLLM docs:
🏁 Script executed:
#!/bin/bash
# Search for owned_by occurrences in the LiteLLM API reference
grep -R "owned_by" -n docs/content/api-reference/litellm.md || true
# Search for occurrences of "gemini" in the same file
grep -R "\"gemini\"" -n docs/content/api-reference/litellm.md || true
# Search for occurrences of "google" in the same file
grep -R "\"google\"" -n docs/content/api-reference/litellm.md || trueLength of output: 364
Align LiteLLM provider IDs: replace “google” with “gemini”
The API reference currently lists a Gemini-owned model under "owned_by": "google", but LiteLLM’s canonical provider key for Google AI Studio (Gemini) is gemini, not google (docs.litellm.ai).
Please update in docs/content/api-reference/litellm.md:
- Lines 139 (both occurrences): change
- "owned_by": "google" + "owned_by": "gemini"
With this change, the models listing and health example will consistently use the same provider ID.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docs/content/api-reference/litellm.md around lines 209 to 217, the example
output and model entries use the provider ID "google" instead of LiteLLM's
canonical "gemini"; update any occurrences of the provider key "google"
(including the "owned_by" fields and the providers list) to "gemini" so the
models listing and health example consistently use the same provider ID.
| <tr> | ||
| <th>Since</th> | ||
| <td>Olla <code>v0.0.1</code></td> | ||
| </tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
“Since” row added; please validate v0.0.1.
Great to document history. Just confirm that v0.0.1 is the intended baseline for Ollama integration in this repo’s release history.
If tags exist locally:
🏁 Script executed:
#!/bin/bash
git fetch --tags --quiet || true
git tag --sort=taggerdate | head -n 10Length of output: 143
Let’s verify whether the v0.0.1 tag actually exists and identify the earliest release that included the Ollama integration documentation. Please run:
🏁 Script executed:
#!/bin/bash
# Check if v0.0.1 exists
echo "Checking for v0.0.1:"
git tag -l v0.0.1
# Find the commit that added the Ollama docs file
file="docs/content/integrations/backend/ollama.md"
add_commit=$(git log --diff-filter=A --follow --format=%H -- "$file" | tail -n1)
echo "Commit that added $file: $add_commit"
# List tags containing that commit (earliest first) and show the earliest
echo "Tags containing that commit:"
git tag --contains $add_commit --sort=taggerdate
echo "Earliest tag including the Ollama docs:"
git tag --contains $add_commit --sort=taggerdate | head -n1Length of output: 597
Let’s pinpoint the first released tag that actually contains the Ollama integration doc. Please run:
🏁 Script executed:
#!/bin/bash
# Find the earliest tag (by semver) that includes docs/content/integrations/backend/ollama.md
file="docs/content/integrations/backend/ollama.md"
git tag | sort -V | while read tag; do
if git ls-tree -r "$tag" --name-only | grep -qx "$file"; then
echo "Earliest tag containing $file: $tag"
break
fi
doneLength of output: 1267
Update baseline version for Ollama integration
The earliest release containing docs/content/integrations/backend/ollama.md is v0.0.15, not v0.0.1. Please update the “Since” row accordingly:
• File: docs/content/integrations/backend/ollama.md
– Lines 14–17: change
<td>Olla <code>v0.0.1</code></td>
to
<td>Ollama <code>v0.0.15</code></td>
🤖 Prompt for AI Agents
In docs/content/integrations/backend/ollama.md around lines 14 to 17, the
"Since" table row incorrectly states the integration was introduced in "Olla
v0.0.1"; change the cell to read "Ollama v0.0.15" (fix typo "Olla"→"Ollama" and
update version to v0.0.15) so the row becomes: <td>Ollama
<code>v0.0.15</code></td>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (4)
docs/content/integrations/backend/lmstudio.md (1)
14-17: “Since” version corrected to v0.0.12 — looks right.Thanks for aligning this with the tag that first introduced the LM Studio profile. This addresses the earlier review note cleanly.
docs/content/usage.md (2)
38-39: Good catch adding discovery.type: "static"This resolves prior copy‑paste issues flagged on this snippet. Keeps examples consistent across the docs and avoids schema surprises.
90-91: Thanks for adding discovery.type: "static" here as wellThis addresses the earlier review feedback and makes the Hybrid example ready to run.
docs/content/integrations/backend/litellm.md (1)
148-151: Prefix non‑OpenAI models in the CLI example so LiteLLM resolves themWithout provider prefixes, Anthropic models may not resolve correctly. This mirrors patterns used elsewhere in the doc.
-litellm --model gpt-3.5-turbo \ - --model claude-3-haiku-20240307 \ +litellm --model openai/gpt-3.5-turbo \ + --model anthropic/claude-3-haiku-20240307 \ --port 4000
🧹 Nitpick comments (6)
docs/content/usage.md (3)
32-34: Standardise list markers to satisfy markdownlint (MD004) and keep style consistentUse asterisks here to match the rest of the page and quiet the linter.
-- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support) -- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed +- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support) +- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed
82-85: Unify bullet style with asterisks for consistency and lintingKeeps the document style consistent and resolves MD004.
-- **Smart Routing**: Use local models for sensitive data, cloud for complex tasks -- **Cost Control**: Prioritise free local models, failover to paid APIs -- **Best-of-Both**: GPT-4 for coding, local Llama for chat, Claude for analysis -- **Unified Interface**: One API endpoint for all models (local and cloud) +- **Smart Routing**: Use local models for sensitive data, cloud for complex tasks +- **Cost Control**: Prioritise free local models, failover to paid APIs +- **Best-of-Both**: GPT-4 for coding, local Llama for chat, Claude for analysis +- **Unified Interface**: One API endpoint for all models (local and cloud)
114-117: Make list marker style consistent with earlier sectionsSwitch to asterisks for a consistent style and to pass MD004.
-- **Multi-Region Deployment**: Geographic load balancing and failover -- **Enterprise Security**: Rate limiting, request validation, audit trails -- **Performance Monitoring**: Circuit breakers, health checks, metrics -- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructure +- **Multi-Region Deployment**: Geographic load balancing and failover +- **Enterprise Security**: Rate limiting, request validation, audit trails +- **Performance Monitoring**: Circuit breakers, health checks, metrics +- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructuredocs/content/integrations/backend/litellm.md (3)
67-71: Tiny formatting nit: add a space before “(default)”Improves readability in the table.
- <li><code>/litellm</code>(default)</li> + <li><code>/litellm</code> (default)</li>
248-255: Avoid future‑dating model names (e.g., “GPT‑5”) unless confirmedTo prevent readers misconfiguring unavailable models, keep this high‑level (e.g., “latest GPT‑4.x family”) or verify availability before naming.
277-284: Be explicit with provider prefixes for first curl exampleYou later emphasise provider‑prefixed routing; using
openai/gpt-4here prevents ambiguity if users haven’t defined aliases in LiteLLM.- "model": "gpt-4", + "model": "openai/gpt-4",
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs/content/integrations/backend/litellm.md(1 hunks)docs/content/integrations/backend/lmstudio.md(2 hunks)docs/content/usage.md(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{go,md}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Australian English for comments and documentation, and write comments explaining why rather than what
Files:
docs/content/integrations/backend/litellm.mddocs/content/integrations/backend/lmstudio.mddocs/content/usage.md
🪛 LanguageTool
docs/content/integrations/backend/litellm.md
[uncategorized] ~459-~459: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...sponse Times 1. Check LiteLLM logs for rate limiting 2. Monitor provider API status 3. Enabl...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
docs/content/usage.md
32-32: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
33-33: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
82-82: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
83-83: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
84-84: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
85-85: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
114-114: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
115-115: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
116-116: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
117-117: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (1.24)
🔇 Additional comments (3)
docs/content/integrations/backend/lmstudio.md (1)
66-66: Absolute/olla/links are intentional and consistent with your MkDocs configurationI’ve checked your
docs/mkdocs.yml, and it specifies
site_url: https://thushan.github.io/olla/
which means all pages are served under the/olla/path. All four backend integration pages also use the same absolute/olla/concepts/profile-system/#routing-prefixeslink, and the#routing-prefixesanchor exists inconcepts/profile-system.md. Since this matches your current hosting setup and is applied consistently across the docs, there’s no need to switch to relative paths at this time.– docs/mkdocs.yml defines the base URL as
https://thushan.github.io/olla/
– Four integration pages reference/olla/concepts/profile-system/#routing-prefixes
– TheRouting Prefixesheading with{#routing-prefixes}is present inconcepts/profile-system.mdNo changes required here.
docs/content/usage.md (1)
36-57: YAML looks correct; minor TLS gotcha worth calling outThe addition of
type: "static"and LiteLLM fields (model_url,health_check_url) reads well. One operational nuance: the second endpoint uses HTTPS with a private IP. If users don’t have a valid certificate chain for that host, requests may fail TLS verification. Consider a brief note on acceptable TLS options (e.g., trusted certs or any supported “skip verify” toggle) to reduce setup friction.docs/content/integrations/backend/litellm.md (1)
291-312: Python example is solid and correctly uses the LiteLLM base path and prefixesGood alignment with the routing model and earlier configuration guidance.
| ```yaml | ||
| discovery: | ||
| static: | ||
| endpoints: | ||
| - url: "http://localhost:4000" | ||
| name: "litellm-gateway" | ||
| type: "litellm" | ||
| priority: 75 | ||
| model_url: "/v1/models" | ||
| health_check_url: "/health" | ||
| check_interval: 5s | ||
| check_timeout: 2s | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add discovery.type: "static" for copy‑paste‑ready configuration
All other docs use type: "static"; adding it here reduces user confusion and aligns with the profile system.
discovery:
+ type: "static"
static:
endpoints:
- url: "http://localhost:4000"
name: "litellm-gateway"
type: "litellm"
priority: 75
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```yaml | |
| discovery: | |
| static: | |
| endpoints: | |
| - url: "http://localhost:4000" | |
| name: "litellm-gateway" | |
| type: "litellm" | |
| priority: 75 | |
| model_url: "/v1/models" | |
| health_check_url: "/health" | |
| check_interval: 5s | |
| check_timeout: 2s | |
| ``` |
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 88 to 100, the
example YAML is missing discovery.type: "static", so update the snippet to
include discovery.type: "static" (i.e., add a type: "static" field under the
discovery key) so the configuration is copy‑paste‑ready and consistent with
other docs.
| ```yaml | ||
| discovery: | ||
| static: | ||
| endpoints: | ||
| # Primary LiteLLM instance | ||
| - url: "http://litellm-primary:4000" | ||
| name: "litellm-primary" | ||
| type: "litellm" | ||
| priority: 90 | ||
| model_url: "/v1/models" | ||
| health_check_url: "/health" | ||
| check_interval: 5s | ||
| check_timeout: 2s | ||
|
|
||
| # Secondary LiteLLM instance | ||
| - url: "http://litellm-secondary:4000" | ||
| name: "litellm-secondary" | ||
| type: "litellm" | ||
| priority: 70 | ||
| model_url: "/v1/models" | ||
| health_check_url: "/health" | ||
| check_interval: 5s | ||
| check_timeout: 2s | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Do the same for the multi‑instance example
Ensures both examples follow the same schema and are runnable as‑is.
discovery:
+ type: "static"
static:
endpoints:
# Primary LiteLLM instance
- url: "http://litellm-primary:4000"
name: "litellm-primary"
type: "litellm"
priority: 90
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
# Secondary LiteLLM instance
- url: "http://litellm-secondary:4000"
name: "litellm-secondary"
type: "litellm"
priority: 70
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```yaml | |
| discovery: | |
| static: | |
| endpoints: | |
| # Primary LiteLLM instance | |
| - url: "http://litellm-primary:4000" | |
| name: "litellm-primary" | |
| type: "litellm" | |
| priority: 90 | |
| model_url: "/v1/models" | |
| health_check_url: "/health" | |
| check_interval: 5s | |
| check_timeout: 2s | |
| # Secondary LiteLLM instance | |
| - url: "http://litellm-secondary:4000" | |
| name: "litellm-secondary" | |
| type: "litellm" | |
| priority: 70 | |
| model_url: "/v1/models" | |
| health_check_url: "/health" | |
| check_interval: 5s | |
| check_timeout: 2s | |
| ``` |
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 106 to 129, the
multi-instance example must match the schema used in the single-instance example
so both are runnable; update the secondary (and any other) endpoint entries to
include the exact same set of keys and value formats as the primary (e.g., url,
name, type, priority, model_url, health_check_url, check_interval,
check_timeout), ensure consistent quoting and duration/string formats, and make
the structure identical so the example can be used as-is for multi-instance
deployments.
| ```yaml | ||
| endpoints: | ||
| # Local models (free) | ||
| - url: "http://localhost:11434" | ||
| name: "local-ollama" | ||
| type: "ollama" | ||
| priority: 100 | ||
|
|
||
| # LiteLLM with budget models | ||
| - url: "http://litellm-budget:4000" | ||
| name: "litellm-budget" | ||
| type: "litellm" | ||
| priority: 75 | ||
|
|
||
| # LiteLLM with premium models | ||
| - url: "http://litellm-premium:4000" | ||
| name: "litellm-premium" | ||
| type: "litellm" | ||
| priority: 50 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align advanced “Cost‑Optimised Routing” with the discovery schema
Other pages use discovery: { type: "static", static: { endpoints: ... } }. Adopting the same here keeps things uniform and reduces setup errors.
-endpoints:
- # Local models (free)
- - url: "http://localhost:11434"
- name: "local-ollama"
- type: "ollama"
- priority: 100
-
- # LiteLLM with budget models
- - url: "http://litellm-budget:4000"
- name: "litellm-budget"
- type: "litellm"
- priority: 75
-
- # LiteLLM with premium models
- - url: "http://litellm-premium:4000"
- name: "litellm-premium"
- type: "litellm"
- priority: 50
+discovery:
+ type: "static"
+ static:
+ endpoints:
+ # Local models (free)
+ - url: "http://localhost:11434"
+ name: "local-ollama"
+ type: "ollama"
+ priority: 100
+
+ # LiteLLM with budget models
+ - url: "http://litellm-budget:4000"
+ name: "litellm-budget"
+ type: "litellm"
+ priority: 75
+
+ # LiteLLM with premium models
+ - url: "http://litellm-premium:4000"
+ name: "litellm-premium"
+ type: "litellm"
+ priority: 50📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```yaml | |
| endpoints: | |
| # Local models (free) | |
| - url: "http://localhost:11434" | |
| name: "local-ollama" | |
| type: "ollama" | |
| priority: 100 | |
| # LiteLLM with budget models | |
| - url: "http://litellm-budget:4000" | |
| name: "litellm-budget" | |
| type: "litellm" | |
| priority: 75 | |
| # LiteLLM with premium models | |
| - url: "http://litellm-premium:4000" | |
| name: "litellm-premium" | |
| type: "litellm" | |
| priority: 50 | |
| ``` | |
| discovery: | |
| type: "static" | |
| static: | |
| endpoints: | |
| # Local models (free) | |
| - url: "http://localhost:11434" | |
| name: "local-ollama" | |
| type: "ollama" | |
| priority: 100 | |
| # LiteLLM with budget models | |
| - url: "http://litellm-budget:4000" | |
| name: "litellm-budget" | |
| type: "litellm" | |
| priority: 75 | |
| # LiteLLM with premium models | |
| - url: "http://litellm-premium:4000" | |
| name: "litellm-premium" | |
| type: "litellm" | |
| priority: 50 |
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 337 to 356, the
endpoints block is defined at top level but must be wrapped to match the
discovery schema used elsewhere; replace the top-level endpoints YAML with a
discovery object: discovery: { type: "static", static: { endpoints: [
...existing endpoint entries... ] } } ensuring correct indentation and
preserving each endpoint entry exactly as-is inside static.endpoints so the
“Cost‑Optimised Routing” format matches other pages.
| ```yaml | ||
| endpoints: | ||
| # US East region | ||
| - url: "http://litellm-us-east:4000" | ||
| name: "litellm-us-east" | ||
| type: "litellm" | ||
| priority: 100 | ||
|
|
||
| # EU West region | ||
| - url: "http://litellm-eu-west:4000" | ||
| name: "litellm-eu-west" | ||
| type: "litellm" | ||
| priority: 100 | ||
|
|
||
| # Load balance across regions | ||
| proxy: | ||
| load_balancer: "least-connections" | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Same schema alignment for “Multi‑Region Setup”
Keeps the advanced examples consistent with the rest of the docs.
-endpoints:
- # US East region
- - url: "http://litellm-us-east:4000"
- name: "litellm-us-east"
- type: "litellm"
- priority: 100
-
- # EU West region
- - url: "http://litellm-eu-west:4000"
- name: "litellm-eu-west"
- type: "litellm"
- priority: 100
-
- # Load balance across regions
- # (rest unchanged)
+discovery:
+ type: "static"
+ static:
+ endpoints:
+ # US East region
+ - url: "http://litellm-us-east:4000"
+ name: "litellm-us-east"
+ type: "litellm"
+ priority: 100
+
+ # EU West region
+ - url: "http://litellm-eu-west:4000"
+ name: "litellm-eu-west"
+ type: "litellm"
+ priority: 100
+
+ # Load balance across regions
+ # (rest unchanged)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```yaml | |
| endpoints: | |
| # US East region | |
| - url: "http://litellm-us-east:4000" | |
| name: "litellm-us-east" | |
| type: "litellm" | |
| priority: 100 | |
| # EU West region | |
| - url: "http://litellm-eu-west:4000" | |
| name: "litellm-eu-west" | |
| type: "litellm" | |
| priority: 100 | |
| # Load balance across regions | |
| proxy: | |
| load_balancer: "least-connections" | |
| ``` | |
| discovery: | |
| type: "static" | |
| static: | |
| endpoints: | |
| # US East region | |
| - url: "http://litellm-us-east:4000" | |
| name: "litellm-us-east" | |
| type: "litellm" | |
| priority: 100 | |
| # EU West region | |
| - url: "http://litellm-eu-west:4000" | |
| name: "litellm-eu-west" | |
| type: "litellm" | |
| priority: 100 | |
| # Load balance across regions | |
| proxy: | |
| load_balancer: "least-connections" |
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 360 to 377, the
"Multi-Region Setup" YAML block should follow the same schema used elsewhere in
the docs: replace the top-level endpoints list with providers (providers: - id:
<...> url: <...> type: <...> priority: <...>) by renaming each "name" to "id"
and the "endpoints:" key to "providers:", and update the proxy section to use
the canonical key name used elsewhere (rename "load_balancer" to the standard
key "strategy" and keep the value "least-connections") so the example aligns
with the rest of the documentation.
| endpoints: | ||
| # Local Ollama (highest priority) | ||
| - url: "http://localhost:11434" | ||
| type: "ollama" | ||
| priority: 100 | ||
|
|
||
| # LM Studio (medium priority) | ||
| - url: "http://localhost:1234" | ||
| type: "lm-studio" | ||
| priority: 75 | ||
|
|
||
| # LiteLLM for cloud (lower priority) | ||
| - url: "http://localhost:4000" | ||
| type: "litellm" | ||
| priority: 50 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use consistent provider type tokens and discovery schema
Elsewhere the type is lmstudio (no hyphen). Also, align with the discovery schema for consistency.
-endpoints:
- # Local Ollama (highest priority)
- - url: "http://localhost:11434"
- type: "ollama"
- priority: 100
-
- # LM Studio (medium priority)
- - url: "http://localhost:1234"
- type: "lm-studio"
- priority: 75
-
- # LiteLLM for cloud (lower priority)
- - url: "http://localhost:4000"
- type: "litellm"
- priority: 50
+discovery:
+ type: "static"
+ static:
+ endpoints:
+ # Local Ollama (highest priority)
+ - url: "http://localhost:11434"
+ type: "ollama"
+ priority: 100
+
+ # LM Studio (medium priority)
+ - url: "http://localhost:1234"
+ type: "lmstudio"
+ priority: 75
+
+ # LiteLLM for cloud (lower priority)
+ - url: "http://localhost:4000"
+ type: "litellm"
+ priority: 50📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| endpoints: | |
| # Local Ollama (highest priority) | |
| - url: "http://localhost:11434" | |
| type: "ollama" | |
| priority: 100 | |
| # LM Studio (medium priority) | |
| - url: "http://localhost:1234" | |
| type: "lm-studio" | |
| priority: 75 | |
| # LiteLLM for cloud (lower priority) | |
| - url: "http://localhost:4000" | |
| type: "litellm" | |
| priority: 50 | |
| ``` | |
| discovery: | |
| type: "static" | |
| static: | |
| endpoints: | |
| # Local Ollama (highest priority) | |
| - url: "http://localhost:11434" | |
| type: "ollama" | |
| priority: 100 | |
| # LM Studio (medium priority) | |
| - url: "http://localhost:1234" | |
| type: "lmstudio" | |
| priority: 75 | |
| # LiteLLM for cloud (lower priority) | |
| - url: "http://localhost:4000" | |
| type: "litellm" | |
| priority: 50 |
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 526 to 541, the
provider type tokens are inconsistent (using "lm-studio" with a hyphen) and must
match the discovery schema and other docs that use "lmstudio"; update the
endpoint entries so the type tokens are consistent (e.g., use "lmstudio" instead
of "lm-studio") and verify all provider tokens ("ollama", "lmstudio", "litellm",
etc.) conform to the discovery schema naming and casing used elsewhere in the
repo.
| # Single endpoint accesses all models | ||
| # http://localhost:40114/olla/openai/v1/chat/completions | ||
| # Automatically routes to the right backend based on model name | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix endpoint path: route via LiteLLM, not the OpenAI shim
Using /olla/openai/... here bypasses the LiteLLM gateway introduced above. Point to /olla/litellm/v1/... so users actually test the Hybrid setup with provider‑prefixed routing.
-# http://localhost:40114/olla/openai/v1/chat/completions
+# http://localhost:40114/olla/litellm/v1/chat/completions📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Single endpoint accesses all models | |
| # http://localhost:40114/olla/openai/v1/chat/completions | |
| # Automatically routes to the right backend based on model name | |
| ``` | |
| # Single endpoint accesses all models | |
| # http://localhost:40114/olla/litellm/v1/chat/completions | |
| # Automatically routes to the right backend based on model name |
🤖 Prompt for AI Agents
In docs/content/usage.md around lines 105 to 108, the endpoint path uses
/olla/openai/... which bypasses the LiteLLM gateway; update the example URL to
use the LiteLLM gateway path /olla/litellm/v1/... so the documentation
demonstrates routing via LiteLLM and provider‑prefixed hybrid routing (replace
the commented URL and any related examples accordingly).
This PR adds adds a lightweight lightllm profile.
Additional things snuck in:
Summary by CodeRabbit
New Features
Documentation
Chores