Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@thushan
Copy link
Owner

@thushan thushan commented Aug 21, 2025

This PR adds adds a lightweight lightllm profile.

endpoints:
    - url: "http://localhost:4000"
      name: "local-litellm"
      type: "litellm"
      priority: 75

Additional things snuck in:

  • FIX: default settings includes wrong port for LMStudio and contained non-localhost port for vllm
  • FIX: Minor documentation example fixes

Summary by CodeRabbit

  • New Features

    • Native LiteLLM backend integration enabling routing to 100+ cloud providers via OpenAI-compatible endpoints (/olla/litellm/*).
    • Response headers now include litellm in backend type.
  • Documentation

    • Added comprehensive LiteLLM integration, API reference and integration guides; updated comparisons, usage, overview and homepage to reflect native LiteLLM support.
    • Added “Since” notes, anchor/link fixes and examples for hybrid cloud setups.
  • Chores

    • Added LiteLLM profile/config and navigation entries; updated discovery endpoint URLs and priorities.

@thushan thushan added enhancement New feature or request llm-backend Issue is about an LLM Backend, provider or type. (Eg. Ollama, vllm) labels Aug 21, 2025
@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Walkthrough

Adds native LiteLLM backend support: introduces a new LiteLLM profile, updates discovery endpoints and priorities, and expands documentation and site navigation to include LiteLLM across API reference, integrations, comparisons, usage, index and README. Minor link, metadata and “Since” table updates across other backend docs.

Changes

Cohort / File(s) Summary of changes
Config: Endpoint URLs
config/config.yaml
Updated discovery.static endpoints: LM Studio URL changed to http://localhost:1234; vLLM URL changed to http://localhost:8000. No other field changes.
Profile: LiteLLM (new)
config/profiles/litellm.yaml
Added a comprehensive LiteLLM gateway profile: OpenAI-compatible API surface, core/alias endpoints, detection hints, capability patterns, model naming/mapping, context window mappings, request/response parsing (including streaming), resource/timeouts, concurrency, metric extraction and health/management metadata.
API Reference: LiteLLM
docs/content/api-reference/litellm.md, docs/content/api-reference/overview.md, docs/mkdocs.yml
Added LiteLLM API reference page; linked it from API overview and navigation. Documents endpoints, streaming, headers, error envelopes and examples.
Integrations: Backends
docs/content/integrations/backend/litellm.md, docs/content/integrations/backend/lmstudio.md, docs/content/integrations/backend/ollama.md, docs/content/integrations/backend/vllm.md, docs/content/integrations/overview.md
New LiteLLM backend integration doc; added LiteLLM to Backends overview. Minor additions: “Since” rows and updated routing-prefix links for LM Studio, Ollama and vLLM.
Compare Docs
docs/content/compare/litellm.md, docs/content/compare/integration-patterns.md, docs/content/compare/overview.md
Reframed LiteLLM as native integration: updated examples, endpoint types/URLs/priorities, diagrams and guidance; added native-integration scenarios and examples.
Concepts: Anchors
docs/content/concepts/profile-system.md
Added anchor IDs to headings: Routing Prefixes, Capability Detection, Creating Custom Profiles.
Usage, Index, Readme
docs/content/usage.md, docs/content/index.md, readme.md
Emphasised native LiteLLM/cloud-fallback integration; added LiteLLM badges/entries; updated examples and discovery YAML snippets to include litellm endpoints, model_url and health_check_url.
Meta Doc
CLAUDE.md
Updated documentation notes: added LiteLLM to backend list, referenced new profile, and reflected response header updates (X-Olla-Backend-Type includes litellm).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Olla as Olla Router
  participant Prof as Profile Resolver
  participant HC as Health Monitor
  participant LLM as LiteLLM Gateway
  participant Cloud as Cloud Providers

  Client->>Olla: OpenAI-compatible request (e.g. /v1/chat/completions)
  Olla->>Prof: Resolve backend/profile & capabilities
  Prof-->>Olla: Select `litellm` endpoint
  HC->>Olla: Provide health/priority status
  Olla->>LLM: Forward OpenAI-style request
  LLM->>Cloud: Translate & route to provider API
  Cloud-->>LLM: Provider response / stream
  LLM-->>Olla: Normalised response / stream
  Olla-->>Client: Return response with X-Olla-* headers
  note over Olla,LLM: If LiteLLM unhealthy, Olla applies fallback per priorities
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

documentation

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/backend/litellm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
docs/content/concepts/profile-system.md (1)

78-105: Verify header ID support and update residual link

Confirmed that docs/mkdocs.yml enables header IDs via the attr_list and toc extensions (lines 61–68), so the explicit {#routing-prefixes} anchor will render correctly. While pymdownx.headeranchor isn’t configured, the attr_list extension suffices for adding custom IDs.

There is one remaining relative link that should be updated to maintain deep‐link consistency:

  • docs/content/integrations/overview.md (line 33):
    * [Remove prefixes](../concepts/profile-system.md#routing-prefixes) you don't use

Please review and update this link (and any others) as needed to point to the new anchor.

docs/content/compare/integration-patterns.md (3)

64-79: Update the Hybrid Cloud example to use the native LiteLLM backend and corrected port.

The example still shows type: openai and port 8000 for LiteLLM, which conflicts with the new native backend guidance and the PR’s move to port 4000.

   - name: cloud-litellm
-    url: http://litellm:8000
-    priority: 10  # Only when local unavailable
-    type: openai
+    url: http://litellm:4000
+    priority: 10  # Only when local unavailable
+    type: litellm

Why: using the dedicated litellm type enables correct health checks and discovery behaviour; the port aligns with the updated defaults in this PR.


297-299: Fix load balancer enum to match config: least-connections not least_connections.

Docs currently show an invalid value which will cause confusion/copy-paste errors.

-  load_balancer: least_connections
+  load_balancer: least-connections

Why: the configuration schema uses hyphenated enums (see config/config.yaml).


391-393: Same issue: use round-robin not round_robin.

Keep enum values consistent with the config schema to avoid misconfiguration.

-  load_balancer: round_robin  # For even distribution
+  load_balancer: round-robin  # For even distribution
```<!-- review_comment_end -->

</blockquote></details>

</blockquote></details>
🧹 Nitpick comments (35)
docs/content/concepts/profile-system.md (2)

229-245: Anchor for “Capability Detection” added—nice; consider future-proofing with aliases.

This will help cross-page references. If you anticipate linking using alternate phrasings (e.g., “capabilities” or “model-capabilities”), you might optionally add secondary anchors to avoid future churn.

Example (optional):

-### Capability Detection {#capability-detection}
+### Capability Detection {#capability-detection #capabilities #model-capabilities}

399-429: Anchor for “Creating Custom Profiles” is helpful; consider adding a brief “why” note.

A one-liner explaining why custom profiles are useful (e.g., override vendor defaults safely) would align with our style of explaining “why” in docs and aid first‑time readers landing directly on this anchor.

Suggested tweak:

-## Creating Custom Profiles {#creating-custom-profiles}
+## Creating Custom Profiles {#creating-custom-profiles}
+Create custom profiles to tailor routing, safety allowlists, and discovery without forking the codebase—useful for environment‑specific policies.
CLAUDE.md (2)

25-27: Project tree now shows litellm.yaml—good; consider linking to it.

Minor improvement: link to the file in GitHub as done for other profiles in integration pages to aid discoverability.

Example tweak:

-│   │   ├── litellm.yaml    # LiteLLM gateway configuration
+│   │   ├── litellm.yaml    # LiteLLM gateway configuration (see repo for latest)

88-94: Header list updated; clarify Response-Time is sent as a trailer.

Docs elsewhere note X-Olla-Response-Time is a trailer. Add the trailer note here for consistency and to set user expectations for streaming responses.

Proposed edit:

 - `X-Olla-Backend-Type`: ollama/openai/lmstudio/vllm/litellm
 - `X-Olla-Request-ID`: Request ID
-- `X-Olla-Response-Time`: Total processing time
+- `X-Olla-Response-Time`: Total processing time (sent as a trailer for streaming responses)
config/config.yaml (2)

81-88: LM Studio default port fix looks right; consider aligning health probe semantics.

Changing LM Studio to http://localhost:1234 matches typical defaults and removes confusion with 11234. One ask: LM Studio’s health behaviour can vary; if "/" occasionally 200s even when the API isn’t ready, a more explicit probe (e.g., a lightweight GET that exercises /v1/models) provides stronger signal and avoids flapping during startup.

Proposed tweak:

-      - url: "http://localhost:1234"
+      - url: "http://localhost:1234"
         name: "local-lm-studio"
         type: "lm-studio"
         priority: 100
         model_url: "/v1/models"
-        health_check_url: "/"
+        health_check_url: "/v1/models"
         check_interval: 2s
         check_timeout: 1s

If you prefer to keep /, consider increasing check_interval to reduce noisy transitions on cold starts. Why: probing the actual models path prevents false-positive “healthy” states while the model list is not yet served.


89-96: vLLM host fix to localhost is sensible; confirm health path and intervals under load.

The shift to http://localhost:8000 is safer for local-first setups. Confirm /health is the correct vLLM probe in your deployment mode (it is for the OpenAI-compatible server). Given GPU initialisation time, the 5s/2s cadence may still mark it down briefly on startup; consider a slightly longer check_timeout (3–5s) if you see transient flaps.

docs/content/compare/overview.md (1)

46-46: Nice: clear guidance to use native LiteLLM backend. Verify doc link path.

The relative link [LiteLLM](./litellm.md) assumes compare/litellm.md exists. If the dedicated LiteLLM compare page lives elsewhere, adjust the path to avoid a 404 in MkDocs.

docs/content/api-reference/overview.md (1)

91-95: Include ‘litellm’ in backend-type header docs to match the new section.

The response-headers table still lists ollama/lmstudio/openai/vllm. Add litellm for consistency with the new API section. Also consider adding “litellm api” to page keywords to improve searchability.

- | `X-Olla-Backend-Type` | Provider type (ollama/lmstudio/openai/vllm) |
+ | `X-Olla-Backend-Type` | Provider type (ollama/lmstudio/openai/vllm/litellm) |

Optional keywords tweak (Line 4):

- keywords: ["olla api", "llm proxy api", "rest api", "ollama api", "lm studio api", "vllm api", "openai api", "system endpoints"]
+ keywords: ["olla api", "llm proxy api", "rest api", "ollama api", "lm studio api", "vllm api", "litellm api", "openai api", "system endpoints"]
```<!-- review_comment_end -->

</blockquote></details>
<details>
<summary>docs/content/compare/integration-patterns.md (3)</summary><blockquote>

`128-145`: **Use proper heading levels instead of bold for options (MD036).**

Bold text as a pseudo-heading trips linters and hurts skimmability. Convert to `####` headings.


```diff
-**Option 1: Native Integration with LiteLLM (Recommended)**
+#### Option 1: Native integration with LiteLLM (recommended)

Why: proper headings improve navigation, anchors, and ToC generation in MkDocs.


147-160: Do the same for Option 2 heading (MD036).

Consistent heading levels keep the section structure clear and linter-clean.

-**Option 2: Redundant LiteLLM Instances with Native Support**
+#### Option 2: Redundant LiteLLM instances with native support
```<!-- review_comment_end -->

---

`345-351`: **Docker Compose port map should reflect 4000 if that’s the new LiteLLM default.**

Elsewhere in this PR, LiteLLM endpoints use port 4000. Align the Compose example to avoid surprises.


```diff
   litellm:
     image: ghcr.io/berriai/litellm:latest
     environment:
       - OPENAI_API_KEY=${OPENAI_API_KEY}
     ports:
-      - "8000:8000"
+      - "4000:4000"

Why: consistent defaults reduce setup friction and mismatched discovery.

docs/content/api-reference/litellm.md (2)

97-103: Add a language to the streaming SSE code fence (MD040).

Specifying a language keeps linters quiet and improves rendering.

-```
+```text
 data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

 data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

 data: [DONE]

---

`228-247`: **Tidy provider model lists to avoid punctuation glitches and improve scanability.**

LanguageTool reports double-comma artefacts; adding spacing and keeping one model per bullet (or splitting the embeddings into a separate list) avoids these. Also ensure consistent capitalisation and Australian English where applicable.


```diff
-### OpenAI Models
-- `gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`
-- `text-embedding-ada-002`, `text-embedding-3-small`
+### OpenAI models
+- `gpt-4`
+- `gpt-4-turbo`
+- `gpt-3.5-turbo`
+- `text-embedding-ada-002`
+- `text-embedding-3-small`

-### Anthropic Models
-- `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku`
-- `claude-2.1`, `claude-2`, `claude-instant`
+### Anthropic models
+- `claude-3-opus`
+- `claude-3-sonnet`
+- `claude-3-haiku`
+- `claude-2.1`
+- `claude-2`
+- `claude-instant`

-### Google Models
-- `gemini-pro`, `gemini-pro-vision`
-- `palm-2`, `chat-bison`
+### Google (Gemini) models
+- `gemini-pro`
+- `gemini-pro-vision`
+- `palm-2`
+- `chat-bison`

-### AWS Bedrock Models
-- `bedrock/claude-3-opus`, `bedrock/claude-3-sonnet`
-- `bedrock/llama2-70b`, `bedrock/mistral-7b`
+### AWS Bedrock models
+- `bedrock/claude-3-opus`
+- `bedrock/claude-3-sonnet`
+- `bedrock/llama2-70b`
+- `bedrock/mistral-7b`

-### Together AI Models
-- `together_ai/llama-3-70b`, `together_ai/mixtral-8x7b`
-- `together_ai/qwen-72b`, `together_ai/deepseek-coder`
+### Together AI models
+- `together_ai/llama-3-70b`
+- `together_ai/mixtral-8x7b`
+- `together_ai/qwen-72b`
+- `together_ai/deepseek-coder`

Why: it removes the linter noise and reads more clearly in MkDocs with ToC and anchor links.

docs/content/compare/litellm.md (4)

34-44: Label fenced diagram as text to satisfy MD040 and improve rendering

The architecture ASCII block lacks a language hint. Mark it as text to stop markdownlint complaining and to keep formatting stable.

-```
+```text
 Application → Olla → Multiple Backends
                 ├── Ollama instance 1
                 ├── Ollama instance 2
                 ├── LM Studio instance
                 └── LiteLLM gateway → Cloud Providers
                                   ├── OpenAI API
                                   ├── Anthropic API
                                   └── 100+ other providers

---

`46-52`: **Also label the LiteLLM standalone diagram as text**

Same MD040 issue here. Add a language to the fence.



```diff
-```
+```text
 Application → LiteLLM → Provider APIs
                     ├── OpenAI API
                     ├── Anthropic API
                     └── Cohere API

---

`129-133`: **Label the side‑by‑side topology block**

Add a language (text) to keep lint happy and preserve spacing.



```diff
-```
+```text
 Applications
      ├── Olla → Local Models (Ollama, LM Studio)
      └── LiteLLM → Cloud Providers (OpenAI, Anthropic)

---

`169-180`: **Qualify latency/resource figures or link to benchmarks**

The stated overheads (<2 ms for Olla, 10–50 ms for LiteLLM, ~250 MB RAM combined) are valuable but will be questioned by readers. Please add a short footnote describing test setup (hardware, versions, workload) or link to an internal/benchmark page.

</blockquote></details>
<details>
<summary>config/profiles/litellm.yaml (7)</summary><blockquote>

`4-4`: **Trim trailing whitespace to satisfy yamllint**

There’s trailing whitespace on this and several other lines flagged by CI. Please strip trailing spaces across the file.


Apply a blanket whitespace clean-up (no functional changes).

---

`291-291`: **Prevent negative new_tokens when cache_read_tokens > total_tokens**

In edge cases (provider returns cache_read_tokens greater than total_tokens), this can go negative. Clamp at zero.



```diff
-      new_tokens: 'cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens'
+      new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))'

44-47: Duration format: consider seconds for parser portability

If the profile loader expects a numeric duration or seconds string, unquoted 5m can be misread. Recommend a seconds string to avoid ambiguity.

-  timeout: 5m  # Remote providers can be slow
+  timeout: "300s"  # Remote providers can be slow; explicit seconds avoids YAML/loader ambiguity

45-47: Default priority may diverge from docs examples

Docs show LiteLLM examples with priority 50–90, while profile default is 95. If the default is consumed by auto‑discovery, this could unexpectedly outrank local endpoints. Align or document the rationale.

Would you prefer setting default_priority to 75 to match the “gateway” examples?


97-128: Capability patterns are broad; risk of false positives

Patterns like gpt-5*, /function, /vision may over‑match custom model names, misclassifying capabilities. Narrow by provider prefixes where possible, or document precedence/order.

Happy to propose a tightened pattern set if desired.


254-276: Metrics extraction: variable availability and streaming behaviour

The calculations reference finish_reason, cache_* tokens; ensure these keys exist for all provider responses via LiteLLM, especially under streaming where usage may be absent. If not, default null handling should be explicit.

Consider documenting fallback behaviour for streaming (e.g., accumulate tokens from final usage delta).


291-291: Add newline at EOF

yamllint flags lack of newline at end of file.

-      new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))'
+      new_tokens: 'max(0, (cache_read_tokens != null ? total_tokens - cache_read_tokens : total_tokens))'
+
docs/content/integrations/backend/litellm.md (3)

411-417: Add language to header sample block

Label the header snippet to satisfy MD040 and improve readability.

-```
+```http
 X-Olla-Endpoint: litellm-gateway
 X-Olla-Backend-Type: litellm
 X-Olla-Model: gpt-4
 X-Olla-Request-ID: req_abc123
 X-Olla-Response-Time: 2.341s

---

`457-464`: **Heading style: lower‑case common nouns**

Minor wording polish for consistency with other headings.



```diff
-### Slow Response Times
+### Slow response times

533-541: Backend type key: confirm lmstudio vs lm-studio

Elsewhere we use lmstudio (file name, links). This example uses "lm-studio". Please align with the actual type expected by the config parser to avoid copy‑paste errors.

-  - url: "http://localhost:1234"
-    type: "lm-studio"
+  - url: "http://localhost:1234"
+    type: "lmstudio"
docs/content/index.md (3)

15-21: Fix HTML casing and alt text for accuracy and rendering consistency

  • Close the paragraph tag in lowercase to avoid mixed-case HTML (some renderers are strict).
  • The LM Deploy badge alt text says “Lemonade AI: OpenAI Compatible”, which is misleading and likely a copy/paste remnant.

Why: Keeps the badges accessible and prevents confusing screen-reader output.

-  </P>
+  </p>

Additionally, please correct the LM Deploy badge alt text on Line 20 to reference LM Deploy rather than Lemonade. If you want, I can supply a precise diff once you confirm preferred wording.


26-29: Clarify unification scope and LiteLLM positioning

The intro says “unified model catalogues across all providers,” which can read as cross-provider unification. Elsewhere (Key Features) it’s “per-provider unification” with cross-provider access via OpenAI-compatible APIs. Suggest tightening the wording here to avoid confusion and to position LiteLLM as the cloud gateway.

Why: Reduces ambiguity for readers configuring routing expectations.

-It intelligently routes LLM requests across local and remote inference nodes - including [Ollama](https://github.com/ollama/ollama), [LM Studio](https://lmstudio.ai/), [LiteLLM](https://github.com/BerriAI/litellm) (100+ cloud providers), and OpenAI-compatible endpoints like [vLLM](https://github.com/vllm-project/vllm). Olla provides model discovery and unified model catalogues across all providers, enabling seamless routing to available models on compatible endpoints.
+It intelligently routes LLM requests across local and remote inference nodes — including [Ollama](https://github.com/ollama/ollama), [LM Studio](https://lmstudio.ai/), OpenAI‑compatible endpoints like [vLLM](https://github.com/vllm-project/vllm), and [LiteLLM](https://github.com/BerriAI/litellm) (gateway to 100+ cloud providers). Olla provides model discovery and per‑provider unified catalogues, with cross‑provider access via OpenAI‑compatible APIs for seamless routing to available models.
-With native [LiteLLM support](integrations/backend/litellm.md), Olla bridges local and cloud infrastructure - use local models when available, automatically failover to cloud APIs when needed. Unlike orchestration platforms like [GPUStack](compare/gpustack.md), Olla focuses on making your existing LLM infrastructure reliable through intelligent routing and failover.
+With native [LiteLLM support](integrations/backend/litellm.md), Olla bridges local and cloud infrastructure — prefer local models when available and automatically fail over to cloud APIs when needed. Unlike orchestration platforms like [GPUStack](compare/gpustack.md), Olla focuses on making your existing LLM infrastructure reliable through intelligent routing and failover.

91-100: Document that X-Olla-Response-Time is sent as a trailer

Per prior guidance, X‑Olla‑Response‑Time is exposed as a trailer rather than a standard header. The table currently doesn’t mention this.

Why: Prevents users from missing the metric when inspecting only regular response headers.

-| `X-Olla-Response-Time` | Total processing time |
+| `X-Olla-Response-Time` | Total processing time (sent as a trailer) |

If you’d like, I can also add a short note beneath the table with a curl example showing --raw and --http2 usage to view trailers.

readme.md (2)

72-84: Make unordered list style consistent with markdownlint (MD004)

The linter expects dashes for unordered lists in this repository; these lines use asterisks.

Why: Consistent list markers keep the docs lint-clean in CI.

-* [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \
+ - [Ollama](https://github.com/ollama/ollama) - native support for Ollama, including model unification. \
   Use: `/olla/ollama/`
-* [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \
+ - [LM Studio](https://lmstudio.ai/) - native support for LMStudio, including model unification. \
   Use: `/olla/lmstudio/` || `/olla/lm-studio/` || `/olla/lm_studio/`
-* [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \
+ - [vLLM](https://github.com/vllm-project/vllm) - native support for vllm, including model unification. \
   Use: `/olla/vllm/` \
   Models from vLLM will be available under `/olla/models` and `/olla/vllm/v1/models`
-* [LiteLLM](https://github.com/BerriAI/litellm) - native support for LiteLLM, providing unified gateway to 100+ LLM providers. \
+ - [LiteLLM](https://github.com/BerriAI/litellm) - native support for LiteLLM, providing unified gateway to 100+ LLM providers. \
   Use: `/olla/litellm/` \
   Access models from OpenAI, Anthropic, Bedrock, Azure, Google Vertex AI, Cohere, and many more through a single interface
-* [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \
+ - [OpenAI](https://platform.openai.com/docs/overview) - You can use OpenAI API that provides a unified query API across all providers. \
   Use: `/olla/openai/`

If you prefer to fix this globally, I can prep a follow-up patch converting bullets across the README.


339-349: Note trailers and enumerate backend types for X‑Olla‑Backend‑Type

The sample is helpful, but:

  • Consider adding a comment that X‑Olla‑Response‑Time is a trailer.
  • Consider indicating possible values for X‑Olla‑Backend‑Type now that LiteLLM is supported.

Why: Avoids confusion when users don’t see the timing metric in regular headers and clarifies expected values.

 X-Olla-Endpoint: local-ollama     # Which backend handled it
 X-Olla-Model: llama4              # Model used
-X-Olla-Backend-Type: ollama       # Platform type
+X-Olla-Backend-Type: ollama       # Platform type (ollama|lmstudio|vllm|litellm|openai)
 X-Olla-Request-ID: req_abc123     # For debugging
-X-Olla-Response-Time: 1.234s      # Total processing time
+X-Olla-Response-Time: 1.234s      # Total processing time (sent as a trailer)
docs/content/usage.md (3)

32-34: Match markdownlint bullet style (MD004) to expected asterisks

This file’s lint rules expect asterisks, not dashes, for unordered lists.

Why: Keeps docs CI green and style consistent.

-- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support)
-- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed
+* **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support)
+* **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed

77-85: Tidy copy and reinforce unified-interface claim

Minor style/consistency tweak (use en dash) and re-emphasise that the unified interface is OpenAI‑compatible, which helps set expectations.

Why: Readers often skim; explicitly naming the interface reduces misconfiguration.

-Seamlessly combine local and cloud models with native LiteLLM support:
+Seamlessly combine local and cloud models with native LiteLLM support:

And optionally amend Line 85 to:

-- **Unified Interface**: One API endpoint for all models (local and cloud)
+- **Unified Interface**: One OpenAI‑compatible API endpoint for all models (local and cloud)

112-116: Bullet style: switch dashes to asterisks per file lint rule

Maintains consistency within this document (lint expects asterisks here).

-- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructure
+* **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on‑premise infrastructure
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5d2915b and ee3b07f.

📒 Files selected for processing (18)
  • CLAUDE.md (3 hunks)
  • config/config.yaml (1 hunks)
  • config/profiles/litellm.yaml (1 hunks)
  • docs/content/api-reference/litellm.md (1 hunks)
  • docs/content/api-reference/overview.md (1 hunks)
  • docs/content/compare/integration-patterns.md (1 hunks)
  • docs/content/compare/litellm.md (3 hunks)
  • docs/content/compare/overview.md (1 hunks)
  • docs/content/concepts/profile-system.md (3 hunks)
  • docs/content/index.md (2 hunks)
  • docs/content/integrations/backend/litellm.md (1 hunks)
  • docs/content/integrations/backend/lmstudio.md (2 hunks)
  • docs/content/integrations/backend/ollama.md (2 hunks)
  • docs/content/integrations/backend/vllm.md (2 hunks)
  • docs/content/integrations/overview.md (1 hunks)
  • docs/content/usage.md (2 hunks)
  • docs/mkdocs.yml (2 hunks)
  • readme.md (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{go,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Use Australian English for comments and documentation, and write comments explaining why rather than what

Files:

  • docs/content/integrations/backend/vllm.md
  • docs/content/integrations/backend/lmstudio.md
  • docs/content/api-reference/overview.md
  • docs/content/compare/integration-patterns.md
  • docs/content/integrations/overview.md
  • docs/content/api-reference/litellm.md
  • docs/content/usage.md
  • docs/content/compare/overview.md
  • docs/content/compare/litellm.md
  • docs/content/integrations/backend/litellm.md
  • docs/content/concepts/profile-system.md
  • CLAUDE.md
  • docs/content/index.md
  • docs/content/integrations/backend/ollama.md
  • readme.md
🧠 Learnings (1)
📚 Learning: 2025-08-11T11:48:06.628Z
Learnt from: CR
PR: thushan/olla#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-11T11:48:06.628Z
Learning: Applies to {internal/app/handlers/handler_proxy.go,internal/adapter/proxy/sherpa/service.go,internal/adapter/proxy/olla/service.go} : Ensure responses set the headers: X-Olla-Endpoint, X-Olla-Model, X-Olla-Backend-Type, X-Olla-Request-ID, and X-Olla-Response-Time (as a trailer)

Applied to files:

  • CLAUDE.md
  • docs/content/index.md
🪛 markdownlint-cli2 (0.17.2)
docs/content/compare/integration-patterns.md

130-130: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


147-147: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/content/api-reference/litellm.md

13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/usage.md

32-32: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


33-33: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


81-81: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


82-82: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


83-83: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


84-84: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


112-112: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


113-113: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


114-114: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


115-115: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

docs/content/compare/litellm.md

35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/content/integrations/backend/litellm.md

411-411: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

readme.md

79-79: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

🪛 LanguageTool
docs/content/api-reference/litellm.md

[typographical] ~229-~229: Two consecutive commas
Context: ...OpenAI Models - gpt-4, gpt-4-turbo, gpt-3.5-turbo - text-embedding-ada-002, text-embedding-3-small ### Anthropic...

(DOUBLE_PUNCTUATION)


[typographical] ~233-~233: Two consecutive commas
Context: ...s - claude-3-opus, claude-3-sonnet, claude-3-haiku - claude-2.1, claude-2, claude-instant ### Googl...

(DOUBLE_PUNCTUATION)


[typographical] ~237-~237: Two consecutive commas
Context: ...ant ### Google Models -gemini-pro, gemini-pro-vision-palm-2, chat-bison` ### AWS Bedrock Models - ...

(DOUBLE_PUNCTUATION)


[typographical] ~241-~241: Two consecutive commas
Context: ...drock Models - bedrock/claude-3-opus, bedrock/claude-3-sonnet - bedrock/llama2-70b, bedrock/mistral-7b ### Together AI M...

(DOUBLE_PUNCTUATION)


[typographical] ~244-~244: Consider inserting a comma for improved readability.
Context: ...70b, bedrock/mistral-7b ### Together AI Models -together_ai/llama-3-70b, to...

(MISSING_COMMAS)


[typographical] ~245-~245: Two consecutive commas
Context: ... AI Models - together_ai/llama-3-70b, together_ai/mixtral-8x7b - together_ai/qwen-72b, together_ai/deepseek-coder --- ## R...

(DOUBLE_PUNCTUATION)

docs/content/integrations/backend/litellm.md

[uncategorized] ~459-~459: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...sponse Times 1. Check LiteLLM logs for rate limiting 2. Monitor provider API status 3. Enabl...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 YAMLlint (1.37.1)
config/profiles/litellm.yaml

[error] 4-4: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 96-96: trailing spaces

(trailing-spaces)


[error] 128-128: trailing spaces

(trailing-spaces)


[error] 138-138: trailing spaces

(trailing-spaces)


[error] 153-153: trailing spaces

(trailing-spaces)


[error] 164-164: trailing spaces

(trailing-spaces)


[error] 175-175: trailing spaces

(trailing-spaces)


[error] 227-227: trailing spaces

(trailing-spaces)


[error] 243-243: trailing spaces

(trailing-spaces)


[error] 248-248: trailing spaces

(trailing-spaces)


[error] 261-261: trailing spaces

(trailing-spaces)


[error] 269-269: trailing spaces

(trailing-spaces)


[error] 272-272: trailing spaces

(trailing-spaces)


[error] 275-275: trailing spaces

(trailing-spaces)


[error] 277-277: trailing spaces

(trailing-spaces)


[error] 281-281: trailing spaces

(trailing-spaces)


[error] 286-286: trailing spaces

(trailing-spaces)


[error] 289-289: trailing spaces

(trailing-spaces)


[error] 291-291: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (14)
docs/content/integrations/backend/lmstudio.md (1)

66-66: Confirmed base path ‘/olla’; no changes required

  • docs/mkdocs.yml (line 4) sets site_url: https://thushan.github.io/olla/, so the site is indeed served under /olla.
  • All references to the Routing Prefixes link use the absolute path /olla/concepts/profile-system/#routing-prefixes across the integration docs (vllm.md, ollama.md, lmstudio.md, litellm.md).

The absolute links are correct and will not 404 in the current deployment.

docs/content/integrations/backend/vllm.md (2)

14-17: Add “Since” row—looks good; verify version v0.0.16.

Nice for provenance. Please confirm v0.0.16 matches the first vLLM doc/profile introduction to avoid confusing readers.

You can reuse the approach from the LM Studio comment, swapping paths for config/profiles/vllm.yaml.


66-66: No changes needed—base path is correctly configured

The MkDocs configuration (docs/mkdocs.yml) sets site_url: https://thushan.github.io/olla/, which means all absolute links beginning with /olla/ will resolve correctly on your GitHub Pages deployment.

docs/content/integrations/backend/ollama.md (1)

64-65: Absolute link for Routing Prefixes is consistent—nice.

This aligns with other backend docs and the new explicit anchor in the concepts page.

CLAUDE.md (1)

4-4: Naming consistency verified—no changes required

I’ve reviewed occurrences of “LiteLLM” and litellm across CLAUDE.md and the wider docs. Narrative text consistently uses “LiteLLM” and configuration identifiers/types use litellm, so there’s no further action needed.

docs/content/api-reference/overview.md (1)

54-58: Good addition: LiteLLM API section is placed logically after vLLM.

Positioning and description read well and match the new /olla/litellm/* prefix.

docs/content/compare/integration-patterns.md (1)

382-386: Script generated to verify schema consistency of the health settings. Once the results are in, we can confirm whether the docs need to use discovery.health_check or the top‐level health: key and then update the review accordingly.

docs/content/integrations/overview.md (1)

18-18: LGTM — adds LiteLLM to backends list

Consistent with the new backend page and profile. No issues.

docs/mkdocs.yml (1)

155-156: LGTM — navigation entries for LiteLLM

Nav paths match the added pages. Indentation and grouping are consistent with existing sections.

readme.md (5)

11-14: Badge addition looks good

LiteLLM badge matches the others in style and casing.


34-35: Good positioning: “alongside LiteLLM”

This reframe reduces perceived overlap and sets correct expectations.


175-181: Architecture bullets read well

Clear guidance on where LiteLLM fits in the routing path.


441-449: YAML example looks correct and practical

Good inclusion of model and health endpoints for LiteLLM; sensible priorities and timeouts.


497-497: FAQ update is accurate and helpful

Clear delineation between infrastructure proxy (Olla) and API gateway (LiteLLM), and how they complement each other.

Comment on lines +209 to +217
"status": "healthy",
"models": 25,
"providers": [
"openai",
"anthropic",
"bedrock",
"gemini",
"together_ai"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Provider naming inconsistency: ‘google’ vs ‘gemini’. Please confirm canonical label.

The models list uses owned_by: "google" while the health example lists "gemini" as a provider. Pick one naming scheme to avoid confusing users and align with LiteLLM’s provider IDs.

Would you like me to update both examples to the canonical provider key used by LiteLLM (e.g., “gemini” if that’s what LiteLLM reports)? I can follow up with a PR diff once confirmed.

You can also ask me to check the latest LiteLLM docs:


🏁 Script executed:

#!/bin/bash
# Search for owned_by occurrences in the LiteLLM API reference
grep -R "owned_by" -n docs/content/api-reference/litellm.md || true

# Search for occurrences of "gemini" in the same file
grep -R "\"gemini\"" -n docs/content/api-reference/litellm.md || true

# Search for occurrences of "google" in the same file
grep -R "\"google\"" -n docs/content/api-reference/litellm.md || true

Length of output: 364


Align LiteLLM provider IDs: replace “google” with “gemini”

The API reference currently lists a Gemini-owned model under "owned_by": "google", but LiteLLM’s canonical provider key for Google AI Studio (Gemini) is gemini, not google (docs.litellm.ai).

Please update in docs/content/api-reference/litellm.md:

  • Lines 139 (both occurrences): change
    -      "owned_by": "google"
    +      "owned_by": "gemini"

With this change, the models listing and health example will consistently use the same provider ID.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/content/api-reference/litellm.md around lines 209 to 217, the example
output and model entries use the provider ID "google" instead of LiteLLM's
canonical "gemini"; update any occurrences of the provider key "google"
(including the "owned_by" fields and the providers list) to "gemini" so the
models listing and health example consistently use the same provider ID.

Comment on lines +14 to +17
<tr>
<th>Since</th>
<td>Olla <code>v0.0.1</code></td>
</tr>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

“Since” row added; please validate v0.0.1.

Great to document history. Just confirm that v0.0.1 is the intended baseline for Ollama integration in this repo’s release history.

If tags exist locally:


🏁 Script executed:

#!/bin/bash
git fetch --tags --quiet || true
git tag --sort=taggerdate | head -n 10

Length of output: 143


Let’s verify whether the v0.0.1 tag actually exists and identify the earliest release that included the Ollama integration documentation. Please run:


🏁 Script executed:

#!/bin/bash
# Check if v0.0.1 exists
echo "Checking for v0.0.1:"
git tag -l v0.0.1

# Find the commit that added the Ollama docs file
file="docs/content/integrations/backend/ollama.md"
add_commit=$(git log --diff-filter=A --follow --format=%H -- "$file" | tail -n1)
echo "Commit that added $file: $add_commit"

# List tags containing that commit (earliest first) and show the earliest
echo "Tags containing that commit:"
git tag --contains $add_commit --sort=taggerdate
echo "Earliest tag including the Ollama docs:"
git tag --contains $add_commit --sort=taggerdate | head -n1

Length of output: 597


Let’s pinpoint the first released tag that actually contains the Ollama integration doc. Please run:


🏁 Script executed:

#!/bin/bash
# Find the earliest tag (by semver) that includes docs/content/integrations/backend/ollama.md
file="docs/content/integrations/backend/ollama.md"
git tag | sort -V | while read tag; do
  if git ls-tree -r "$tag" --name-only | grep -qx "$file"; then
    echo "Earliest tag containing $file: $tag"
    break
  fi
done

Length of output: 1267


Update baseline version for Ollama integration
The earliest release containing docs/content/integrations/backend/ollama.md is v0.0.15, not v0.0.1. Please update the “Since” row accordingly:

• File: docs/content/integrations/backend/ollama.md
– Lines 14–17: change
<td>Olla <code>v0.0.1</code></td>
to
<td>Ollama <code>v0.0.15</code></td>

🤖 Prompt for AI Agents
In docs/content/integrations/backend/ollama.md around lines 14 to 17, the
"Since" table row incorrectly states the integration was introduced in "Olla
v0.0.1"; change the cell to read "Ollama v0.0.15" (fix typo "Olla"→"Ollama" and
update version to v0.0.15) so the row becomes: <td>Ollama
<code>v0.0.15</code></td>.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
docs/content/integrations/backend/lmstudio.md (1)

14-17: “Since” version corrected to v0.0.12 — looks right.

Thanks for aligning this with the tag that first introduced the LM Studio profile. This addresses the earlier review note cleanly.

docs/content/usage.md (2)

38-39: Good catch adding discovery.type: "static"

This resolves prior copy‑paste issues flagged on this snippet. Keeps examples consistent across the docs and avoids schema surprises.


90-91: Thanks for adding discovery.type: "static" here as well

This addresses the earlier review feedback and makes the Hybrid example ready to run.

docs/content/integrations/backend/litellm.md (1)

148-151: Prefix non‑OpenAI models in the CLI example so LiteLLM resolves them

Without provider prefixes, Anthropic models may not resolve correctly. This mirrors patterns used elsewhere in the doc.

-litellm --model gpt-3.5-turbo \
-        --model claude-3-haiku-20240307 \
+litellm --model openai/gpt-3.5-turbo \
+        --model anthropic/claude-3-haiku-20240307 \
         --port 4000
🧹 Nitpick comments (6)
docs/content/usage.md (3)

32-34: Standardise list markers to satisfy markdownlint (MD004) and keep style consistent

Use asterisks here to match the rest of the page and quiet the linter.

-- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support)
-- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed
+- **Cost Optimisation**: Priority routing (local first, cloud fallback via native [LiteLLM](integrations/backend/litellm.md) support)
+- **Hybrid Cloud**: Access GPT-4, Claude, and 100+ cloud models when needed

82-85: Unify bullet style with asterisks for consistency and linting

Keeps the document style consistent and resolves MD004.

-- **Smart Routing**: Use local models for sensitive data, cloud for complex tasks
-- **Cost Control**: Prioritise free local models, failover to paid APIs
-- **Best-of-Both**: GPT-4 for coding, local Llama for chat, Claude for analysis
-- **Unified Interface**: One API endpoint for all models (local and cloud)
+- **Smart Routing**: Use local models for sensitive data, cloud for complex tasks
+- **Cost Control**: Prioritise free local models, failover to paid APIs
+- **Best-of-Both**: GPT-4 for coding, local Llama for chat, Claude for analysis
+- **Unified Interface**: One API endpoint for all models (local and cloud)

114-117: Make list marker style consistent with earlier sections

Switch to asterisks for a consistent style and to pass MD004.

-- **Multi-Region Deployment**: Geographic load balancing and failover
-- **Enterprise Security**: Rate limiting, request validation, audit trails  
-- **Performance Monitoring**: Circuit breakers, health checks, metrics
-- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructure
+- **Multi-Region Deployment**: Geographic load balancing and failover
+- **Enterprise Security**: Rate limiting, request validation, audit trails  
+- **Performance Monitoring**: Circuit breakers, health checks, metrics
+- **Vendor Diversity**: Mix of cloud providers (via LiteLLM) and on-premise infrastructure
docs/content/integrations/backend/litellm.md (3)

67-71: Tiny formatting nit: add a space before “(default)”

Improves readability in the table.

-                <li><code>/litellm</code>(default)</li>
+                <li><code>/litellm</code> (default)</li>

248-255: Avoid future‑dating model names (e.g., “GPT‑5”) unless confirmed

To prevent readers misconfiguring unavailable models, keep this high‑level (e.g., “latest GPT‑4.x family”) or verify availability before naming.


277-284: Be explicit with provider prefixes for first curl example

You later emphasise provider‑prefixed routing; using openai/gpt-4 here prevents ambiguity if users haven’t defined aliases in LiteLLM.

-    "model": "gpt-4",
+    "model": "openai/gpt-4",
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ee3b07f and b9b7a5d.

📒 Files selected for processing (3)
  • docs/content/integrations/backend/litellm.md (1 hunks)
  • docs/content/integrations/backend/lmstudio.md (2 hunks)
  • docs/content/usage.md (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{go,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Use Australian English for comments and documentation, and write comments explaining why rather than what

Files:

  • docs/content/integrations/backend/litellm.md
  • docs/content/integrations/backend/lmstudio.md
  • docs/content/usage.md
🪛 LanguageTool
docs/content/integrations/backend/litellm.md

[uncategorized] ~459-~459: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...sponse Times 1. Check LiteLLM logs for rate limiting 2. Monitor provider API status 3. Enabl...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.17.2)
docs/content/usage.md

32-32: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


33-33: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


82-82: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


83-83: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


84-84: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


85-85: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


114-114: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


115-115: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


116-116: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


117-117: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (1.24)
🔇 Additional comments (3)
docs/content/integrations/backend/lmstudio.md (1)

66-66: Absolute /olla/ links are intentional and consistent with your MkDocs configuration

I’ve checked your docs/mkdocs.yml, and it specifies
site_url: https://thushan.github.io/olla/
which means all pages are served under the /olla/ path. All four backend integration pages also use the same absolute /olla/concepts/profile-system/#routing-prefixes link, and the #routing-prefixes anchor exists in concepts/profile-system.md. Since this matches your current hosting setup and is applied consistently across the docs, there’s no need to switch to relative paths at this time.

– docs/mkdocs.yml defines the base URL as https://thushan.github.io/olla/
– Four integration pages reference /olla/concepts/profile-system/#routing-prefixes
– The Routing Prefixes heading with {#routing-prefixes} is present in concepts/profile-system.md

No changes required here.

docs/content/usage.md (1)

36-57: YAML looks correct; minor TLS gotcha worth calling out

The addition of type: "static" and LiteLLM fields (model_url, health_check_url) reads well. One operational nuance: the second endpoint uses HTTPS with a private IP. If users don’t have a valid certificate chain for that host, requests may fail TLS verification. Consider a brief note on acceptable TLS options (e.g., trusted certs or any supported “skip verify” toggle) to reduce setup friction.

docs/content/integrations/backend/litellm.md (1)

291-312: Python example is solid and correctly uses the LiteLLM base path and prefixes

Good alignment with the routing model and earlier configuration guidance.

Comment on lines +88 to +100
```yaml
discovery:
static:
endpoints:
- url: "http://localhost:4000"
name: "litellm-gateway"
type: "litellm"
priority: 75
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add discovery.type: "static" for copy‑paste‑ready configuration

All other docs use type: "static"; adding it here reduces user confusion and aligns with the profile system.

 discovery:
+  type: "static"
   static:
     endpoints:
       - url: "http://localhost:4000"
         name: "litellm-gateway"
         type: "litellm"
         priority: 75
         model_url: "/v1/models"
         health_check_url: "/health"
         check_interval: 5s
         check_timeout: 2s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```yaml
discovery:
static:
endpoints:
- url: "http://localhost:4000"
name: "litellm-gateway"
type: "litellm"
priority: 75
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
```
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 88 to 100, the
example YAML is missing discovery.type: "static", so update the snippet to
include discovery.type: "static" (i.e., add a type: "static" field under the
discovery key) so the configuration is copy‑paste‑ready and consistent with
other docs.

Comment on lines +106 to +129
```yaml
discovery:
static:
endpoints:
# Primary LiteLLM instance
- url: "http://litellm-primary:4000"
name: "litellm-primary"
type: "litellm"
priority: 90
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s

# Secondary LiteLLM instance
- url: "http://litellm-secondary:4000"
name: "litellm-secondary"
type: "litellm"
priority: 70
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Do the same for the multi‑instance example

Ensures both examples follow the same schema and are runnable as‑is.

 discovery:
+  type: "static"
   static:
     endpoints:
       # Primary LiteLLM instance
       - url: "http://litellm-primary:4000"
         name: "litellm-primary"
         type: "litellm"
         priority: 90
         model_url: "/v1/models"
         health_check_url: "/health"
         check_interval: 5s
         check_timeout: 2s
         
       # Secondary LiteLLM instance  
       - url: "http://litellm-secondary:4000"
         name: "litellm-secondary"
         type: "litellm"
         priority: 70
         model_url: "/v1/models"
         health_check_url: "/health"
         check_interval: 5s
         check_timeout: 2s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```yaml
discovery:
static:
endpoints:
# Primary LiteLLM instance
- url: "http://litellm-primary:4000"
name: "litellm-primary"
type: "litellm"
priority: 90
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
# Secondary LiteLLM instance
- url: "http://litellm-secondary:4000"
name: "litellm-secondary"
type: "litellm"
priority: 70
model_url: "/v1/models"
health_check_url: "/health"
check_interval: 5s
check_timeout: 2s
```
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 106 to 129, the
multi-instance example must match the schema used in the single-instance example
so both are runnable; update the secondary (and any other) endpoint entries to
include the exact same set of keys and value formats as the primary (e.g., url,
name, type, priority, model_url, health_check_url, check_interval,
check_timeout), ensure consistent quoting and duration/string formats, and make
the structure identical so the example can be used as-is for multi-instance
deployments.

Comment on lines +337 to +356
```yaml
endpoints:
# Local models (free)
- url: "http://localhost:11434"
name: "local-ollama"
type: "ollama"
priority: 100

# LiteLLM with budget models
- url: "http://litellm-budget:4000"
name: "litellm-budget"
type: "litellm"
priority: 75

# LiteLLM with premium models
- url: "http://litellm-premium:4000"
name: "litellm-premium"
type: "litellm"
priority: 50
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align advanced “Cost‑Optimised Routing” with the discovery schema

Other pages use discovery: { type: "static", static: { endpoints: ... } }. Adopting the same here keeps things uniform and reduces setup errors.

-endpoints:
-  # Local models (free)
-  - url: "http://localhost:11434"
-    name: "local-ollama"
-    type: "ollama"
-    priority: 100
-  
-  # LiteLLM with budget models
-  - url: "http://litellm-budget:4000"
-    name: "litellm-budget"
-    type: "litellm"
-    priority: 75
-  
-  # LiteLLM with premium models
-  - url: "http://litellm-premium:4000"
-    name: "litellm-premium"
-    type: "litellm"
-    priority: 50
+discovery:
+  type: "static"
+  static:
+    endpoints:
+      # Local models (free)
+      - url: "http://localhost:11434"
+        name: "local-ollama"
+        type: "ollama"
+        priority: 100
+      
+      # LiteLLM with budget models
+      - url: "http://litellm-budget:4000"
+        name: "litellm-budget"
+        type: "litellm"
+        priority: 75
+      
+      # LiteLLM with premium models
+      - url: "http://litellm-premium:4000"
+        name: "litellm-premium"
+        type: "litellm"
+        priority: 50
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```yaml
endpoints:
# Local models (free)
- url: "http://localhost:11434"
name: "local-ollama"
type: "ollama"
priority: 100
# LiteLLM with budget models
- url: "http://litellm-budget:4000"
name: "litellm-budget"
type: "litellm"
priority: 75
# LiteLLM with premium models
- url: "http://litellm-premium:4000"
name: "litellm-premium"
type: "litellm"
priority: 50
```
discovery:
type: "static"
static:
endpoints:
# Local models (free)
- url: "http://localhost:11434"
name: "local-ollama"
type: "ollama"
priority: 100
# LiteLLM with budget models
- url: "http://litellm-budget:4000"
name: "litellm-budget"
type: "litellm"
priority: 75
# LiteLLM with premium models
- url: "http://litellm-premium:4000"
name: "litellm-premium"
type: "litellm"
priority: 50
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 337 to 356, the
endpoints block is defined at top level but must be wrapped to match the
discovery schema used elsewhere; replace the top-level endpoints YAML with a
discovery object: discovery: { type: "static", static: { endpoints: [
...existing endpoint entries... ] } } ensuring correct indentation and
preserving each endpoint entry exactly as-is inside static.endpoints so the
“Cost‑Optimised Routing” format matches other pages.

Comment on lines +360 to +377
```yaml
endpoints:
# US East region
- url: "http://litellm-us-east:4000"
name: "litellm-us-east"
type: "litellm"
priority: 100

# EU West region
- url: "http://litellm-eu-west:4000"
name: "litellm-eu-west"
type: "litellm"
priority: 100

# Load balance across regions
proxy:
load_balancer: "least-connections"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Same schema alignment for “Multi‑Region Setup”

Keeps the advanced examples consistent with the rest of the docs.

-endpoints:
-  # US East region
-  - url: "http://litellm-us-east:4000"
-    name: "litellm-us-east"
-    type: "litellm"
-    priority: 100
-  
-  # EU West region
-  - url: "http://litellm-eu-west:4000"
-    name: "litellm-eu-west"
-    type: "litellm"
-    priority: 100
-  
-  # Load balance across regions
-  # (rest unchanged)
+discovery:
+  type: "static"
+  static:
+    endpoints:
+      # US East region
+      - url: "http://litellm-us-east:4000"
+        name: "litellm-us-east"
+        type: "litellm"
+        priority: 100
+      
+      # EU West region
+      - url: "http://litellm-eu-west:4000"
+        name: "litellm-eu-west"
+        type: "litellm"
+        priority: 100
+  
+  # Load balance across regions
+  # (rest unchanged)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```yaml
endpoints:
# US East region
- url: "http://litellm-us-east:4000"
name: "litellm-us-east"
type: "litellm"
priority: 100
# EU West region
- url: "http://litellm-eu-west:4000"
name: "litellm-eu-west"
type: "litellm"
priority: 100
# Load balance across regions
proxy:
load_balancer: "least-connections"
```
discovery:
type: "static"
static:
endpoints:
# US East region
- url: "http://litellm-us-east:4000"
name: "litellm-us-east"
type: "litellm"
priority: 100
# EU West region
- url: "http://litellm-eu-west:4000"
name: "litellm-eu-west"
type: "litellm"
priority: 100
# Load balance across regions
proxy:
load_balancer: "least-connections"
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 360 to 377, the
"Multi-Region Setup" YAML block should follow the same schema used elsewhere in
the docs: replace the top-level endpoints list with providers (providers: - id:
<...> url: <...> type: <...> priority: <...>) by renaming each "name" to "id"
and the "endpoints:" key to "providers:", and update the proxy section to use
the canonical key name used elsewhere (rename "load_balancer" to the standard
key "strategy" and keep the value "least-connections") so the example aligns
with the rest of the documentation.

Comment on lines +526 to +541
endpoints:
# Local Ollama (highest priority)
- url: "http://localhost:11434"
type: "ollama"
priority: 100

# LM Studio (medium priority)
- url: "http://localhost:1234"
type: "lm-studio"
priority: 75

# LiteLLM for cloud (lower priority)
- url: "http://localhost:4000"
type: "litellm"
priority: 50
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use consistent provider type tokens and discovery schema

Elsewhere the type is lmstudio (no hyphen). Also, align with the discovery schema for consistency.

-endpoints:
-  # Local Ollama (highest priority)
-  - url: "http://localhost:11434"
-    type: "ollama"
-    priority: 100
-  
-  # LM Studio (medium priority)
-  - url: "http://localhost:1234"
-    type: "lm-studio"
-    priority: 75
-  
-  # LiteLLM for cloud (lower priority)
-  - url: "http://localhost:4000"
-    type: "litellm"
-    priority: 50
+discovery:
+  type: "static"
+  static:
+    endpoints:
+      # Local Ollama (highest priority)
+      - url: "http://localhost:11434"
+        type: "ollama"
+        priority: 100
+      
+      # LM Studio (medium priority)
+      - url: "http://localhost:1234"
+        type: "lmstudio"
+        priority: 75
+      
+      # LiteLLM for cloud (lower priority)
+      - url: "http://localhost:4000"
+        type: "litellm"
+        priority: 50
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
endpoints:
# Local Ollama (highest priority)
- url: "http://localhost:11434"
type: "ollama"
priority: 100
# LM Studio (medium priority)
- url: "http://localhost:1234"
type: "lm-studio"
priority: 75
# LiteLLM for cloud (lower priority)
- url: "http://localhost:4000"
type: "litellm"
priority: 50
```
discovery:
type: "static"
static:
endpoints:
# Local Ollama (highest priority)
- url: "http://localhost:11434"
type: "ollama"
priority: 100
# LM Studio (medium priority)
- url: "http://localhost:1234"
type: "lmstudio"
priority: 75
# LiteLLM for cloud (lower priority)
- url: "http://localhost:4000"
type: "litellm"
priority: 50
🤖 Prompt for AI Agents
In docs/content/integrations/backend/litellm.md around lines 526 to 541, the
provider type tokens are inconsistent (using "lm-studio" with a hyphen) and must
match the discovery schema and other docs that use "lmstudio"; update the
endpoint entries so the type tokens are consistent (e.g., use "lmstudio" instead
of "lm-studio") and verify all provider tokens ("ollama", "lmstudio", "litellm",
etc.) conform to the discovery schema naming and casing used elsewhere in the
repo.

Comment on lines +105 to +108
# Single endpoint accesses all models
# http://localhost:40114/olla/openai/v1/chat/completions
# Automatically routes to the right backend based on model name
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix endpoint path: route via LiteLLM, not the OpenAI shim

Using /olla/openai/... here bypasses the LiteLLM gateway introduced above. Point to /olla/litellm/v1/... so users actually test the Hybrid setup with provider‑prefixed routing.

-# http://localhost:40114/olla/openai/v1/chat/completions
+# http://localhost:40114/olla/litellm/v1/chat/completions
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Single endpoint accesses all models
# http://localhost:40114/olla/openai/v1/chat/completions
# Automatically routes to the right backend based on model name
```
# Single endpoint accesses all models
# http://localhost:40114/olla/litellm/v1/chat/completions
# Automatically routes to the right backend based on model name
🤖 Prompt for AI Agents
In docs/content/usage.md around lines 105 to 108, the endpoint path uses
/olla/openai/... which bypasses the LiteLLM gateway; update the example URL to
use the LiteLLM gateway path /olla/litellm/v1/... so the documentation
demonstrates routing via LiteLLM and provider‑prefixed hybrid routing (replace
the commented URL and any related examples accordingly).

@thushan thushan merged commit 7f91ce3 into main Aug 21, 2025
8 checks passed
@thushan thushan deleted the feature/backend/litellm branch August 21, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request llm-backend Issue is about an LLM Backend, provider or type. (Eg. Ollama, vllm)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant