Ollama Client: Local LLM Chrome Extension for Private AI Chat

A local-first browser extension for chat with local LLM providers (Ollama, LM Studio, llama.cpp) from the browser sidepanel.

Quick links: Install · Docs · Setup Guide · Privacy · Issues

What This Project Is

Ollama Client is a sidepanel chat extension for Chromium browsers, with Firefox support.

It is a local LLM Chrome extension where you choose provider endpoints and models.

Version v0.6.0 introduced multi-provider chat routing and local RAG workflows.

Who This Is For (and Who It Is Not For)

This is for

Developers who want an Ollama client directly in the browser UI.
Users who want an offline AI assistant workflow with local storage by default.
Users running local provider servers (Ollama, LM Studio, llama.cpp).
Contributors interested in browser extension + local model architecture.

This is not for

Users expecting cloud-SaaS reliability without running local infrastructure.
Teams requiring centralized cloud sync, SSO, and org admin controls.
Users who do not want to manage provider endpoints.

What Problem It Solves

Most AI browser tools assume hosted providers and account-based workflows.

This project focuses on local-first usage:

You configure the model endpoint.
Chat/session data is stored locally.
There is no built-in telemetry pipeline.

How It Differs From Alternatives

Sidepanel-native browser UX instead of a separate desktop app.
Multi-provider routing in one UI.
Built-in local retrieval flow (RAG with local LLMs).
Source-visible behavior for auditing and contribution.

Feature Coverage and Limits

Major feature	What works now	Current limitation
Multi-provider chat	Route chat to Ollama, LM Studio, llama.cpp	Routing defaults to Ollama if model mapping is missing
Model management	Pull/delete/unload/version support for Ollama	Equivalent management actions are not yet implemented for LM Studio/llama.cpp
Streaming	Token streaming via runtime port with cancel support	Message keys and some hook names still use legacy `ollama-*` naming
RAG with local LLMs	Local chunking, embedding, hybrid retrieval, context injection	Embeddings use provider-native/shared routes with Ollama fallback for reliability
File ingestion	TXT/MD/PDF/DOCX/CSV/TSV/PSV/HTML processing	Quality depends on file quality and chunking config
Persistence	Chat/session/files and vectors stored in Dexie/IndexedDB	SQLite exists as migration/auxiliary path, not primary runtime store
Browser support	Chromium workflow and Firefox workflow are supported	Firefox may need explicit origin/CORS setup

Architecture Overview

High-level flow:

Sidepanel/options UI collects prompt and settings.
UI opens runtime port to background.
Background resolves provider by selected model mapping.
Provider stream is relayed back to UI in chunks.
UI updates message state and persists chat data.
Optional RAG pipeline retrieves local context and appends it to prompt input.

Key directories:

src/sidepanel/*
src/options/*
src/background/*
src/contents/*
src/lib/providers/*
src/lib/embeddings/*

Build/runtime notes:

Extension framework: WXT (wxt CLI)

moved from Plasmo to WXT for more deterministic MV3 builds and explicit entrypoint/manifest control.
Settings hooks/storage wrapper: @plasmohq/storage (plasmoGlobalStorage)

Supported Providers

Default provider profiles:

Ollama (http://localhost:11434)
LM Studio (http://localhost:1234/v1)
llama.cpp server (http://localhost:8000/v1)

Clarifying examples:

If both Ollama and LM Studio expose llama3, mapping decides which backend handles chat.
If mapping is absent for a model ID, fallback provider is Ollama.

RAG With Local LLMs

In this project, RAG means local retrieval before generation:

Uploaded or chat text is chunked.
Chunks are embedded and stored in local vector storage.
Query-time retrieval selects relevant chunks.
Retrieved snippets are appended to generation context.

Clarifying example:

Upload a local API spec PDF, then ask: What headers are required for createUser?
Retrieved chunks from that PDF are included in prompt context before model response.

Browser-Only RAG Runtime Constraints

Current RAG runtime is intentionally browser-first:

extension context only (UI + background worker)
IndexedDB + in-memory index/cache
HTTP-based model/embedding access
graceful fallback over hard failure

Embedding strategy defaults:

provider-native embeddings when available
shared canonical target: all-MiniLM-L6-v2
silent background warmup
Ollama fallback for reliability

RAG implementation details and module boundaries:

Current behavior guide: docs/rag.md
Full audit and redesign: docs/rag-browser-core.md
Browser-first RAG contracts (TypeScript interfaces): src/lib/rag/core/interfaces.ts

Installation

For users

Install extension from the Chrome Web Store.
Start at least one provider endpoint.
Configure provider URL in settings.
Select a model and start chatting in the sidepanel.

Common endpoint examples:

http://localhost:11434 (Ollama)
http://localhost:1234/v1 (LM Studio)
http://localhost:8000/v1 (llama.cpp server)

For contributors

git clone https://github.com/Shishir435/ollama-client.git
cd ollama-client
pnpm install
pnpm dev

Common commands:

pnpm lint:check
pnpm test:run
pnpm build
pnpm package

Firefox commands:

pnpm dev:firefox
pnpm build:firefox
pnpm package:firefox

Basic Usage Flow

Start provider service.
Open extension settings and verify connection.
Select model.
Send prompt and monitor stream.
Optionally upload files for retrieval context.
Fork conversation by editing earlier user messages.

Advanced Usage

Provider and model control

Enable only providers you need.
Keep model names unique when possible.
Re-check mappings after model list changes.

Parameter tuning

Per-model parameters are stored locally (temperature, top_p, top_k, etc.).
Start with defaults, then tune one variable at a time.

RAG tuning

Adjust chunk size/overlap and retrieval thresholds.
Narrow retrieval scope when debugging noisy answers.

Limitations and Known Issues

Legacy key/message naming (ollama-*) remains in parts of multi-provider code.
Embedding support varies by provider; fallback keeps Ollama as reliability anchor.
Reranker exists but is disabled by default due extension CSP constraints.
Provider parity is incomplete for model management actions.
Runtime persistence is Dexie-first while SQLite migration path still exists.

Security and Privacy Notes

Privacy depends on endpoint choice.
If you configure a remote endpoint, prompts/responses are sent to that endpoint.
Do not expose provider APIs publicly without access controls.

Roadmap (Short and Realistic)

Provider-agnostic naming cleanup.
Clear single-source persistence strategy.
Better provider parity for management actions.
Better retrieval diagnostics.

Future Direction (Documentation Only)

Potential future architecture may include a desktop helper/local companion for heavier retrieval workloads.

Important constraints:

this is not implemented
browser-only mode remains first-class
core runtime does not depend on helper availability

Contributing (Summary)

Read CONTRIBUTING.md.
Keep PRs scoped and testable.
Include reproduction details for bug fixes.
Update docs when behavior changes.

Philosophy and Non-Goals

Philosophy:

Local-first operation.
Explicit behavior over hidden automation.
User control of endpoint and model settings.

Non-goals:

Managed cloud LLM platform behavior.
Hidden telemetry for growth metrics.
Abstracting away all local infrastructure responsibility.

License

MIT License: LICENCE

Name		Name	Last commit message	Last commit date
Latest commit History 402 Commits
.github		.github
.husky		.husky
.vscode		.vscode
assets		assets
docs		docs
public/assets		public/assets
src		src
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CNAME		CNAME
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
components.json		components.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
wxt.config.ts		wxt.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ollama Client: Local LLM Chrome Extension for Private AI Chat

What This Project Is

Who This Is For (and Who It Is Not For)

This is for

This is not for

What Problem It Solves

How It Differs From Alternatives

Feature Coverage and Limits

Architecture Overview

Supported Providers

RAG With Local LLMs

Browser-Only RAG Runtime Constraints

Installation

For users

For contributors

Basic Usage Flow

Advanced Usage

Provider and model control

Parameter tuning

RAG tuning

Limitations and Known Issues

Security and Privacy Notes

Roadmap (Short and Realistic)

Future Direction (Documentation Only)

Contributing (Summary)

Philosophy and Non-Goals

License

Documentation Map

About

Uh oh!

Releases 22

Sponsor this project

Uh oh!

Uh oh!

Languages

Uh oh!

License

Shishir435/ollama-client

Folders and files

Latest commit

History

Repository files navigation

Ollama Client: Local LLM Chrome Extension for Private AI Chat

What This Project Is

Who This Is For (and Who It Is Not For)

This is for

This is not for

What Problem It Solves

How It Differs From Alternatives

Feature Coverage and Limits

Architecture Overview

Supported Providers

RAG With Local LLMs

Browser-Only RAG Runtime Constraints

Installation

For users

For contributors

Basic Usage Flow

Advanced Usage

Provider and model control

Parameter tuning

RAG tuning

Limitations and Known Issues

Security and Privacy Notes

Roadmap (Short and Realistic)

Future Direction (Documentation Only)

Contributing (Summary)

Philosophy and Non-Goals

License

Documentation Map

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 22

Sponsor this project

Uh oh!

Uh oh!

Languages