Below is a step-by-step walkthrough of the app:
| Step 1 | Step 2 |
|---|---|
| Step 3 | Step 4 |
|---|---|
| Step 5 | Step 6 |
|---|---|
Follow these simple steps to get started with Offline Doc Chat:
-
Set Your Ollama Endpoint and Model: Navigate to the Settings panel and configure the connection to your locally hosted Ollama instance.
-
Upload Your Documents: Drag and drop your files into the upload area. The system will automatically process and embed them in the background.
-
Start Asking Questions: Once processing is complete, you can ask natural language questions about your documents — all without internet access!
You can fine-tune the entire RAG (Retriever-Augmented Generation) pipeline via Settings > Show Advanced Options. This gives you full control over model behavior and retrieval strategy.
| Setting | Description | Default |
|---|---|---|
| Ollama Endpoint | URL to your locally hosted Ollama API instance | http://localhost:11434 |
| Model | LLM to use for generating chat completions | (User-defined) |
| System Prompt | Initial prompt used to condition the LLM | (See source code) |
| Top K | Number of most similar documents to retrieve per query | 3 |
| Chat Mode | Llama Index chat mode for retrievals | Best |
| Setting | Description | Default |
|---|---|---|
| Embedding Model | Model used to vectorize uploaded files | bge-large-en-v1.5 |
| Chunk Size | Text length per chunk to improve embedding granularity | 1024 |
Local RAG is powered by llama-index and utilizes its SimpleDirectoryReader() to process a wide variety of document types (e.g. .pdf, .md, .ipynb). Here's what happens under the hood:
- Each file is split into smaller logical units. For example, a multi-page PDF is separated into one document per page.
- These documents are then chunked based on the configured
chunk_size, with optionalchunk_overlapto preserve context. - The resulting chunks are embedded using the selected model and stored for fast retrieval.
| Parameter | Description |
|---|---|
chunk_size |
Determines the size of each embedded chunk. Smaller chunks = higher quality |
chunk_overlap |
Defines text overlap between chunks. Helps maintain contextual flow |
You can tweak these settings to balance between embedding precision, retrieval speed, and system performance.
💡 Tip: Experimenting with different
chunk_sizeandchunk_overlapvalues can significantly affect answer quality. A smaller chunk size may be better for detailed queries, while larger chunks can improve speed.