A Python project for simple, fully customizable local retrieval-augmented generation (RAG).
First, create a local clone of the repository:
git clone https://github.com/harrisonfloam/local-ragTo recreate the development environment, follow on in DEV.md.
The only hard requirements for normal execution are Docker Desktop, Ollama, and an internet connection for image building.
After installing Ollama, pull an embedding model and an LLM. nomic-embed-text is a good standard embedding model choice.
ollama pull gpt-oss:latest
ollama pull nomic-embed-text:latestStart the application, build the Docker image, and run tests quickly using commands from the Makefile. Run the following from the project root to start the application:
make upIf you don't want to install Make or you're more familiar with the Docker CLI, the same functionality and more are exposed via docker-compose.yml. To run the application:
docker compose up- Solve markdown formatting issues in Streamlit chat UI
- Improve document sync UX
- Improve document portal UI
- Add simple conversation memory
- Expose more configurable parameters
- Improve configuration panel UI
- Add summarization functionality
- Add smart chunking parameters
- Improve vector store error handling
- Prevent duplicate documents during ingest/sync
- Add streaming response logging/telemetry
- Fix documental portal state persistence
- Remove unnecessary Streamlit reruns after document deletion