Arona is Elysia's documentation search using AI with RAG
Standard RAG
When a user ask question, it will query the vector database to find the most relevant content, then use that content to answer the question
It clone Elysia documentation and index the content into embedding and BM25 index
The content will be chunked and vectorized then stored in Postgres Indexing has diff awareness, it will only update the new content
Indexing process is automated by
- Using webhook that triggers when Elysia documentation update
- Cron for 6 hours as a fallback
- Off the shelf RAG is expensive
We were looking for an API based RAG provider, but we couldn't find any that fit our needs and budget
Kapa, Mendable, and other RAG providers didn't tell the price upfront, and people said it can cost up to $1000 per month, which is not affordable for us, especially when we are still in early stage and need to be frugal with our expenses
We don't want to move to full API documentation provider because we have already invested in Vitepress and Vue ecosystem for building documentation, eg. Interactive Tutorial, Playground, etc
- We have already invested in Vitepress
We don't want to lose all the benefits of having our own documentation site and move to a third party provider that may not have the same level of customization and control as we have now
Because all of this we decided to build our own RAG system, which is more cost effective and gives us more control over the data and the features we want to implement
There are several ways to build RAG but we choose a more cost effective way to do things
This also give us more freedom to customize the RAG system to fit our specific needs and use cases, rather than being limited by the features and capabilities of a third party provider
- User request a Proof of Work to prove that they are not a bot, this is to prevent abuse and spam
- User send question alongside PoW, Turnstile token and checksum to prevent abuse
- Question then normalized using a small model for semantic search and caching
- Send question to main model for answering, it has tool for doing search and read page
- Normalized question will be used to query using BM25 and vector search
- The relevant content like normalized query, embedding, question-answer will be cached
- The answer will be returned to the user, and also stored in the cache for future reference
We don't use reranking API because it's expensive
Most of the stack is preferably self-hosted when possible:
- Elysia (obviously) - For building the API and backend
- ParadeDB - BM25 + Vector Search
- Dragonfly - Caching, Semantic Cache Search
- GPT OSS 120B - Main model for answering questions
- GPT OSS 20B - Query normalization for semantic search/caching
- OpenAI Embedding Small - Most cost effective embedding model for vector search
- Axiom - Logging and monitoring
- Turnstile - Proof of Work to prevent abuse and spam
- Set
.envvariables, you can refer to.env.examplefor the required variables - Run docker compose up -d to start the services
- Setup using
scripts/setup.ts, modify the script to fit your documentation repo