- Daniel Egbo – @Danselem
A demonstration project for building LLM applications using Google's Gemini models, with LangChain, ChromaDB, and Retrieval-Augmented Generation (RAG). This app shows how to load documents, embed them, store in a vector DB, and answer queries based on context.
Update: Tracing has been integrated into the project with Arize Open Inference Telemetry to log traces and spans.
- ✅ Integration with Google Gemini (e.g.,
gemini-1.5-flash) - 📄 PDF ingestion and preprocessing
- ✂️ Intelligent document chunking with LangChain
- 🧠 Vector embeddings using Gemini
- 🗃️ ChromaDB as the vector store
- 🔎 Semantic search for relevant document chunks
- 💬 Question-answering using a RAG pipeline
- 📈 Tracing and observability with Arize Open Inference for end-to-end span-level logging. Phoenix Open Inference has been persisted with PostgreSQL and has been containerized with Docker. See the docker-compose file.
- 🧪 Includes example usage in the examples directory
Note: Sentence transformer embeddings has been integrated into the project, see the embedding module.
gemini_llm_app/
.
├── docker-compose.yaml
├── Dockerfile
├── LICENSE
├── Makefile
├── pyproject.toml
├── README.md
├── requirements.txt
├── src
│ ├── agent
│ │ └── tools.py
│ ├── embeddings
│ │ ├── gemini_embedding.py
│ │ └── sentence_embedding.py
│ ├── handlers
│ │ ├── __init__.py
│ │ └── error_handler.py
│ ├── ingest.py
│ ├── llm
│ │ ├── gemini_client.py
│ │ └── lang_gemini.py
│ ├── observability
│ │ └── arize_observability.py
│ ├── prompt_engineering
│ │ ├── __init__.py
│ │ ├── prompt.py
│ │ └── templates.py
│ ├── rag
│ │ ├── app.py
│ │ ├── apphybrid.py
│ │ ├── hybrid.py
│ │ └── lcel.py
│ ├── retrievers
│ │ └── retriever.py
│ ├── utils
│ │ ├── doc_loader.py
│ │ ├── doc_split.py
│ │ ├── download_file.py
│ │ ├── logger.py
│ │ ├── rate_limiter.py
│ │ └── setvars.py
│ └── vectors
│ └── chroma_vector.py
├── tests
│ ├── test_chroma_vector_pdf.py
│ ├── test_ingest_pdf.py
│ └── test_ingest.py
└── uv.lock
git clone https://github.com/Danselem/gemini_llm_app.git
cd gemini_llm_appThis project uses the uv project manager and make tools.
uv venv --python 3.11
source .venv/bin/activateuv pip install --all-extras --requirement backend/pyproject.tomlor
make installCreate a .env file at the project root and fill the environment variables with make env.
GOOGLE_API_KEY=your_gemini_api_key
MULTIMODAL_MODEL=gemini-1.5-flashThis project uses the Arize Open Inference for telemetry and logging of traces and spans. To start the telemetry server, run the command below.
make start-phoenixNote, ensure your Docker software is running before running the above command.
There are multiple examples in the examples directory for you to get started with, e.g.:
python -m examples.psummor
make psummTo run the RAG app, use the command below:
make appThe example code summarises a pdf file. There are multiple examples in the examples directory. You can also check out the Makefile to see other examples.
The PDF is downloaded from a URL if it's not already in data/pdfs.
The PDF is split into overlapping chunks using RecursiveCharacterTextSplitter from LangChain.
Each chunk is converted into a vector using Google Gemini Embedding or Sentence Transformer Embedding (via LangChain wrapper).
The chunks and their embeddings are stored in a local ChromaDB collection.
When a user asks a question, the app retrieves relevant chunks using vector similarity search.
The relevant chunks are passed to Gemini to answer the question contextually.
| Component | Role | Notes |
|---|---|---|
| LangChain | Orchestrates prompts, chains, and retrievers | Backbone for LLM workflows |
| ChromaDB | Vector store for embeddings | Local persistence for RAG |
| Google Gemini API | Primary LLM / embedding provider | Configurable model (e.g., gemini-1.5-flash) |
| sentence-transformers | Fallback / local embeddings | all-MiniLM-L6-v2 used via SentenceTransformer |
| Open Inference (Arize / Phoenix) | Telemetry & tracing | Observability for traces/spans |
| Redis (RedisSaver) | Short-term memory checkpointing | Used by langgraph checkpointing |
| Docker / docker-compose | Containerization & local telemetry stack | Phoenix / Postgres services |
| Python 3.11+ | Runtime | Project tested on Python 3.11 |
MIT License. See the LICENSE file.
-
Google for releasing the Gemini family of models
-
LangChain community for open-source tools
-
ChromaDB team for fast and easy vector storage
-
🔧 Add a simple web UI with Gradio or Streamlit for the RAG application.
-
📝 Ingest multiple documents and support multi-source QA.
-
🧠 Add caching to avoid re-embedding on re-runs.
-
📊 Integrate telemetry/tracing and observability with `Open Inference and Phoenix.