Codestin Search App

Project Structure

rag-starter/
├── data/                 # Place your documents here
├── notebooks/            # For experimentation
├── src/
│   ├── loader.py        # Document loading & chunking
│   ├── embedder.py      # Text embedding
│   ├── retriever.py     # Similarity search
│   ├── generator.py     # LLM generation
│   └── pipeline.py      # Main orchestration
├── main.py              # CLI interface
├── requirements.txt
└── README.md

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

If you are using the included virtual environment on Windows, run:

venv\Scripts\python -m pip install -r requirements.txt

2. Set Up Local LLM (Recommended)

Install Ollama for local LLM inference:

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download/windows

Pull a model:

ollama pull phi3
# or
ollama pull llama3

3. Add Your Documents

Place your documents in the data/ directory:

Text files: .txt
PDFs: .pdf (requires pypdf - figure it out)
And extend loader.py as needed

4. Run the Pipeline

# Interactive mode
python main.py --mode interactive

# Interactive mode with official web fallback
python main.py --mode interactive --enable-web-fallback --web-url https://www.unima.ac.mw/

# Demo mode
python main.py --mode demo

In interactive mode, you can also use:

teach to add a new Q/A pair to the local knowledge base and refresh the index
refresh to rebuild the vector index from the current files in data/
help to show the available commands

Hybrid RAG

The assistant can optionally use official UNIMA web pages as a fallback source when the local data/ knowledge base does not have enough information.

Example:

python main.py --mode interactive --enable-web-fallback --web-url https://www.unima.ac.mw/

You can repeat --web-url to add more official pages:

python main.py --mode interactive --enable-web-fallback \
  --web-url https://www.unima.ac.mw/ \
  --web-url <another-official-unima-page>

Recommended usage:

Keep local FAQ files as the main approved source
Add only official UNIMA pages as web fallback URLs
Use web fallback especially for current or updated information

Key Commands

# Run with custom settings
python main.py --llm-model llama3 --k 5 --chunk-size 300

# Run tests
python -m pytest

Troubleshooting

ModuleNotFoundError: No module named 'langchain_community': Your virtual environment is missing the project dependencies. Reinstall with venv\Scripts\python -m pip install -r requirements.txt
No documents loaded: Check data/ directory contains valid files
LLM not responding: Ensure Ollama is running (ollama serve)
Embedding errors: Verify sentence-transformers installed correctly

Resources

LangChain RAG: https://github.com/langchain-ai/rag-from-scratch
LlamaIndex: https://github.com/run-llama/llama_index
Ollama: https://ollama.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Structure

Setup Instructions

1. Install Dependencies

2. Set Up Local LLM (Recommended)

3. Add Your Documents

4. Run the Pipeline

Hybrid RAG

Key Commands

Troubleshooting

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project Structure

Setup Instructions

1. Install Dependencies

2. Set Up Local LLM (Recommended)

3. Add Your Documents

4. Run the Pipeline

Hybrid RAG

Key Commands

Troubleshooting

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages