rag-ingest

Overview

rag-ingest is a simple pipeline for ingestion of documents into a vector store and serving them for retrieval-augmented generation (RAG). I am using this project to ingest the user manuals of all devices in my house and then chat with Llama 3.1 for troubleshooting any issues I have. I created this repo for educational purpose only, CPUs are not as efficient as GPUs when it comes to inference so for a real use-case it makes sense to work with CUDA cores for siginifcant performance gains.

This project allows you to:

Prepare and convert language models to the gguf format.
Split and embed text documents using CPU-agnostic libraries.
Store embeddings in a vector store for fast similarity search.

Goals

CPU-Agnostic: Use libraries that run efficiently on any CPU without requiring GPU acceleration.
Modular Pipeline: Separate steps for model preparation, document ingestion, and query serving.
Reproducible: Clear instructions to download models, convert formats, and ingest data.

Requirements

Python 3.8+
llama-cpp-python (for gguf model loading)
sentence-transformers or transformers (CPU mode)
faiss-cpu or chromadb
Hugging Face CLI (huggingface_hub)

Model Preparation

The below instructions are generic so you can download the model of your choosing. I went for Llama 3.1 8B where I downloaded the model files then converted it to gguf format. I tracked the instructions for my own setup in SETUP.md.

Get a pre-trained model

pip install huggingface_hub
huggingface-cli login
huggingface-cli repo clone <model-id> ./model

Convert to GGUF
GGUF is the llama.cpp “general-purpose” file format for quantized inference.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# build the converter
make
# convert
./convert-ggml-to-gguf models/<model>.bin models/<model>.gguf

What is GGUF?

GGUF is a binary format optimized for the llama.cpp runtime. It supports quantized weights, fast loading, and is CPU-friendly.

Document Ingestion & Vector Store

Text Splitting
Use a library like langchain or nltk to split large documents into smaller chunks.

Embedding
Compute embeddings with sentence-transformers or transformers in CPU mode:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

Vector Store
Store embeddings using FAISS or Chroma:

import faiss
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

What is a Vector Store?

A vector store is a database of numeric embeddings that supports fast similarity search. It allows you to retrieve the most relevant document chunks given a query embedding.

Pipeline Flow

flowchart TB
  %% Model prep
  subgraph ModelPrep["Model Preparation"]
    A["Download HF Model"] --> B["Convert to GGUF"]
    B --> C["Load GGUF Model"]
  end

  %% Document ingestion
  subgraph Ingest["Ingestion"]
    D["Raw Documents"] --> E["Text Splitting"]
    E --> F["Compute Embeddings"]
    F --> G["Store Embeddings in Vector Store"]
  end

  %% User query path
  subgraph Query["Query"]
    H["User Query"] --> I["Embed Query"]
    I --> J["Vector Store Lookup"]
    J --> K["Retrieve Chunks"]
    K --> L["LLM Inference (gguf)"]
    L --> M["Answer"]
  end

Usage

Clone this repo:

git clone https://github.com/akram0zaki/rag-ingest.git
cd rag-ingest

Follow Model Preparation and Document Ingestion steps above.
Run your query script pointing at the vector store and gguf model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hnsw_manuals		hnsw_manuals
.gitignore		.gitignore
README.md		README.md
SETUP.md		SETUP.md
chat.py		chat.py
ingest.py		ingest.py
requirements.txt		requirements.txt
web-chat.py		web-chat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rag-ingest

Overview

Goals

Requirements

Model Preparation

What is GGUF?

Document Ingestion & Vector Store

What is a Vector Store?

Pipeline Flow

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

akram0zaki/rag-ingest

Folders and files

Latest commit

History

Repository files navigation

rag-ingest

Overview

Goals

Requirements

Model Preparation

What is GGUF?

Document Ingestion & Vector Store

What is a Vector Store?

Pipeline Flow

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages