Understanding
Retrieval-Augmented
Generation (RAG)
What is RAG? A Real-Life
Analogy
► Imagine a super-smart librarian in a vast library.
► Someone asks: “What are the benefits of a
Mediterranean diet?”
► Librarian searches the catalog, pulls relevant books,
and summarizes key points.
► RAG does this digitally: searches data and
generates
answers.
What is Retrieval-Augmented Generation
(RAG)?
► A hybrid AI framework combining:
► Retrieval: Finds relevant info from a dataset.
► Generation: Crafts coherent answers using AI.
► Used in chatbots, search tools, and research
assistants.
► Ensures answers are accurate and up-to-date.
Why Do We Need RAG?
► Traditional AI models have limitations:
► Limited Knowledge: Fixed at training time, can be
outdated.
► Hallucination: May generate incorrect info.
► Lack of Specificity: Struggles with domain-specific
data.
► RAG solves this by grounding answers in specific
documents.
How Does RAG Work?
1. Data Preparation: Build a knowledge base
(documents, PDFs, etc.).
2. Query Processing: User asks a question (e.g., “How
many vacation days?”).
3. Retrieval: Find relevant document chunks.
4. Generation: AI crafts a coherent answer.
5. Delivery: Return answer with source references.
Step 1: Data Preparation
► Collect Documents: E.g., employee handbook,
research papers.
► Chunking: Split into small pieces (100500 words).
► Embeddings: Convert chunks to numerical vectors
using models like BERT.
► Indexing: Store embeddings in a vector database
(e.g., FAISS, Pinecone).
Step 23: Query Processing and
Retrieval
► Query Embedding: Convert user question to a vector.
► Similarity Search: Compare query to document
embeddings using cosine similarity.
► Retrieve Top-K Chunks: E.g., top 5 most
relevant document pieces.
► Example: Query “How many vacation days?” retrieves
“20 days per year” chunk.
Step 45: Generation and Delivery
► Augmentation: Combine query and retrieved chunks
into a prompt.
► Generation: Use a language model (e.g., GPT,
LLaMA) to create an answer.
► Example Answer: “Employees get 20 vacation days per
year.”
► Delivery: Return answer with source (e.g.,
“Employee Handbook, Section 4.2”).
Benefits and
Limitations
Benefits: Limitations:
► Accurate, ► Depends on quality
fact-based of knowledge
answers. base.
► Flexible for any domain. ► Retrieval errors can
► Scalable with large affect answers.
datasets. ► Computational cost for
► Transparent with large datasets.
source references.
Real-World Applications
► Customer Support: Chatbots answering from
manuals or FAQs.
► Enterprise Search: Querying internal policies or
specs.
► Research: Summarizing papers with citations.
► Legal/Healthcare: Retrieving case law or medical
literature.
References
► Lewis, P., et al. (2020). “Retrieval-Augmented Generation
for Knowledge-Intensive NLP Tasks.”
https://arxiv.org/abs/2005.11401
► Gao, Y. (2023). “What is Retrieval-Augmented
Generation (RAG)?” Towards Data Science.
https://towardsdatascience.com/
what-is-retrieval-augmented-generation-rag
► Pinecone (2024). “RAG: The Definitive Guide to
Retrieval-Augmented Generation.”
https://www.pinecone.
io/learn/retrieval-augmented-generation/
Questions
?
Thank you! Any
questions?