Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
73 views56 pages

Lecture # 14-1 Introduction To RAG

The document discusses Retrieval-Augmented Generation (RAG) and Prompt Engineering in the context of Generative AI. RAG enhances language model generation by incorporating external knowledge, while prompt engineering involves creating effective instructions for AI tools to achieve desired outcomes. The document also covers various techniques and considerations for implementing RAG and prompt engineering effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views56 pages

Lecture # 14-1 Introduction To RAG

The document discusses Retrieval-Augmented Generation (RAG) and Prompt Engineering in the context of Generative AI. RAG enhances language model generation by incorporating external knowledge, while prompt engineering involves creating effective instructions for AI tools to achieve desired outcomes. The document also covers various techniques and considerations for implementing RAG and prompt engineering effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

National University of Computer and Emerging Sciences

RAG and Prompt Engineering

AI-4009 Generative AI
Dr. Akhtar Jamil
Department of Computer Science

04/23/2025 Presented by Dr. AKHTAR JAMIL 1


Goals
• Recap of Previous Lecture
• Today’s Lecture
– Retrieval Augmented Generation (RAG)
– Prompt Engineering

04/23/2025 Presented by Dr. AKHTAR JAMIL 2


Retrieval-Augmented Generation (RAG)
• Retrieval-Augmented Generation (RAG) is a technique
that enhances language model generation by
incorporating external knowledge.
• Retrieving relevant information from an external
database and using that information to inform the
generation process.
• Useful for Generative AI applications over private or
custom datasets
• Use-cases like chatbots, question-answering, as well as
research and analysis.
04/23/2025 Presented by Dr. AKHTAR JAMIL 3
Retrieval-Augmented Generation (RAG)
• LLMs are trained on a vast amount of textual data
• Their capabilities are based on the knowledge they
acquire from this data.
– Knowledge Cut-off
• Can’t answer a question about data that is not part of
their training set
– May not be able to respond accurately
• May give a refusal (where the LLM responds with “I don’t
know”)
• Or worse, a hallucination.
04/23/2025 Presented by Dr. AKHTAR JAMIL 4
Limitations of current LMMs: Hallucination
• What’s a Hallucination?
Model can generate text that is factually incorrect or nonsensical to the provided source content

• Possible Hallucination causes:


- Data: flawed data sources and the inferior utilization of factual knowledge captured in the data
- Training: architecture flaw and misalignment
- Inference: inherent randomness of decoding strategies and imperfect decoding representation

04/23/2025 Presented by Dr. AKHTAR JAMIL 5


RAG Pipeline
• Steps of RAG:

04/23/2025 Presented by Dr. AKHTAR JAMIL 6


QA with Retrieval Augmented Generation
How many parameters are there in
Grok-0?

Query

Documents Web Pages

Split into Embeddings


chunks Prompt
Template LLM
Searc
h

Stor Top-K
Context Answer
Embeddings e Vector DB

[...] Grok-0, the prototype LLM


After announcing xAI, we trained a prototype LLM (Grok- mentioned in the provided
0) with 33 billion parameters. This early model context, is stated to have
approaches LLaMA 2 (70B) capabilities on standard been trained with 33 billion
LM benchmarks but uses only half of its training parameters.
resources.
[...]

04/23/2025 Presented by Dr. AKHTAR JAMIL 7


Retrieval Augmented Generation: Ingestion
• Chunking
• Chunking is the process of dividing the prompts and/or the
documents to be retrieved, into smaller, manageable segments
or chunks.
• These chunks can be defined either by a fixed size, such as a
specific number of characters, sentences or paragraphs.
• Each chunk is encoded into an embedding vector for retrieval.
• Smaller, more precise chunks lead to a finer match between
the user’s query and the content
• Enhancing the accuracy and relevance of the information retrieved.

04/23/2025 Presented by Dr. AKHTAR JAMIL 8


Retrieval Augmented Generation: Ingestion
• Embeddings
• Transforming both the user’s query (prompt) and the
documents in the knowledge base into a format that can be
effectively compared for relevance.
• This is called Embeddings
• Important to retrieve the most relevant information from its
knowledge base in response to a user query.
• Many embedding models; select the one that best suited for
your task
– HuggingFace’s
Massive Text Embedding Benchmark (MTEB) leaderboard.
04/23/2025 Presented by Dr. AKHTAR JAMIL 9
Retrieval Augmented Generation: Ingestion
• Sparse embedding:
• It refers to a method of representing words or phrases in a
high-dimensional space where most of the vector
elements are zeros.
• E.G. TF-IDF are great for lexical (word) matching the
prompt with the documents.
• It’s computationally less intensive but may not capture
the deeper semantic meanings in the text.

04/23/2025 Presented by Dr. AKHTAR JAMIL 10


Retrieval Augmented Generation: Ingestion
• Semantic embedding: Semantic embeddings,
such as BERT or SentenceBERT are more suitable
for RAG use cases.
• BERT: Suitable for capturing contextual
information in both the documents and queries.
• Requires more computational resources
• Offers more semantically rich embeddings.

04/23/2025 Presented by Dr. AKHTAR JAMIL 11


Sentence BERT
• SentenceBERT: Ideal for scenarios where the context
and meaning at the sentence level are important.
• This is usually the preferred for RAG application
• For example: BERT may not easily distinguish between
sentences like "Polar bears endangered as glaciers melt"
and "Polar bears feed on fish"
• SentenceBERT can better understands sentence
meanings

04/23/2025 Presented by Dr. AKHTAR JAMIL 12


Sentence BERT
• BERT as a powerful pre-trained network for Mask Language
Prediction or Sentence prediction.
• BERT faces a challenge in creating detailed sentence
embeddings.
• Sentence-BERT is making sentence embedding generation more
straightforward.
• Uses Siamese architectures, which effectively produce deeply
meaningful sentence embeddings, boosting its performance.
• Uses Siamese Network and triplet loss

04/23/2025 Presented by Dr. AKHTAR JAMIL 13


SentenceBERT

04/23/2025 Presented by Dr. AKHTAR JAMIL 14


Sentence BERT
• Bi-Encoder: This encoder processes sentences independently,
resulting in two distinct embeddings
• Embeddings are then compared to determine sentence similarity
– Cosine similarity.
• Cross-Encoder: This encoder processes sentences together
through the Transformer network.
– It outputs a similarity score ranging from 0 to 1.

04/23/2025 Presented by Dr. AKHTAR JAMIL 15


Sentence BERT

04/23/2025 Presented by Dr. AKHTAR JAMIL 16


Sentence BERT
• Sentence-BERT improves upon the original BERT model by
integrating a pooling step with its outputs.
• This addition transforms the outputs into consistent-sized
sentence embeddings.

04/23/2025 Presented by Dr. AKHTAR JAMIL 17


Which Tokenizer to use?
• You can check the best suited Model e.g. from HuggingFace’s
MTEB:
• Massive Text Embedding Benchmark (MTEB) Leaderboard

04/23/2025 Presented by Dr. AKHTAR JAMIL 18


Retrieval Augmented Generation: Ingestion
• Response:
• The model synthesizes the retrieved information with its pre-
trained knowledge to generate coherent and contextually
relevant responses.
• This process involves a generative relevant response that is not
only informative but also aligns with the user’s original query,
maintaining a natural and conversational tone.
• Creating an expanded prompt (with the retrieved top-k chunks)
for an LLM could enhance the RAG system’s effectiveness
• Make the system more performant.

04/23/2025 Presented by Dr. AKHTAR JAMIL 19


Vector DB: introduction
• A vector database stores vectors of fixed dimensions
(called embeddings)
• We can then query the database to search the closest
embeddings to a given query vector
– Employ a distance metric (such as cosine similarity, Euclidean
distance)
• Can use KNN (K Nearest Neighbor) algorithm or another
similarity search algorithm.

04/23/2025 Presented by Dr. AKHTAR JAMIL 20


Vector DB: introduction

04/23/2025 Presented by Dr. AKHTAR JAMIL 21


Vector DB: introduction
How many parameters are there in Grok-
0?

Documents Web Pages Query

Split into Embeddings


chunks
Searc
h

Stor Top-
Embeddings e Vector DB K Context
[...]
After announcing xAI, we trained a prototype LLM (Grok-0) with 33 billion parameters. This early
model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its
training resources.
[...]

04/23/2025 Presented by Dr. AKHTAR JAMIL 22


Prompt Engineering

04/23/2025 Presented by Dr. AKHTAR JAMIL 23


What is prompt
engineering?
Find an effective way to
communicate with AI tools

04/23/2025 Presented by Dr. AKHTAR JAMIL 24


What are prompts?
• Prompts involve instructions and context passed to a
language model to achieve a desired task
• Prompt engineering is the practice of developing and
optimizing prompts to efficiently use language models
(LMs) for a variety of applications
•Prompt engineering is a useful skill for AI engineers and
researchers to improve and efficiently use language models

What is prompt engineering?

Prompt engineering is a process of creating a set of prompts,


or questions, that are used to guide the user toward a desired
outcome. It is an effective tool for designers to create user
experiences that are easy to use and intuitive. This method is
often used in interactive design and software development, as
it allows users to easily understand how to interact with a
system or product..

04/23/2025 Presented by Dr. AKHTAR JAMIL 25


Why Prompt Engineering?
• Why learn prompt engineering?
• Important for research, discoveries, and advancement
• Helps to test and evaluate the limitations of LLMs
• Enables all kinds of innovative applications on top of LLMs

Source: https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed

04/23/2025 Presented by Dr. AKHTAR JAMIL 26


First Basic Prompt
Instructio
n

The sky is

blue

The sky is a beautiful blue color during the day. The


blue of the sky is created by the Earth’s atmosphere
Respons

scattering the sun’s light. The blue is a result of the


air molecules in the atmosphere reflecting the shorter
e

wavelength of blue light back to our eyes.

Model: “text-davinci-003”
temperature: 0.7
top-p: 1

04/23/2025 Presented by Dr. AKHTAR JAMIL 27


Elements of a Prompt
• A prompt is composed with the following
components:
• Instructions
• Context
• Input data
• Output indicator

Classify the text into neutral, negative or positive

Text: I think the food was okay.

Sentiment:

04/23/2025 Presented by Dr. AKHTAR JAMIL 28


Settings to keep in mind
• When prompting a language model you should keep in mind a few
settings
• You can get very different results with prompts when using
different settings
• One important setting is controlling how deterministic the model is
when generating completion for prompts
• Temperature and top_p are two important parameters to keep in
mind
• Generally, keep these low if you are looking for exact answers

04/23/2025 Presented by Dr. AKHTAR JAMIL 29


Settings to keep in mind
• The term "temperature" refers to a parameter that controls the
randomness of the model's responses.
• A scaling factor applied to the logits
• Keep them high if you are looking for more diverse responses
• Zero Temperature: Effectively turns the model into a greedy algorithm that
always chooses the most likely next word based on the model's training.
• Top-p (Nucleus Sampling): Focuses on randomness in a subset of the
most likely next tokens.
– Instead of sampling from the entire distribution, it selects the smallest set of words
whose cumulative probability exceeds the threshold p, and then samples only from
this set.

04/23/2025 Presented by Dr. AKHTAR JAMIL 30


Designing Prompts for Different Tasks
• In the next few slides, we will cover a few
examples of common tasks using different
prompts
• Tasks covered:
• Text Summarization
• Question Answering
• Text Classification
• Role Playing
• Code Generation
• Reasoning

04/23/2025 Presented by Dr. AKHTAR JAMIL 31


Text Summarization

Antibiotics are a type of medication used to treat


bacterial infections. They work by either killing the
bacteria or preventing them from reproducing, allowing
the body’s immune system to fight off the infection.
Contex

Antibiotics are usually taken orally in the form of


pills, capsules, or liquid solutions, or sometimes
t

administered intravenously. They are not effective


against viral infections, and using them inappropriately
can lead to antibiotic resistance.
Instructio

Explain the above in one sentence:


n

Antibiotics are medications used to treat bacterial


infections by either killing the bacteria or stopping
them from reproducing, but they are not effective against
viruses and overuse can lead to antibiotic resistance.

04/23/2025 Presented by Dr. AKHTAR JAMIL 32


Question Answering
Answer the question based on the context below. Keep the
answer short and concise. Respond "Unsure about answer"
if not sure about the answer.

Context: Teplizumab traces its roots to a New Jersey drug


company called Ortho Pharmaceutical. There, scientists
generated an early version of the antibody, dubbed OKT3.
Originally sourced from mice, the molecule was able to
bind to the surface of T cells and limit their cell-
killing potential. In 1986, it was approved to help
prevent organ rejection after kidney transplants, making
it the first therapeutic antibody allowed for human use.

Question: What was OKT3 originally sourced from?

Answer: Mice.

Context obtained from: https://www.nature.com/articles/d41586-023-00400-x

04/23/2025 Presented by Dr. AKHTAR JAMIL 33


Text Classification

Classify the text into neutral, negative or positive.

Text: I think the food was okay.

Sentiment: Neutral

04/23/2025 Presented by Dr. AKHTAR JAMIL 34


Role Playing

The following is a conversation with an AI research


assistant. The assistant tone is technical and
scientific.

Human: Hello, who are you?


AI: Greeting! I am an AI research assistant. How can I
help you today?
Human: Can you tell me about the creation of blackholes?
AI: Sure! Black holes are regions of spacetime where the
gravitational force is so strong that nothing, not even
light, can escape from it. They are created when a very
massive star dies and its core collapses in on itself,
forming a singularity of infinite density. The intense
gravity of the singularity pulls in all the matter and
radiation around it, creating the black hole.

04/23/2025 Presented by Dr. AKHTAR JAMIL 35


Code Generation

“””
Table departments, columns = [DepartmentId,
DepartmentName]
Table students, columns = [DepartmentId, StudentId,
StudentName]
Create a MySQL query for all students in the Computer
Science Department
“””

SELECT StudentId, StudentName


FROM students
WHERE DepartmentId IN (SELECT
DepartmentId WHERE
departments FROM DepartmentName = 'Computer Science');

04/23/2025 Presented by Dr. AKHTAR JAMIL 36


Reasoning

The odd number i this group add up to an even number: 15,


s n
32, 5, 13, 82, 7, 1.
Solve by breaking the problem into steps. First, identify
the odd numbers, add them, an indicate whether the result
Odd numbers: 15, 5, 13, 7, 1
Sum: 41 d
is odd or even.
41 is an odd number.

04/23/2025 Presented by Dr. AKHTAR JAMIL 37


Prompt Engineering Techniques
• Many advanced prompting techniques have
been designed to improve performance on
complex tasks
• Few-shot prompts
• Chain-of-thought (CoT) prompting
• Self-Consistency

04/23/2025 Presented by Dr. AKHTAR JAMIL 38


Few-shot prompts
• Few-shot prompting is a method used with large language models
(LLMs) like GPT-3
• The model is provided with a few examples to demonstrate a task
before being asked to perform a similar task.
• It allows the model's pre-trained knowledge and its ability to
generalize from a few examples, to understand the task at hand
and produce the desired output.

04/23/2025 Presented by Dr. AKHTAR JAMIL 39


Few-shot prompts

04/23/2025 Presented by Dr. AKHTAR JAMIL 40


Chain-of-Thought Prompting
• Commonly used method
• Introduces chain-of-thought prompting as a novel method.
• Significantly improves LLMs' complex reasoning tasks
performance.
– Arithmetic reasoning
– Commonsense reasoning
– symbolic reasoning

04/23/2025 Presented by Dr. AKHTAR JAMIL 41


Traditional Prompting vs Chain-of-Thought
• Traditional Prompting
– In-context few-shot learners
– Works poorly on tasks that require reasoning abilities
– Not improving with increasing language model scale
• Chain-of-thought
– Series of intermediate reasoning steps
– Mimic human step-by-step thought

04/23/2025 Presented by Dr. AKHTAR JAMIL 42


Chain-of-thought Prompting Principals
• Step-by-Step Guidance - Breaking complex problems into smaller steps
– “Let’s do it step by step.”

• Explicit Reasoning Chains - Creating prompts that detail the reasoning process step
by step
– “Consider the dangers of lightning. Next, evaluate the safety of being Step 1
outdoors vs. indoors. Conclude with the safest option.”

Step n
• Use of Examples - Providing solved examples with detailed reasoning step
– “If A is bigger than B and B is bigger than C, then A is the biggest. For
example, if A = 5, B = 3, and C = 2, then 5 is the biggest.”

• Iterative Refinement - Refining prompts based on model output for clarity and
effectiveness
– "Consider the relationship between A, B, and C step by step, starting with
comparing A and B.”

04/23/2025 Presented by Dr. AKHTAR JAMIL 43


Attractive Properties of CoT
• Enables decomposition of multi-step problems.

• Provides interpretable insights into LLM's reasoning.


– How the answer was derived?
– Easy to debug when reasoning path went wrong

• Broad applicability across diverse reasoning tasks.


– (Any) problems solvable via language.

• Simple, off-the-shelf
– No addition training or fine-tuning needed

04/23/2025 Presented by Dr. AKHTAR JAMIL 44


Evaluation - Arithmetic Reasoning
• Benchmark:
– Five math word problems: GSM8K, SVAMP, ASDiV, AQuA, MAWPS
– “Josh decides to try flipping a house. He buys a house for $80,000 and then puts in $50,000
in repairs. This increased the value of the house by 150%. How much profit did he make?”
(GSM8K)
• Baseline: Standard prompting with in-context exemplars
• CoT: 8 few-shot exemplars with chains of thought for prompting
• Language models:
– GPT-4
– LaMDA
– PaLM
– UL2 20B
– Codex

04/23/2025 Presented by Dr. AKHTAR JAMIL 45


Evaluation - Arithmetic Reasoning
• CoT is an emergent ability of the model scale
– No effect on small models

• Harder problems have a greater improvement


– Performance doubled on GSM8K

• Error analysis on LaMDA 137B for GSM8K


– 46% - calculator errors, symbol mapping error
– 54% - semantic misunderstanding

• PaLM 62B -> PaLM 540B

04/23/2025 Presented by Dr. AKHTAR JAMIL 46


Limitation of CoT
• Emulate the thought process, but not necessarily
reasoning
• Cost of manually augmenting exemplars with CoTs
• Emergence of CoT reasoning only at large model scale
• No guarantee of correct reasoning paths

04/23/2025 Presented by Dr. AKHTAR JAMIL 47


Self-Consistency Improves Chain of Thought Reasoning in LLMs

• Analogous to the human way of thinking:


– If multiple different ways of thinking lead to the same answer, one has
greater confidence that the final answer is correct.
– Complex reasoning tasks typically admit multiple reasoning paths that
reach a correct answer.
• Correct reasoning processes tend to have greater agreement in
their final answer than incorrect processes.

04/23/2025 Presented by Dr. AKHTAR JAMIL 48


Traditional CoT vs CoT w/ Self Consistency

04/23/2025 Presented by Dr. AKHTAR JAMIL 49


3 steps of Self Consistency
1. Prompting LLM with CoT
2. Sampling from the LLM decoder to get diverse reasoning paths
3. Selecting the most consistent answers (e.g. majority voting)

04/23/2025 Presented by Dr. AKHTAR JAMIL 50


Another example

04/23/2025 Presented by Dr. AKHTAR JAMIL 51


Understanding the Consistency
• Decomposition of decoder output into
– Reasoning path
– Final answer
• Example
– Common Sense reasoning
• Q: Yes or no: Would a pear sink in water?
• : Considering that pears have a density less than water, we can infer that a pear would float in
water.
• : The answer is yes, a pear would float in water.
– Symbolic reasoning
• Q: Take the last letters of the words in “Lady Gaga” and concatenate them.
• : The last letter of “Lady” is “y”. The last letter of “Gaga” is “a”.
• : The answer is “ya”.

04/23/2025 Presented by Dr. AKHTAR JAMIL 52


Understanding the Consistency

04/23/2025 Presented by Dr. AKHTAR JAMIL 53


Self-Refine: Iterative Refinement with
Self-Feedback
• Alternates between two generative steps–FEEDBACK and REFINE.
• 3 prompts are needed during the entire process
o initial generation, feedback, refinement

04/23/2025 Presented by Dr. AKHTAR JAMIL 54


Self-Refine - Big picture

04/23/2025 Presented by Dr. AKHTAR JAMIL 55


Thank You 

04/23/2025 Presented by Dr. AKHTAR JAMIL 56

You might also like