0% found this document useful (0 votes)

5 views5 pages

RAG Application Using Open Source Tools 1721123882

The document outlines the creation of a RAG (Retrieval-Augmented Generation) application using LangChain and open-source models, detailing the installation of necessary libraries and the process of chunking and storing PDF documents. It introduces the EduBotCreator class, which constructs a chatbot capable of answering questions based on the content of the PDFs using a language model and a vector database. Additionally, it provides examples of how to interact with the chatbot to retrieve information from the loaded documents.

Uploaded by

pvsdteja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

RAG Application Using Open Source Tools 1721123882

Uploaded by

pvsdteja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

x3

June 15, 2024

0.1 RAG Application using LangChain and Open Source Models:

[35]: !pip install -q langchain langchain_community pypdf sentence-transformers␣
↪faiss-gpu ctransformers

[36]: from langchain.embeddings import HuggingFaceEmbeddings

from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

[37]: DATA_DIR_PATH = "/content/"

CHUNK_SIZE = 500
CHUNK_OVERLAP = 200
VECTOR_DB_PATH = "/content/"
EMBEDDER = "thenlper/gte-large" # Huggingface Embedding model

def chunk_and_store():
dir_loader = DirectoryLoader(
DATA_DIR_PATH,
glob='*.pdf',
loader_cls=PyPDFLoader
)

docs = dir_loader.load()
print("PDFs Loaded & Chunking starts...")

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=CHUNK_SIZE,
chunk_overlap=CHUNK_OVERLAP
)

inp_txt = text_splitter.split_documents(docs)
print("Data Chunks Created & Vector storing starts...")

hfembeddings = HuggingFaceEmbeddings(
model_name=EMBEDDER,
model_kwargs={'device': 'cuda'}

1
)

db = FAISS.from_documents(inp_txt, hfembeddings)
db.save_local(VECTOR_DB_PATH)

print("Vector Store Creation Completed")

[38]: from langchain import PromptTemplate

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA

[39]: PROMPT_TEMPLATE = '''

With the information provided try to answer the question.
You are an expert in the field. Use the following context to answer the␣
↪question as accurately as possible.

If the context does not contain enough information to answer the question,␣
↪please state that explicitly.

Context: {context}

Question: {question}

Answer:
'''

INP_VARS = ['context', 'question']

CHAIN_TYPE = "stuff"
SEARCH_KWARGS = {'k': 1}
MODEL_CKPT = "TheBloke/Llama-2-7B-Chat-GGML"
MODEL_TYPE = "llama"
MAX_NEW_TOKENS = 512
TEMPERATURE = 0.9

[40]: class EduBotCreator:

def __init__(self):
self.prompt_temp = PROMPT_TEMPLATE
self.input_variables = INP_VARS
self.chain_type = CHAIN_TYPE
self.search_kwargs = SEARCH_KWARGS
self.embedder = EMBEDDER
self.vector_db_path = VECTOR_DB_PATH
self.model_ckpt = MODEL_CKPT
self.model_type = MODEL_TYPE
self.max_new_tokens = MAX_NEW_TOKENS
self.temperature = TEMPERATURE

2
def create_prompt(self):
custom_prompt_temp = PromptTemplate(template=self.prompt_temp,
input_variables=self.input_variables)
return custom_prompt_temp

def load_llm(self):
llm = CTransformers(model = self.model_ckpt,
model_type=self.model_type,
max_new_tokens = self.max_new_tokens,
temperature = self.temperature
)
return llm

def load_vectordb(self):
hfembeddings = HuggingFaceEmbeddings(model_name=self.embedder,
model_kwargs={'device': 'cuda'}
)
vector_db = FAISS.load_local(self.vector_db_path, hfembeddings,␣
↪allow_dangerous_deserialization=True)

return vector_db

def create_bot(self, custom_prompt, vectordb, llm):

retrieval_qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type=self.chain_type,
retriever=vectordb.
↪as_retriever(search_kwargs=self.search_kwargs),

return_source_documents=True,
chain_type_kwargs={"prompt": custom_prompt}
)
return retrieval_qa_chain

def create_chatbot(self):
self.custom_prompt = self.create_prompt()
self.vector_db = self.load_vectordb()
self.llm = self.load_llm()
self.bot = self.create_bot(self.custom_prompt, self.vector_db, self.llm)
return self.bot

[41]: chunk_and_store()

PDFs Loaded & Chunking starts…

Data Chunks Created & Vector storing starts…
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132:
FutureWarning: `resume_download` is deprecated and will be removed in version
1.0.0. Downloads always resume when possible. If you want to force a new

3
download, use `force_download=True`.
warnings.warn(
Vector Store Creation Completed

[42]: edubot_creator = EduBotCreator()

edubot = edubot_creator.create_chatbot()

Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]

[43]: # Function to ask question from the bot

def ask_question(bot, question):
query = {"query": question}
result = bot(query)
return result["result"]

# Example usage
question = "What is the main topic of the first PDF?"
answer = ask_question(edubot, question)
print(f"Answer: {answer}")

Answer: The main topic of the first PDF is the improvement of word embeddings
using large unlabeled data sets.

[44]: question = "What is Part-Of-Speech Tagging?" # I have uploaded a file called␣

↪'Natural_Language_Processing_Almost_from_Scratch.pdf"

answer = ask_question(edubot, question)

print(f"Answer: {answer}")

Answer: Part-Of-Speech (POS) tagging is the task of assigning a word or phrase a

label that indicates its part of speech. This label can be a single word, such
as "noun", "verb", "adjective", etc., or it can be a more detailed label that
indicates the specific class of words that the word belongs to, such as "noun",
"countable noun", "uncountable noun", "verb", "past tense", "present tense",
etc. The goal of POS tagging is to accurately identify the part of speech for
each word in a sentence or text, which can be useful in various applications
such as language modeling, natural language processing, and information
retrieval.

[44]:

Introduction
No ratings yet
Introduction
17 pages
Rag Project
No ratings yet
Rag Project
13 pages
Claude Comparet DB
No ratings yet
Claude Comparet DB
8 pages
Langchain App Design
No ratings yet
Langchain App Design
7 pages
Build An LLM From Scratch
No ratings yet
Build An LLM From Scratch
19 pages
QA Using Gemini Langchain ChromaDB PDF
No ratings yet
QA Using Gemini Langchain ChromaDB PDF
2 pages
RAG With Reinforcement Learning
No ratings yet
RAG With Reinforcement Learning
40 pages
JHH 24 HR 2 Nvarlunhuuye
No ratings yet
JHH 24 HR 2 Nvarlunhuuye
2 pages
MultiModel RAG
No ratings yet
MultiModel RAG
18 pages
22BCE9752 NLPDigital Assignment 02
No ratings yet
22BCE9752 NLPDigital Assignment 02
21 pages
Gen AI Lab
No ratings yet
Gen AI Lab
22 pages
Build Personalized Bots with RAG
No ratings yet
Build Personalized Bots with RAG
32 pages
Natural Language Processing Lab 9
No ratings yet
Natural Language Processing Lab 9
13 pages
Huggingface Basics
No ratings yet
Huggingface Basics
28 pages
Case Study
No ratings yet
Case Study
25 pages
Setting Up A Local AI Q&A Server For Class 11 - 12 and JEE PDFs On Windows 10
No ratings yet
Setting Up A Local AI Q&A Server For Class 11 - 12 and JEE PDFs On Windows 10
6 pages
Notes - by Kishor
No ratings yet
Notes - by Kishor
11 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
Hugging Face
No ratings yet
Hugging Face
9 pages
IndicTrans2 PDF to Punjabi Docx Conversion
No ratings yet
IndicTrans2 PDF to Punjabi Docx Conversion
5 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
6 pages
Ali Ahmad and Rameez - Project - Proposal
No ratings yet
Ali Ahmad and Rameez - Project - Proposal
5 pages
Genai-Capstone 1
No ratings yet
Genai-Capstone 1
2 pages
DL - 20-WordEmbeddings - Ipynb - Colab
No ratings yet
DL - 20-WordEmbeddings - Ipynb - Colab
6 pages
Chap 7.1 Sequence Analysis Using FFN
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
47 pages
Gen Ai-1
No ratings yet
Gen Ai-1
6 pages
Open Source RAG Made Easy by Dell Enterprise Hub
No ratings yet
Open Source RAG Made Easy by Dell Enterprise Hub
9 pages
Pgi20s02j - Lab Record
No ratings yet
Pgi20s02j - Lab Record
24 pages
cs336 Spring2025 Assignment1 Basics
No ratings yet
cs336 Spring2025 Assignment1 Basics
50 pages
Chatbot Code
No ratings yet
Chatbot Code
2 pages
02 Data Connections
No ratings yet
02 Data Connections
32 pages
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
No ratings yet
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
3 pages
LangChain LLM Programming Guide
No ratings yet
LangChain LLM Programming Guide
39 pages
Finally Final
No ratings yet
Finally Final
18 pages
Rag-From-Scratch:rag - From - Scratch - 1 - To - 4.ipynb at Main Langchain-Ai:rag-From-Scratch
No ratings yet
Rag-From-Scratch:rag - From - Scratch - 1 - To - 4.ipynb at Main Langchain-Ai:rag-From-Scratch
8 pages
Taask
No ratings yet
Taask
18 pages
Day 2 AI-ML
No ratings yet
Day 2 AI-ML
4 pages
LLAMA 2.0 CPU Setup for In-Context Learning
No ratings yet
LLAMA 2.0 CPU Setup for In-Context Learning
20 pages
Lecture 31-Document GPT Hands On
No ratings yet
Lecture 31-Document GPT Hands On
18 pages
Building A Complex, Production-Ready RAG System With LangChain, LangGraph, and RAGAS
No ratings yet
Building A Complex, Production-Ready RAG System With LangChain, LangGraph, and RAGAS
75 pages
AIlab 10
No ratings yet
AIlab 10
3 pages
GenAI Final Project
No ratings yet
GenAI Final Project
8 pages
Format wpr-3
No ratings yet
Format wpr-3
6 pages
Guide To Signlanguage Detection
No ratings yet
Guide To Signlanguage Detection
2 pages
nlp2 3
No ratings yet
nlp2 3
2 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Introducing Transformers Agents 20
No ratings yet
Introducing Transformers Agents 20
8 pages
1Z0-1127-24 OCI Generative AI Professional
100% (1)
1Z0-1127-24 OCI Generative AI Professional
15 pages
Self RAG
No ratings yet
Self RAG
12 pages
Guide Ipynb
No ratings yet
Guide Ipynb
26 pages
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
A-Z of RAG Question Answering Methods in Langchain
No ratings yet
A-Z of RAG Question Answering Methods in Langchain
33 pages
DL Pro 456
No ratings yet
DL Pro 456
8 pages
Take-Home Challenge
No ratings yet
Take-Home Challenge
3 pages
Chatbot Documentation Task
No ratings yet
Chatbot Documentation Task
5 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Unit 5 DL
No ratings yet
Unit 5 DL
26 pages
Train 400x Faster Static Embedding Models With Sentence Transformers
No ratings yet
Train 400x Faster Static Embedding Models With Sentence Transformers
47 pages
Basic English Grammar Exercises
No ratings yet
Basic English Grammar Exercises
3 pages
CEFR AVSpeak 2023
No ratings yet
CEFR AVSpeak 2023
4 pages
(123dok - Com) Kata Kerja Bantu Neng Dan Hui Dalam Bahasa Mandarin
No ratings yet
(123dok - Com) Kata Kerja Bantu Neng Dan Hui Dalam Bahasa Mandarin
94 pages
Intermediate English Grammar Class
No ratings yet
Intermediate English Grammar Class
22 pages
Moshi Moshi Japanese Phone Etiquette Coto Academy
No ratings yet
Moshi Moshi Japanese Phone Etiquette Coto Academy
4 pages
Figures of Speech and Sound Devices
No ratings yet
Figures of Speech and Sound Devices
1 page
Daily Lesson Plan: Teacher'S Name: Michelle Law Pei Ling
No ratings yet
Daily Lesson Plan: Teacher'S Name: Michelle Law Pei Ling
1 page
Class 7 2nd Class Test Topics
No ratings yet
Class 7 2nd Class Test Topics
2 pages
Week 06 - Task - Assignment - My Life Achievements
0% (1)
Week 06 - Task - Assignment - My Life Achievements
6 pages
I. Zulu A. Consider The Following Data From Zulu
No ratings yet
I. Zulu A. Consider The Following Data From Zulu
2 pages
Arabic Book Translations
No ratings yet
Arabic Book Translations
12 pages
Introduction To English Morphology
No ratings yet
Introduction To English Morphology
21 pages
Logical Connectives for Test Prep
No ratings yet
Logical Connectives for Test Prep
17 pages
C21 L1+2 Academic Skills Factsheet Check and Confirm Understanding
No ratings yet
C21 L1+2 Academic Skills Factsheet Check and Confirm Understanding
3 pages
Europass-CV CCDSTEDE B13 EN FR
No ratings yet
Europass-CV CCDSTEDE B13 EN FR
5 pages
Screenshot 2024-01-05 at 2.51.43 PM
No ratings yet
Screenshot 2024-01-05 at 2.51.43 PM
4 pages
Parts of Speech and Nominal and Verbal Sentences
No ratings yet
Parts of Speech and Nominal and Verbal Sentences
10 pages
Er Verbs Worksheet1 French.219849722
100% (1)
Er Verbs Worksheet1 French.219849722
2 pages
SSC CGL 40day Plan
No ratings yet
SSC CGL 40day Plan
4 pages
Daily Lesson Plan Ts25
No ratings yet
Daily Lesson Plan Ts25
22 pages
Gold Exp A2 U1to3 Review Lang Test A
No ratings yet
Gold Exp A2 U1to3 Review Lang Test A
2 pages
English Week 4 Lesson 1 Proverbs and Idioms
No ratings yet
English Week 4 Lesson 1 Proverbs and Idioms
12 pages
33880-Article Text-41762-1-10-20200604
No ratings yet
33880-Article Text-41762-1-10-20200604
11 pages
Future Tense Exercises
No ratings yet
Future Tense Exercises
2 pages
Important Points
No ratings yet
Important Points
4 pages
What A Cat Was Thinking About. Larisa Golubeva
No ratings yet
What A Cat Was Thinking About. Larisa Golubeva
41 pages
WIDA Performance Definitions: Produce or Use
No ratings yet
WIDA Performance Definitions: Produce or Use
1 page
First Meetings: On Business or On Holiday?
100% (1)
First Meetings: On Business or On Holiday?
6 pages
A1 English Listening Practice - Language Learning
No ratings yet
A1 English Listening Practice - Language Learning
1 page
2.1 Building Communications Proficiency
80% (5)
2.1 Building Communications Proficiency
102 pages

RAG Application Using Open Source Tools 1721123882

Uploaded by

RAG Application Using Open Source Tools 1721123882

Uploaded by

x3

June 15, 2024

0.1 RAG Application using LangChain and Open Source Models:

[36]: from langchain.embeddings import HuggingFaceEmbeddings

[37]: DATA_DIR_PATH = "/content/"

print("Vector Store Creation Completed")

[38]: from langchain import PromptTemplate

[39]: PROMPT_TEMPLATE = '''

INP_VARS = ['context', 'question']

[40]: class EduBotCreator:

def create_bot(self, custom_prompt, vectordb, llm):

PDFs Loaded & Chunking starts…

[42]: edubot_creator = EduBotCreator()

Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]

[43]: # Function to ask question from the bot

[44]: question = "What is Part-Of-Speech Tagging?" # I have uploaded a file called␣

answer = ask_question(edubot, question)

Answer: Part-Of-Speech (POS) tagging is the task of assigning a word or phrase a

You might also like