This project is a PDF Assistant Bot built using Streamlit, SpaCy, LangChain, and other libraries. The bot allows users to upload PDF files and ask questions based on the content of these files. It uses a retrieval-augmented generation (RAG) approach to provide answers.
- Upload multiple PDF files.
- Ask questions based on the content of the uploaded PDF files.
- Utilizes SpaCy for text embeddings and LangChain for managing the conversational chain.
- Integration with OpenAI's GPT-3.5 for generating responses.
- Easy-to-use Streamlit interface.
-
Clone the repository:
git clone https://github.com/d43ash1sh/PDF_chatbot.git cd PDF_chatbot -
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.envfile and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key
-
Run the Streamlit app:
streamlit run app.py
-
Once the app is running, you can upload your PDF files and start asking questions based on the content of these files.
pdf-assistant-bot/
│
├── .env
├── .gitignore
├── app.py
├── requirements.txt
└── README.mdThis is the main file of the project that contains all the code for the PDF Assistant Bot.
- streamlit: For building the web interface.
- PyPDF2: For reading PDF files.
- langchain: For managing the conversational chain and embeddings.
- dotenv: For loading environment variables.
- spacy: For text embeddings.
- os: For handling file paths and environment variables.
pdf_read(pdf_doc): Reads and extracts text from uploaded PDF files.get_chunks(text): Splits the extracted text into chunks.vector_store(text_chunks): Stores the text chunks in a vector store using FAISS.get_conversational_chain(tools, ques): Manages the conversational chain using LangChain and OpenAI's GPT-3.5.user_input(user_question): Processes the user's input question.main(): The main function that sets up the Streamlit app interface.
Contributions are welcome! If you find any issues or have suggestions for improvement, feel free to open an issue or create a pull request.
- Streamlit
- SpaCy
- OpenAI
- LangChain
- PyPDF2