Thanks to visit codestin.com
Credit goes to github.com

Skip to content

emarco177/documentation-helper

Repository files navigation

🦜 LangChain Documentation Helper

An intelligent documentation assistant powered by LangChain and vector search

LangChain Logo Tavily Logo


Python LangChain Streamlit Pinecone Tavily License

udemy

🎯 Overview

The LangChain Documentation Helper is a sophisticated AI-powered web application that serves as a slim version of chat.langchain.com. This intelligent documentation assistant provides accurate answers to questions about LangChain documentation using advanced Retrieval-Augmented Generation (RAG) techniques, enhanced with web crawling capabilities and conversational memory.

✨ Key Features

RAG Pipeline Flow:

  1. 🌐 Web Crawling: Real-time web scraping and content extraction using Tavily's advanced crawling capabilities
  2. πŸ“š Document Processing: Intelligent chunking and preprocessing of LangChain documentation
  3. πŸ” Vector Storage: Advanced embedding and indexing using Pinecone for fast similarity search
  4. 🎯 Intelligent Retrieval: Context-aware document retrieval based on user queries
  5. 🧩 Memory System: Conversational memory for coreference resolution and context continuity
  6. 🧠 Context-Aware Generation: Provides accurate, contextual answers with source citations
  7. πŸ’¬ Interactive Interface: User-friendly chat interface powered by Streamlit
  8. πŸš€ Real-time Processing: Fast end-to-end pipeline from query to response

🎬 Demo

Documentation Helper Demo

Interactive demo showing the LangChain Documentation Helper in action

πŸ› οΈ Tech Stack

Component Technology Description
πŸ–₯️ Frontend Streamlit Interactive web interface
🧠 AI Framework LangChain πŸ¦œπŸ”— Orchestrates the AI pipeline
πŸ” Vector Database Pinecone 🌲 Stores and retrieves document embeddings
🌐 Web Crawling Tavily Intelligent web scraping and content extraction
🧩 Memory Conversational Memory Coreference resolution and context continuity
πŸ€– LLM OpenAI GPT Powers the conversational AI
🐍 Backend Python Core application logic

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key
  • Pinecone API key
  • Tavily API key (required - for documentation crawling and web search)

Installation

  1. Clone the repository

    git clone https://github.com/emarco177/documentation-helper.git
    cd documentation-helper
  2. Set up environment variables

    Create a .env file in the root directory:

    PINECONE_API_KEY=your_pinecone_api_key_here
    OPENAI_API_KEY=your_openai_api_key_here
    TAVILY_API_KEY=your_tavily_api_key_here  # Required - for documentation crawling
  3. Install dependencies

    pipenv install
  4. Ingest LangChain Documentation (Run the ingestion pipeline)

    python ingestion.py  # Uses Tavily to crawl and index documentation
  5. Run the application

    streamlit run main.py
  6. Open your browser and navigate to http://localhost:8501

πŸ§ͺ Testing

Run the test suite to ensure everything is working correctly:

pipenv run pytest .

πŸ“ Project Structure

documentation-helper/
β”œβ”€β”€ backend/                          # Core backend logic
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── core.py
β”œβ”€β”€ static/                           # Static assets (images, logos)
β”‚   β”œβ”€β”€ banner.gif
β”‚   β”œβ”€β”€ LangChain Logo.png
β”‚   β”œβ”€β”€ Tavily Logo.png
β”‚   β”œβ”€β”€ Tavily Logo Trimmed Padded.png
β”‚   └── Trimmed Padded Langchain.png
β”œβ”€β”€ chroma_db/                        # Local vector database
β”œβ”€β”€ main.py                           # Streamlit application entry point
β”œβ”€β”€ ingestion.py                      # Document ingestion pipeline
β”œβ”€β”€ consts.py                         # Configuration constants
β”œβ”€β”€ logger.py                         # Logging utilities
β”œβ”€β”€ Tavily Demo Tutorial.ipynb        # πŸ“š Tutorial: Introduction to Tavily API
β”œβ”€β”€ Tavily Crawl Demo Tutorial.ipynb  # πŸ“š Tutorial: Advanced Tavily crawling techniques
└── requirements files                # Pipfile, Pipfile.lock

πŸ“š Tutorial Notebooks

The project includes comprehensive Jupyter notebooks that serve as hands-on tutorials:

  • Tavily Demo Tutorial.ipynb: Introduction to Tavily API basics and core functionality
  • Tavily Crawl Demo Tutorial.ipynb: Advanced tutorial covering Tavily's crawling capabilities, including TavilyMap and TavilyExtract features

These tutorials provide step-by-step guidance on integrating Tavily's powerful web search and crawling capabilities into your AI applications.

πŸ”§ Configuration

Environment Variables

Variable Description Required
PINECONE_API_KEY Your Pinecone API key for vector storage βœ…
OPENAI_API_KEY Your OpenAI API key for LLM access βœ…
TAVILY_API_KEY Your Tavily API key for documentation crawling and web search βœ…

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

πŸ“š Learning Resources

This project is designed as a learning tool for understanding:

  • 🦜 LangChain framework implementation
  • πŸ” Vector search and embeddings
  • πŸ’¬ Conversational AI development
  • πŸ—οΈ RAG (Retrieval-Augmented Generation) architecture

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Support

If you find this project helpful, please consider:

  • ⭐ Starring the repository
  • πŸ› Reporting issues
  • πŸ’‘ Contributing improvements
  • πŸ“’ Sharing with others

πŸ”— Connect with Me

Portfolio LinkedIn Twitter

Built with ❀️ by Eden Marco

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published