The LangChain Documentation Helper is a sophisticated AI-powered web application that serves as a slim version of chat.langchain.com. This intelligent documentation assistant provides accurate answers to questions about LangChain documentation using advanced Retrieval-Augmented Generation (RAG) techniques, enhanced with web crawling capabilities and conversational memory.
RAG Pipeline Flow:
- π Web Crawling: Real-time web scraping and content extraction using Tavily's advanced crawling capabilities
- π Document Processing: Intelligent chunking and preprocessing of LangChain documentation
- π Vector Storage: Advanced embedding and indexing using Pinecone for fast similarity search
- π― Intelligent Retrieval: Context-aware document retrieval based on user queries
- π§© Memory System: Conversational memory for coreference resolution and context continuity
- π§ Context-Aware Generation: Provides accurate, contextual answers with source citations
- π¬ Interactive Interface: User-friendly chat interface powered by Streamlit
- π Real-time Processing: Fast end-to-end pipeline from query to response
| Component | Technology | Description |
|---|---|---|
| π₯οΈ Frontend | Streamlit | Interactive web interface |
| π§ AI Framework | LangChain π¦π | Orchestrates the AI pipeline |
| π Vector Database | Pinecone π² | Stores and retrieves document embeddings |
| π Web Crawling | Tavily | Intelligent web scraping and content extraction |
| π§© Memory | Conversational Memory | Coreference resolution and context continuity |
| π€ LLM | OpenAI GPT | Powers the conversational AI |
| π Backend | Python | Core application logic |
- Python 3.8 or higher
- OpenAI API key
- Pinecone API key
- Tavily API key (required - for documentation crawling and web search)
-
Clone the repository
git clone https://github.com/emarco177/documentation-helper.git cd documentation-helper -
Set up environment variables
Create a
.envfile in the root directory:PINECONE_API_KEY=your_pinecone_api_key_here OPENAI_API_KEY=your_openai_api_key_here TAVILY_API_KEY=your_tavily_api_key_here # Required - for documentation crawling
-
Install dependencies
pipenv install
-
Ingest LangChain Documentation (Run the ingestion pipeline)
python ingestion.py # Uses Tavily to crawl and index documentation -
Run the application
streamlit run main.py
-
Open your browser and navigate to
http://localhost:8501
Run the test suite to ensure everything is working correctly:
pipenv run pytest .documentation-helper/
βββ backend/ # Core backend logic
β βββ __init__.py
β βββ core.py
βββ static/ # Static assets (images, logos)
β βββ banner.gif
β βββ LangChain Logo.png
β βββ Tavily Logo.png
β βββ Tavily Logo Trimmed Padded.png
β βββ Trimmed Padded Langchain.png
βββ chroma_db/ # Local vector database
βββ main.py # Streamlit application entry point
βββ ingestion.py # Document ingestion pipeline
βββ consts.py # Configuration constants
βββ logger.py # Logging utilities
βββ Tavily Demo Tutorial.ipynb # π Tutorial: Introduction to Tavily API
βββ Tavily Crawl Demo Tutorial.ipynb # π Tutorial: Advanced Tavily crawling techniques
βββ requirements files # Pipfile, Pipfile.lock
The project includes comprehensive Jupyter notebooks that serve as hands-on tutorials:
Tavily Demo Tutorial.ipynb: Introduction to Tavily API basics and core functionalityTavily Crawl Demo Tutorial.ipynb: Advanced tutorial covering Tavily's crawling capabilities, including TavilyMap and TavilyExtract features
These tutorials provide step-by-step guidance on integrating Tavily's powerful web search and crawling capabilities into your AI applications.
| Variable | Description | Required |
|---|---|---|
PINECONE_API_KEY |
Your Pinecone API key for vector storage | β |
OPENAI_API_KEY |
Your OpenAI API key for LLM access | β |
TAVILY_API_KEY |
Your Tavily API key for documentation crawling and web search | β |
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is designed as a learning tool for understanding:
- π¦ LangChain framework implementation
- π Vector search and embeddings
- π¬ Conversational AI development
- ποΈ RAG (Retrieval-Augmented Generation) architecture
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project helpful, please consider:
- β Starring the repository
- π Reporting issues
- π‘ Contributing improvements
- π’ Sharing with others