InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python retrieval-augmented-generation Projects
-
ragflow
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26 -
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
storm
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Project mention: Code Explanation: "STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking" | dev.to | 2025-03-08Note: this explanation only covers the knowledge_storm in the storm repo because it aligns with my interests.
-
LightRAG examples: https://github.com/HKUDS/LightRAG/tree/main/examples
-
Project mention: How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models | dev.to | 2025-05-14
Notebook for example 3: prompts and models
-
txtai
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26GitHub: https://github.com/neuml/txtai
-
Project mention: BGE-Reasoner: An open-source framework for reasoning-intensive retrieval | news.ycombinator.com | 2025-08-27
-
memvid
Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
Project mention: Friday Links #30 — JavaScript Updates, Tools, and Inspiration | dev.to | 2025-10-17memvid - Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26
GitHub: https://github.com/HKUDS/RAG-Anything
-
Project mention: Show HN: Agent S: an open agentic framework that uses computers | news.ycombinator.com | 2025-05-01
-
R2R
SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.
Project mention: The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog | dev.to | 2025-10-26Citations: Community references, https://github.com/SciPhi-AI/R2R
-
-
AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
-
cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Project mention: Lists of open-source frameworks for building RAG applications | dev.to | 2025-01-02Ideal For: Enterprises seeking a robust framework for large-scale AI applications. GitHub Repository
-
Project mention: Using Claude Code to modernize a forgotten Linux kernel driver | news.ycombinator.com | 2025-09-07
> using these tools as a massive force multiplier…
Even before tools like CC it was the case that LLMs enabled venturing into projects/areas that would be intimidating otherwise. But Claude-Code (and codex-cli as of late) has made this massively more true.
For example I recently used CC to do a significant upgrade of the Langroid LLM-Agent framework from Pydantic V1 to V2, something I would not have dared to attempt before CC:
https://github.com/langroid/langroid/releases/tag/0.59.0
I also created nice collapsible html logs [2] for agent interactions and tool-calls, inspired by @badlogic/Zechner’s Claude-trace [3] (which incidentally is a fantastic tool!).
[2] https://github.com/langroid/langroid/releases/tag/0.57.0
[3] https://github.com/badlogic/lemmy/tree/main/apps/claude-trac...
And added a DSL to specify agentic task termination conditions based on event-sequence patterns:
https://langroid.github.io/langroid/notes/task-termination/
Needless to say, the docs are also made with significant CC assistance.
-
LEANN
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
Project mention: First lightweight local semantic search MCP for Claude Code | news.ycombinator.com | 2025-08-15@Berkeley SkyLab, we’re the first to bring semantic search to Claude Code with a fully local index in a novel, lightweight structure — check it out at LEANN(https://github.com/yichuan-w/LEANN).
-
swirl-search
Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.
Project mention: How These Free Open Source Projects Can Jumpstart Your Career (No Experience? No Problem!) | dev.to | 2024-12-13Give SWIRL a try: https://github.com/swirlai/swirl-search
-
MemOS
Build memory-native AI agents with Memory OS — an open-source framework for long-term memory, retrieval, and adaptive learning in large language models. Agent Memory | Memory System | Memory Management | Memory MCP | MCP System | LLM Memory | Agents Memory System | (by MemTensor)
Project mention: MemOS: Treating "memory" as a first-class resource for LLMs | news.ycombinator.com | 2025-08-18 -
-
colpali
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Project mention: Integrating Vision-Language Models into Agentic RAG Systems with ColPali | dev.to | 2025-03-31If you want to learn more about ColPali, you can refer to the official documentation and also I would recommend you to read the 9 part blog series on RAG on DailyDoseofDS by Avi Chawla and Akshay Pachaar.
-
raptor
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
3.2. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval (Stanford Univ, 2024)
-
raglite
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
Project mention: Show HN: RAGLite – A Python package for the unhobbling of RAG | news.ycombinator.com | 2024-12-19 -
-
AnglE
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard (by SeanLee97)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python retrieval-augmented-generation discussion
Python retrieval-augmented-generation related posts
-
🍥 Hands-on Experience with LightRAG
-
Wikipedia as a Graph
-
6 Weeks of Claude Code
-
How I Learned Generative AI in Two Weeks (and You Can Too): Part 3 - Prompts & Models
-
Show HN: Toller – A Python library for robust async calls
-
Graph RAG의 모든 것
-
Integrating Vision-Language Models into Agentic RAG Systems with ColPali
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 Nov 2025
Index
What are some of the best open-source retrieval-augmented-generation projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | ragflow | 67,441 |
| 2 | storm | 27,602 |
| 3 | LightRAG | 22,597 |
| 4 | llmware | 14,448 |
| 5 | txtai | 11,800 |
| 6 | FlagEmbedding | 10,831 |
| 7 | memvid | 10,372 |
| 8 | RAG-Anything | 10,061 |
| 9 | Agent-S | 8,126 |
| 10 | R2R | 7,430 |
| 11 | TaskingAI | 5,346 |
| 12 | AutoRAG | 4,399 |
| 13 | cognita | 4,277 |
| 14 | langroid | 3,759 |
| 15 | LEANN | 4,367 |
| 16 | swirl-search | 2,922 |
| 17 | MemOS | 2,986 |
| 18 | fastembed | 2,488 |
| 19 | colpali | 2,307 |
| 20 | raptor | 1,374 |
| 21 | raglite | 1,102 |
| 22 | rag-demystified | 854 |
| 23 | AnglE | 559 |