InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Machine Learning Projects
-
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
python3.12 -m venv new_venv_312 source new_venv_312/bin/activate pip install --upgrade pip pip install https://github.com/huggingface/transformers/archive/main.zip torchaudio peft soundfile torchcodec ### and also pip install librosa
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
Project mention: The bug that taught me more about PyTorch than years of using it | news.ycombinator.com | 2025-10-26
He's not a core maintainer and hasn't been for years - pytorch's contributors are completely public
https://github.com/pytorch/pytorch/graphs/contributors
-
nn
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
-
Start Simple, Build Confidence Project: Scikit-learn After the intense first experience with BEHAVIOR-1K, I needed something more approachable. I went straight to Scikit-learn's good first issue label and found a task that seemed manageable: changing relative imports to absolute imports in Cython files. From this
-
Keras 3 multi-backend
-
Project mention: Labellerr YOLOv8: Cars and Number Plate Detection — Practical, Step-by-Step | dev.to | 2025-11-05
YOLOv8(by Ultralytics) is one of the most widely used state-of-the-art object detection models. It is known for delivering high accuracy, while still being fast enough for real-time detection.
-
Project mention: Show HN: Real-time privacy protection for smart glasses | news.ycombinator.com | 2025-08-11
Did you look at egoblur? its a lot more effective at face detection than https://github.com/ageitgey/face_recognition granted, you'd have to do your own face matching to do exception.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
Project mention: OpenBB – Investment Research for Everyone, Everywhere | news.ycombinator.com | 2025-03-22
-
Project mention: Show HN: Using YOLO to Detect Office Chairs in 40M Hotel Photos | news.ycombinator.com | 2025-01-25
They did it on their own computer. https://github.com/ultralytics/ultralytics
-
Apache Airflow - Apache's Airflow project is a popular workflow system that supports DAG-based tasks and precise scheduling. It's an extensible Python project that supports several different providers and job executors, including Kubernetes.
-
Project mention: How to Build a RAG Solution with Llama Index, ChromaDB, and Ollama | dev.to | 2025-11-04
With a few lines of Python, you can build a basic retrieval-augmented generation (RAG) solution, but it doesn’t stop here. You can extend this project to search for multiple web pages, load large documents, add a simple web UI using either Streamlit or Anvil, or even experiment with different models in Ollama.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Project mention: The Ultimate Guide to Building Stunning AI Apps For Beginners - Gradio | dev.to | 2025-11-14
Why Gradio is the New Superpower for Every AI Learner in 2025
-
Ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Not currently, but it is being worked on https://github.com/ray-project/ray/issues/53976.
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
Project mention: “One Journey Ends, Another Begins — My Hacktoberfest 2025 Story” | dev.to | 2025-10-31
Just wrapped up my Hacktoberfest project using MindsDB and Streamlit — built a CRM Semantic Search AI app! 😄 If anyone’s into open source + AI, would love feedback on my PR: Hacktoberfest 2025 PR – Add CRM Semantic Search use case (MindsDB)
-
-
Project mention: Show HN: Plug-and-play Python utils for any computer-vision pipeline | news.ycombinator.com | 2025-07-21
-
paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
Borg Backup - I use it to automatically back up my main hosted Docker services. I have publicly hosted instances of Immich, and Paperless-NGX using Docker containers. I periodically make a backup of their data folder using Borg and store it in a Borg repo. The advantage of storing the backups in a Borg repo is that it is a deduplicating archival program. So no matter how many backups you make, it will not take any extra space than the first backup, provided nothing has changed. If there is a change, only that changed chunk is backed up, just like git. Also, you can easily encrypt and/or compress while backing up. Restoring a backup is also as easy as running a single Borg command.
-
qlib
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped with https://github.com/microsoft/RD-Agent to automate R&D process.
After researching different AI models in Qlib (a quantitative finance platform), here's what I learned:
-
Project mention: Strengthening Open-Source Integrity: My First Contribution to spaCy | dev.to | 2025-10-28
🔗 Pull Request: #13877 — Remove spaCy Quickstart from Universe/Courses due to spam redirect
-
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Machine Learning discussion
Python Machine Learning related posts
-
The Ultimate Guide to Building Stunning AI Apps For Beginners - Gradio
-
Deep universal probabilistic programming with Python and PyTorch
-
What is Argo Workflows?
-
TabPFN-2.5 – SOTA foundation model for tabular data
-
Python library for quantum computing, quantum ML, and quantum chemistry
-
Why stop at 1M tokens when you can have 10M?
-
We're open-sourcing the successor of Jupyter notebook
-
A note from our sponsor - InfluxDB
www.influxdata.com | 16 Nov 2025
Index
What are some of the best open-source Machine Learning projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | transformers | 152,508 |
| 2 | Pytorch | 94,956 |
| 3 | nn | 64,273 |
| 4 | scikit-learn | 64,038 |
| 5 | Keras | 63,551 |
| 6 | yolov5 | 56,018 |
| 7 | Face Recognition | 55,756 |
| 8 | faceswap | 54,691 |
| 9 | OpenBB | 54,534 |
| 10 | ultralytics | 48,563 |
| 11 | Airflow | 43,200 |
| 12 | streamlit | 42,140 |
| 13 | DeepSpeed | 40,641 |
| 14 | gradio | 40,497 |
| 15 | Ray | 39,825 |
| 16 | Open-Assistant | 37,492 |
| 17 | MindsDB | 37,211 |
| 18 | gym | 36,649 |
| 19 | supervision | 35,881 |
| 20 | paperless-ngx | 34,208 |
| 21 | qlib | 33,724 |
| 22 | spaCy | 32,785 |
| 23 | pytorch-lightning | 30,432 |