Thanks to visit codestin.com
Credit goes to www.libhunt.com

Python Data Science

Open-source Python projects categorized as Data Science

Top 23 Python Data Science Projects

Data Science
  1. scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Open Source Journey | dev.to | 2025-11-01

    Start Simple, Build Confidence Project: Scikit-learn After the intense first experience with BEHAVIOR-1K, I needed something more approachable. I went straight to Scikit-learn's good first issue label and found a task that seemed manageable: changing relative imports to absolute imports in Cython files. From this

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Keras

    Deep Learning for humans

    Project mention: PyTorch vs TensorFlow 2025: Which one wins after 72 hours? | dev.to | 2025-08-29

    Keras 3 multi-backend

  4. Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    Project mention: Node.js vs Python: Real Benchmarks, Performance Insights, and Scalability Analysis | dev.to | 2025-10-04

    data analytics stacks (Pandas)

  5. Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: What is Argo Workflows? | dev.to | 2025-11-10

    Apache Airflow - Apache's Airflow project is a popular workflow system that supports DAG-based tasks and precise scheduling. It's an extensible Python project that supports several different providers and job executors, including Kubernetes.

  6. streamlit

    Streamlit — A faster way to build and share data apps.

    Project mention: How to Build a RAG Solution with Llama Index, ChromaDB, and Ollama | dev.to | 2025-11-04

    With a few lines of Python, you can build a basic retrieval-augmented generation (RAG) solution, but it doesn’t stop here. You can extend this project to search for multiple web pages, load large documents, add a simple web UI using either Streamlit or Anvil, or even experiment with different models in Ollama.

  7. gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    Project mention: The Ultimate Guide to Building Stunning AI Apps For Beginners - Gradio | dev.to | 2025-11-14

    Why Gradio is the New Superpower for Every AI Learner in 2025

  8. Ray

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    Project mention: PyTorch Monarch | news.ycombinator.com | 2025-10-23

    Not currently, but it is being worked on https://github.com/ray-project/ray/issues/53976.

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Strengthening Open-Source Integrity: My First Contribution to spaCy | dev.to | 2025-10-28

    🔗 Pull Request: #13877 — Remove spaCy Quickstart from Universe/Courses due to spam redirect

  11. pytorch-lightning

    Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

  12. ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    Project mention: Open Source? Open Mind! | dev.to | 2025-09-05

    So.. which Open Source project did you chose? https://github.com/eriklindernoren/ML-From-Scratch Okay, I know I said much things about moving out of comfort zone but I am still bit scared and worried. So, I've decided to start from an area that I am familiar with. First of all, this open source project is called "ML-From-Scratch". It's a learning resource that demystifies machine learning by showing the fundamental code for a wide range of models. Past few months, I have been studying Machin Learning with Data science. This project reveals what is actually going on behind the libraries and algorithms so people can understand the core functionality of machine learning (ML). This choice feels right for my "coder to developer" journey. As I contribute to the project, I will be exposed to deeper knowledge of machine learning.

  13. data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  14. d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  15. dash

    Data Apps & Dashboards for Python. No JavaScript Required.

    Project mention: Other Visualization Tools: Dashboards & Reports | dev.to | 2025-09-14

    Cloud Deployment: Dash apps can be deployed to Dash Enterprise or Heroku.

  16. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
  17. pandas-ai

    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

    Project mention: Pandas AI | news.ycombinator.com | 2025-07-18
  18. matplotlib

    matplotlib: plotting with Python

    Project mention: How to Get Started with Scikit-Learn: A Beginner-Friendly Guide to Machine Learning in Python | dev.to | 2025-04-24

    As is the case with most Python libraries, it is open-source and free-to-use, making it easily accessible by anyone willing to learn machine learning, and it is built upon other open-source libraries within Python, like SciPy for advanced scientific operations, NumPy for efficient numerical computations, Matplotlib for data visualization, and Cython for increased efficiency and speed, similar to that of C/C++.

  19. recommenders

    Best Practices on Recommendation Systems

  20. Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

    Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

    - https://github.com/PrefectHQ/prefect

  21. marimo

    A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.

    Project mention: We're open-sourcing the successor of Jupyter notebook | news.ycombinator.com | 2025-11-04

    The successor to Jupyter notebook is Marimo, https://marimo.io/ because they are pure code, not code in json. First class everywhere.

  22. ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    Project mention: Reloading Classes in Python | news.ycombinator.com | 2025-08-29

    Pickling + unpickling the object is a neat trick to update objects to point to the new methods, but it's even more straightforward to just patch `obj.__class__ = reloaded_module.NewClass`. This is what ipython's autoreload extension used to do, though nowadays it's had some improvements over this approach: https://github.com/ipython/ipython/pull/14500

  23. gensim

    Topic Modelling for Humans

  24. dvc

    🦉 Data Versioning and ML Experiments

    Project mention: Ask HN: What is the simplest data orchestration tool you've worked with? | news.ycombinator.com | 2025-03-21
  25. dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: Fixing Type Hints for Callable Objects with Custom Signatures in Dagster | dev.to | 2025-10-28

    Finding the Issue I was browsing through their GitHub issues, I found Issue #32574: "Callable object custom signatures are resolved incorrectly." Issue-32574 At first glance, I thought "Oh cool, this looks easy." But then I read the details and realized this was actually pretty interesting.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Science discussion

Log in or Post with

Python Data Science related posts

  • TabPFN-2.5 – SOTA foundation model for tabular data

    2 projects | news.ycombinator.com | 6 Nov 2025
  • We're open-sourcing the successor of Jupyter notebook

    4 projects | news.ycombinator.com | 4 Nov 2025
  • Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

    1 project | dev.to | 28 Oct 2025
  • Strengthening Open-Source Integrity: My First Contribution to spaCy

    1 project | dev.to | 28 Oct 2025
  • LLMZ25-2 Review : Construyendo Interfaces LLM con Streamlit

    2 projects | dev.to | 25 Oct 2025
  • LLMZ25-1 Review : Streamlit La Herramienta Perfecta para Interfaces de Proyectos LLM

    1 project | dev.to | 25 Oct 2025
  • Installing FFCV and Fastxtend on Windows with Micromamba and MSVC

    1 project | dev.to | 24 Oct 2025
  • A note from our sponsor - Stream
    getstream.io | 16 Nov 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source Data Science projects in Python? This list will help you:

# Project Stars
1 scikit-learn 63,971
2 Keras 63,551
3 Pandas 47,068
4 Airflow 43,200
5 streamlit 42,140
6 gradio 40,497
7 Ray 39,825
8 spaCy 32,785
9 pytorch-lightning 30,432
10 ML-From-Scratch 28,738
11 data-science-ipython-notebooks 28,539
12 d2l-en 26,601
13 dash 24,233
14 best-of-ml-python 22,786
15 pandas-ai 22,534
16 matplotlib 21,982
17 recommenders 21,100
18 Prefect 20,783
19 marimo 17,232
20 ipython 16,601
21 gensim 16,267
22 dvc 15,089
23 dagster 14,423

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?