Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →
Top 23 Python GPU Projects
-
Project mention: The bug that taught me more about PyTorch than years of using it | news.ycombinator.com | 2025-10-26
He's not a core maintainer and hasn't been for years - pytorch's contributors are completely public
https://github.com/pytorch/pytorch/graphs/contributors
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Project mention: ATC/OSDI '25 Joint Keynote: Accelerating Software Dev: The LLM (R)Evolution [video] | news.ycombinator.com | 2025-09-08- https://github.com/plasma-umass/scalene
Coz: A causal profiler that tells you where to optimize your code (C/C++/Rust/Swift/Java)
-
-
The plethora of packages, including DSLs for compute and MLIR.
https://developer.nvidia.com/how-to-cuda-python
https://cupy.dev/
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
Project mention: Gluon: a GPU programming language based on the same compiler stack as Triton | news.ycombinator.com | 2025-09-17Also it REALLY jams me up that this is a thing, complicating discussions: https://github.com/triton-inference-server/server
-
skypilot
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
Project mention: Cloud Run GPUs, now GA, makes running AI workloads easier for everyone | news.ycombinator.com | 2025-06-04To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,
$ sky launch --gpus H100
will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or .
Essentially the way you deal with it is to increase the infra search space.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
ImageAI
A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities
-
BigDL
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
Project mention: FlashMoE: DeepSeek-R1 671B and Qwen3MoE 235B with 1~2 Intel B580 GPU in IPEX-LLM | news.ycombinator.com | 2025-05-12 -
-
nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Project mention: Show HN: Sping – A HTTP/TCP Latency Tool That's Easy on the Eye | news.ycombinator.com | 2025-08-24I've frequently found myself using [nvitop](https://github.com/XuehaiPan/nvitop) to diagnose GPU/CPU contention issues.
The two best things about it are:
-
-
-
-
-
-
Project mention: Google AI Edge – on-device cross-platform AI deployment | news.ycombinator.com | 2025-06-01
Genuine question, why should I use this to deploy models on the edge instead of executorch? https://github.com/pytorch/executorch
For context, I get to choose the tech stack for a greenfield project. I think that executor h, which belongs to the pytorch ecosystem, will have a way more predictable future than anything Google does, so I currently consider executorch more.
-
jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
-
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
-
-
jetson_stats
📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
-
Project mention: Advancements in Embedding-Based Retrieval at Pinterest Homefeed | news.ycombinator.com | 2025-02-14
Nice, there are a ton of threads here to check out. For example I had not heard of
https://pytorch.org/torchrec/
Which seems to nicely package a lot of primitives I have worked with previously.
-
pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python GPU discussion
Python GPU related posts
-
The bug that taught me more about PyTorch than years of using it
-
PyTorch 2.9 released with C ABI and better multi-GPU support
-
The 64 KB Challenge: Teaching a Tiny Net to Play Pong
-
Show HN: I Built Claude Code for CUDA in 18 Hours (Open Source)
-
Docker Was Too Slow, So We Replaced It: Nix in Production [video]
-
Wasted Open Source efforts 😮
-
Speeding up PyTorch inference by 87% on Apple with AI-generated Metal kernels
-
A note from our sponsor - Stream
getstream.io | 16 Nov 2025
Index
What are some of the best open-source GPU projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Pytorch | 94,956 |
| 2 | DeepSpeed | 40,641 |
| 3 | scalene | 13,086 |
| 4 | tvm | 12,809 |
| 5 | cupy | 10,608 |
| 6 | server | 10,005 |
| 7 | skypilot | 8,955 |
| 8 | ImageAI | 8,841 |
| 9 | BigDL | 8,445 |
| 10 | AlphaPose | 8,444 |
| 11 | nvitop | 6,271 |
| 12 | chainer | 5,908 |
| 13 | tf-quant-finance | 5,048 |
| 14 | pytorch-forecasting | 4,653 |
| 15 | gpustat | 4,286 |
| 16 | asitop | 4,245 |
| 17 | executorch | 3,490 |
| 18 | jittor | 3,212 |
| 19 | TransformerEngine | 2,912 |
| 20 | leptonai | 2,797 |
| 21 | jetson_stats | 2,416 |
| 22 | torchrec | 2,390 |
| 23 | pygraphistry | 2,361 |