InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python GPU Projects
-
Project mention: The bug that taught me more about PyTorch than years of using it | news.ycombinator.com | 2025-10-26
He's not a core maintainer and hasn't been for years - pytorch's contributors are completely public
https://github.com/pytorch/pytorch/graphs/contributors
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Project mention: ATC/OSDI '25 Joint Keynote: Accelerating Software Dev: The LLM (R)Evolution [video] | news.ycombinator.com | 2025-09-08- https://github.com/plasma-umass/scalene
Coz: A causal profiler that tells you where to optimize your code (C/C++/Rust/Swift/Java)
-
-
The plethora of packages, including DSLs for compute and MLIR.
https://developer.nvidia.com/how-to-cuda-python
https://cupy.dev/
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
Project mention: Gluon: a GPU programming language based on the same compiler stack as Triton | news.ycombinator.com | 2025-09-17Also it REALLY jams me up that this is a thing, complicating discussions: https://github.com/triton-inference-server/server
-
skypilot
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
Project mention: Cloud Run GPUs, now GA, makes running AI workloads easier for everyone | news.ycombinator.com | 2025-06-04To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,
$ sky launch --gpus H100
will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or .
Essentially the way you deal with it is to increase the infra search space.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
ImageAI
A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities
-
BigDL
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
Project mention: FlashMoE: DeepSeek-R1 671B and Qwen3MoE 235B with 1~2 Intel B580 GPU in IPEX-LLM | news.ycombinator.com | 2025-05-12 -
-
nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Project mention: Show HN: Sping – A HTTP/TCP Latency Tool That's Easy on the Eye | news.ycombinator.com | 2025-08-24I've frequently found myself using [nvitop](https://github.com/XuehaiPan/nvitop) to diagnose GPU/CPU contention issues.
The two best things about it are:
-
-
-
-
-
-
Project mention: Google AI Edge – on-device cross-platform AI deployment | news.ycombinator.com | 2025-06-01
Genuine question, why should I use this to deploy models on the edge instead of executorch? https://github.com/pytorch/executorch
For context, I get to choose the tech stack for a greenfield project. I think that executor h, which belongs to the pytorch ecosystem, will have a way more predictable future than anything Google does, so I currently consider executorch more.
-
jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
-
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
-
-
jetson_stats
📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
-
Project mention: Advancements in Embedding-Based Retrieval at Pinterest Homefeed | news.ycombinator.com | 2025-02-14
Nice, there are a ton of threads here to check out. For example I had not heard of
https://pytorch.org/torchrec/
Which seems to nicely package a lot of primitives I have worked with previously.
-
pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python GPU discussion
Python GPU related posts
-
The bug that taught me more about PyTorch than years of using it
-
PyTorch 2.9 released with C ABI and better multi-GPU support
-
The 64 KB Challenge: Teaching a Tiny Net to Play Pong
-
Show HN: I Built Claude Code for CUDA in 18 Hours (Open Source)
-
Docker Was Too Slow, So We Replaced It: Nix in Production [video]
-
Wasted Open Source efforts 😮
-
Speeding up PyTorch inference by 87% on Apple with AI-generated Metal kernels
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 Nov 2025
Index
What are some of the best open-source GPU projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Pytorch | 94,956 |
| 2 | DeepSpeed | 40,641 |
| 3 | scalene | 13,086 |
| 4 | tvm | 12,809 |
| 5 | cupy | 10,608 |
| 6 | server | 10,005 |
| 7 | skypilot | 8,955 |
| 8 | ImageAI | 8,841 |
| 9 | BigDL | 8,445 |
| 10 | AlphaPose | 8,444 |
| 11 | nvitop | 6,271 |
| 12 | chainer | 5,908 |
| 13 | tf-quant-finance | 5,048 |
| 14 | pytorch-forecasting | 4,653 |
| 15 | gpustat | 4,286 |
| 16 | asitop | 4,245 |
| 17 | executorch | 3,490 |
| 18 | jittor | 3,212 |
| 19 | TransformerEngine | 2,912 |
| 20 | leptonai | 2,797 |
| 21 | jetson_stats | 2,416 |
| 22 | torchrec | 2,390 |
| 23 | pygraphistry | 2,361 |