Cross-platform installer for Triton and SageAttention on ComfyUI. Simplifies GPU-accelerated inference setup for Windows users with automated dependency management and RTX 5090 support.
-
Updated
Jun 17, 2026 - Python
Cross-platform installer for Triton and SageAttention on ComfyUI. Simplifies GPU-accelerated inference setup for Windows users with automated dependency management and RTX 5090 support.
RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.
From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.
异环(Neverness To Everness / Ananta)光线追踪一键部署面板,基于 OptiScaler winmm 方案,默认推荐 RTX 5090,并支持本机/RTX 4090/RTX 5080M 配置、备份、恢复和本地 WebUI。
Pixal3D ComfyUI integration for Windows (RTX 30/40/50) — single image to textured PBR mesh in 3-5 min
NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.
Research: vGPU unlock on consumer NVIDIA RTX 5090 (Blackwell/GB202). 19 binary patches, full CPU-side pipeline working, GSP firmware blocked by fused-off VF PRIV registers.
Windows prebuilt of llama.cpp combining Multi-Token Prediction (MTP) + TurboQuant KV cache compression + native sm_120 (Blackwell consumer GPU, FP4 tensor cores). For RTX 5060 Ti / 5070 / 5080 / 5090.
Fastest MoE/LLM inference runtime for consumer and edge Blackwell GPUs. SN74 on Gittensor.
Local AI coding assistant using Qwen3.6-27B, Ollama, and FastAPI proxy. Built for NVIDIA DGX Spark (GB10) with RTX 5090/4090/3090 GPU support. Powers VS Code Copilot or GitHub Copilot CLI with zero API costs.
A high-performance local AI pipeline for restoring VHS audio, transcribing with Whisper, and translating subtitles using NLLB-200.
CastelOS public artifacts — principles, architecture insights, and build-in-public content
Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.
Optimized CSM-1B TTS pipeline for RTX 5090 (Blackwell sm_120). CUDA graph replay via patched HF Transformers. ~0.46x RTF. Topics (tags): csm text-to-speech rtx-5090 blackwell cuda-graphs torch-compile sesame streaming pytorch
Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.
⚡ Compare AI models by Accuracy × Cost × Carbon — RTX 5090 benchmarks reveal 4-bit quantization wastes energy on small models
Local-first LLM stack on a single RTX 5090: QLoRA fine-tuning, exact speculative decoding, paged KV-cache, and continuous batching — served via FastAPI with a live React dashboard.
Technical insights from r/LocalLLaMA — vLLM, FP8, NVFP4, Blackwell GPU benchmarks, and more. Unverified community knowledge, generated by Nemotron 9B. Issues welcome.
Add a description, image, and links to the rtx-5090 topic page so that developers can more easily learn about it.
To associate your repository with the rtx-5090 topic, visit your repo's landing page and select "manage topics."