Codestin Search App

DazzleML / comfyui-triton-and-sageattention-installer

Cross-platform installer for Triton and SageAttention on ComfyUI. Simplifies GPU-accelerated inference setup for Windows users with automated dependency management and RTX 5090 support.

windows automation installer cuda pytorch triton gpu-acceleration windows10 build-tools cli-tool windows11 stable-diffusion comfyui rtx-5090 sageattention

Updated Jun 17, 2026
Python

dconsorte / pytorch-tensorflow-gpu

Star

RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.

docker machine-learning deep-learning tensorflow cuda pytorch gpu-computing blackwell rtx-5090 rtx-5060 blackwell-gpu nvidia-blackwell cuda-12-8 rtx-50-series rtx-5080

Updated Jul 8, 2025
Shell

kekzl / imp

Star

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.

Updated Jul 4, 2026
Cuda

llg1634 / nte-ray-tracing-panel

Star

异环（Neverness To Everness / Ananta）光线追踪一键部署面板，基于 OptiScaler winmm 方案，默认推荐 RTX 5090，并支持本机/RTX 4090/RTX 5080M 配置、备份、恢复和本地 WebUI。

Updated May 18, 2026
Python

dreamrec / ComfyUI-Pixal3D

Sponsor

Star

Pixal3D ComfyUI integration for Windows (RTX 30/40/50) — single image to textured PBR mesh in 3-5 min

Updated May 14, 2026
Python

lna-lab / blackwell-geforce-nvfp4-gemm

Star

NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.

gpu-computing quantization cutlass gemm geforce blackwell vllm llm-inference flashinfer rtx-5090 sm120 nvfp4

Updated Apr 27, 2026
Python

OnlyTerp / windows-is-fine-for-llms

Star

The old advice to avoid Windows for local LLMs used to be right. It isn't anymore. The fixes for display-GPU desktop crashes and WSL memory limits, from people who run a 5090 daily.

windows cuda nvidia tdr wsl2 llm llamacpp local-llm ollama rtx-5090

Updated Jun 2, 2026
PowerShell

bird / vgpu-unlock-blackwell

Star

Research: vGPU unlock on consumer NVIDIA RTX 5090 (Blackwell/GB202). 19 binary patches, full CPU-side pipeline working, GSP firmware blocked by fused-off VF PRIV registers.

reverse-engineering nvidia kvm vfio sr-iov mdev vgpu blackwell gpu-virtualization rtx-5090 vgpu-unlock gsp-firmware

Updated Apr 1, 2026
C

Andgihat / llama-cpp-mtp-turboquant-sm120-blackwell-windows

Star

Windows prebuilt of llama.cpp combining Multi-Token Prediction (MTP) + TurboQuant KV cache compression + native sm_120 (Blackwell consumer GPU, FP4 tensor cores). For RTX 5060 Ti / 5070 / 5080 / 5090.

windows prebuilt mtp blackwell llama-cpp rtx-5090 cuda-12-8 sm-120 turboquant rtx-50 rtx-5060ti

Updated Jun 5, 2026

gittensor-ai-lab / sparkinfer

Star

Fastest MoE/LLM inference runtime for consumer and edge Blackwell GPUs. SN74 on Gittensor.

gpu cuda moe gemma jetson inference-engine gpu-programming mixture-of-experts edge-ai on-device-ai blackwell llm bittensor qwen agentic-ai rtx-5090 kernel-optimization gittensor

Updated Jul 4, 2026
Cuda

darkmatter2222 / GithubCopilotExit

Star

Local AI coding assistant using Qwen3.6-27B, Ollama, and FastAPI proxy. Built for NVIDIA DGX Spark (GB10) with RTX 5090/4090/3090 GPU support. Powers VS Code Copilot or GitHub Copilot CLI with zero API costs.

vscode vscode-extension coding-assistant github-copilot local-llm local-ai ollama qwen large-context private-ai tool-calling rtx-5090 openai-compatible ai-coding-assistant qwen3 agentic-coding qwen3-coder copilot-alternative

Updated Jul 4, 2026
HTML

ventura8 / Auto-Subtitle-Generator

Star

A high-performance local AI pipeline for restoring VHS audio, transcribing with Whisper, and translating subtitles using NLLB-200.

python automation translation ffmpeg cuda vhs subtitle-generator audio-restoration local-ai rtx-5090 nllb-200 ryzen-9950x3d pytorch-nightly vwhisper-ai

Updated Jan 26, 2026
Python

CastelDazur / castelos-public

Star

CastelOS public artifacts — principles, architecture insights, and build-in-public content

gpu workstation build-in-public llm local-ai rtx-5090

Updated Apr 23, 2026
Mermaid

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

Star

Enterprise-grade Sovereign AI Stack optimized for NVIDIA Blackwell (sm_120) & vLLM. Features 256K context window, 5.8k tok/s prefill, and integrated observability via Langfuse.

cuda blackwell vllm langfuse litellm rtx-5090 qwen3 sovereign-ai self-hosted-llm llm-infrastructure

Updated Jan 21, 2026
Python

senorcris / gpu-power-monitor

Star

python linux i2c textual tui nvidia asus gpu-monitoring asus-rog power-monitoring rtx-5090 rtx-5000 it8915fn 12v-2x6

Updated Feb 21, 2026
Python

D3velop-llc / csm-rtx5090

Star

Optimized CSM-1B TTS pipeline for RTX 5090 (Blackwell sm_120). CUDA graph replay via patched HF Transformers. ~0.46x RTF. Topics (tags): csm text-to-speech rtx-5090 blackwell cuda-graphs torch-compile sesame streaming pytorch

text-to-speech streaming pytorch tts sesame csm huggingface blackwell torch-compile rtx-5090 sm-120 cuda-graphs

Updated Apr 5, 2026
Python

thc1006 / taiwan-asr-toolkit

Star

Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.

Updated May 7, 2026
Python

hongping-zh / ecocompute-dynamic-eval

Star

⚡ Compare AI models by Accuracy × Cost × Carbon — RTX 5090 benchmarks reveal 4-bit quantization wastes energy on small models

open-source quantization energy-efficiency carbon-footprint mlops carbon-calculator green-ai sustainable-ai climate-tech llm-evaluation deepseek rtx-5090 ai-sustainability gpu-benchmarks

Updated May 4, 2026
TypeScript

likhith-v1 / inferd

Star

Local-first LLM stack on a single RTX 5090: QLoRA fine-tuning, exact speculative decoding, paged KV-cache, and continuous batching — served via FastAPI with a live React dashboard.

react cuda pytorch triton lora inference-engine fine-tuning fastapi kv-cache llm local-llm llm-inference qlora qwen speculative-decoding paged-attention continuous-batching rtx-5090

Updated Jul 3, 2026
Python

soy-tuber / localllama-insights

Star

Technical insights from r/LocalLLaMA — vLLM, FP8, NVFP4, Blackwell GPU benchmarks, and more. Unverified community knowledge, generated by Nemotron 9B. Issues welcome.

gpu inference benchmarks blackwell llm fp8 vllm localllama rtx-5090 nvfp4

Updated Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtx-5090

Here are 40 public repositories matching this topic...

DazzleML / comfyui-triton-and-sageattention-installer

dconsorte / pytorch-tensorflow-gpu

kekzl / imp

llg1634 / nte-ray-tracing-panel

dreamrec / ComfyUI-Pixal3D

lna-lab / blackwell-geforce-nvfp4-gemm

OnlyTerp / windows-is-fine-for-llms

bird / vgpu-unlock-blackwell

Andgihat / llama-cpp-mtp-turboquant-sm120-blackwell-windows

gittensor-ai-lab / sparkinfer

darkmatter2222 / GithubCopilotExit

ventura8 / Auto-Subtitle-Generator

CastelDazur / castelos-public

informatico-madrid / Sovereign-Blackwell-vLLM-Stack

senorcris / gpu-power-monitor

D3velop-llc / csm-rtx5090

thc1006 / taiwan-asr-toolkit

hongping-zh / ecocompute-dynamic-eval

likhith-v1 / inferd

soy-tuber / localllama-insights

Improve this page

Add this topic to your repo