- Irvine
-
11:47
(UTC -08:00) - in/austin362667
Lists (1)
Sort Name ascending (A-Z)
Stars
A Streaming-Native Serving Engine for TTS/STS Models
AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
NVIDIA FastGen: Fast Generation from Diffusion Models
JAX bindings for the FlashAttention 3 kernels
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
DFlash: Block Diffusion for Flash Speculative Decoding
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Enjoy the magic of Diffusion models!
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
TTS model capable of streaming conversational audio in realtime.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
🚀 Efficient implementations of state-of-the-art linear attention models
Distributed execution for duckdb queries.