Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View austin362667's full-sized avatar

Block or report austin362667

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Streaming-Native Serving Engine for TTS/STS Models

Python 51 5 Updated Feb 13, 2026

AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.

C++ 52 17 Updated Jan 8, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,211 119 Updated Feb 13, 2026

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 9,968 859 Updated Feb 12, 2026

Open ABI and FFI for Machine Learning Systems

C++ 337 59 Updated Feb 12, 2026

NVIDIA FastGen: Fast Generation from Diffusion Models

Python 553 31 Updated Jan 28, 2026

If tinygrad wasn't small enough for you...

Python 772 99 Updated Mar 9, 2024

JAX bindings for the FlashAttention 3 kernels

C++ 17 1 Updated Dec 27, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,795 271 Updated Feb 10, 2026

common in-memory tensor structure

C++ 1,168 158 Updated Jan 26, 2026

DFlash: Block Diffusion for Flash Speculative Decoding

Python 545 33 Updated Feb 6, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,340 181 Updated Dec 17, 2025

a Jax quantization library

Python 91 11 Updated Feb 13, 2026

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 833 101 Updated Jan 28, 2026

LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.

330 18 Updated Feb 12, 2026

d3LLM: Ultra-Fast Diffusion LLM 🚀

Python 91 2 Updated Feb 4, 2026

Dream 7B, a large diffusion language model

Python 1,167 72 Updated Nov 21, 2025

Open-source unified multimodal model

Python 5,665 500 Updated Oct 27, 2025

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,945 382 Updated Feb 13, 2026

Enjoy the magic of Diffusion models!

Python 11,783 1,138 Updated Feb 11, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 874 108 Updated Feb 13, 2026

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

C++ 695 234 Updated Feb 13, 2026

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more

Scala 2,137 813 Updated Feb 13, 2026

TTS model capable of streaming conversational audio in realtime.

Python 1,063 87 Updated Nov 29, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,740 3,337 Updated Feb 13, 2026

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Python 3,299 439 Updated Feb 13, 2026

A very fast linker for Linux

Rust 3,341 99 Updated Feb 13, 2026

A Lightweight LLM Post-Training Library

Python 2,152 237 Updated Feb 13, 2026

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,380 386 Updated Feb 13, 2026

Distributed execution for duckdb queries.

C++ 47 2 Updated Feb 9, 2026
Next