austin362667

Austin Liu austin362667

@linkedin Liger Kernel, @apache DataFusion Ray Contributor, @flyteorg Committer

66 followers · 194 following

Irvine
11:47 (UTC -08:00)
in/austin362667

Achievements

x2 x3

Achievements

x2 x3

Lists (1)

Sort

🔮 Future ideas

5 repositories

Stars

vox-serve / vox-serve

A Streaming-Native Serving Engine for TTS/STS Models

Python 51 5 Updated Feb 13, 2026

AstroAccelerateOrg / astro-accelerate

AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.

C++ 52 17 Updated Jan 8, 2026

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,211 119 Updated Feb 13, 2026

yichuan-w / LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 9,968 859 Updated Feb 12, 2026

apache / tvm-ffi

Open ABI and FFI for Machine Learning Systems

C++ 337 59 Updated Feb 12, 2026

NVlabs / FastGen

NVIDIA FastGen: Fast Generation from Diffusion Models

Python 553 31 Updated Jan 28, 2026

tinygrad / teenygrad

If tinygrad wasn't small enough for you...

Python 772 99 Updated Mar 9, 2024

kyutai-labs / flash-attn3-jax

JAX bindings for the FlashAttention 3 kernels

C++ 17 1 Updated Dec 27, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,795 271 Updated Feb 10, 2026

dmlc / dlpack

common in-memory tensor structure

C++ 1,168 158 Updated Jan 26, 2026

z-lab / dflash

DFlash: Block Diffusion for Flash Speculative Decoding

Python 545 33 Updated Feb 6, 2026

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,340 181 Updated Dec 17, 2025

google / qwix

a Jax quantization library

Python 91 11 Updated Feb 13, 2026

NVlabs / Fast-dLLM

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 833 101 Updated Jan 28, 2026

inclusionAI / LLaDA2.X

LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.

330 18 Updated Feb 12, 2026

hao-ai-lab / d3LLM

d3LLM: Ultra-Fast Diffusion LLM 🚀

Python 91 2 Updated Feb 4, 2026

DreamLM / Dream

Dream 7B, a large diffusion language model

Python 1,167 72 Updated Nov 21, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,665 500 Updated Oct 27, 2025

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,945 382 Updated Feb 13, 2026

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 11,783 1,138 Updated Feb 11, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 874 108 Updated Feb 13, 2026

buddy-compiler / buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

C++ 695 234 Updated Feb 13, 2026

ucb-bar / chipyard

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more

Scala 2,137 813 Updated Feb 13, 2026

nari-labs / dia2

TTS model capable of streaming conversational audio in realtime.

Python 1,063 87 Updated Nov 29, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,740 3,337 Updated Feb 13, 2026

pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Python 3,299 439 Updated Feb 13, 2026

davidlattimore / wild

A very fast linker for Linux

Rust 3,341 99 Updated Feb 13, 2026

google / tunix

A Lightweight LLM Post-Training Library

Python 2,152 237 Updated Feb 13, 2026

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,380 386 Updated Feb 13, 2026

dentiny / duckdb-distributed-execution

Distributed execution for duckdb queries.

C++ 47 2 Updated Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Austin Liu austin362667

Achievements

Achievements

Block or report austin362667

Lists (1)

🔮 Future ideas

Stars

vox-serve / vox-serve

AstroAccelerateOrg / astro-accelerate

uccl-project / uccl

yichuan-w / LEANN

apache / tvm-ffi

NVlabs / FastGen

tinygrad / teenygrad

kyutai-labs / flash-attn3-jax

skyzh / tiny-llm

dmlc / dlpack

z-lab / dflash

NVIDIA / gdrcopy

google / qwix

NVlabs / Fast-dLLM

inclusionAI / LLaDA2.X

hao-ai-lab / d3LLM

DreamLM / Dream

ByteDance-Seed / Bagel

pytorch / TensorRT

modelscope / DiffSynth-Studio

radixark / miles

buddy-compiler / buddy-mlir

ucb-bar / chipyard

nari-labs / dia2

NVIDIA-NeMo / NeMo

pytorch / rl

davidlattimore / wild

google / tunix

fla-org / flash-linear-attention

dentiny / duckdb-distributed-execution