- Beijing, China
- https://yangwenbo.com
Starred repositories
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…
A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
A modern replacement for Redis and Memcached
Trainable fast and memory-efficient sparse attention
A Datacenter Scale Distributed Inference Serving Framework
A tool to configure, launch and manage your machine learning experiments.
Scalable toolkit for efficient model reinforcement
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
HuggingFace conversion and training library for Megatron-based models
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Delivers efficient, stable, and secure data distribution and acceleration powered by P2P technology, with an optional content‑addressable filesystem that accelerates OCI container launch.
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A web-based 3D CAD application for online model design and editing
All in one project management tool for efficient teams
slime is an LLM post-training framework for RL Scaling.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
A Python program that uses tkinter as a UI. It helps organize photos by putting them in folders based on the time they were taken.
aacostadiaz / cutlass-fork
Forked from intel/sycl-tlaCUDA Templates for Linear Algebra Subroutines
SGLang is a fast serving framework for large language models and vision language models.