- Beijing, China
- https://yangwenbo.com
Starred repositories
这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…
A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
A modern replacement for Redis and Memcached
Trainable fast and memory-efficient sparse attention
A Datacenter Scale Distributed Inference Serving Framework
A tool to configure, launch and manage your machine learning experiments.
Scalable toolkit for efficient model reinforcement
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
HuggingFace conversion and training library for Megatron-based models
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Delivers efficient, stable, and secure data distribution and acceleration powered by P2P technology, with an optional content‑addressable filesystem that accelerates OCI container launch.
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A web-based 3D CAD application for online model design and editing
All in one project management tool for efficient teams
slime is an LLM post-training framework for RL Scaling.
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
A Python program that uses tkinter as a UI. It helps organize photos by putting them in folders based on the time they were taken.
SGLang is a fast serving framework for large language models and multi-modality models.
Scripts for building Chromium from source tarball.