Stars
OmniGen2: Exploration to Advanced Multimodal Generation.
A Datacenter Scale Distributed Inference Serving Framework
antgroup / ant-ray
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay is forked from ray, offering incremental new features on top …
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
A Flexible Framework for Comprehensive Multimodal Model Evaluation
FlagScale is a large model toolkit based on open-sourced projects.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
一款简单易用且高性能的端侧AI部署框架 | An Easy-to-Use and High-Performance Edge AI Deployment Framework
Awesome LLM compression research papers and tools.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
A high-throughput and memory-efficient inference and serving engine for LLMs
JiabinYang / Paddle
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning