Stars
Havenask is a large-scale distributed information search system widely used within Alibaba Group
A distributed, fast open-source graph database featuring horizontal scalability and high availability
PerfKit Benchmarker (PKB) contains a set of benchmarks to measure and compare cloud offerings. The benchmarks use default settings to reflect what most users will see. PerfKit Benchmarker is licens…
Enjoy the magic of Diffusion models!
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
FlashInfer: Kernel Library for LLM Serving
SGLang is a high-performance serving framework for large language models and multimodal models.
A guidance language for controlling large language models.
Sourcetrail - free and open-source interactive source explorer
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A family of compressed models obtained via pruning and knowledge distillation
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
llama3 implementation one matrix multiplication at a time
Transformer related optimization, including BERT, GPT
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A high-throughput and memory-efficient inference and serving engine for LLMs
Continuous Profiling Platform. Debug performance issues down to a single line of code
DeepSeek Coder: Let the Code Write Itself
Modeling, training, eval, and inference code for OLMo
Universal LLM Deployment Engine with ML Compilation
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
APM, Application Performance Monitoring System
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming