Stars
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Triton-based Symmetric Memory operators and examples
A high-throughput and memory-efficient inference and serving engine for LLMs
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Fast and memory-efficient exact attention
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pytorch domain library for recommendation systems