Starred repositories
Courses on building, compressing, evaluating, and deploying efficient AI models.
Questions and answers to the Germany Citizenship Test (Einbürgerungstest) in Anki format.
Implementation of our unlearning method "Partial Model Collapse" introduced in the paper: "Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs" (Preprint).
Download and Compile Any Diffusion Models in your Endpoint
A curated list of the best software pricing pages and useful resources for pricing research
TabBench is a benchmark built to evaluate machine learning models on tabular data, focusing on real-world industry use cases.
📚 Collection of token-level model compression resources.
Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.
A curated list of materials on AI efficiency
This is a ComfyUI node that integrates pruna
This repository describes how to use pruna with tritonserver
collection of diffusion model papers categorized by their subareas
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
A list of awesome compiler projects and papers for tensor computation and deep learning.
Official Repository for the ICLR 2022 paper "Generalization of Neural Combinatorial Solvers through the Lens of Adversarial Robustness"
Official Implementation of the Paper "MAGNet: Motif-Agnostic Generation of Molecules from Shapes"
Code for Winning the Lottery Ahead of Time: Efficient Early Network Pruning (ICML 2022)
A massively parallel, high-level programming language
Examples and guides for using the OpenAI API
Open source implementation and models of One-step Diffusion with Distribution Matching Distillation
Awesome LLM compression research papers and tools.
High-speed Large Language Model Serving for Local Deployment
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Mac app for crushing tech interviews with AI
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Wel…
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…