Stars
H-Net: Hierarchical Network with Dynamic Chunking
Flash Attention in ~100 lines of CUDA (forward pass only)
Training LLMs with QLoRA + FSDP
Codebase for Merging Language Models (ICML 2024)
Robust recipes to align language models with human and AI preferences
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…