Starred repositories
Machine Learning Engineering Open Book
Data manipulation and transformation for audio signal processing, powered by PyTorch
GPU programming related news and material links
Fast and memory-efficient exact attention
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Portfolio analytics for quants, written in Python
FlagGems is an operator library for large language models implemented in the Triton Language.
⚡️ Lightning-fast backtesting engine to find your trading edge
Python Backtesting library for trading strategies
[CVPR 2026] Official implementation of BiCo: Composing Concepts from Images and Videos via Concept-prompt Binding
A feature-rich command-line audio/video downloader
shihaobai / ms-swift
Forked from modelscope/ms-swiftUse PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...…
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
🤗 A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Complete solutions to the Programming Massively Parallel Processors Edition 4
LightTTS is a lightweight TTS inference framework optimized for CosyVoice2 and CosyVoice3, enabling fast and scalable speech synthesis in Python and supports stream and bistream modes.
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
slime is an LLM post-training framework for RL Scaling.
A unified inference and post-training framework for accelerated video generation.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism