Stars
Universal Monocular Metric Depth Estimation
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
DeepEP: an efficient expert-parallel communication library
The official repository to build SAT-DS, a medical data collection of over 72 public segmentation datasets, contains over 22K 3D images, 302K segmentation masks and 497 classes from 3 different mod…
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
💖🧸 Self hosted, you owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minec…
Trainable fast and memory-efficient sparse attention
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
Unveiling Super Experts in Mixture-of-Experts Large Language Models
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Helpful tools and examples for working with flex-attention
Unified KV Cache Compression Methods for Auto-Regressive Models
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
本仓库包含对 Claude Code v1.0.33 进行逆向工程的完整研究和分析资料。包括对混淆源代码的深度技术分析、系统架构文档,以及重构 Claude Code agent 系统的实现蓝图。主要发现包括实时 Steering 机制、多 Agent 架构、智能上下文管理和工具执行管道。该项目为理解现代 AI agent 系统设计和实现提供技术参考。
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Code for CVPR'24 best paper: Rich Human Feedback for Text-to-Image Generation (https://arxiv.org/pdf/2312.10240)
Famous Vision Language Models and Their Architectures