Lists (1)
Sort Name ascending (A-Z)
Stars
The official implementation of flow Q-learning (FQL)
AnyLoc: Universal Visual Place Recognition (RA-L 2023)
DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing
NetVLAD: CNN architecture for weakly supervised place recognition
slime is an LLM post-training framework for RL Scaling.
A Foundation Model for Generalist Gaming Agents
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Efficient vision foundation models for high-resolution generation and perception.
Towards Scalable Pre-training of Visual Tokenizers for Generation
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
Official Code of "Distribution Matching Distillation Meets Reinforcement Learning"
[NeurIPS 2024] GenRL: Multimodal-foundation world models enable grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state…
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
StreamDiffusion, Live Stream APP
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Native Multimodal Models are World Learners
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives