Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Official code of “MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning”
[CVPR 2025] Official Implementation for Probing the Mid-level Vision Capabilities of Self-Supervised Learning
[3DV 2026] Official Code Release for Learning 3D Representations from Procedural 3D Programs
[3DV 2026] Open Vocabulary Monocular 3D Object Detection
[ICCV 2025] VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
[TMLR 2025] Efficient Diffusion Models: A Survey
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
[NeurIPS 2025] LabelAny3D: Label Any Object 3D in the Wild
assistant tools for attention visualization in deep learning
Official implementation for "Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter"
Generalised Contrastive Learning. This is a Repository for Google Shopping Dataset and Benchmarks followed by our novel fine-grained contrastive learning framework.
Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective (AAAI 2026)
Official implementation of the "Multimodal Parameter-Efficient Few-Shot Class Incremental Learning" paper
Awsome of VLM-CL. Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
[ICCV'25] Unified Open-World Segmentation with Multi-Modal Prompts
Central repository for biomolecular foundation models with shared trainers and pipeline components
Code for β -CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
Official implementation of "Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation".
Official repository of "Multi-view Pyramid Transformer: Look Coarser to See Broader"
[NeurIPS 2025] AutoSeg3D, online real-time 3D segmentation as instance tracking with long-short term query memory for embodied perception
[AAAI 2025] Official codes of "ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models".
A training-free, mask-free framework for 3D shape editing.
offical repository of LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”
[AAAI 2026] Diffusion-Based Contextual Reconstruction for Point Cloud Segmentation with Limited Annotations