Stars
Official repository for "Unveiling and Mitigating Bias in Audio Visual Segmentation" in ACM MM 2024
AVSBench-Robust Dataset Generation Scripts
The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
audio-visual segmentation with bidirectional generation
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
[ICCV 2025] SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
[CVPR 2025] "A Distractor-Aware Memory for Visual Object Tracking with SAM2"
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
Integrate the DeepSeek API into popular softwares
[NeurIPS 2024] Mixture of Experts for Audio-Visual Learning
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
Deep Audio-Visual Embedding network (DAVEnet) implementation in PyTorch
Frontier Multimodal Foundation Models for Image and Video Understanding
Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities (NeurIPS 2024)
[CVPR 2024] - Official code for the paper "Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation"
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts
[CVPR 2025 Highlight] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"