Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View rejoicelf's full-sized avatar

Block or report rejoicelf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository for "Unveiling and Mitigating Bias in Audio Visual Segmentation" in ACM MM 2024

Python 6 Updated Oct 10, 2024

AVSBench-Robust Dataset Generation Scripts

Python 2 Updated Apr 29, 2025

The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024

Python 17 1 Updated Oct 11, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 844 55 Updated Mar 25, 2024

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,381 76 Updated Jun 26, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,952 1,254 Updated Oct 27, 2025

The code for DDESeg [CVPR25].

Python 8 1 Updated Jul 5, 2025

The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

Jupyter Notebook 151 2 Updated Oct 9, 2025

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

708 28 Updated Sep 13, 2025

audio-visual segmentation with bidirectional generation

Python 6 Updated Sep 10, 2024

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,476 183 Updated Feb 16, 2025

Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"

Python 484 21 Updated Mar 17, 2025

[ICCV 2025] SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Jupyter Notebook 529 18 Updated Jul 29, 2025

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

Python 1,378 96 Updated Nov 4, 2025

The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024

Python 47 2 Updated Oct 12, 2025

[CVPR 2025] "A Distractor-Aware Memory for Visual Object Tracking with SAM2"

Python 432 36 Updated Oct 23, 2025

[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding

Python 127 9 Updated Aug 26, 2025

Integrate the DeepSeek API into popular softwares

34,334 3,844 Updated Sep 25, 2025

[NeurIPS 2024] Mixture of Experts for Audio-Visual Learning

Python 23 1 Updated Jan 19, 2025

[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer

Python 70 7 Updated Mar 6, 2025
Python 31 Updated Mar 1, 2024

Deep Audio-Visual Embedding network (DAVEnet) implementation in PyTorch

Python 65 19 Updated Aug 31, 2018

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 1,030 74 Updated Aug 14, 2025

Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024

Python 24 5 Updated Mar 26, 2024

Deep Correlated Prompting for Visual Recognition with Missing Modalities (NeurIPS 2024)

Python 29 1 Updated Mar 6, 2025

[CVPR 2024] - Official code for the paper "Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation"

Python 43 3 Updated Aug 22, 2024

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Python 39 3 Updated Apr 20, 2025

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts

Python 1,270 107 Updated Dec 28, 2024
Python 1 Updated Sep 15, 2024

[CVPR 2025 Highlight] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"

Python 335 23 Updated Sep 25, 2025
Next