Stars
[BMVC'21] Official PyTorch Implementation of "Grounded Situation Recognition with Transformers"
[AAAI 2024] EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
A collection of awesome LaTeX Thesis/Dissertation templates and beyond! //(LaTeX / Word / Typst / Markdown 格式的学位论文、演示文稿、报告、项目申请书、简历、书籍等模板收藏)
Data set for the IEEE TGRS paper "Mutual Attention Inception Network for Remote Sensing Visual Question Answering"
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition, by Debaditya Roy and Dhruv Verma and Basura Fernando, IEEE/CVF Winter Conference on Applications of Compu…
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
Code for the paper "PointAttN: You Only Need Attention for Point Cloud Completion"
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Papers and Datasets about Point Cloud.
[MICCAI 2024] TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs
Video Object Segmentation using Space-Time Memory Networks
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Code for paper titled, "Learning to Predict Task Progress by Self-Supervised Video Alignment" by Gerard Donahue and Ehsan Elhamifar, published at CVPR 2024.
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
"Interaction-centric Spatio-Temporal Context Reasoning for Muti-Person Video HOI Recognition" ECCV 2024
Official Implementation of STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering, AAAI 2024
Official repository of ECCV 2024 paper - "HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization"
[CVPR 2021] Actor-Context-Actor Relation Network for Spatio-temporal Action Localization
Video Evnet Extraction via Tracking Visual States of Arguments (AAAI2023)
[ECCV 2024 oral] -C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Code release for Hu et al., Language-Conditioned Graph Networks for Relational Reasoning. in ICCV, 2019
[IEEE TMM 2025 & ACL 2024 Findings] LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
[CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)