-
Nanyang Technological University
- Ho Chi Minh city
Lists (3)
Sort Name ascending (A-Z)
Stars
[ECCV’24 Main] MAMA: A Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
[AAAI 2024] MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation
[NAACL 2024] ToXCL: A Unified Framework for Toxic Speech Detection and Explanation
[AAAI’24 Main] READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
✨✨Latest Advances on Multimodal Large Language Models
Large Language Models Are Reasoning Teachers (ACL 2023)
A Topic Modeling System Toolkit (ACL 2024 Demo)
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling
Collection of AWESOME vision-language models for vision tasks
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
Official implementation of POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples (NeurIPS 2021)
Network Pruning That Matters: A Case Study on Retraining Variants (ICLR 2021)
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Code for ICML 2020 "Graph Optimal Transport for Cross-Domain Alignment"
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic representations).
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Vision-Language Pre-training for Image Captioning and Question Answering
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".