Thanks to visit codestin.com
Credit goes to Github.com

UCSB ERIC Lab

All

28 repositories

GRIT
Public
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
reinforcement-learning visual-reasoning visual-grounding multimodal-reasoning grounded-reasoning thinking-with-image
Python
•
MIT License
•10•172•6•0•Updated Jan 8, 2026Jan 8, 2026
DMLR
Public
Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
Python
•0•40•5•0•Updated Dec 17, 2025Dec 17, 2025
Soft-Thinking
Public
Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"
soft-reasoning chain-of-thought-reasoning reasoning-models soft-thinking continous-space-reasoning soft-token concept-token
Python
•
MIT License
•37•296•4•1•Updated Dec 12, 2025Dec 12, 2025
EvoPresent
Public
Official codebase for the paper "Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations"
Python
•22•329•2•0•Updated Oct 14, 2025Oct 14, 2025
MMWorld
Public
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
evaluation video-understanding video-dataset multi-disciplinary multimodal-large-language-models world-model
Python
•
MIT License
•1•29•0•0•Updated Jul 15, 2025Jul 15, 2025
SafeKey
Public
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
ai-safety llm-safety large-reasoning-models safety-reasoning
Python
•1•14•0•0•Updated Jun 30, 2025Jun 30, 2025
MSSBench
Public
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
safety ai-agents situational-awareness ai-assistant large-language-models multimodal-large-language-models
Python
•
MIT License
•2•30•3•0•Updated Jun 23, 2025Jun 23, 2025
Mojito
Public
Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""
motion-control video-generation diffusion-models controllable-generation text-to-video-generation
Python
•1•5•0•0•Updated Jun 11, 2025Jun 11, 2025
iReason
Public
Official code for paper "Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models"
Python
•1•4•0•0•Updated Jun 4, 2025Jun 4, 2025
MLRM-Halu
Public
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Python
•3•0•0•0•Updated May 31, 2025May 31, 2025
VLMbench
Public
NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"
language-grounding vision-and-language robotic-manipulation compositionality embodied-ai
Python
•
MIT License
•8•98•5•0•Updated May 8, 2025May 8, 2025
MiniGPT-5
Public
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
transformers diffusion-models multimodal-generation multimodal-llm
Python
•
Apache License 2.0
•53•863•7•0•Updated May 8, 2025May 8, 2025
EditRoom
Public
[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Python
•
MIT License
•4•20•0•1•Updated Apr 1, 2025Apr 1, 2025
MMIR
Public
[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"
Python
•0•13•2•0•Updated Feb 25, 2025Feb 25, 2025
ProbMed
Public
[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
evaluation vision-and-language medical-vqa medical-diagnosis llms large-multimodal-models
Python
•1•25•1•0•Updated Feb 21, 2025Feb 21, 2025
Aerial-Vision-and-Dialog-Navigation
Public
Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
navigation aerial-imagery drone-navigation vision-and-language vln
Python
•7•60•4•0•Updated Nov 4, 2024Nov 4, 2024
llm_coordination
Public
Code repository for the NAACL 2025 paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
multiagent llms coordination-game agent-coordination
Python
•
MIT License
•7•44•1•0•Updated Oct 13, 2024Oct 13, 2024
swap-anything
Public
Official implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"
image-editing personalization diffusion-models subject-driven-generation photoswapping swap-anything
Python
•
MIT License
•15•262•5•0•Updated Oct 10, 2024Oct 10, 2024
ComCLIP
Public
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
causality clip svo slip vision-and-language compositionality flickr8k-dataset image-text-matching flickr30k image-text-retrieval
Python
•
MIT License
•5•37•0•1•Updated Aug 18, 2024Aug 18, 2024
Screen-Point-and-Read
Public
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
screen-reader ai-agents grounding gui-agents tree-of-lens layout-understanding
Python
•3•28•0•0•Updated Jul 31, 2024Jul 31, 2024
R2H
Public
Official implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"
helper navigation dialogue multimodal embodied-agent response-generation ai-agent
Python
•1•5•0•0•Updated Jun 19, 2024Jun 19, 2024
Discffusion
Public
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
vision-and-language few-shot-learning discriminative-learning diffusion-models
Python
•
MIT License
•4•30•1•0•Updated Apr 27, 2024Apr 27, 2024
Naivgation-as-wish
Public
Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
robustness attack-defense federated-learning embodied-agent vision-and-language-navigation
Python
•
MIT License
•0•6•0•0•Updated Apr 10, 2024Apr 10, 2024
T2IAT
Public
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Python
•
MIT License
•0•7•0•0•Updated Aug 15, 2023Aug 15, 2023
PEViT
Public
Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"
pytorch image-classification fine-tuning vision-transformer parameter-efficient-tuning
Python
•
MIT License
•8•106•10•0•Updated Aug 7, 2023Aug 7, 2023
Mitigate-Gender-Bias-in-Image-Search
Public
Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
image-search multimodality gender-bias fairness-ml vision-language
Python
•
MIT License
•1•12•2•0•Updated Feb 6, 2023Feb 6, 2023
CPL
Public
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
vqa image-classification causal-inference vision-and-language image-text-retrieval counterfactual-reasoning prompt-tuning
Python
•
MIT License
•6•35•6•0•Updated Dec 5, 2022Dec 5, 2022
ACLToolBox
Public
Python
•
MIT License
•1•8•0•0•Updated Nov 15, 2022Nov 15, 2022