Stars
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Official code for VisProg (CVPR 2023 Best Paper!)
🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
bert-base-chinese example