ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models

📷 This is the code repository for the paper: ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models. ACM MM 2025.

Overview: (a) Vanilla Text CoT Reasoning; (b) Video-Text Interleaved CoT Reasoning; (c) Video-Text Interleaved Data Construction; (d) Performance Comparison: Vanilla Reasoning Paradigm (Vanilla CoT, Vanilla Desp-CoT, and Vanilla Plan-and-Solve) vs. Video-Text Interleaved Reasoning Paradigm (ViT CoT, ViT Desp-CoT and ViT Plan-and-Solve) on Qwen2.5-VL-7B.

Preparation steps: environment installation

(1) Environment installation command:

pip install -r requirements.txt

(2) Please fill in the API information in the file: src/ViTCoT_stage1 and src/ViTCoT_stage2.

API_KEYS = []

(3) Download datasets 🤗 all_video.zip and 🤗 key_video.zip and unzip them into the src folder.

💻 To get the performance results for Gemini-2.0-Flash, run the following command:

cd src
bash run.sh

💯 Model Performance

💬 Contact

Please create Github issues here or email Yongheng Zhang or Libo Qin if you have any questions or suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models

Preparation steps: environment installation

💻 To get the performance results for Gemini-2.0-Flash, run the following command:

💯 Model Performance

💬 Contact

About

Uh oh!

Releases

Packages

Languages

BRZ911/ViTCoT

Folders and files

Latest commit

History

Repository files navigation

ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models

Preparation steps: environment installation

💻 To get the performance results for Gemini-2.0-Flash, run the following command:

💯 Model Performance

💬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages