-
Central South University
- Changsha
Lists (5)
Sort Name ascending (A-Z)
Stars
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning (KDD '25)
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
🔥 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization
Graph learning framework for long-term video understanding
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
VideoX: a collection of video cross-modal models
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
[IJCV 2025] Project Page for "Deep Learning-Based Object Pose Estimation: A Comprehensive Survey".
[NeurIPS 2025] CausalVTG: Towards Robust Video Temporal Grounding via Causal Inference
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
[RA-L 2025] This is the official repository for using and downloading the DynOPETs dataset.
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
A paper list of some recent Mamba-based CV works.
This is the official implementation for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1.
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
📌 Code sourced from [zezhishao/DailyArXiv](https://github.com/zezhishao/DailyArXiv)
Testing adaptation of the DINOv2/3 encoders for vision tasks with Low-Rank Adaptation (LoRA)
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Videos (TSGV)
Latest Papers, Codes and Datasets on VTG-LLMs.
When One Moment Isn’t Enough: Multi-Moment Retrieval with Cross-Moment Interactions (NeurIPS 2025)