Lists (1)
Sort Name ascending (A-Z)
Stars
Algorithm powering the For You feed on X
Code for "Kuramoto Orientation Diffusion"
Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation (AAAI 2026)
[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
Official PyTorch implementation for "Large Language Diffusion Models"
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
A foundation model for knowledge graph reasoning
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
Official code for 【BMVC 2021】Decentralised Person Re-Identification with Selective Knowledge Aggregation
MentraOS is the leading smart glasses platform + SDK. Stream your view, transcribe audio, talk to AI and capture photos hands-free on compatible glasses.
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
GraphRAG / From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
[SIGGRAPH 2025] One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[NeurIPS 2025] 𝓡𝓣𝓥-𝓑𝓮𝓷𝓬𝓱: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video.
[ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"