-
University of North Carolina, Charlotte
- NC, United States
- in/manishgovind
- https://orcid.org/0009-0003-6381-6293
Stars
Code to load DreamZero model checkpoints and run evaluation on DROID-sim and Genie Sim 3.0
Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"
š¤ LeRobot: Making AI for Robotics more accessible with end-to-end learning
A curated list of large VLM-based VLA models for robotic manipulation.
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
Frontier Multimodal Foundation Models for Image and Video Understanding
Official Repository of 'Multi-Scale Temporal Mamba for Efficient Temporal Action Detection'
[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
This repository contains demos I made with the Transformers library by HuggingFace.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
ćEMNLP 2024š„ćVideo-LLaVA: Learning United Visual Representation by Alignment Before Projection
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
[CVPR 2024] Code and models for pi-ViT, a video transformer for understanding activities of daily living
A one stop repository for generative AI research updates, interview resources, notebooks and much more!