🕸️ [ICCV'21 Oral] Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Python 101 12 Updated Aug 17, 2022

chengzhag / Implicit3DUnderstanding

🕸️ [CVPR'21] Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation. Also includes a PyTorch implementation of the decoder of LDIF (from 3D Shape …

Python 218 37 Updated Sep 11, 2021

linkangheng / Video-UTR

[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs

Shell 61 1 Updated Feb 27, 2025

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown 1,159 54 Updated Jan 11, 2026

gabrielegoletto / AMEGO

Code for the paper "AMEGO: Active Memory from long EGOcentric videos" published at ECCV 2024

Python 43 4 Updated Dec 7, 2024

maxin-cn / Awesome-Autoregressive-Visual-Generation-Models

a collection of awesome autoregressive visual generation models

79 Updated Apr 17, 2025

maxin-cn / Cinemo

[CVPR 2025] Consistent and Controllable Image Animation with Motion Diffusion Models

Python 293 22 Updated May 17, 2025

lzw-lzw / awesome-remote-sensing-vision-language-models

Awesome-Remote-Sensing-Vision-Language-Models

190 10 Updated Apr 27, 2024

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

718 26 Updated Dec 8, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,352 2,708 Updated Aug 12, 2024

superagent-ai / superagent

Superagent protects your AI applications against prompt injections, data leaks, and harmful outputs. Embed safety directly into your app and prove compliance to your customers.

TypeScript 6,366 957 Updated Jan 21, 2026

run-llama / rags

Build ChatGPT over your data, all with natural language

Python 6,529 664 Updated Apr 5, 2024

Vchitect / Latte

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

Python 1,905 189 Updated Oct 30, 2025

microsoft / PICa

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)

Python 87 6 Updated Apr 10, 2022

euske / wavetool

Audio manupulation/analysis tools.

Python 3 2 Updated Apr 13, 2016

MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research

Python 330 64 Updated Sep 3, 2021

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python 5,616 942 Updated Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upper9527

Achievements

Achievements

Block or report Upper9527

Stars

AMAP-ML / SocioReasoner

AMAP-ML / Thinking-with-Map

AMAP-ML / Eevee

AMAP-ML / HS-STaR

AMAP-ML / Pos2Distill

AMAP-ML / Tree-GRPO

Upper9527 / DrVideo

AMAP-ML / UniVG-R1

maxin-cn / OmniPainter

AMAP-ML / GPG

Upper9527 / GeReA

chengzhag / PanSplat

chengzhag / PanFusion

chengzhag / DeepPanoContext