Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Upper9527's full-sized avatar

Block or report Upper9527

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of "Urban Socio-Semantic Segmentation with Vision-Language Reasoning"

Python 150 2 Updated Jan 20, 2026

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

135 1 Updated Jan 17, 2026

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

Python 67 Updated Jan 6, 2026

[EMNLP’2025] Official code for "HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation"

Python 25 Updated Nov 3, 2025

Official code for "POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation"

Python 25 Updated Nov 11, 2025

Tree Search for LLM Agent Reinforcement Learning

Python 271 24 Updated Sep 29, 2025

Code of "DrVideo: Document Retrieval Based Long Video Understanding"

Python 96 3 Updated Aug 11, 2025

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Python 155 7 Updated Jun 2, 2025

Training-free Stylized Text-to-Image Generation with Fast Inference

Python 26 Updated May 30, 2025

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

Python 173 6 Updated Oct 11, 2025

Official Code of "GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering"

Python 112 31 Updated Nov 21, 2025

🍳 [CVPR'25] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Python 201 13 Updated Dec 6, 2025

🍳 [CVPR'24 Highlight] Pytorch implementation of "Taming Stable Diffusion for Text to 360° Panorama Image Generation"

Python 254 24 Updated Dec 6, 2025

🕸️ [ICCV'21 Oral] Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Python 101 12 Updated Aug 17, 2022

🕸️ [CVPR'21] Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation. Also includes a PyTorch implementation of the decoder of LDIF (from 3D Shape …

Python 218 37 Updated Sep 11, 2021

[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs

Shell 61 1 Updated Feb 27, 2025

Famous Vision Language Models and Their Architectures

Markdown 1,159 54 Updated Jan 11, 2026

Code for the paper "AMEGO: Active Memory from long EGOcentric videos" published at ECCV 2024

Python 43 4 Updated Dec 7, 2024

a collection of awesome autoregressive visual generation models

79 Updated Apr 17, 2025

[CVPR 2025] Consistent and Controllable Image Animation with Motion Diffusion Models

Python 293 22 Updated May 17, 2025

Awesome-Remote-Sensing-Vision-Language-Models

190 10 Updated Apr 27, 2024

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

718 26 Updated Dec 8, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,352 2,708 Updated Aug 12, 2024

Superagent protects your AI applications against prompt injections, data leaks, and harmful outputs. Embed safety directly into your app and prove compliance to your customers.

TypeScript 6,366 957 Updated Jan 21, 2026

Build ChatGPT over your data, all with natural language

Python 6,529 664 Updated Apr 5, 2024

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

Python 1,905 189 Updated Oct 30, 2025

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)

Python 87 6 Updated Apr 10, 2022

Audio manupulation/analysis tools.

Python 3 2 Updated Apr 13, 2016

A lightweight, scalable, and general framework for visual question answering research

Python 330 64 Updated Sep 3, 2021

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python 5,616 942 Updated Jan 12, 2026
Next