Stars
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
The simplest, fastest repository for training/finetuning small-sized VLMs.
Millions-Level Face/Human-Scene Image-Text Datasets
[IEEE T-BIOM] FaceXBench: Evaluating Multimodal LLMs on Face Understanding
Famous Vision Language Models and Their Architectures
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
PyTorch implementation of Real-ESRGAN model
A Multimodal Large Language Model for Face Understanding
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.
neosr is an open-source framework for training super-resolution models.
🚀 An awesome list of curated Nano Banana pro prompts and examples. Your go-to resource for mastering prompt engineering and exploring the creative potential of the Nano banana pro(Nano banana 2) AI…
SpotEdit:Selective Region Editing in Diffusion Transformers
This project is the official implementation of 'DreamOmni3: Scribble-based Editing and Generation''
[AAAI 2026] Offical implementation of the paper "IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation".
ThinkGen: Generalized Thinking for Visual Generation
A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation
A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration techniques.
[CVPR 2022] We unify pixel-to-pixel transformation and color-to-color transformation in a coherent framework for high-resolution image harmonization. We also release 100 high-resolution real compos…
RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
All my self trained & released AI upscaling models. After gathering and applying over 600 different upscaling models, I learned how to train my own models, and these are the results.
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022
HQ-50K: A Large-scale, High-quality Dataset for Image Restoration