Stars
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
Pytorch0.4.1 codes for InsightFace
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.
Code for Group Critical-token Policy Optimization for Autoregressive Image Generation
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
State-of-the-art 2D and 3D Face Analysis Project
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
[ICCV2025] PyTorch implementation of "Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models"
3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型
A high-throughput and memory-efficient inference and serving engine for LLMs
Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Official inference repo for FLUX.1 models
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Mobile-Agent: The Powerful GUI Agent Family
Reference PyTorch implementation and models for DINOv3
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Build resilient language agents as graphs.
The absolute trainer to light up AI agents.