-
Zhejiang University
- Hang Zhou
Lists (16)
Sort Name ascending (A-Z)
Stars
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
verl: Volcano Engine Reinforcement Learning for LLMs
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…
Implementing DeepSeek R1's GRPO algorithm from scratch
The official repository of SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Janus-Series: Unified Multimodal Understanding and Generation Models
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Reference PyTorch implementation and models for DINOv3
An open-source AI agent that lives in your terminal.
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
Enjoy the magic of Diffusion models!
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
[NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows to finetune the explainability maps of Vision Transformers t…
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.