-
Southeast University
- Shanghai
-
17:02
(UTC +08:00) - https://yongliang-wu.github.io/
- https://scholar.google.com/citations?user=NdE8DZ8AAAAJ
Stars
Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
A simple yet powerful agent framework that delivers with open-source models
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
Visual Planning: Let's Think Only with Images
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
VideoNSA: Native Sparse Attention Scales Video Understanding
Understanding R1-Zero-Like Training: A Critical Perspective
Fully Open Framework for Democratized Multimodal Training
Official Repository of "Learning to Reason under Off-Policy Guidance"
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
A Survey of Reinforcement Learning for Large Reasoning Models
Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。
Towards a Unified View of Large Language Model Post-Training
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.
Official repo for 'Large Multimodal Models Evaluation: A Survey'
ForJadeForest / verl
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
A curated list of Computer Vision related conferences with dates and paper registration deadlines.
Official implementation for the paper "QVAE-Mole: The Quantum VAE with Spherical Latent Variable Learning for 3-D Molecule Generation" (NeurIPS 2024).
codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629
VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild
LiveBench: A Challenging, Contamination-Free LLM Benchmark
TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation.