Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yongliang-wu's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report yongliang-wu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Discriminative Constrained Optimization for Reinforcing Large Reasoning Models

Python 43 2 Updated Oct 28, 2025

A simple yet powerful agent framework that delivers with open-source models

Python 3,687 359 Updated Oct 29, 2025

MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models

Python 74 2 Updated Oct 21, 2025

Contexts Optical Compression

Python 18,392 1,213 Updated Oct 25, 2025

Visual Planning: Let's Think Only with Images

Python 280 9 Updated May 20, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 545 45 Updated Oct 21, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 51 1 Updated Oct 8, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,133 54 Updated Aug 27, 2025

Fully Open Framework for Democratized Multimodal Training

Python 589 40 Updated Oct 21, 2025

Geometric-Mean Policy Optimization

Python 88 8 Updated Oct 18, 2025

Official Repository of "Learning to Reason under Off-Policy Guidance"

Python 356 42 Updated Oct 4, 2025

Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"

Python 44 4 Updated Sep 8, 2025

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

163 11 Updated Sep 22, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

1,925 108 Updated Oct 29, 2025

Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。

Python 33,862 2,416 Updated Oct 10, 2025

Towards a Unified View of Large Language Model Post-Training

Python 170 8 Updated Sep 8, 2025

The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.

TypeScript 5,051 437 Updated Sep 24, 2025

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook 342 50 Updated Sep 29, 2025

PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.

Python 27 1 Updated Sep 9, 2025

Official repo for 'Large Multimodal Models Evaluation: A Survey'

86 4 Updated Oct 20, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 1 Updated Sep 5, 2025

A curated list of Computer Vision related conferences with dates and paper registration deadlines.

42 5 Updated Sep 22, 2025
Python 562 15 Updated Oct 20, 2025

Official implementation for the paper "QVAE-Mole: The Quantum VAE with Spherical Latent Variable Learning for 3-D Molecule Generation" (NeurIPS 2024).

Python 8 2 Updated Jun 4, 2025

codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)

Python 657 64 Updated Oct 3, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 3,905 294 Updated Oct 24, 2025

Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629

Python 16 1 Updated Oct 14, 2025

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

Python 3 Updated Jan 28, 2021

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Python 905 83 Updated Oct 16, 2025

TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation.

Python 802 45 Updated Oct 28, 2025
Next