-
NYU Shanghai
- Shanghai/Suzhou/New York
-
14:43
(UTC -05:00) - https://zephyr271828.github.io/
- in/yufeng-felix-xu
Highlights
- Pro
RL
[ICML 2022] The official implementation of DWBC in "Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations"
Train transformer language models with reinforcement learning.
verl: Volcano Engine Reinforcement Learning for LLMs
Implementations and examples of common offline policy evaluation methods in Python.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
[NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
slime is an LLM post-training framework for RL Scaling.