Zephyr271828

Yufeng Xu Zephyr271828

56 followers · 131 following

NYU Shanghai
Shanghai/Suzhou/New York
14:43 (UTC -05:00)
https://zephyr271828.github.io/
in/yufeng-felix-xu

Achievements

Highlights

Organizations

Stars

RL

10 repositories

ryanxhr / DWBC

[ICML 2022] The official implementation of DWBC in "Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations"

Python 35 2 Updated Jan 5, 2023

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,145 2,271 Updated Nov 4, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,092 2,418 Updated Nov 4, 2025

banditml / offline-policy-evaluation

Implementations and examples of common offline policy evaluation methods in Python.

Python 224 25 Updated Feb 11, 2023

OpenHands / OpenHands

🙌 OpenHands: Code Less, Make More

Python 64,680 7,863 Updated Nov 4, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,311 807 Updated Oct 31, 2025

thinkwee / AgentsMeetRL

Awesome List for Agentic RL

HTML 531 16 Updated Oct 13, 2025

MiniMax-AI / SynLogic

[NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Python 177 19 Updated Jul 7, 2025

BytedTsinghua-SIA / Enigmata

Resources for the Enigmata Project.

Python 73 4 Updated Aug 13, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,361 241 Updated Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yufeng Xu Zephyr271828

Achievements

Achievements

Highlights

Organizations

Block or report Zephyr271828

RL

ryanxhr / DWBC

huggingface / trl

volcengine / verl

banditml / offline-policy-evaluation

OpenHands / OpenHands

OpenRLHF / OpenRLHF

thinkwee / AgentsMeetRL

MiniMax-AI / SynLogic

BytedTsinghua-SIA / Enigmata

THUDM / slime