hubstrauss

Hubert Strauss hubstrauss

Pinned Loading

princeton-pli/what-makes-good-rm princeton-pli/what-makes-good-rm Public

[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective

Python 43 4
princeton-pli/imperfect-rewards princeton-pli/imperfect-rewards Public

Code for reproducing experiments from the paper "When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient"

Python 2