Thanks to visit codestin.com
Credit goes to yoonholee.com

Photo of Yoonho Lee

I’m a final-year PhD student at Stanford CS, advised by Chelsea Finn. My research focuses on building systems that learn from text directly and continually.

Scalar rewards collapse feedback into a single number. A score gives a verdict, not a diagnosis. What went wrong, why, and what to change is often better expressed in text.

In many settings, we already have access to this kind of feedback: natural-language corrections, pairwise comparisons with explanations, stack traces, and reflections on what worked. Now that models can read text well enough to use it for decision-making, I believe we can do better than learning from scalar rewards alone. My research develops methods that leverage such structured feedback to enable models to continually improve.

The best way to learn more about the technical side of these ideas is to read my blog post or the selected papers below.

2025

preprint

Operationalizes the core text optimization loop, accumulating "why better" signals from pairwise comparisons across up to a thousand iterations.
2025

ICML 2025 workshops: AI for Math, PRAL, ES-FoMo

A hierarchical RL framework for training LLMs to discover and use textual abstractions for solving complex reasoning problems. Demonstrates that useful information for solving reasoning problems can be represented in pure text form.
2025

ICML 2025 Workshop PUT

Test-time alignment by reweighting ensemble members using a small set of labeled examples from the target distribution. Adaptation without retraining weights.
2024

UIST 2024, NeurIPS 2023 workshops XAIA and ICBINB

Built a natural language interface for humans to teach vision models using natural language corrections instead of manual labels. Demonstrates how natural language can provide higher-bandwidth feedback that communicates what went wrong.
2023

ICLR 2023

Learns from structured disagreement signals between diverse models; working at a higher level of abstraction than datapoints by "choosing the best model" among different functions that fit the training data.

My name (윤호) is pronounced like ‘you-know’ said quickly (with stress on ‘you’). This is a good approximation.

Feel free to reach out via email if you’d like to chat. I’m on the 2026-27 academic and industry job markets, and open to collaborations that align with my research interests.