Codestin Search App

cangozpi · 2026-05-11T09:27:59Z

Logging Support for RL Training

This pull request introduces logging functionality to provide deeper insights into Reinforcement Learning (RL) training.

Features

Implements a general, easily extensible logger that can be subclassed to support different backends (e.g., Weights & Biases).
Currently supports TensorBoard logger (TB_Logger).
Fully backward compatible — older code will continue to work, but logging can now be optionally enabled.
Comes with example script: examples/14_hello_world_logging.py, which demonstrates logging on top of what examples/01_run_hello_world.py already provides.

Logged Metrics

The logger tracks important RL metrics, including:

avg_sliding_window
advantage
clip_frac
entropy
KL
grad_norm
loss
policy_log_prob
policy_ratio
Maximum token length fed in during a batch
Maximum number of unmasked completion tokens during a training batch

Visual Demonstration

The images below showcase the logged metrics. Running examples/14_hello_world_logging.py does not break training, as evidenced by the reward/avg_sliding_window metric converging to 1.0.