Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Logging Support for RL Training#22

Open
cangozpi wants to merge 2 commits into
ToolBrain:mainfrom
cangozpi:fun/logging
Open

Logging Support for RL Training#22
cangozpi wants to merge 2 commits into
ToolBrain:mainfrom
cangozpi:fun/logging

Conversation

@cangozpi

Copy link
Copy Markdown

Logging Support for RL Training

This pull request introduces logging functionality to provide deeper insights into Reinforcement Learning (RL) training.

Features

  • Implements a general, easily extensible logger that can be subclassed to support different backends (e.g., Weights & Biases).
  • Currently supports TensorBoard logger (TB_Logger).
  • Fully backward compatible — older code will continue to work, but logging can now be optionally enabled.
  • Comes with example script: examples/14_hello_world_logging.py, which demonstrates logging on top of what examples/01_run_hello_world.py already provides.

Logged Metrics

The logger tracks important RL metrics, including:

  • avg_sliding_window
  • advantage
  • clip_frac
  • entropy
  • KL
  • grad_norm
  • loss
  • policy_log_prob
  • policy_ratio
  • Maximum token length fed in during a batch
  • Maximum number of unmasked completion tokens during a training batch

Visual Demonstration

The images below showcase the logged metrics. Running examples/14_hello_world_logging.py does not break training, as evidenced by the reward/avg_sliding_window metric converging to 1.0.

image image image image image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant