Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@wizeng23
Copy link
Contributor

@wizeng23 wizeng23 commented Jul 8, 2025

Description

  • Add configs/examples/misc/slurm_ray_init.sh, which sets up a Ray cluster on Slurm nodes. Reference: https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html.
  • Add example config configs/examples/misc/grpo_verl_gsm8k_slurm_job.yaml for running verl on Slurm, using an Oumi job config
  • Pin verl version to 0.4.0. We can't use 0.4.1 yet since it added new config fields we don't have yet.
  • Move metadata printing to a new line so tables stay aligned.
  • Misc fixes

Related issues

Towards OPE-1338

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

@wizeng23 wizeng23 requested review from oelachqar and taenin July 8, 2025 23:27
@@ -0,0 +1,5 @@
working_dir: ./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file needed? How would a user change the conda environment etc ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was recommended in the verl guide. They could update the value in this file, but given the pattern we use in our job configs (copying our working directory to the slurm cluster and installing that in the oumi conda env), hardcoding it makes sense IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How about the environment "oumi" ?
  • I think this would not work with pip installed oumi right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, how would this work for users who are pip installing oumi from pypi?

# - Run step 1 of verl quickstart: https://verl.readthedocs.io/en/latest/start/quickstart.html
#
# Usage:
# oumi launch up -c configs/examples/misc/grpo_verl_gsm8k_slurm_job.yaml --cluster $OUMI_SLURM_CONNECTIONS --user wizeng
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# oumi launch up -c configs/examples/misc/grpo_verl_gsm8k_slurm_job.yaml --cluster $OUMI_SLURM_CONNECTIONS --user wizeng
# oumi launch up -c configs/examples/misc/grpo_verl_gsm8k_slurm_job.yaml --cluster $OUMI_SLURM_CONNECTIONS --user $USER

@@ -0,0 +1,5 @@
working_dir: ./
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, how would this work for users who are pip installing oumi from pypi?

@wizeng23 wizeng23 merged commit 963a0ab into main Jul 10, 2025
5 checks passed
@wizeng23 wizeng23 deleted the wizeng/o1338-multinode-verl branch July 10, 2025 18:05
penfever pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants