News • Links • Getting Started • Evaluation • Citation • Acknowledgement
- [2025/05/06] 🎉 We released SkyRL-v0: the first open-source online RL training framework for multi-turn tool use LLMs, optimized for long-horizon, real-environment tasks like SWE-Bench!
This repository contains training code for the SkyRL-v0 release. Our implementation is a fork of VeRL.
The only pre-requisite is having uv installed on your system. We use the uv + ray integration to easily manage dependencies in multi-node training.
We use SkyRL-OpenHands to be able to connect to our remote runtime server. Clone the repository and place it in the git root:
git clone https://github.com/NovaSky-AI/SkyRL-OpenHandsYou can dry run your installation with the following command:
uv run --isolated --frozen pip show torchNOTE: With a CPU head node, you might encounter installation issues with torch-memory-saver. To fix this, you need to install CUDA and make sure your CUDA libraries are linked in /usr/lib. For example,
sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so /usr/lib/libcuda.so
sudo ln -s /usr/local/cuda-12.4/compat/libcuda.so.1 /usr/lib/libcuda.so.1For reproducing our results for SkyRL-Agent-14B-v0, SkyRL-Agent-8B-v0, and SkyRL-Agent-7B-v0 you can refer to examples/sky.
| Model | Base | Base Performance | Performance | Training Time |
|---|---|---|---|---|
| SkyRL-Agent-7B-v0 | OpenHands-7B-Agent | 11% | 14.6% | 16hrs 8xH100 |
| SkyRL-Agent-8B-v0 | Qwen3-8B no thinking | 3.6% | 9.4% | 27hrs 8xH200 |
| SkyRL-Agent-14B-v0 | Qwen3-14B thinking | 18% | 21.6% | 20hrs 8xH200 |
This work is done at Berkeley Sky Computing Lab, with the amazing compute support from Lambda Labs, Anyscale, and Databricks.