-
Separation of Programming and Execution
- Users can perform RL training and inference without local GPU resources.
- Built-in Distributed Training and Job Scheduling manage resources transparently.
-
Separation of Environment and Training Code
- Simplifies the design of various agentic task environments.
- Includes support for any single-turn and multi-turn agentic tasks.
-
Seamless Transition from Training to Inference
- Environments and agentic workflows can be seamlessly connected to inference, allowing trained models to be directly applied.
- Python 3.8+
- CUDA-compatible GPU(s)
- Docker (recommended)
# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
# Create and run container
docker run -dit \
--gpus all \
--restart=no \
--entrypoint /bin/bash \
--net=host \
--shm-size=10g \
--cap-add=SYS_ADMIN \
-v .:/workspace/dev \
--name tinker \
verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d# Clone with submodules
git clone --recurse-submodules [email protected]:open-tinker/OpenTinker.git
cd OpenTinker
# Install OpenTinker
pip install -e .
# Install verl dependency
cd verl
pip install -e .
cd ..hostname -Ibash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port>For Math Environment:
# single turn
python opentinker/environment/math/math_server.py --port <env_port>
# multi turn tool call
python opentinker/environment/math/code_interpreter_math_server.py --port <env_port>For Gomoku Environment:
python opentinker/environment/gomoku/gomoku_server.py --port <env_port>Math RL:
generate data:
python opentinker/data_preprocess/math_multiturn_w_interaction.py \
--local_save_dir=<local_save_dir># single turn
python opentinker/client/math_client_unified.py \
tokenizer_path=Qwen/Qwen2.5-1.5B \
batch_size=16 \
val_batch_size=64 \
num_epochs=5 \
save_freq=1000 \
test_freq=5 \
data_path=data/math_agentloop/train.parquet \
val_data_path=data/math_agentloop/test.parquet \
scheduler_url=http://<server_endpoint>:<scheduler_port> \
interaction.config.env_port=<env_port> \
interaction.config.env_host=<client_endpoint>
# multi turn tool ca
python opentinker/client/math_code_interpreter_client.py \
tokenizer_path=Qwen/Qwen2.5-1.5B \
batch_size=16 \
val_batch_size=64 \
num_epochs=5 \
save_freq=1000 \
test_freq=5 \
scheduler_url=http://<server_endpoint>:<scheduler_port> \
interaction.config.env_port=<env_port> \
interaction.config.env_host=<client_endpoint>Gomoku RL (Multi-turn):
python opentinker/client/gomoku_client.py \
tokenizer_path=Qwen/Qwen2.5-3B-Instruct \
batch_size=16 \
val_batch_size=32 \
num_epochs=5 \
save_freq=1000 \
test_freq=5 \
scheduler_url=http://<server_endpoint>:<scheduler_port> \
interaction.config.env_port=<env_port> \
interaction.config.env_host=<client_endpoint>Math Inference:
# single turn
python opentinker/client/math_inference_with_scheduler.py \
model_path=<model_name> \
data_path=data/math/test.parquet \
output_path=./tmp/results.jsonl \
max_samples=5 \
env_endpoint=http://<client_endpoint>:<env_port> \
scheduler_url=http://<server_endpoint>:<scheduler_port>
# multi turn tool call
python opentinker/client/math_code_interpreter_inference.py \
model_path=<model_name> \
data_path=data/math/test.parquet \
output_path=./tmp/results.jsonl \
max_samples=5 \
env_endpoint=http://<client_endpoint>:<env_port> \
scheduler_url=http://<server_endpoint>:<scheduler_port>Gomoku Inference:
python opentinker/client/gomoku_inference_with_scheduler.py \
model_path=<model_name> \
output_path=./tmp/results.jsonl \
max_samples=5 \
env_endpoint=http://<client_endpoint>:<env_port> \
scheduler_url=http://<server_endpoint>:<scheduler_port>OpenTinker includes a built-in authentication system to secure access to the scheduler API.
Edit opentinker/scheduler/config/scheduler.yaml:
enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"Run the interactive script to register a user and get an API key:
python opentinker/scheduler/register_user_example.pyFor advanced usage (REST API registration, using the key) and detailed configuration, see the Scheduler & Dashboard Guide.
Single-turn math reasoning environment where the model solves mathematical problems. It serves as a key example of a data-driven environment, loading data from parquet files.
We also support multi-turn tool call mode (Code Interpreter), where the model can iteratively generate and execute Python code to solve math problems. This enables more complex reasoning through code execution feedback.
| Component | Description |
|---|---|
| Server (Single-turn) | opentinker/environment/math/math_server.py |
| Server (Multi-turn Tool Call) | opentinker/environment/math/code_interpreter_math_server.py |
| Client (Single-turn) | opentinker/client/math_client_unified.py |
| Client (Multi-turn Tool Call) | opentinker/client/math_code_interpreter_client.py |
| Data | Parquet files with math problems |
| Reward | Correctness of mathematical solutions |
Multi-turn game environment where the model plays Gomoku against an opponent. It serves as a key example of a data-free environment, where the model gets prompts directly from the simulated environment.
| Component | Description |
|---|---|
| Server | opentinker/environment/gomoku/gomoku_server.py |
| Data | Generated from simulated games |
| Reward | Win/loss/draw outcomes |
- Scheduler & Dashboard Guide - Configuration, Usage, and Web Dashboard
@misc{opentinker2025,
title = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
author = {Siqi Zhu and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
note = {GitHub repository}
}