Codestin Search App

Democratizing Agentic Reinforcement Learning as a Service

Project Page · W&B · DeepWiki · Slack · Wechat

🌟 Key Features

Separation of Programming and Execution
- Users can perform RL training and inference without local GPU resources.
- Built-in Distributed Training and Job Scheduling manage resources transparently.
Separation of Environment and Training Code
- Simplifies the design of various agentic task environments.
- Includes support for any single-turn and multi-turn agentic tasks.
Seamless Transition from Training to Inference
- Environments and agentic workflows can be seamlessly connected to inference, allowing trained models to be directly applied.

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinker

Install OpenTinker

pip install -e .

Install verl (core package)

cd verl
pip install -e .
cd ..

💻 Client Setup

After completing the Common Setup, no additional steps are needed.

Note
The client currently relies on a small subset of functions from verl. This dependency is transitional. In future releases, the client will be fully decoupled from verl, allowing it to remain completely lightweight and independent of training-related code.

🧠 Server Setup

In addition to the Common Setup, it must install verl dependencies.

You can choose one of the following two approaches.

Option 1: Docker Installation (Recommended)

# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

# Create and run container
docker run -dit \
  --gpus all \
  --restart=no \
  --entrypoint /bin/bash \
  --net=host \
  --shm-size=10g \
  --cap-add=SYS_ADMIN \
  -v .:/workspace/dev \
  --name tinker \
  verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

Option 2: Manual Installation

you can install verl dependencies manually. After completing the Common Setup, run:

cd verl
pip install -r requirements.txt
cd ..

This installs all GPU and training-related dependencies required by the server.

⚠️ Warning
Manual installation may introduce version conflicts. For better stability and reproducibility, we recommend using the Docker-based setup whenever possible.

🚀 Quick Start

1. Get Your IP Address

hostname -I

2. Start the Scheduler (Server Side)

bash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port>

3. Start the Environment Server (Client Side)

For Math Environment:

# single turn
python opentinker/environment/math/math_server.py --port <env_port>

# multi turn tool call
python opentinker/environment/math/math_tool_server.py --port <env_port>

For Gomoku Environment:

python opentinker/environment/gomoku/gomoku_server.py --port <env_port>

4. Run Training/Inference (Client Side)

Math RL:

generate data:

python opentinker/data_preprocess/math_multiturn_w_interaction.py \
    --local_save_dir=<local_save_dir>

# single turn
python opentinker/client/math_rl.py \
    tokenizer_path=Qwen/Qwen2.5-1.5B \
    batch_size=16 \
    val_batch_size=64 \
    num_epochs=5 \
    save_freq=1000 \
    test_freq=5 \
    data_path=data/math_agentloop/train.parquet \
    val_data_path=data/math_agentloop/test.parquet \
    scheduler_url=http://<server_endpoint>:<scheduler_port> \
    interaction.config.env_port=<env_port> \
    interaction.config.env_host=<client_endpoint>

# multi turn tool ca
python opentinker/client/math_tool_rl.py \
    tokenizer_path=Qwen/Qwen2.5-1.5B \
    batch_size=16 \
    val_batch_size=64 \
    num_epochs=5 \
    save_freq=1000 \
    test_freq=5 \
    scheduler_url=http://<server_endpoint>:<scheduler_port> \
    interaction.config.env_port=<env_port> \
    interaction.config.env_host=<client_endpoint>

Gomoku RL (Multi-turn):

python opentinker/client/gomoku_rl.py \
    tokenizer_path=Qwen/Qwen2.5-3B-Instruct \
    batch_size=16 \
    val_batch_size=32 \
    num_epochs=5 \
    save_freq=1000 \
    test_freq=5 \
    scheduler_url=http://<server_endpoint>:<scheduler_port> \
    interaction.config.env_port=<env_port> \
    interaction.config.env_host=<client_endpoint>

Math Inference:

# single turn
python opentinker/client/math_inference.py \
    model_path=<model_name> \
    data_path=data/math/test.parquet \
    output_path=./tmp/results.jsonl \
    max_samples=5 \
    env_endpoint=http://<client_endpoint>:<env_port> \
    scheduler_url=http://<server_endpoint>:<scheduler_port>

# multi turn tool call
python opentinker/client/math_tool_inference.py \
    model_path=<model_name> \
    data_path=data/math/test.parquet \
    output_path=./tmp/results.jsonl \
    max_samples=5 \
    env_endpoint=http://<client_endpoint>:<env_port> \
    scheduler_url=http://<server_endpoint>:<scheduler_port>

Gomoku Inference:

python opentinker/client/gomoku_inference.py \
    model_path=<model_name> \
    output_path=./tmp/results.jsonl \
    max_samples=5 \
    env_endpoint=http://<client_endpoint>:<env_port> \
    scheduler_url=http://<server_endpoint>:<scheduler_port>

🔐 Authentication

OpenTinker includes a built-in authentication system to secure access to the scheduler API.

Configuration

Edit opentinker/scheduler/config/scheduler.yaml:

enable_auth: true   # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"

Quick Registration

Run the interactive script to register a user and get an API key:

python opentinker/scheduler/register_user_example.py

For advanced usage (REST API registration, using the key) and detailed configuration, see the Scheduler & Dashboard Guide.

🎮 Environments

Math Environment (Data-driven)

Single-turn math reasoning environment where the model solves mathematical problems. It serves as a key example of a data-driven environment, loading data from parquet files.

We also support multi-turn tool call mode (Code Interpreter), where the model can iteratively generate and execute Python code to solve math problems. This enables more complex reasoning through code execution feedback.

Component	Description
Server (Single-turn)	`opentinker/environment/math/math_server.py`
Server (Multi-turn Tool Call)	`opentinker/environment/math/code_interpreter_math_server.py`
Client (Single-turn)	`opentinker/client/math_client_unified.py`
Client (Multi-turn Tool Call)	`opentinker/client/math_code_interpreter_client.py`
Data	Parquet files with math problems
Reward	Correctness of mathematical solutions

Gomoku Environment (Data-free)

Multi-turn game environment where the model plays Gomoku against an opponent. It serves as a key example of a data-free environment, where the model gets prompts directly from the simulated environment.

Component	Description
Server	`opentinker/environment/gomoku/gomoku_server.py`
Data	Generated from simulated games
Reward	Win/loss/draw outcomes

📚 Documentation

Scheduler & Dashboard Guide - Configuration, Usage, and Web Dashboard

📖 Citation

@misc{opentinker2025,
  title        = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
  author       = {Siqi Zhu and Jiaxuan You},
  year         = {2025},
  howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
data		data
docs		docs
opentinker		opentinker
verl @ 4bf4bd3		verl @ 4bf4bd3
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
scheduler_users.db		scheduler_users.db
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌟 Key Features

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

Install OpenTinker

Install verl (core package)

💻 Client Setup

🧠 Server Setup

Option 1: Docker Installation (Recommended)

Option 2: Manual Installation

🚀 Quick Start

1. Get Your IP Address

2. Start the Scheduler (Server Side)

3. Start the Environment Server (Client Side)

4. Run Training/Inference (Client Side)

🔐 Authentication

Configuration

Quick Registration

🎮 Environments

Math Environment (Data-driven)

Gomoku Environment (Data-free)

📚 Documentation

📖 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

open-tinker/OpenTinker

Folders and files

Latest commit

History

Repository files navigation

🌟 Key Features

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

Install OpenTinker

Install verl (core package)

💻 Client Setup

🧠 Server Setup

Option 1: Docker Installation (Recommended)

Option 2: Manual Installation

🚀 Quick Start

1. Get Your IP Address

2. Start the Scheduler (Server Side)

3. Start the Environment Server (Client Side)

4. Run Training/Inference (Client Side)

🔐 Authentication

Configuration

Quick Registration

🎮 Environments

Math Environment (Data-driven)

Gomoku Environment (Data-free)

📚 Documentation

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages