Codestin Search App

Democratizing Agentic Reinforcement Learning as a Service

Project Page · DeepWiki · Slack · Wechat

🚀 Quick Start

Choose an example below to get started. Each example includes step-by-step instructions for setup, training, and inference.

Task	Description	Performance
LLM Single-Turn Math	Mathematical problem solving	wandb
LLM Multi-Turn Math	Multi-turn mathematical problem solving with tool calling	wandb
LLM Single-LoRA Single-Turn Math	Math single-turn Trained With LoRA	wandb
VLM Single-Turn Math	geometry 3k math problem solving	wandb
VLM Multi-Turn Math	geometry 3k math problem solving with tool calling	wandb
LLM Gomoku Agent	A multi-turn gomoku agent	wandb
LLM AlfWorld Agent	A multi-turn alfworld agent	TDA

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinker

Install OpenTinker

pip install -e .

Install verl (core package)

cd verl
pip install -e .
cd ..

💻 Client Setup

After completing the Common Setup, no additional steps are needed.

Note The client currently relies on a small subset of functions from verl. This dependency is transitional. In future releases, the client will be fully decoupled from verl, allowing it to remain completely lightweight and independent of training-related code.

🧠 Server Setup

In addition to the Common Setup, it must install verl dependencies.

You can choose one of the following two approaches.

Option 1: Docker Installation (Recommended)

# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

# Create and run container
docker run -dit \
  --gpus all \
  --restart=no \
  --entrypoint /bin/bash \
  --net=host \
  --shm-size=10g \
  --cap-add=SYS_ADMIN \
  -v .:/workspace/dev \
  --name tinker \
  verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

Option 2: Manual Installation

you can install verl dependencies manually. After completing the Common Setup, run:

cd verl
pip install -r requirements.txt
cd ..

This installs all GPU and training-related dependencies required by the server.

⚠️ Warning Manual installation may introduce version conflicts. For better stability and reproducibility, we recommend using the Docker-based setup whenever possible.

🔐 Authentication

OpenTinker includes a built-in authentication system to secure access to the scheduler API.

Configuration

Edit opentinker/scheduler/config/scheduler.yaml:

enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"

Quick Registration

Run the interactive script to register a user and get an API key:

python opentinker/scheduler/register_user_example.py

For advanced usage (REST API registration, using the key) and detailed configuration, see the Scheduler & Dashboard Guide.

🎮 Environments

OpenTinker provides a flexible environment design framework that supports diverse training scenarios. Our architecture accommodates two orthogonal dimensions:

Data Source: Data-Dependent environments load structured datasets (e.g., parquet files) to provide prompts, while Data-Free environments generate prompts dynamically from simulators or game engines.
Interaction Mode: Single-Turn environments involve one-shot model responses, while Multi-Turn environments enable iterative interactions with tool calls and feedback loops.

This 2×2 design space enables four distinct paradigms, each suited to different learning objectives:

Paradigm	Data Source	Interaction	Example Use Case
Data-Dependent × Single-Turn	Dataset	One-shot	Math reasoning, QA tasks
Data-Dependent × Multi-Turn	Dataset	Iterative	Tool-assisted problem solving
Data-Free × Single-Turn	Simulator	One-shot	Bandit
Data-Free × Multi-Turn	Simulator	Iterative	Complex game playing, dialogue agents

📚 Documentation

Scheduler & Dashboard Guide - Configuration, Usage, and Web Dashboard

📖 Citation

@misc{opentinker2025,
  title        = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
  author       = {Siqi Zhu and Jiaxuan You},
  year         = {2025},
  howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
assets		assets
data		data
docs		docs
opentinker		opentinker
verl @ 4bf4bd3		verl @ 4bf4bd3
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
scheduler_users.db		scheduler_users.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Quick Start

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

Install OpenTinker

Install verl (core package)

💻 Client Setup

🧠 Server Setup

Option 1: Docker Installation (Recommended)

Option 2: Manual Installation

🔐 Authentication

Configuration

Quick Registration

🎮 Environments

📚 Documentation

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

open-tinker/OpenTinker

Folders and files

Latest commit

History

Repository files navigation

🚀 Quick Start

📦 Installation

🔹 Common Setup (Client and Server)

Clone the Repository

Install OpenTinker

Install verl (core package)

💻 Client Setup

🧠 Server Setup

Option 1: Docker Installation (Recommended)

Option 2: Manual Installation

🔐 Authentication

Configuration

Quick Registration

🎮 Environments

📚 Documentation

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages