Tau-Retail End-to-End RL Experiment

This repository presents a reinforcement learning (RL) recipe designed to enhance end-to-end agentic capabilities, inspired by the perspective in The Second Half that realistic progress now hinges on defining problems & evaluations.
We provide a full τ-bench retail (τ-retail) environment to train and evaluate agents on policy-following, strategic clarification, and multi-hop tool use in realistic, rule-bound workflows.

Despite the controlled setup, initial results highlight that it remains challenging for an RL-based agent to:

Strategically query the user for missing or clarifying information
Perform effective multi-hop tool calls to achieve the intended outcome

Key takeaway: RL-tuned model improves the Tau Retail score 0.478 → 0.496 (+0.018 abs., +3.8% rel.) in the non-thinking setting.

Note: Reported rewards correspond to pass^1.
The training dataset contains 500 samples, and the test dataset contains 115 samples.

Main Results

`τ-retail` (full tools)

Strategy	Pass^1
TC (claude-3-5-sonnet-20241022)	TBD
TC (gpt-4o)	TBD
Baselines	TBD

*TC = tool-calling strategy (as described in the τ-bench paper)

Quick Start

1. Installation

Refer to the VERL installation guide for detailed setup instructions.

2. Dataset Preprocessing

python -m examples.data_preprocess.tau_retail.preprocess_tau_retail_dataset

You should see output similar to:

train dataset len : 500
test dataset len  : 115

3. Training

Hardware requirement: At least H100 × 8 GPUs are recommended to reproduce the results.

export OPENAI_API_KEY=<YOUR-API-KEY>
nohup bash examples/sglang_multiturn/run_tau_retail_multiturn.sh > train.log 2>&1 &

Citation

If you use this repository in your research or work, please cite it as follows:

@misc{tau_retail_rl,
  title        = {Tau-Retail End-to-End RL Experiment},
  author       = {Seungyoun, Shin},
  year         = {2025},
  url          = {https://github.com/SeungyounShin/tau-retail-rl}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tau-Retail End-to-End RL Experiment

Main Results

`τ-retail` (full tools)

Quick Start

1. Installation

2. Dataset Preprocessing

3. Training

Citation

About

Uh oh!

Releases

Packages

Languages

License

SeungyounShin/tau-retail-rl

Folders and files

Latest commit

History

Repository files navigation

Tau-Retail End-to-End RL Experiment

Main Results

τ-retail (full tools)

Quick Start

1. Installation

2. Dataset Preprocessing

3. Training

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`τ-retail` (full tools)

Packages