Thanks to visit codestin.com
Credit goes to github.com

Skip to content

joykirat18/TRAAC

Repository files navigation

Paper License: MIT

Joykirat Singh | Justin Chih-Yao Chen | Archiki Prasad | Elias Stengel-Eskin | Akshay Nambi | Mohit Bansal

Table of Contents

Overview

This repository contains the implementation of TRAAC (Think Right with Adaptive, Attentive Compression), an online post-training RL method that leverages the model’s self-attention over a long reasoning trajectory to identify important steps and prune redundant ones.

Overview of TRAAC Given a problem, the model first generates $N$ rollouts, and the pass rate of these rollouts is used to estimate the problem's difficulty (easy, medium, or hard). Next, the generated reasoning is fed back into the model, which is asked to compute the attention score of each reasoning token from </think>. During this attention-based compression step, we remove steps with lower scores. The degree of removal is determined by the estimated difficulty: easier problems undergo more aggressive compression. Finally, we compute the correctness and length rewards using the compressed reasoning trajectory, and these rewards are used to update the policy.

Installation

Please make sure that you have torch compiled with CUDA enabled. Repository developed with python 3.12.3, please ensure python envokes python 3.12.3. The codebase has been build on top of verl.

Create virtual environment and Install verl and other packages from requirements.txt:

python -m venv traac_venv
source traac_venv/bin/activate
pip install -e .[vllm]
pip install -r requirements.txt

Download Models

Download our trained Adaptive reasoning models directly from huggingface:

Model Download Link
(TRAAC)DeepSeek-R1-Distill-Qwen-7B Hugging Face
(TRAAC) Qwen3-4B Hugging Face

Training Data & Evaluation

All training and evaluation scripts are located in the scripts/data folder.
Make sure you cd into this folder before running the commands:

cd scripts/data
Purpose Script
Training Data Generation dapo-17k.py
Evaluation – AIME, AMC, GPQA full_test_dataset.py
Evaluation – Overthinking Benchmark overthink_test_dataset.py
Evaluation – Underthinking Benchmark underthink_test_dataset.py
Evaluation – BBEH Benchmark bbeh_test_dataset.py

These will create appropriate parquet files in the scripts/data folder.

Run Evaluations

To run your custom evaluation on any of the above datasets use the bas script provided in scripts/eval folder. Change the data.val_files field inside verl configuraiton to the required dataset:

./eval_deepseek-qwen-with-summary-linear-reward-attention.sh
./eval_qwen3-4b-with-summary-linear-reward-attention.sh

Train Models

Training was conducted on 3 GPUs:

  • 1 GPU was dedicated to hosting the policy model for calculating attention scores (attention-based compression).
  • 2 GPUs were used to train the main model.

Step 1: Host Attention-Based Compression (Qwen3-4B)

The host model script is located at scripts/train:

CUDA_VISIBLE_DEVICES=0 python model_host_qwen.py

This will host the Qwen3-4B model at http://localhost:8008.

Step 2: Training TRAAC (Qwen3-4B)

source run_qwen3-4b-with-summary-linear-reward-attention.sh

The file vllm_rollout_spmd.py contains the implementation for adaptive, attentive summarization, which is used during training.

Citation

If you find this work useful, please consider citing us:

@misc{singh2025thinkrightlearningmitigate,
      title={Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression}, 
      author={Joykirat Singh and Justin Chih-Yao Chen and Archiki Prasad and Elias Stengel-Eskin and Akshay Nambi and Mohit Bansal},
      year={2025},
      eprint={2510.01581},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.01581}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published