Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression

Overview

This repository contains the implementation of TRAAC (Think Right with Adaptive, Attentive Compression), an online post-training RL method that leverages the model’s self-attention over a long reasoning trajectory to identify important steps and prune redundant ones.

Given a problem, the model first generates $N$ rollouts, and the pass rate of these rollouts is used to estimate the problem's difficulty (easy, medium, or hard). Next, the generated reasoning is fed back into the model, which is asked to compute the attention score of each reasoning token from </think>. During this attention-based compression step, we remove steps with lower scores. The degree of removal is determined by the estimated difficulty: easier problems undergo more aggressive compression. Finally, we compute the correctness and length rewards using the compressed reasoning trajectory, and these rewards are used to update the policy.

Installation

Please make sure that you have torch compiled with CUDA enabled. Repository developed with python 3.12.3, please ensure python envokes python 3.12.3. The codebase has been build on top of verl.

Create virtual environment and Install verl and other packages from requirements.txt:

python -m venv traac_venv
source traac_venv/bin/activate
pip install -e .[vllm]
pip install -r requirements.txt

Download Models

Download our trained Adaptive reasoning models directly from huggingface:

Model	Download Link
(TRAAC)DeepSeek-R1-Distill-Qwen-7B
(TRAAC) Qwen3-4B

Training Data & Evaluation

All training and evaluation scripts are located in the scripts/data folder.
Make sure you cd into this folder before running the commands:

cd scripts/data

Purpose	Script
Training Data Generation	`dapo-17k.py`
Evaluation – AIME, AMC, GPQA	`full_test_dataset.py`
Evaluation – Overthinking Benchmark	`overthink_test_dataset.py`
Evaluation – Underthinking Benchmark	`underthink_test_dataset.py`
Evaluation – BBEH Benchmark	`bbeh_test_dataset.py`

These will create appropriate parquet files in the scripts/data folder.

Run Evaluations

To run your custom evaluation on any of the above datasets use the bas script provided in scripts/eval folder. Change the data.val_files field inside verl configuraiton to the required dataset:

./eval_deepseek-qwen-with-summary-linear-reward-attention.sh
./eval_qwen3-4b-with-summary-linear-reward-attention.sh

Train Models

Training was conducted on 3 GPUs:

1 GPU was dedicated to hosting the policy model for calculating attention scores (attention-based compression).
2 GPUs were used to train the main model.

Step 1: Host Attention-Based Compression (Qwen3-4B)

The host model script is located at scripts/train:

CUDA_VISIBLE_DEVICES=0 python model_host_qwen.py

This will host the Qwen3-4B model at http://localhost:8008.

Step 2: Training TRAAC (Qwen3-4B)

source run_qwen3-4b-with-summary-linear-reward-attention.sh

The file vllm_rollout_spmd.py contains the implementation for adaptive, attentive summarization, which is used during training.

Citation

If you find this work useful, please consider citing us:

@misc{singh2025thinkrightlearningmitigate,
      title={Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression}, 
      author={Joykirat Singh and Justin Chih-Yao Chen and Archiki Prasad and Elias Stengel-Eskin and Akshay Nambi and Mohit Bansal},
      year={2025},
      eprint={2510.01581},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.01581}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Asset		Asset
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
README.md		README.md
evaluateRollouts.ipynb		evaluateRollouts.ipynb
evaluateUnderThink.ipynb		evaluateUnderThink.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression

Table of Contents

Overview

Installation

Download Models

Training Data & Evaluation

Run Evaluations

Train Models

Step 1: Host Attention-Based Compression (Qwen3-4B)

Step 2: Training TRAAC (Qwen3-4B)

Citation

About

Uh oh!

Releases

Packages

Languages

joykirat18/TRAAC

Folders and files

Latest commit

History

Repository files navigation

Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression

Table of Contents

Overview

Installation

Download Models

Training Data & Evaluation

Run Evaluations

Train Models

Step 1: Host Attention-Based Compression (Qwen3-4B)

Step 2: Training TRAAC (Qwen3-4B)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages