Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ ARES Public

🌴 ARES is an open-source framework for adaptive multimodal reasoning, featuring a two-stage pipeline—Adaptive Cold-Start and Entropy-Shaped Policy Optimization—to balance reasoning depth and efficiency.

License

Notifications You must be signed in to change notification settings

shawn0728/ARES

Repository files navigation

Revisual Icon ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Paper alphaXiv Github Hugging Face Collection

Awesome License: MIT

📖 Introduction

This paper proposes ARES, a novel open-source framework for adaptive multimodal reasoning, aiming to dynamically allocate the model’s reasoning effort based on the difficulty of the input problem. The authors observe a key imbalance in existing multimodal reasoning models: on easy tasks they tend to overthink (producing redundantly long inference traces), whereas on hard tasks they under-explore (missing solutions due to insufficient search). To correct this, ARES introduces a mechanism based on High Window-Entropy (HWE) tokens (i.e. token-level entropies averaged over a sliding window) to detect moments of reasoning uncertainty, and flexibly adapt the exploration intensity.

ARES is trained with a two-stage pipeline:

  1. Adaptive Cold-Start Stage: construct multimodal and textual reasoning examples with trace lengths scaled to task difficulty, so the model learns a notion of difficulty awareness.

  2. Adaptive Entropy Policy Optimization (AEPO): use HWE tokens as triggers to decide when to explore further, combined with a hierarchical entropy reward and dynamic KL control to decide how much to explore.

Empirical results show that ARES achieves better tradeoffs between reasoning efficiency and accuracy, outperforming baselines across multimodal, mathematical, and logical benchmarks — while incurring lower inference costs, and narrowing the gap to commercial systems.ARES

This work highlights that adaptively modulating the exploration behavior at token-level (rather than a fixed strategy) is essential for balancing reasoning depth and computational cost under varying task difficulties.

🧮 Results

Results_performance

Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical,and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.

Bar


🍭 Pipeline

Overall training pipeline of our method ARES. Stage 1 (Adaptive Coldstart Fine-Tuning): difficulty-aware selective data curation and adaptive KL-guided fine-tuning establish a strong initialization across text and multimodal inputs. Stage 2 (Adaptive Entropy Policy Optimization, AEPO): online difficulty bucketing and entropy-aware rollout allocate reasoning depth dynamically, with high-entropy windows serving as branching points for exploration. Together, the two stages enable uncertainty-aware, difficulty-adaptive reasoning for large language models.

Flow chart


✨ Models

Model Huggingface Base Model
ARES-Coldstart https://huggingface.co/datasets/ares0728/ARES-Adaptive-Coldstart Qwen2.5-VL-7B-Instruct
ARES-RL https://huggingface.co/ares0728/ARES-RL-7B Qwen2.5-VL-7B-Instruct

🔮 Datasets

ARES results

The dataset construction of ARES revolves around a core concept:

"The reasoning length of the model should match the difficulty of the task."

To this end, ARES does not directly use the common hybrid multimodal corpus. Instead, it constructs a difficult-aware reasoning corpus, which is specifically used to teach the model to distinguish between "easy questions" and "difficult questions" and to use different reasoning lengths and exploration intensities during the cold start stage.

Datasets Huggingface Size of the data volume
ARES-hard-validation https://huggingface.co/datasets/ares0728/ARES-hard-validation 2.46K
ARES-Adaptive-SFT https://huggingface.co/datasets/ares0728/ARES-Adaptive-Coldstart 223k

The training corpus of ARES-Adaptive-223K comprises two components:

  • Textual reasoning data — drawn from high-quality, reasoning-intensive datasets used to develop symbolic reasoning and reflection capabilities.
  • Multimodal reasoning data — collected from visual mathematics, logical reasoning, and chart-understanding datasets to enhance cross-modal reasoning consistency.

To ensure coherence across sources, all reasoning traces undergo chain-of-thought (CoT) normalization, standardizing them into a unified “think → derive → conclude” format.

We further use Gemini 2.5-Pro with a pass@3 evaluation to filter out samples that the model fails on in all three attempts across various visual benchmarks, resulting in a curated hard-validation set containing 2.46 k challenging examples.


🛠️ Installation

conda create -n aepo python=3.11 -y
conda activate aepo
pip install -r requirements.txt

🚀 Training

Staged RL: AEPO

# example script to prepare rewards / launch AEPO
bash ./experiments/AEPO/train.sh

Key ideas.

  • HWE trigger: branch only in sustained-uncertainty regions.
  • Difficulty-aware shaping: suppress over-exploration on easy, encourage deeper exploration on hard, stabilize around a batch target on medium.
  • Dynamic KL: token-wise KL budget that relaxes inside validated HWE windows.

🧩 Checkpoint Merge (HF format)

python scripts/model_merger.py \
  --local_dir ./checkpoints/${ProjectName}/exp_name/global_step_1/actor

🖥️ Inference

Run the command below.

MODEL_PATH="ARES"
MAX_TOKENS=16384
DO_SAMPLE=True
TEMPERATURE=1.0
TOP_P=0.95
TOP_K=50
NUM_RETURN_SEQUENCES=1


prompt = "You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
question="xxx"


python infer.py \
 --model_path ${MODEL_PATH} \
 --image_path ${IMAGE_PATH} \
 --question ${question} \
 --prompt ${prompt} \
 --max_tokens ${MAX_TOKENS} \
 --do_sample ${DO_SAMPLE} \
 --temperature ${TEMPERATURE} \
 --top_p ${TOP_P} \
 --top_k ${TOP_K} \
 --num_return_sequences ${NUM_RETURN_SEQUENCES} 

You can also modify the arguments in inference/inference.sh

bash inference/inference.sh

📊 Highlights

  • ARES-3B: +8.4 average over prior open 3B models across core multimodal benchmarks.
  • ARES-7B: +9.7 average over strong 7B open baselines; large gains on MathVision and DynaMath-W.
  • Efficiency: Shorter responses on easy/medium tasks; deeper but targeted exploration on hard tasks.

ARES_RL_Performance


🙌 Acknowledgements

We thank the open-source community for tools, datasets, and prior work on reasoning-oriented pretraining and RL that inspired this project.

🚧 TODO

We are preparing to complete these tasks over the next few weeks, please stay tuned!

  • 🚧 We are in the process of training for 3B ARES (Coldstart&RL) and will release them in a few days.
  • 🚧 We are also in the process of developing and open-sourcing a multimodal model with performance comparable to leading commercial systems. Stay tuned!

📮 Contact

For questions, feedback, or collaboration opportunities, feel free to reach out: [email protected]

📄Citation

If you find our works useful for your research, please consider citing:

@article{chen2025ares,
  title={ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping},
  author={Chen, Shuang and Guo, Yue and Ye, Yimeng and Huang, Shijue and Hu, Wenbo and Li, Haoxi and Zhang, Manyuan and Chen, Jiayu and Guo, Song and Peng, Nanyun},
  journal={arXiv preprint arXiv:2510.08457},
  year={2025}
}

About

🌴 ARES is an open-source framework for adaptive multimodal reasoning, featuring a two-stage pipeline—Adaptive Cold-Start and Entropy-Shaped Policy Optimization—to balance reasoning depth and efficiency.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages