π€ MathBook-Standard ο½ π Webpage ο½ π€ MathBook-Pro ο½
β¨ Stay tuned β we will continue to share updates as our work on multimodal reasoning progresses!
- We-Math 2.0 (Preprint 2025) - We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
- We-Math (ACL 2025) - We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
[2025.08.15] π We-Math 2.0 homepage is live at we-math2.github.io. π
[2025.08.15] π We-Math 2.0 paper is now available on arXiv. π
[2025.08.15] π¦ We-Math 2.0 dataset is now available on Hugging Face Datasets. π
[2025.05.16] π We-Math is accepted by ACL 2025 π
[2025.02.20] π We-Math is officially supported by VLMEvalKit for fast evalution π.
[2024.07.02] We-Math is accessible at https://arxiv.org/abs/2407.01284.
[2024.07.02] The We-Math dataset is accessible at Huggingface Datasets.
[2024.07.02] The We-Math homepage can be accessed at https://we-math.github.io/.
- Overview (We-Math 2.0)
- Quick Start
- Cold-Start SFT Stage
- Progressive Alignment RL
We-Math 2.0 is a unified system designed to comprehensively enhance the mathematical reasoning capabilities of Multimodal Large Language Models (MLLMs).
It integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to achieve both broad conceptual coverage and robust reasoning performance across varying difficulty levels.
The key contributions of We-Math 2.0 are fourfold:
- MathBook Knowledge System β A five-level hierarchical structure encompassing 491 knowledge points and 1,819 fundamental principles.
- MathBook-Standard & MathBook-Pro β MathBook-Standard ensures wide conceptual coverage and flexibility via dual expansion, while MathBook-Pro defines a three-dimensional difficulty space and generates 7 progressive variants per problem for robust training.
- MathBook-RL β A two-stage RL framework comprising Cold-Start Fine-tuning for knowledge-oriented chain-of-thought alignment, and Progressive Alignment RL with average-reward learning and dynamic data scheduling for gradual alignment across difficulty levels.
- MathBookEval β A comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions.
Extensive experiments show that MathBook-RL consistently outperforms existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, demonstrating superior generalization in mathematical reasoning.
The MathBook Knowledge System is organized as a five-level hierarchy covering 491 knowledge points and 1,819 fundamental principles.
It is systematically derived from trusted sources such as Wikipedia and open-source textbooks, refined through hierarchical clustering, and further revised by human experts to ensure accuracy and completeness.
You can visit our project website to explore the complete knowledge system.
Building on the MathBook Knowledge System, MathBook-Standard is a dataset featuring comprehensive principle-level knowledge annotations and carefully curated problems to ensure broad, balanced coverage across mathematical domains, with particular focus on underrepresented areas.
To foster deeper conceptual understanding, MathBook-Standard employs a dual-expansion strategy:
- multi-images per question
- multi-questions per image
This enables the creation of diverse problem sets that promote conceptual flexibility and adaptability.
Below, we present an example of the multi-images-per-question component of the dataset, which can be retrieved via its underlying knowledge principles.
You can view the full collection on our project website.
Building on the MathBook Knowledge System, MathBook-Pro introduces a pivotal three-dimensional difficulty modeling framework that systematically characterizes mathematical problem complexity from a model-centric perspective.
Each seed problem is positioned within a structured difficulty space defined by three orthogonal axes:
- Step Complexity β Reasoning depth is quantified by the number of knowledge points involved. More complex variants incorporate additional intermediate conclusions, with the most advanced cases involving at least six knowledge points drawn from the MathBook Knowledge System.
- Visual Complexity β Additional elements such as auxiliary lines or altered geometric configurations are introduced via GeoGebra, while preserving the original core structure.
- Contextual Complexity β Concise mathematical statements are rephrased into richer real-world contexts or linguistically abstract scenarios, increasing the semantic and interpretive demands of the problem statement.
By varying a single dimension at a time and progressively composing transformations across multiple dimensions, each seed problem is expanded into seven progressive difficulty levels.
This enables structured, gradual learning for MLLMs and creates a robust foundation for enhancing reasoning performance across varying levels of complexity.
Below, we showcase the multi-level difficulty component of MathBook-Pro, illustrating its progressive design across the three complexity dimensions.
You can visit our project website to see the use of MathBook-Pro in the Dynamic Scheduling RL strategy.
Cold-Start Fine-tuning.
Supervised fine-tuning on MathBook-Standard (covering all 491 knowledge points), instilling awareness of the knowledge system and guiding knowledge-driven chain-of-thought reasoning.
Progressive Alignment RL.
A curriculum-based RL procedure with two phases:
Pre-aligned RL.
Using MathBook-Standard, where each group contains multiple variants of the same knowledge principle.
A mean-based reward is computed over variants sharing the same knowledge principle, encouraging reasoning consistency and robustness based on knowledge mastery rather than individual instances.
Dynamic Scheduling RL.
Using MathBook-Pro, each base problem
where
If the model fails at
-
Knowledge Increment Scheduling: when failure is due to added knowledge in
$\phi_s$ , sample auxiliary problems$x'_0$ from MathBook-Standard targeting the new knowledge point(s). -
Modality Increment Scheduling: when failure stems from added modality complexity (
$\phi_v$ or$\phi_c$ ), guide the model through single-modality incremental problems that isolate the visual or contextual component.
In this step, we will describe how to perform a cold start for the SFT stage using the ms-swift repository. Please first set up the environment for ms-swift.
pip install ms-swift -UOur SFT dataset consists of two parts: 200 pure text samples and 800 samples with associated images. Download the SFT dataset from π€MathBook-SFT and refer to the script below for fine-tuning. (We found that in some versions, you may need to change the mathbook_sft.jsonl paths to absolute paths.)
nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node \
MASTER_PORT=29500 \
MAX_PIXELS=4194304 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
--model_type qwen2_5_vl \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--num_train_epochs 1 \
--train_type full \
--deepspeed zero2 \
--tuner_backend peft \
--torch_dtype bfloat16 \
--weight_decay 0.1 \
--warmup_ratio 0.03 \
--eval_steps 1000 \
--attn_impl flash_attn \
--output_dir checkpoint \
--dataset mathbook_sft.jsonl \
--per_device_train_batch_size 1you can install our additional environment as follow:
pip install -r requirements.txtBoth RL stages are developed based on the EasyR1 codebase to fit our workflow.
For data preparation, you can directly download Parquet-format datasets from π€MathBook-Standard and π€MathBook-Pro for training.
cd pre_align
python3 -m verl.trainer.main \
config=pre_align_r1v.yamlcd dynamic_scheduling
python3 -m verl.trainer.main \
config=dynamic_scheduling_r1v.yamlpython scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actorOur dataset are distributed under the CC BY-NC 4.0 license.
If you find We-Math 2.0 useful for your your research and applications, please kindly cite using this BibTeX:
@article{qiao2025we,
title={We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning},
author={Qiao, Runqi and Tan, Qiuna and Yang, Peiqing and Wang, Yanzi and Wang, Xiaowan and Wan, Enhui and Zhou, Sitong and Dong, Guanting and Zeng, Yuchen and Xu, Yida and others},
journal={arXiv preprint arXiv:2508.10433},
year={2025}
}For any questions or feedback, please reach out to us at [email protected] or [email protected].