Transformer Math Evaluation

Jonathan Drechsel, Anja Reusch, Steffen Herbold

Framework to evaluate mathematical aware Transformer models, first introduced by MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training.

Installation

Clone the repository:

git clone https://github.com/aieng-lab/transformer-math-evaluation
cd transformer-math-evaluation

2. Create a Conda Environment:

conda create --name math-eval python=3.10
conda activate math-eval
conda install pip
pip install -r requirements.txt

3. Data Generation

python src/util/generate_data.py

This script generates the data splits of the large pre-training datasets NMF and MFR. However, for NMF, a special split is also available on Hugging Face along with additional meta data (e.g., which MAMUT strategies have been used to generate a false example). This enriched data is used by default in the config files.

Usage

Everything can be controlled by executor.py.

python src/executor.py -model bert-base-cased -config config/all.json -data_dir data

-model: The model to be evaluated.
-config: The configuration file to be used.
-data_dir: The directory containing the data to be evaluated.

The following base configurations are available:

config/nmf.json: Named Math Formula (NMF) retrieval, i.e., an IR task with a name of a mathematical formula as query (e.g., "Binomial Formula") and the formula itself as document (e.g., $(\alpha + z)^2 = z^2 + \alpha^2 + 2\cdot \alpha \cdot z$).
config/nmf-split.json: NMF with a special train/val/test split such that an identity is only in one of the splits.
config/nmf-fp1.json: NMF using the same false examples for each epoch (the default NMF task changes false examples every epoch)
config/nmf-no-challenging: NMF using non-challenging false examples, i.e., random formulas of different mathematical identities.
config/mfr.json: Math Formula Retrieval (MFR), i.e., an IR task with formulas for both, query and document (e.g., query $n!=1\cdot \dots \cdot n$ and document $n!\coloneqq \prod_{k=1}^n k$).

Notice that there are more evaluations already available (not well documented) and more evaluation methods are planned for the future. For example, Evaluates a model on the Mathematical Structure Attention Score, aiming to find mathematical aware attention heads. Notice that this evaluations are not published as part of the MAMUT paper.

CITATION

If you use this evaluation framework, please cite the following paper:

@article{
  drechsel2025mamut,
  title={{MAMUT}: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training},
  author={Jonathan Drechsel and Anja Reusch and Steffen Herbold},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2025},
  url={https://openreview.net/forum?id=khODmRpQEx}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer Math Evaluation

Installation

2. Create a Conda Environment:

3. Data Generation

Usage

CITATION

About

Uh oh!

Releases

Packages

Languages

License

aieng-lab/transformer-math-evaluation

Folders and files

Latest commit

History

Repository files navigation

Transformer Math Evaluation

Installation

2. Create a Conda Environment:

3. Data Generation

Usage

CITATION

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages