Giuseppe Bruno1 ,
Federico Pasqualotto2 ,
Andrea Agazzi1
1Department of Mathematics and Statistics, University of Bern
2Department of Mathematics, University of California, San Diego
NeurIPS 2025 Oral
In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number N of tokens is large and the inverse temperature parameter β of the model scales together with N. In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.
@article{bruno2025multiscale,
title={A multiscale analysis of mean-field transformers in the moderate interaction regime},
author={Bruno, Giuseppe and Pasqualotto, Federico and Agazzi, Andrea},
journal={arXiv preprint arXiv:2509.25040},
year={2025}
}

