Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gbruno16/transformers-metastability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Emergence of meta-stable clustering in mean-field transformer models

Giuseppe Bruno1 ,  Federico Pasqualotto2 ,  Andrea Agazzi1  
1Department of Mathematics and Statistics, University of Bern   
2Department of Mathematics, University of California, San Diego   

ICLR 2025 Oral

[Paper]      [Code]

Abstract

We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in Geshkovski et al. (2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.

Citing

@article{bruno2024emergence,
  title={Emergence of meta-stable clustering in mean-field transformer models},
  author={Bruno, Giuseppe and Pasqualotto, Federico and Agazzi, Andrea},
  journal={arXiv preprint arXiv:2410.23228},
  year={2024}
}

About

Code for the paper "Emergence of meta-stable clustering in mean-field transformer models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published