Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis

Alcalde Zafra A, Fantuzzi G, Zuazua Iriondo E (2025)


Publication Language: English

Publication Status: Submitted

Publication Type: Journal article, Original article

Future Publication Type: Journal article

Publication year: 2025

Journal

Book Volume: 7

Journal Issue: 3

URI: https://arxiv.org/abs/2407.01602

DOI: 10.1137/24M167086X

Abstract

Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of points in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special points called leaders. We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model, which effectively captures “context” by clustering meaningless words around leader words carrying the most meaning. Finally, we outline remaining challenges to bridge the gap between the mathematical analysis of transformers and their real-life implementation.

Authors with CRIS profile

How to cite

APA:

Alcalde Zafra, A., Fantuzzi, G., & Zuazua Iriondo, E. (2025). Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis. SIAM Journal on Mathematics of Data Science, 7(3). https://doi.org/10.1137/24M167086X

MLA:

Alcalde Zafra, Albert, Giovanni Fantuzzi, and Enrique Zuazua Iriondo. "Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis." SIAM Journal on Mathematics of Data Science 7.3 (2025).

BibTeX: Download