Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis

Alcalde Zafra A, Fantuzzi G, Zuazua Iriondo E (2024)


Publication Language: English

Publication Status: Submitted

Publication Type: Unpublished / Preprint

Future Publication Type: Journal article

Publication year: 2024

URI: https://arxiv.org/abs/2407.01602

DOI: 10.48550/arXiv.2407.01602

Abstract

Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of points in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special points called leaders. We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model, which effectively captures `context' by clustering meaningless words around leader words carrying the most meaning. Finally, we outline remaining challenges to bridge the gap between the mathematical analysis of transformers and their real-life implementation.

Authors with CRIS profile

How to cite

APA:

Alcalde Zafra, A., Fantuzzi, G., & Zuazua Iriondo, E. (2024). Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis. (Unpublished, Submitted).

MLA:

Alcalde Zafra, Albert, Giovanni Fantuzzi, and Enrique Zuazua Iriondo. Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis. Unpublished, Submitted. 2024.

BibTeX: Download