Exact Sequence Classification with Hardmax Transformers

Alcalde Zafra A, Fantuzzi G, Zuazua Iriondo E (2025)


Publication Language: English

Publication Status: Submitted

Publication Type: Unpublished / Preprint

Future Publication Type: Conference contribution

Publication year: 2025

URI: https://arxiv.org/abs/2502.02270

Abstract

We prove that hardmax attention transformers perfectly classify datasets of $N$ labeled sequences in $\mathbb{R}^d$, $d\geq 2$. Specifically, given $N$ sequences with an arbitrary but finite length in $\mathbb{R}^d$, we construct a transformer with $\mathcal{O}(N)$ blocks and $\mathcal{O}(Nd)$ parameters perfectly classifying this dataset. Our construction achieves the best complexity estimate to date, independent of the length of the sequences, by innovatively alternating feed-forward and self-attention layers and by capitalizing on the clustering effect inherent to the latter. Our novel constructive method also uses low-rank parameter matrices within the attention mechanism, a common practice in real-life transformer implementations. Consequently, our analysis holds twofold significance: it substantially advances the mathematical theory of transformers and it rigorously justifies their exceptional real-world performance in sequence classification tasks.

Authors with CRIS profile

How to cite

APA:

Alcalde Zafra, A., Fantuzzi, G., & Zuazua Iriondo, E. (2025). Exact Sequence Classification with Hardmax Transformers. (Unpublished, Submitted).

MLA:

Alcalde Zafra, Albert, Giovanni Fantuzzi, and Enrique Zuazua Iriondo. Exact Sequence Classification with Hardmax Transformers. Unpublished, Submitted. 2025.

BibTeX: Download