On the Rate of Convergence of a Classifier Based on a Transformer Encoder

Gurevych I, Kohler M, Sahin GG (2022)


Publication Type: Journal article

Publication year: 2022

Journal

Book Volume: 68

Pages Range: 8139-8155

Journal Issue: 12

DOI: 10.1109/TIT.2022.3191747

Abstract

Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the a posteriori probability satisfies a suitable hierarchical composition model. Furthermore, the difference between the Transformer classifiers theoretically analyzed in this paper and the ones used in practice today is illustrated by means of classification problems in natural language processing.

Involved external institutions

How to cite

APA:

Gurevych, I., Kohler, M., & Sahin, G.G. (2022). On the Rate of Convergence of a Classifier Based on a Transformer Encoder. IEEE Transactions on Information Theory, 68(12), 8139-8155. https://doi.org/10.1109/TIT.2022.3191747

MLA:

Gurevych, Iryna, Michael Kohler, and Gozde Gul Sahin. "On the Rate of Convergence of a Classifier Based on a Transformer Encoder." IEEE Transactions on Information Theory 68.12 (2022): 8139-8155.

BibTeX: Download