Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model. Furthermore, the difference between Transformer classifiers analyzed theoretically in this paper and Transformer classifiers used nowadays in practice are illustrated by considering classification problems in natural language processing.
翻译:考虑基于高维预测器的范式识别。 分类器的定义是以变换编码器为基础的。 分析分类器分类误差概率与最佳分类误差概率的趋同率。 显示该分类器能够绕过维度的诅咒, 只要异质概率符合合适的等级构成模型。 此外, 本文中从理论上分析的变换分类器与当前实践中使用的变换分类器的区别, 可以通过考虑自然语言处理中的分类问题来说明。