Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al. 2021). However, hard attention is a strong assumption, which may complicate the relevance of these results in practice. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We first show that saturated transformers transcend the known limitations of hard-attention transformers. We then prove saturated transformers with floating-point values can be simulated by constant-depth threshold circuits, giving the class $\mathsf{TC}^0$ as an upper bound on the formal languages they recognize.
翻译:最新工作显示,关注力强的变压器在力量上相当有限(Hahn,2020年),因为可以通过恒定深度和/或电路模拟变压器(Hao等人,2021年)。然而,关注力是一个强烈的假设,这可能会使这些结果的实际相关性复杂化。在这项工作中,我们分析了变压器的电路复杂性,并给予饱和的注意:集中关注力,更密切地捕捉到在实用变压器中可以学习的注意模式。我们首先显示饱和变压器超越了已知的硬意识变压器的局限性。然后我们证明,具有浮动点值的饱和变压器可以通过恒定深度电路模拟,将美元和mathsf{TC ⁇ 0的等级作为它们所认识的正式语言的上限。