Optimal inference of a generalised Potts model by single-layer transformers with factored attention (Optimal inference of a generalised Potts model by single-layer transformers with factored attention)

Transformers are the type of neural networks that has revolutionised natural language processing and protein science. Their key building block is a mechanism called self-attention which is trained to predict missing words in sentences. Despite the practical success of transformers in applications it remains unclear what self-attention learns from data, and how. Here, we give a precise analytical and numerical characterisation of transformers trained on data drawn from a generalised Potts model with interactions between sites and Potts colours. While an off-the-shelf transformer requires several layers to learn this distribution, we show analytically that a single layer of self-attention with a small modification can learn the Potts model exactly in the limit of infinite sampling. We show that this modified self-attention, that we call ``factored'', has the same functional form as the conditional probability of a Potts spin given the other spins, compute its generalisation error using the replica method from statistical physics, and derive an exact mapping to pseudo-likelihood methods for solving the inverse Ising and Potts problem.

翻译：通过带因数注意力的单层变压器对广义Potts模型进行最优推断变压器是一种神经网络，已经彻底改变了自然语言处理和蛋白质科学的领域。它们的关键组件是被称为自注意力的机制，其被训练来预测句子中缺失的单词。尽管变压器在应用中获得成功，但自注意力从数据中学到了什么以及如何学习仍不清楚。在这篇文章中，我们对从具有位点和Potts颜色相互作用的广义Potts模型中提取的数据进行了精确的分析和数值描述。虽然一个非专业的变压器需要多个层才能学习这个分布，但我们证明了对于无限采样极限，带有小修改的自注意力的单层可以完美地学习Potts模型。我们表明，这个修改过的自注意力，我们称之为“因数”，具有与Potts自旋给出其他自旋的条件概率相同的函数形式，利用统计物理中的复制方法计算其广义误差，并导出解决反向Ising和Potts问题的伪似然方法的精确映射。

相关内容

自注意力

关注 13

利用注意力机制来“动态”地生成不同连接的权重，这就是自注意力模型（Self-Attention Model）. 注意力机制模仿了生物观察行为的内部过程，即一种将内部经验和外部感觉对齐从而增加部分区域的观察精细度的机制。注意力机制可以快速提取稀疏数据的重要特征，因而被广泛用于自然语言处理任务，特别是机器翻译。而自注意力机制是注意力机制的改进，其减少了对外部信息的依赖，更擅长捕捉数据或特征的内部相关性

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

【ICML2021】使用Transformers编码的计算感知神经架构

专知会员服务

17+阅读 · 2021年9月15日

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日