多负责人使用角色制面具的多负责人自我注意 (Multi-Head Self-Attention with Role-Guided Masks)

The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors. We propose a method to guide the attention heads towards roles identified in prior work as important. We do this by defining role-specific masks to constrain the heads to attend to specific parts of the input, such that different heads are designed to play different roles. Experiments on text classification and machine translation using 7 different datasets show that our method outperforms competitive attention-based, CNN, and RNN baselines.

翻译：在学习有意义的文字语义表达方式方面,最先进的是变换模式及其关注机制。简而言之,关注机制学会关注投入的具体部分,避免重现和变迁。虽然已经发现一些有学识的负责人发挥语言解释作用,但他们可能是多余的或容易出错的。我们提出了一个方法来引导人们关注先前工作中确定的重要角色。我们这样做的方式是界定特定角色的面罩,以限制负责人关注投入的具体部分,例如设计不同的负责人以发挥不同的作用。使用7个不同的数据集进行的文本分类和机器翻译实验表明,我们的方法优于竞争性关注基准、CNN和RNN。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日