The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors. We propose a method to guide the attention heads towards roles identified in prior work as important. We do this by defining role-specific masks to constrain the heads to attend to specific parts of the input, such that different heads are designed to play different roles. Experiments on text classification and machine translation using 7 different datasets show that our method outperforms competitive attention-based, CNN, and RNN baselines.
翻译:在学习有意义的文字语义表达方式方面,最先进的是变换模式及其关注机制。简而言之,关注机制学会关注投入的具体部分,避免重现和变迁。虽然已经发现一些有学识的负责人发挥语言解释作用,但他们可能是多余的或容易出错的。我们提出了一个方法来引导人们关注先前工作中确定的重要角色。我们这样做的方式是界定特定角色的面罩,以限制负责人关注投入的具体部分,例如设计不同的负责人以发挥不同的作用。使用7个不同的数据集进行的文本分类和机器翻译实验表明,我们的方法优于竞争性关注基准、CNN和RNN。