" 将变革的注意力推向单声道 " (On Biasing Transformer Attention Towards Monotonicity)

from arxiv, To be published in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021)

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

翻译：在自然语言处理中,许多顺序到顺序的任务在源与目标序列之间的一致性方面大致是单一的,以往的工作通过专门关注功能或培训前,促进或强制学习单一关注行为。在这项工作中,我们引入了与标准关注机制兼容的单一度损失功能,并在若干顺序到顺序的任务上测试:图形式对电话转换、形态变化、转性、方言正常化。实验显示,我们基本上可以实现单一行为。性能混杂在一起,在RNN基线上方有较大的收益。一般的单一度并不有利于变压器多头关注,然而,当只有一组头偏向单一行为时,我们看到孤立的改进。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

65+阅读 · 2021年5月23日

【斯坦福CS224N硬核课】Transformers模型详解，50页ppt

专知会员服务

58+阅读 · 2021年2月16日

最新《Transformers模型》教程，64页ppt

专知会员服务

274+阅读 · 2020年11月26日

[CVPR 2020 Oral-牛津] RandLA-Net:大场景三维点云语义分割新框架

专知会员服务

24+阅读 · 2020年3月15日