Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.
翻译:在自然语言处理中,许多顺序到顺序的任务在源与目标序列之间的一致性方面大致是单一的,以往的工作通过专门关注功能或培训前,促进或强制学习单一关注行为。在这项工作中,我们引入了与标准关注机制兼容的单一度损失功能,并在若干顺序到顺序的任务上测试:图形式对电话转换、形态变化、转性、方言正常化。实验显示,我们基本上可以实现单一行为。性能混杂在一起,在RNN基线上方有较大的收益。一般的单一度并不有利于变压器多头关注,然而,当只有一组头偏向单一行为时,我们看到孤立的改进。