The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens. Machine translation, however, is intrinsically uncertain: the same source sentence can have multiple semantically equivalent translations. Therefore, we propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively. We call our loss function Single-label Contrastive Objective for Non-Exclusive Sequences (SCONES). We show that the multi-label output layer can still be trained on single reference training data using the SCONES loss function. SCONES yields consistent BLEU score gains across six translation directions, particularly for medium-resource language pairs and small beam sizes. By using smaller beam sizes we can speed up inference by a factor of 3.9x and still match or improve the BLEU score obtained using softmax. Furthermore, we demonstrate that SCONES can be used to train NMT models that assign the highest probability to adequate translations, thus mitigating the "beam search curse". Additional experiments on synthetic language pairs with varying levels of uncertainty suggest that the improvements from SCONES can be attributed to better handling of ambiguity.
翻译:神经机翻译中的软模层旨在模拟相互排斥的符号的分布。 但是,机器翻译本质上是不确定的:同一个源句可以具有多个等同的语义翻译。 因此,我们建议用一个能更有效地模拟模糊度的多标签分类层来取代软模的激活。 我们称我们的损失函数为“无排斥序列的单一标签对比目标 ” ( SCONES) 。 我们显示,多标签输出层仍然可以通过使用SCONES损失函数来进行单一参考培训数据的培训。 SCONES在六个翻译方向上取得了一致的 BLEU 得分增益, 特别是中等资源语言对口和小梁尺寸。 因此,我们建议用一个较小的光束尺寸来取代软体积的多标签分类层。 我们称我们的损失函数“ 单一标签非排斥序列的对比目标 ” ( SCONS) 。 我们证明, 多标签输出层可以用来培训给充分翻译分配最大可能性的NMT模型, 从而减轻“ Beam搜索诅咒 ” 。 。 在合成语言配对具有不同程度的不确定性的附加实验表明,, 可以通过更好地处理模糊性来改进SCO 。