There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup. We also report new state of the art results on both IWSLT'14 German-English translation as well as Gigaword abstractive summarization. On the larger WMT'14 English-French translation task, sequence-level training achieves 41.5 BLEU which is on par with the state of the art.
翻译:最近利用强化学习式方法或优化光束,在序列层对神经注意模型进行了大量培训工作。在本文中,我们调查了广泛用于培训线性模型用于结构化预测的一系列古典目标功能,并将其应用于神经序列模型。我们的实验表明,这些损失可以通过在类似设置中略优于光束搜索优化来达到惊人的效果。我们还报告了IWSLT'14 德文-英文翻译和Gigaword 抽象总结的新艺术成果。在更大的WMT'14 英法翻译任务中,序列级培训达到了41.5 BLEU,这与艺术状态相当。