Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.
翻译:经过培训的语文模式在广泛的语言理解任务中取得了成功。在本文件中,我们考察了各种战略,将培训前的表述纳入序列模型的顺序,并将其应用于神经机翻译和抽象总结。我们发现,如果将培训前的表述添加到编码器网络中,则最为有效,因为编码器网络只减缓14%的推断速度。我们在机器翻译方面的实验显示,在模拟资源贫乏的装置中取得了高达5.3 BLEU的收益。在用更多的标签数据来减少回报的同时,我们仍看到当有数百万个句子时,情况有所改善。最后,关于抽象总结,我们在CNN/DailyMail的全文版本上实现了新的艺术状态。