We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropriate surrogate for the test error: by only maximizing the ground truth probability, it fails to exploit the wealth of information offered by structured losses. Further, it introduces discrepancies between training and predicting (such as exposure bias) that may hurt test performance. Instead, SEARNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error. We first demonstrate improved performance over MLE on two different tasks: OCR and spelling correction. Then, we propose a subsampling strategy to enable SEARNN to scale to large vocabulary sizes. This allows us to validate the benefits of our approach on a machine translation task.
翻译:我们建议SEARNN, 这是一种基于“学习搜索”(L2S)系统化预测方法的经常性神经网络的新的培训算法。 RNN在机械翻译或解析等结构化预测应用中取得了广泛成功,并且通常通过最大可能性估计(MLE)来接受培训。 不幸的是,这种培训损失并不总是测试错误的适当替代方法:它仅仅最大限度地扩大地面真实概率,它未能利用结构性损失所带来的大量信息。此外,它也引入了培训和预测(例如暴露偏差)之间可能损害测试性能的差异。相反,SEARNN利用类似测试的空间探索来引入更接近测试错误的全球-本地损失。我们首先在两个不同任务(OCR和拼写校正)上展示了比MLE更好的表现。 然后,我们提出一个子抽样战略,使SEARNN能够将规模缩小到大词汇大小。这使我们能够验证我们在机器翻译任务上的做法的好处。