In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate that our approach can consistently improve translation performance against strong baseline Transformer.
翻译:在机器学习领域,假定经过良好培训的模式能够恢复培训标签,即模型预测的合成标签应尽可能接近地面真实标签,因此,我们提出自导课程战略,鼓励学习神经机翻译模型,以遵循上述恢复标准,将每个培训实例的恢复程度作为学习困难。具体地说,我们采用BLEU等级评分作为恢复学位的代名词。与现有课程相比,我们所选择的学习困难更适合衡量NMT模型的知识掌握程度。关于翻译基准的实验,包括WMT14 English$\Liightrowral$和WMT17 WMT17 中文\Riightrowral$英语,表明我们的方法可以不断改进与强大的基线变换器的翻译绩效。