基于神经机器翻译代号步骤的表列抽样 (Scheduled Sampling Based on Decoding Steps for Neural Machine Translation)

from arxiv, To appear in EMNLP-2021 main conference, code is at https://github.com/Adaxry/ss_on_decoding_steps. arXiv admin note: text overlap with arXiv:2107.10427

Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus bridging the gap between training and inference. However, vanilla scheduled sampling is merely based on training steps and equally treats all decoding steps. Namely, it simulates an inference scene with uniform error rates, which disobeys the real inference scene, where larger decoding steps usually have higher error rates due to error accumulations. To alleviate the above discrepancy, we propose scheduled sampling methods based on decoding steps, increasing the selection chance of predicted tokens with the growth of decoding steps. Consequently, we can more realistically simulate the inference scene during training, thus better bridging the gap between training and inference. Moreover, we investigate scheduled sampling based on both training steps and decoding steps for further improvements. Experimentally, our approaches significantly outperform the Transformer baseline and vanilla scheduled sampling on three large-scale WMT tasks. Additionally, our approaches also generalize well to the text summarization task on two popular benchmarks.

翻译：排程抽样广泛用于减轻神经机翻译的暴露偏差问题。其核心动机是模拟培训期间的推断场景,用预测的符号取代地面真实符号,从而缩小培训与推断之间的差距;然而,香草定序取样仅仅基于培训步骤,同等处理所有解码步骤。也就是说,它模拟了统一的误差率的推断场景,这与真实的推断场景不相符,在真实的推断场景中,较大的解码步骤通常因误差累积而导致的误差率较高。为了缩小上述差异,我们提议了基于解码步骤的定序取样方法,增加了预测符号的选择机会,并增加了解码步骤的增长。因此,我们可以更现实地模拟培训期间的推断场景,从而更好地弥合培训和推断之间的差距。此外,我们根据培训步骤和为进一步改进的解码步骤对预定的取样进行了调查。实验性地,我们的方法大大超出了三个大规模WMT任务的变换基线和香草定序定序抽样。此外,我们的方法还全面概括了两个大众基准的文本总和任务。