Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, one-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1-2 orders of magnitude.
翻译:在神经机器翻译中,噪音频道模型特别有效。然而,最近的一些方法,如“Beam搜索和重置”(BSR),在推理过程中引起了大量的计算间接费用,使得现实世界应用无法实现。我们的目标是研究能否建立一个摊销的噪音频道NMT模型,这样当我们在推理过程中进行贪婪的解码时,BSR的翻译准确性与奖励(基于源对目标日志概率和目标对源日志概率)和质量(基于BLEU和BLEURT)的翻译准确性相匹配。我们试图用三种方法来培训新模型:知识蒸馏、一步步的模拟学习和Q学习。第一种方法从假体获得噪音频道信号,后两种方法的目标是直接优化对噪音频道MT的奖励。对于所有三种方法来说,所产生的翻译都无法取得与BSR(BLEU)和BLEURT(BLEURT)相近于BSR(BSR)级订单质量的翻译质量。