Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to build an amortized noisy channel NMT model such that greedily decoding from it would generate translations that maximize the same reward as translations generated using BSR. We attempt three approaches: knowledge distillation, 1-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. All three approaches speed up inference by 1-2 orders of magnitude. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU is similar to the quality of BSR-produced translations.
翻译:在神经机翻译中,噪音频道模型特别有效。然而,最近的一些方法,如“波音搜索和重新排序”(BSR)在推理过程中引起了大量的计算间接费用,使得现实世界应用无法实现。我们的目标是建立一个摊销的噪音频道NMT模型,这样贪婪地解码出它就能产生与使用 BSR 生成的翻译同样的最大回报的翻译。我们尝试了三种方法:知识蒸馏、一步步递仿真学习和Q学习。第一种方法从一个伪体获取噪音频道信号,而后两种方法旨在直接优化对噪音频道MT的奖励。所有三种方法都加速了1-2级的推论。对于所有三种方法,产生的翻译都无法取得与BSR相似的回报,但BLEU所估计的翻译质量与BSR制作的翻译质量相似。