Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.
翻译:有条件的隐形语言模式(CMLM)培训在机器翻译等非侵略性和半侵略性序列生成任务方面证明是成功的。 但是,鉴于经过培训的 CMLMM, 不清楚什么是最佳推理策略。 我们将隐形推论作为部分序列有条件概率的系数,表明这并不损害性能,并调查一些由这一角度驱动的简单超自然学。 我们确定了一个比标准“ 显性” 算法优越的临界战略, 并分析了机器翻译任务的行为 。