Natural language generation technology has recently seen remarkable progress with large-scale training, and many natural language applications are now built upon a wide range of generation models. Combining diverse models may lead to further progress, but conventional ensembling (e.g., shallow fusion) requires that they share vocabulary/tokenization schemes. We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared. Our extensive evaluations on machine translation and scientific paper summarization demonstrate that Twist decoding substantially outperforms each model decoded in isolation over various scenarios, including cases where domain-specific and general-purpose models are both available. Twist decoding also consistently outperforms the popular reranking heuristic where output candidates from one model is rescored by another. We hope that our work will encourage researchers and practitioners to examine generation models collectively, not just independently, and to seek out models with complementary strengths to the currently available models.
翻译:最近,自然语言生成技术在大规模培训方面取得了显著的进步,许多自然语言应用现在都建立在一系列广泛的代际模式之上。 将多种模式结合起来可能会带来进一步的进展,但常规组合(例如浅质聚合)要求它们共享词汇/感化方法。 我们引入了Twist解码法,这是一种简单和一般的推论算法,既生成文本,又从多种模式中获益。我们的方法并不包含词汇、代号甚至代代号顺序。 我们对机器翻译和科学纸质总结的广泛评价表明,Twist解码大大超越了在各种情景下独立解码的每一种模型,包括有特定领域和通用模型的模型。 Twist解码法也一贯地超越了流行的重新排位超能力,因为一个模型的产出候选人在另一个模型中被重新标注。 我们希望我们的工作将鼓励研究人员和从业人员集体地、而不仅仅是独立地研究代号模型,并寻找与现有模型互补的模型。