Conditional neural text generation models generate high-quality outputs, but often concentrate around a mode when what we really want is a diverse set of options. We present a search algorithm to construct lattices encoding a massive number of generation options. First, we restructure decoding as a best-first search, which explores the space differently than beam search and improves efficiency by avoiding pruning paths. Second, we revisit the idea of hypothesis recombination: we can identify pairs of similar generation candidates during search and merge them as an approximation. On both summarization and machine translation, we show that our algorithm encodes thousands of diverse options that remain grammatical and high-quality into one lattice. This algorithm provides a foundation for building downstream generation applications on top of massive-scale diverse outputs.
翻译:有条件神经文本生成模型产生高质量的产出,但往往集中在我们真正想要的多种选择模式上。 我们展示了一种搜索算法来构建拉丁拼图,将大量一代选项编码。 首先, 我们重组解码为最佳第一搜索, 探索空间不同于光束搜索, 并通过避免修剪路径来提高效率。 其次, 我们重新审视假设重组的理念: 我们可以在搜索中找到相似的相近一代候选者, 并把它们合并为近似值。 在汇总和机器翻译上, 我们展示了我们的算法将数千种仍然具有语法性和高质量的各种选项编码成一个拉蒂。 这个算法为在大规模多样化产出之上建立下游一代应用提供了基础。