Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search since the problem of searching the full output space is often intractable, or impractical in many settings. The default algorithm for this job is beam search -- a pruned version of breadth-first search. Quite surprisingly, beam search often returns better results than exact inference due to beneficial search bias for NLP tasks. In this work, we show that the standard implementation of beam search can be made up to 10x faster in practice. Our method assumes that the scoring function is monotonic in the sequence length, which allows us to safely prune hypotheses that cannot be in the final set of hypotheses early on. We devise effective monotonic approximations to popular nonmonontic scoring functions, including length normalization and mutual information decoding. Lastly, we propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance, but runs in a fraction of the time.
翻译:解码许多 NLP 任务时, 需要一种有效的超值算法, 以接近精确搜索, 因为搜索全部输出空间的问题往往难以解决, 或者在许多设置中不切实际。 这项工作的默认算法是波束搜索 -- -- 宽度第一搜索的原始版本。 相当令人惊讶的是, 光束搜索的结果往往比对 NLP 任务进行有益的搜索偏差而得出的精确推理结果要好。 在这项工作中, 我们显示, 光束搜索的标准实施速度在实际操作中可以达到10x。 我们的方法假设, 分数函数在序列长度中是单数, 从而使我们能够安全地提取无法在早期最后一组假设中出现的假设。 我们设计出有效的单调近似非单调评分函数, 包括长度正常化和相互解码。 最后, 我们建议采用最佳第一 Beam 搜索 的记忆调整变体, 它在下游操作中具有类似的有益搜索偏差, 但会在时间的一小部分 。