Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size.
翻译:诸如 Funsion- in-Decoder 等检索强化语言模型是强大的,它使各种知识密集型任务具有了先进的水平,然而,由于需要将大量检索的段落编码成编码,它们也是昂贵的。有些工作通过将文本文体预先编码成一个记忆体和直接检索密度表示来避免了这一成本。然而,编码前的内存将受到严重的质量处罚,因为内存表达并不以当前输入为条件。我们提议LUMEN, 这两种极端之间的混合, 预先计算大部分检索代表, 并使用一个以问题为条件的实时编码器完成苍蝇的编码。 我们显示, LUMEN在多个问题解答任务上大大超越纯记忆, 而比FID便宜得多, 并且与任何给定的计算预算相比, 两者都超越了。 此外, LUMEN 相对于FID的优势随着模型的大小而增加。