We propose Generation-Augmented Retrieval (GAR) for answering open-domain questions, which augments a query through text generation of heuristically discovered relevant contexts without external resources as supervision. We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR. We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy. Moreover, as sparse and dense representations are often complementary, GAR can be easily combined with DPR to achieve even better performance. GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader, and consistently outperforms other retrieval methods when the same generative reader is used.
翻译:我们提议代际强化检索(GAR) 用于回答开放式问题,这将通过在没有外部资源监督的情况下通过文本生成超自然发现的相关背景来增加查询;我们证明,生成的环境大大丰富了查询的语义和GAR的表达方式(BM25),与DPR等最先进的密集检索方法相比,其性能可比或更好。我们表明,为查询创造不同的背景有利于将结果拼凑成一致的检索准确性。此外,由于稀疏和稠密的表达方式往往相互补充,GAR可以很容易与DPR相结合,从而取得更好的业绩。 GAR在安装了采掘阅读器时,在自然问题和TriviaQA下实现了最先进的性能和TriviaQA数据集,在使用同一配有基因化阅读器时,始终优于其他检索方法。