Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
翻译:大量经过培训的语文模型已经展示出,以储存其参数中的事实知识,并在对下游国家语言平台任务进行微调时,取得最先进的成果。然而,它们获取和精确操作知识的能力仍然有限,因此,在知识密集型任务上,其业绩落后于任务特定结构。此外,为其决定提供出处和更新其世界知识提供出处,仍然是开放的研究问题。具有明确非参数内存的可选访问机制的事先培训模式可以克服这一问题,但迄今为止只对下游采掘任务进行了调查。我们探索了一种通用的精细调配方,用于检索强化的事实生成(RAG) -- -- 将预先培训的参数性与非参数性记忆结合起来的模型。我们引入了RAG模型模型,用于检索整个生成序列的同一通道上的条件 -- -- RAG2 事实生成(RAG) -- -- 将预先培训的参数性参数性与非参数性记忆的模型结合起来。我们引入了参数性记忆模型模型,可以使用更精确和不精确的版本。