The emergence of large pretrained models has enabled language models to achieve superior performance in common NLP tasks, including language modeling and question answering, compared to previous static word representation methods. Augmenting these models with a retriever to retrieve the related text and documents as supporting information has shown promise in effectively solving NLP problems in a more interpretable way given that the additional knowledge is injected explicitly rather than being captured in the models' parameters. In spite of the recent progress, our analysis on retriever-augmented language models shows that this class of language models still lack reasoning over the retrieved documents. In this paper, we study the strengths and weaknesses of different retriever-augmented language models such as REALM, kNN-LM, FiD, ATLAS, and Flan-T5 in reasoning over the selected documents in different tasks. In particular, we analyze the reasoning failures of each of these models and study how the models' failures in reasoning are rooted in the retriever module as well as the language model.
翻译:与以前静态的单词表示方法相比,大型预先培训模型的出现使得语言模型能够在共同的NLP任务中取得优异的成绩,包括语言模型和回答问题,这与以往静态的单词表示法相比。将这些模型与检索器连接,以检索相关文本和文件作为佐证信息,显示出以更可解释的方式有效解决NLP问题的希望,因为额外知识是明确注入的,而不是在模型参数中捕获的。尽管最近取得了进展,我们对检索器强化语言模型的分析表明,这一类语言模型仍然缺乏对检索文档的推理。在本文中,我们研究了不同检索器推理语言模型的优缺点,如SeliveM、 kNN-LM、FID、ATLAS和Flan-T5, 在不同任务的选定文件的推理。特别是,我们分析了这些模型的推理失败,并研究了这些模型在推理方面的失败如何根植于检索器模块和语言模型。