Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.
翻译:语言模型(LMS)现在在很多任务上都非常出色,例如少见的学习、提问回答、推理和对话。但是,它们有时产生没有支持或误导的内容。用户无法轻易地决定其产出是否可信,因为大多数LMS没有内在的外部证据归属机制。为了在保留最近一代模型的所有强大优势的同时实现归属,我们建议RAR(利用研究和修订进行更新),这个系统1)自动找到任何文本生成模型产出的属性。2)在编辑产出后,在尽可能保存原始产出的同时,对不支持的内容进行修正。当应用到数个最先进的LMS关于多种一代任务的输出时,我们发现RAR在将原始投入保存到比以前探索的编辑模型更大程度上的同时,大大改进了归属。此外,RAR的实施只需要少数几个培训范例、一个大语言模型和标准网络搜索。