Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct errors so that the system improves over time. Our approach is three-fold: First, generated chains of reasoning show how answers are implied by the system's own internal beliefs. Second, users can interact with the explanations to identify erroneous model beliefs and provide corrections. Third, we augment the model with a dynamic memory of such corrections. Retrievals from memory are used as additional context for QA, to help avoid previous mistakes in similar new situations - a novel type of memory-based continuous learning. To our knowledge, this is the first system to generate chains that are both faithful (the answer follows from the reasoning) and truthful (the chain reflects the system's own beliefs, as ascertained by self-querying). In evaluation, users judge that a majority (65%+) of generated chains clearly show how an answer follows from a set of facts - substantially better than a high-performance baseline. We also find that using simulated feedback, our system (called EntailmentWriter) continually improves with time, requiring feedback on only 25% of training examples to reach within 1% of the upper-bound (feedback on all examples). We observe a similar trend with real users. This suggests new opportunities for using language models in an interactive setting where users can inspect, debug, correct, and improve a system's performance over time.
翻译:我们的目标是一个解答问题的教学推理系统(QA ), 用户可以与忠实的解答解释互动, 纠正错误, 从而让系统随着时间的推移得到改进。 我们的方法有三重: 首先, 生成的推理链显示系统内部信仰意味着什么答案。 其次, 用户可以与解答解释互动, 找出错误的模型信念, 提供校正。 第三, 我们用对此类校正的动态记忆来充实模型。 从记忆中检索作为QA的额外背景, 以帮助避免类似新情况中以往的错误 — — 一种新型的基于记忆的不断学习。 据我们了解, 这是第一个生成既忠实( 推理的答案) 也诚实( 反映系统自身信仰的系统) 和诚实( 自我征服的系统能反映系统本身的信念 ) 。 在评价中, 用户判断大多数( 65 ⁇ ) 是如何从一系列事实中找到答案的 — 大大高于高性能基线。 我们还发现, 使用模拟反馈, 我们的系统( 所谓的缩) 持续改进链( ) 系统, 并不断改进这个系统, 要求25 更新用户的反馈, 使用精确的模型, 显示 25 正确的用户在设置中, 更新的模型。