Knowledge underpins reasoning. Recent research demonstrates that when relevant knowledge is provided as additional context to commonsense question answering (QA), it can substantially enhance the performance even on top of state-of-the-art. The fundamental challenge is where and how to find such knowledge that is high quality and on point with respect to the question; knowledge retrieved from knowledge bases are incomplete and knowledge generated from language models are inconsistent. We present Rainier, or Reinforced Knowledge Introspector, that learns to generate contextually relevant knowledge in response to given questions. Our approach starts by imitating knowledge generated by GPT-3, then learns to generate its own knowledge via reinforcement learning where rewards are shaped based on the increased performance on the resulting question answering. Rainier demonstrates substantial and consistent performance gains when tested over 9 different commonsense benchmarks: including 5 in-domain benchmarks that are seen during reinforcement learning, as well as 4 out-of-domain benchmarks that are kept unseen. Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of knowledge elicited from GPT-3 for commonsense QA.
翻译:最近的研究表明,如果将相关知识作为常识问题解答的补充背景来提供,那么它就可以大大提高即使在最先进的解答(QA)上的业绩。基本的挑战在于,在哪些方面以及如何找到高质量和与问题相关的知识;从知识库中获取的知识不完整,语言模型产生的知识不一致。我们介绍Rainier或强化知识调查员,他们学习根据特定问题解答生成与背景相关的知识。我们的方法是从GPT-3所生成的知识开始,然后学习如何通过强化学习产生自己的知识,在根据由此产生的问题解答所提高的效绩确定奖励的情况下,通过强化学习产生自己的知识。Rainier在测试了9项不同的常识基准(包括在强化学习期间看到的5项内部基准,以及隐藏在不为人知的4项外部基准)时,表现得显著一致。我们的工作首先报告,规模小于GPT-3的模型所产生的知识,即使不直接监督知识本身,也可能超过从GPTA-3获得的知识质量。