Explanations play a considerable role in human learning, especially in areas that remain major challenges for AI -- forming abstractions, and learning about the relational and causal structure of the world. Here, we explore whether reinforcement learning agents might likewise benefit from explanations. We outline a family of relational tasks that involve selecting an object that is the odd one out in a set (i.e., unique along one of many possible feature dimensions). Odd-one-out tasks require agents to reason over multi-dimensional relationships among a set of objects. We show that agents do not learn these tasks well from reward alone, but achieve >90% performance when they are also trained to generate language explaining object properties or why a choice is correct or incorrect. In further experiments, we show how predicting explanations enables agents to generalize appropriately from ambiguous, causally-confounded training, and even to meta-learn to perform experimental interventions to identify causal structure. We show that explanations help overcome the tendency of agents to fixate on simple features, and explore which aspects of explanations make them most beneficial. Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.
翻译:解释在人类学习中起着相当大的作用, 特别是在对AI来说仍然是主要挑战的领域, 形成抽象, 了解世界的关系和因果结构。 在这里, 我们探讨强化学习代理人是否同样会从解释中受益。 我们概述了一个关系任务组合, 涉及选择一组中奇特的物体( 即, 与许多可能的特性不同) 。 奇数一出的任务要求代理人对一组物体之间的多维关系进行思考。 我们显示, 代理人不会从奖励中很好地学到这些任务, 但是当他们也受过训练来生成解释对象属性的语言或者为什么选择正确或不正确的语言时, 实现 > 90 % 的绩效。 在进一步的实验中, 我们展示了预测解释如何使代理人能够从模棱两可、 无因果关系的培训中适当地概括, 甚至从进行实验性干预以确定因果关系结构。 我们显示, 解释有助于克服代理人对简单特性的倾向, 并探索解释的哪些方面最有益。 我们的结果表明, 从解释中学习是一个强有力的原则, 能够提供更可靠和普通的机器学习系统培训的有希望的途径。