Enabling reinforcement learning (RL) agents to leverage a knowledge base while learning from experience promises to advance RL in knowledge intensive domains. However, it has proven difficult to leverage knowledge that is not manually tailored to the environment. We propose to use the subclass relationships present in open-source knowledge graphs to abstract away from specific objects. We develop a residual policy gradient method that is able to integrate knowledge across different abstraction levels in the class hierarchy. Our method results in improved sample efficiency and generalisation to unseen objects in commonsense games, but we also investigate failure modes, such as excessive noise in the extracted class knowledge or environments with little class structure.
翻译:使强化学习(RL)代理商能够利用知识库,而从经验中学习,则有可能在知识密集领域推进RL。然而,事实证明,很难利用并非以人工方式适应环境的知识。我们提议使用开放源码知识图中存在的亚类关系,从特定对象中抽取。我们开发了一种残余的政策梯度方法,能够将知识纳入等级层次的不同抽象层次。我们的方法导致在普通游戏中提高抽样效率和对看不见对象的概括化,但我们也调查失败模式,例如被提取的阶级知识中过度的噪音或缺乏阶级结构的环境。