Reward design for reinforcement learning agents can be difficult in situations where one not only wants the agent to achieve some effect in the world but where one also cares about how that effect is achieved. For example, we might wish for an agent to adhere to a tacit understanding of commonsense, align itself to a preference for how to behave for purposes of safety, or taking on a particular role in an interactive game. Storytelling is a mode for communicating tacit procedural knowledge. We introduce a technique, Story Shaping, in which a reinforcement learning agent infers tacit knowledge from an exemplar story of how to accomplish a task and intrinsically rewards itself for performing actions that make its current environment adhere to that of the inferred story world. Specifically, Story Shaping infers a knowledge graph representation of the world state from observations, and also infers a knowledge graph from the exemplar story. An intrinsic reward is generated based on the similarity between the agent's inferred world state graph and the inferred story world graph. We conducted experiments in text-based games requiring commonsense reasoning and shaping the behaviors of agents as virtual game characters.
翻译:强化学习代理人的奖励设计在以下情况下可能很困难:人们不仅希望该代理人在世界上取得某种效果,而且还关心如何取得这种效果。例如,我们可能希望代理人坚持对常识的默认理解,赞同为安全目的如何行事的偏好,或在互动游戏中扮演某种特殊角色。说故事是传递隐性程序知识的一种模式。我们引入了一种技术,即《故事形状》,其中一名强化学习代理人从一个实例中推断出如何完成一项任务,并且从本质上奖励自己从事使其当前环境符合假设世界环境的行动。具体地说,《故事形状》从观察中推导出一个世界状况的知识图示,还推导出一个从实例故事中得出的知识图。根据该代理人的推论世界状况图和推断的世界故事图之间的相似性而产生内在的奖赏。我们在基于文本的游戏中进行了实验,需要共同的推理和将代理人的行为塑造为虚拟游戏的字符。