We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.
翻译:我们引入了深入强化学习(RL)的方法,通过结构化的认知和关联推理,提高常规方法的效率、普及能力和可解释性。它利用自我注意,反复解释现场实体之间的关系,指导无模式的政策。我们的结果表明,在名为Box-World的新颖的导航和规划任务中,我们的代理人找到了可以解释的解决方案,从抽样复杂性、普及到比培训期间经历的更复杂场景的能力以及总体性能等方面来改进基线。在StarCraft II学习环境中,我们的代理人在六场小型游戏上取得了最先进的表现 -- -- 超越了人类祖母在四场上的表演。通过考虑建筑感官偏见,我们的工作为克服深层RL中重要但顽固的挑战开辟了新的方向。