We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state, in order to plan and to generalize better out-of-distribution. The agent's architecture uses a set representation and a bottleneck mechanism, forcing the number of entities to which the agent attends at each planning step to be small. In experiments with customized MiniGrid environments with different dynamics, we observe that the design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
翻译:我们提出了一个端对端、基于模型的深强化学习工具,该工具动态地关注其状态的相关部分,以便规划和推广更好的分配外分配。该工具的结构使用一套代表制和瓶颈机制,迫使该代理参与每个规划步骤的实体数目很小。在对具有不同动态的定制小型干燥环境的实验中,我们观察到,设计允许该代理通过关注相关目标学习有效规划,从而导致更好的分配外概括化。