We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state, in order to plan and to generalize better out-of-distribution. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution performance.
翻译:我们提出了一个端对端、基于模型的深强化学习代理,该代理动态地关注其状态的相关部分,以便规划和推广更好的分配外分配。该代理使用基于固定代表的瓶颈机制,迫使代理参与每个规划步骤的实体数目较小。在实验中,我们用具有不同挑战的几套定制环境来调查瓶颈机制。我们一贯认为,设计允许规划代理通过关注相关目标,在兼容的看不见环境中推广其学到的任务解决能力,从而导致更好的分配外业绩。