We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.
翻译:我们提出了一个端对端、基于模型的深强化学习代理,在规划期间动态地关注其状态的相关部分。该代理使用固定代表制的瓶颈机制,迫使代理参与每个规划步骤的实体数目很小。在实验中,我们调查瓶颈机制,有几套定制环境,具有不同的挑战。我们一贯认为,设计允许规划代理通过关注相关物体,在相容的看不见环境中推广其学到的任务解决能力,从而导致更好的分配外概括性业绩。