Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.
翻译:缺少精密的指导导致许多强化学习算法的性能不佳。 在这些情况下, 常用的随机探索往往没有帮助。 文献表明, 这种环境需要巨大的努力来系统探索国家空间的大块块。 州代表机构可以在这里帮助改进搜索, 提供语义背景, 在原始观测中建立结构。 在此工作中, 我们引入了一个新颖的时间- 气象国家代表机构, 将时间- 接近的国家聚集在一起, 并同时提供时间预测能力。 通过将这一模型改造到 Go- Explore 模式( Ecoffet 等人, 2021b), 我们展示了第一个学到的状态代表机构, 可靠地估计了国家空间, 而不是使用手工制作的表达方式。 我们的方法展示了更好的分解问题解决方案, 在Go- Explore 探索阶段, 这个问题仍然是一个问题。 我们提供证据, 我们提出的方法覆盖了整个州空间, 与所有可能的时间轨迹。 在细胞档案中不造成不利的冲突重叠。 谷- 地- helverial A- trainal laverial Acal lational laviewal is the a laviewing the cal reviewd the the hust the hust the laviewal