Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, existing approaches to MSVE are theoretically justified only for discrete state-spaces as they are oblivious to the geometry of continuous domains. We address this challenge by introducing Geometric Entropy Maximisation (GEM), a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains. Our key theoretical contribution is casting geometry-aware MSVE exploration as a tractable problem of optimising a simple and novel noise-contrastive objective function. In our experiments, we show the efficiency of GEM in solving several RL problems with sparse rewards, compared against other deep RL exploration approaches.
翻译:探索对于解决复杂的强化学习(RL)任务至关重要。 国家最大视觉(MSVE)将勘探问题设计成一个定义明确的政策优化问题,其解决办法是尽可能统一地访问所有各州。 这与标准的基于不确定性的方法形成鲜明对比,在这种方法中,勘探是短暂的,最终会消失。 然而,现有的MSVE方法在理论上只对离散的状态空间是有道理的,因为它们忽略了连续域的几何测量。 我们通过引入几何成形最大化(GEM)来应对这一挑战,这是一种在离散和连续的域内最大限度地实现国家访问的几何性-有觉的香农增生新算法。 我们的主要理论贡献是将MSVE探索定位为选择简单和新颖的噪音调频目标功能的可感性问题。 我们的实验显示,与其它深度RL探索方法相比,GEM在以微量的回报解决若干RL问题方面的效率。