Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Encoders for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In order to estimate state entropy in environments with high-dimensional observations, we utilize a k-nearest neighbor entropy estimator in the low-dimensional representation space of a convolutional encoder. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly initialized encoder, which is fixed throughout training. Our experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3 allows learning diverse behaviors without extrinsic rewards, effectively improving sample-efficiency in downstream tasks. Source code and videos are available at https://sites.google.com/view/re3-rl.
翻译:最近的勘探方法已证明是提高深层强化学习(RL)样本效率的秘诀。然而,在高维观测空间的有效探索仍是一项挑战。本文展示了高效探索的随机编码器(RE3),这是一种利用状态的星温作为内在奖赏的探索方法。为了估算高维观测环境中的状态环球,我们使用了在低维代表空间的相邻光学估计器。特别是,我们发现,通过使用随机初始化的编码器(RE3),可以以稳定、计算有效的方式估计该邦的酶。我们的实验显示,RE3大大提高了深海控制套和迷你Grid基准的无模型和基于模型的遥控和导航任务的样本效率。我们还显示,RE3允许在不具有极端奖赏的情况下学习不同的行为,有效地提高下游任务的样本效率。源代码和视频见https://sitesite.gogle.com/view3-rl。