For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance. However, in practice, limited experience and high-dimensional input prevent effective representation learning. To address this, motivated by the success of masked modeling in other research fields, we introduce mask-based reconstruction to promote state representation learning in RL. Specifically, we propose a simple yet effective self-supervised method, Mask-based Latent Reconstruction (MLR), to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels. MLR enables the better use of context information when learning state representations to make them more informative, which facilitates RL agent training. Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous benchmark environments.
翻译:对于从像素中深入强化学习(RL)而言,学习有效的国家表现对于取得高性能至关重要,但在实践中,经验有限和高维投入阻碍了有效的代表性学习。为了解决这一问题,由于在其他研究领域成功采用蒙面模型,我们引入了以面具为基础的重建,以促进在RL中进行国家代表性学习。具体地说,我们提议了一种简单而有效的自我监督方法,即以面具为基础的晚报重建(MLR),以预测从带有空间和时空蒙面像素的观测得出的潜伏空间中完全的状态表现。MLR能够在学习国家表现时更好地利用背景信息,使其更具信息性,从而促进RL代理培训。广泛的实验表明,我们的MLR大大提高了RL的样本效率,在多个连续的基准环境中超越了最先进的样本效率RL方法。