Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve sample diversity for state representation learning. Our method enhances the exploration capability of RL algorithms, by taking advantage of the SRL setup. Our experiments show that our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments. These results are most apparent for environments where the baseline methods struggle. Even in simple environments, our method stabilizes the training, reduces the reward variance, and promotes sample efficiency.
翻译:以愿景为基础的强化学习(RL)是解决涉及图像的控制任务的一种很有希望的方法,主要观察为图像。最先进的RL算法仍然在样本效率方面挣扎,特别是在使用图像观察时。这导致人们更加关注将国家代表性学习(SRL)技术纳入RL管道。这一领域的工作表明,样本效率有了显著提高,除其他好处外,还取得了显著提高。然而,为了充分利用这一模式,用于培训的样本质量起着关键作用。更重要的是,这些样本的多样性可能影响基于愿景的RL的样本效率,但也影响到其普及能力。在这项工作中,我们提出了一个改进国家代表性学习样本多样性的方法。我们的方法通过利用SRL设置加强了RL算法的探索能力。我们的实验表明,我们拟议的方法促进了问题州的访问,改进了所学到的状态代表制,并超越了所有测试环境的基线。这些结果对于基本方法难以克服的环境来说最为明显。即便在简单的环境中,我们的方法也稳定了培训,缩小了报酬差异,并促进了抽样。