Vision-based reinforcement learning (RL) is a promising technique to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to an increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve the sample diversity. Our method enhances the exploration capability of the RL algorithms by taking advantage of the SRL setup. Our experiments show that the presented approach outperforms the baseline for all tested environments. These results are most apparent for environments where the baseline method struggles. Even in simple environments, our method stabilizes the training, reduces the reward variance and boosts sample efficiency.
翻译:以愿景为基础的强化学习(RL)是解决涉及图像作为主要观察结果的控制任务的一个很有希望的技术。最先进的RL算法仍然在样本效率方面挣扎,特别是在使用图像观测时。这促使人们更加注意将国家代表性学习(SRL)技术纳入RL管道。这一领域的工作表明,样本效率有了显著提高,除其他好处外,还取得了显著提高。然而,为了充分利用这一范例,用于培训的样本质量起着关键作用。更重要的是,这些样本的多样性可能影响基于愿景的RL的样本效率,但也影响到其普及能力。在这项工作中,我们提出了一个改进样本多样性的方法。我们的方法通过利用SRL设置提高了RL算法的探索能力。我们的实验表明,所提出的方法超越了所有测试环境的基线。这些结果对于基线方法挣扎的环境来说最为明显。即便在简单的环境中,我们的方法也稳定了培训,降低了奖励差异,提高了样本效率。