Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer.
翻译:在强化学习方面最近取得的进展表明,它有能力在超人一级解决硬剂-环境互动任务,然而,目前对实际和实际世界任务应用强化学习方法有限,原因是大多数RL最先进的算法样本效率低下,即需要大量培训。例如,在Dota 2 中击败人类玩家的OpenAI Five算法已经培训了数千年的游戏时间。有几种方法可以解决抽样效率低下问题,或者可以更有效地利用已经收集的经验,或者通过更好地探索环境来获取更相关和多样化的经验。然而,据我们所知,在基于模型的算法中,并不存在这种方法,表明它们在解决硬控制任务方面与高维度空间之间的高采样效率。这项工作将探索技术和基于模型的强化学习结合起来。我们设计了一种新的探索方法,其中考虑到基于模型的方法的特点。我们还通过实验表明,我们的方法大大改进了基于模型的Dreamer的算法的性能。