In this study, we address the problem of efficient exploration in reinforcement learning. Most common exploration approaches depend on random action selection, however these approaches do not work well in environments with sparse or no rewards. We propose Generative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently. We discuss a few weaknesses and conclude by discussing future works.
翻译:在这项研究中,我们讨论了在强化学习中有效探索的问题。大多数常见的探索方法都取决于随机选择行动,但是这些方法在少有或无回报的环境中效果不好。我们建议基于网络的“内分泌”模块,该模块学习观察国家的分布,并给没有被分配的国家以最高数额的内在奖励,以便引导代理商进入未勘探的州。我们评估了我们在超级马里奥兄弟公司的做法,没有奖赏设定,在蒙特祖马的“复仇”中,没有微薄的奖赏设定,并表明我们的方法确实能够有效探索。我们讨论了几个弱点,最后讨论了未来工作。