Reinforcement learning (RL) is one of the three basic paradigms of machine learning. It has demonstrated impressive performance in many complex tasks like Go and StarCraft, which is increasingly involved in smart manufacturing and autonomous driving. However, RL consistently suffers from the exploration-exploitation dilemma. In this paper, we investigated the problem of improving exploration in RL and introduced the intrinsically-motivated RL. In sharp contrast to the classic exploration strategies, intrinsically-motivated RL utilizes the intrinsic learning motivation to provide sustainable exploration incentives. We carefully classified the existing intrinsic reward methods and analyzed their practical drawbacks. Moreover, we proposed a new intrinsic reward method via R\'enyi state entropy maximization, which overcomes the drawbacks of the preceding methods and provides powerful exploration incentives. Finally, extensive simulation demonstrated that the proposed module achieve superior performance with higher efficiency and robustness.
翻译:强化学习(RL)是机器学习的三个基本范例之一,在Go和StarCraft等许多复杂任务中表现出令人印象深刻的表现,Go和StarCraft越来越多地参与智能制造和自主驾驶。然而,RL始终受到勘探-开发困境的困扰。在本文中,我们调查了改进RL勘探的问题,并引入了具有内在动机的RL。与传统的勘探战略形成鲜明对比的是,具有内在动机的RL利用内在学习动力提供可持续勘探奖励。我们仔细分类了现有的内在奖励方法并分析了其实际缺点。此外,我们通过R\'enyi州温和最大化提出了新的内在奖励方法,克服了先前方法的缺陷,提供了强有力的勘探奖励。最后,广泛的模拟表明,拟议的模块以更高的效率和强健度实现了优异性。