Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.
翻译:Go-Explore在挑战性强化学习(RL)任务上取得了突破性成绩,但回报却很少。Go-Explore的主要见解是,成功的探索要求代理人首先返回有趣的状态(“Go”),然后才探索到未知的地形(“Explore ” )。我们在目标达到后,我们称之为“勘探后” 。在本文中,我们在一个总的目标探索进程(IMGEP)框架内对勘探后情况进行了明确的消化研究,而Go-Explore文件没有显示这一点。我们研究了勘探后孤立的可能性,在不同的导航和连续控制任务上,在表格和深层的RL设置下将其开关,在相同的算法下,在不同的导航和连续控制任务下,我们研究了这种孤立的可能性。对微型和穆乔科环境的实验表明,勘探后情况确实有助于IMGEP代理接触更多不同的州,并提升其绩效。简而言之,我们的工作表明,RL研究人员应该考虑在可能时在IMGEPEP中使用勘探后的情况,因为它是有效的、方法上的简单易执行。