When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.
翻译:在寻找政策时,奖赏偏差的环境往往缺乏关于哪些行为需要改进或避免的充分信息。在这种环境中,政策搜索过程必然会盲目地寻找有回报的过渡,没有早期的奖赏可以将这一搜索偏向于任何方向。 克服这一状况的一个办法是利用内在动机探索新的过渡,直到找到奖励。 在这项工作中,我们采用进化政策搜索方法中最近提出的内在动机定义“好奇心” 。 我们提出了“好奇心-ES”, 这是一种适应于将好奇心用作健康衡量标准的渐进战略。 我们把好奇心与常用的多样性衡量标准“新奇特 ” 进行比较, 发现好奇心可以产生更大的多样性,而无需明确的多样性标准,并导致多种政策获得奖励。