In multi-goal reinforcement learning in an environment, agents learn policies to achieve multiple goals by using experiences gained from interactions with the environment. With a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences from unsuccessful experiences. However, generating successful experiences without consideration of the property of achieved goals is less efficient. In this paper, a novel cluster-based sampling strategy exploiting the property of achieved goals is proposed. The proposed sampling strategy groups episodes with different achieved goals and samples experiences in the manner of HER. For the grouping, K-means clustering algorithm is used. The centroids of the clusters are obtained from the distribution of failed goals defined as the original goals not achieved. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method significantly reduces the number of epochs required for convergence in two of the three tasks and marginally increases the success rates in the remaining one. It is also shown that the proposed method can be combined with other sampling strategies for HER.
翻译:在多目标强化学习环境中的多目标强化学习中,代理商通过利用与环境互动的经验学习政策以实现多重目标。由于缺少成功的经验,培训代理商特别具有挑战性,因为缺乏成功的经验。为解决这一问题,事后的体验重现(HER)从不成功的经验中产生成功的经验。然而,在不考虑已实现目标的属性的情况下,产生成功经验的效率较低。在本文件中,提出了利用已实现目标的属性的新型集群抽样战略;拟议的抽样战略组,以其方式实现了不同的目标和样本经验。在分组中,使用了K-手段组合算法。集群的中间体来自被确定为原始目标的失败目标分布。提议的方法通过OpenAI Gym的三个机器人控制任务实验得到验证。实验结果表明,拟议的方法大大减少了三个任务中两个任务中需要融合的教区数目,并略微提高了其余一项任务的成功率。还表明,拟议的方法可以与其他取样战略相结合。