Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
翻译:事后经验重现(HER)是一种目标重新标签技术,通常与政策外强化学习算法一起用于解决面向目标的任务;它非常适合机器人操纵任务,只提供微薄的回报。在她身上,轨迹和过渡均进行统一的抽样培训。然而,并非所有代理人的经验都同样有助于培训,因此天真的统一抽样都可能导致低效学习。在本文中,我们建议与HER(DGSH)一道,以多样性为基础的轨迹和目标选择。首先,轨迹根据目标国的多样性进行抽样,以定点进程(DPPs)为模型。第二,使用 k-DPPs从轨迹选择不同目标国的过渡。我们评估DGSH在模拟机器人环境中的五项具有挑战性的机器人操纵任务,我们在那里显示,我们的方法可以在所有任务上更快地学习,并达到比其他最先进的方法更高的性能。