以能源为基础的事后见识经验的优先排序 (Energy-Based Hindsight Experience Prioritization)

In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in previous work, the experience was replayed at random, without considering which episode might be the most valuable for learning. In this paper, we develop an energy-based framework for prioritizing hindsight experience in robotic manipulation tasks. Our approach is inspired by the work-energy principle in physics. We define a trajectory energy function as the sum of the transition energy of the target object over the trajectory. We hypothesize that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. To verify our hypothesis, we designed a framework for hindsight experience prioritization based on the trajectory energy of goal states. The trajectory energy function takes the potential, kinetic, and rotational energy into consideration. We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation. Our empirical results show that our proposed method surpasses state-of-the-art approaches in terms of both performance and sample-efficiency on all four tasks, without increasing computational time. A video showing experimental results is available at https://youtu.be/jtsF2tTeUGQ

翻译：在深视经验重现(HER)中,一个强化学习代理机构通过将它已经达到的任何成就视为虚拟目标来培训强化学习的代理人。然而,在以往的工作中,经验是随机重现的,没有考虑到哪些事件可能是最有价值的学习。在本文中,我们开发了一个基于能源的框架,以便在机器人操纵任务中优先考虑后视经验。我们的方法受到物理学中的工作-能源原则的启发。我们把轨迹能源功能定义为目标物体在轨迹上过渡能量的总和。我们假设的是,重播具有高轨能的重播事件对于加强机器人的学习更为有效。为了核实我们的假设,我们设计了一个框架,根据目标国的轨迹能量确定后视经验优先顺序。轨迹能源功能考虑到潜力、动能和旋转能源。我们评估了我们基于能源的优先度(EBP)方法,以模拟中四项具有挑战性的机器人操纵任务。我们的经验结果显示,在不增加计算时间的情况下,我们拟议的方法超过了所有四项任务的性能和抽样效率方面的最新方法。显示实验结果的视频显示在http://Gs/TeG2中可以查阅。