Model-based reinforcement learning is a promising learning strategy for practical robotic applications due to its improved data-efficiency versus model-free counterparts. However, current state-of-the-art model-based methods rely on shaped reward signals, which can be difficult to design and implement. To remedy this, we propose a simple model-based method tailored for sparse-reward multi-goal tasks that foregoes the need for complicated reward engineering. This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates. To improve exploration in the sparse-reward setting, the policy is trained with standard Hindsight Experience Replay and endowed with curiosity-based intrinsic rewards. Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
翻译:以模型为基础的强化学习是实用机器人应用的一个有希望的学习战略,因为它提高了数据效率和没有模型的对应方,然而,目前最先进的基于模型的方法依赖于成形的奖励信号,而这种信号可能难以设计和实施。为了纠正这一点,我们建议了一种简单的基于模型的方法,专门为少得可怜的多目标任务设计,这些任务预示着对复杂奖励工程的需要。这个方法称为“想象的Hindsight 经验重现”,通过将想象中的数据纳入政策更新,最大限度地减少现实世界的相互作用。为了改进在稀释环境中的探索,该政策接受标准 Hindsight 经验重现培训,并具有基于好奇心的内在奖赏。经过评估,这一方法提供了平均数据效率与OpenAI Gym Fack Robicics基准中最先进的无标准方法相比,提高数据效率的幅度。