The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while still retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER), significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input.
翻译:在强化学习中使用人类示范已经证明大大改进了代理人的性能。然而,任何要求人手动“教学”模型的要求都与强化学习的目标有些矛盾。本文件试图尽量减少人类参与学习过程的程度,同时通过使用一个简单的虚拟现实模拟收集的单一人类范例来保持业绩优势,以协助学习学习。我们的方法增加了一个单一的演示,以产生许多类似人类的演示,这些演示与深确定性政策进展和闪烁经验(DDPG+HER)相结合,极大地改进了简单任务的培训时间,使代理商能够解决DDPG+HER单独无法解决的复杂任务(堆叠)。模型利用一个单一的人类范例实现这一巨大的培训优势,需要不到一分钟的人力投入。