The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER) significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input. Moreover, despite learning from a human example, the agent is not constrained to human-level performance, often learning a policy that is significantly different from the human demonstration.
翻译:本文试图在保持人类演示对智能体效果显著提升的同时,最小化人类介入学习过程的要求。我们使用一次简单易学的虚拟现实模拟收集单一的人类样例,通过增强这个样例,生成多个类似人类的演示,再结合深度确定性策略梯度和回忆体验重放来大幅提高训练效率。该模型在简单任务上的训练时间更短,能够解决深度确定性策略梯度和回忆体验重放无法解决的复杂任务(例如搭积木)。尽管是从人类样例的学习中得出的,但培训对象不受人类水平表现的约束,通常学习出与人类演示明显不同的策略。该方法仅需不到一分钟的人类干预,即可获得显著的培训优势。