Although reinforcement learning has seen tremendous success recently, this kind of trial-and-error learning can be impractical or inefficient in complex environments. The use of demonstrations, on the other hand, enables agents to benefit from expert knowledge rather than having to discover the best action to take through exploration. In this survey, we discuss the advantages of using demonstrations in sequential decision making, various ways to apply demonstrations in learning-based decision making paradigms (for example, reinforcement learning and planning in the learned models), and how to collect the demonstrations in various scenarios. Additionally, we exemplify a practical pipeline for generating and utilizing demonstrations in the recently proposed ManiSkill robot learning benchmark.
翻译:----
尽管强化学习最近取得了巨大的成功,但在复杂环境中,这种试错学习可能不实用或低效。另一方面,利用示范使代理从专家知识中受益,而不必通过探索发现最佳行动。在本次调查中,我们讨论了在顺序决策制定中使用示范的优势、在学习的决策制定模式中应用示范的各种方式(例如,在学习模型的强化学习和计划中),以及如何在各种场景中收集示范。此外,我们还举例说明了在最近提出的 ManiSkill 机器人学习基准测试中生成和利用示范的实际流程。