分而治之的模仿学习 (Divide & Conquer Imitation Learning)

When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.

翻译：当被应用于深度强化学习领域的时候，许多机器人任务需要解决长期和稀疏奖励问题，让学习算法难以应对。在这种情况下，模仿学习（IL）可以成为启动学习过程的强大方法。然而，大多数IL方法需要多个专家演示，这可能非常难以获得。只有少数IL算法在极少的专家数据情况下表现出效率，只有一次专家演示。在本文中，我们提出了一种新的算法，旨在从专家轨迹的状态中模仿复杂的机器人任务。基于顺序归纳偏差，我们的方法将复杂的任务分成更小的技能。这些技能被学习成为一个目标条件政策，能够单独解决每个技能并链接技能以解决整个任务。我们表明，我们的方法可以模仿非完整导航任务，并能在高样本效率的情况下扩展到复杂的虚拟机器人操作任务。

相关内容

模仿学习

关注 322

模仿学习是学习尝试模仿专家行为从而获取最佳性能的一系列任务。目前主流方法包括监督式模仿学习、随机混合迭代学习和数据聚合模拟学习等方法。模仿学习（Imitation Learning）背后的原理是是通过隐含地给学习器关于这个世界的先验信息，比如执行、学习人类行为。在模仿学习任务中，智能体（agent）为了学习到策略从而尽可能像人类专家那样执行一种行为，它会寻找一种最佳的方式来使用由该专家示范的训练集（输入-输出对）。当智能体学习人类行为时，虽然我们也需要使用模仿学习，但实时的行为模拟成本会非常高。与之相反，吴恩达提出的学徒学习（Apprenticeship learning）执行的是存粹的贪婪/利用（exploitative）策略，并使用强化学习方法遍历所有的（状态和行为）轨迹（trajectories）来学习近优化策略。它需要极难的计略（maneuvers），而且几乎不可能从未观察到的状态还原。模仿学习能够处理这些未探索到的状态，所以可为自动驾驶这样的许多任务提供更可靠的通用框架。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日