Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizontal tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or fail to learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
翻译:提议从专家示范中恢复长期横向模拟学习(HIL)的高度复杂行为,办法是以选项框架模拟任务等级结构,从专家示范中恢复长期横向任务中的高度复杂行为; 现有方法要么忽视子任务与其相应政策之间的因果关系,要么没有以端到端的方式学习政策,导致不优化。 在这项工作中,我们根据反向强化学习开发了新型的HIL算法,并与期望-最大化算法相适应,以便直接从未加注解的演示中恢复等级政策。 此外,我们为目标功能引入一个定向信息术语,以加强因果关系,并提议一个以端到端的方式与我们的目标一起学习的自动自动编码框架。 为挑战机器人控制任务提供理论依据和评价,以显示我们的算法的优越性。 代码可在 https://github.com/LucaCJYSDL/HierAIRL上查阅。