Human beings are able to understand objectives and learn by simply observing others perform a task. Imitation learning methods aim to replicate such capabilities, however, they generally depend on access to a full set of optimal states and actions taken with the agent's actuators and from the agent's point of view. In this paper, we introduce a new algorithm - called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) - with the purpose of bypassing such constraints. Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task, by making use of adversarial learning with a latent representation inside the discriminator network. Such latent representation is regularized through mutual information constraints to incentivize learning only features that encode information about the completion levels of the task being demonstrated. This allows to obtain a shared feature space to successfully perform imitation while disregarding the differences between the expert's and the agent's domains. Empirically, our algorithm is able to efficiently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.
翻译:人类能够理解目标和学习,只需观察他人就可以完成一项任务。 模拟学习方法的目的是复制这种能力,但是,它们一般取决于能否获得一整套最佳状态和与代理人的推动者一起采取的行动,并且从代理人的观点出发。 在本文中,我们引入了一种新的算法,称为“分解基因反自流学习”(Disentangail),目的是绕过这些限制。我们的算法使自主代理商能够直接从从事一项任务的专家的高层次观察中学习。我们的算法使自主代理商能够直接从从事一项任务的专家的高层次观察中学习,方法是利用在歧视者网络中具有潜在代表性的对抗性学习。这种潜在代表性是通过相互的信息限制而正规化的。通过相互信息限制来激励学习关于所展示的任务完成程度的信息的唯一特点。这样可以获得一个共同的特性空间,以便成功进行模仿,同时无视专家和代理人的领域之间的差异。 生动地说,我们的算法能够有效地模仿各种控制问题,包括平衡、操纵和动态任务,同时对各种表面和感官环境和感官的方面的各种领域差异。