Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.
翻译:模拟学习(IL)旨在从专家示范中学习一项政策,以尽量减少学习者和专家行为之间的差异。提出了各种模拟学习算法,并预先确定了不同的差异,以量化差异。这自然产生了以下问题:鉴于一套专家示范,这种差异可以更准确地用更高的数据效率来恢复专家政策?在这项工作中,我们提议了一个新的基因化对抗模拟学习(GAIL)模式,即f$-GAIL,它自动从美元差异家庭学习差异计量法,以及能够产生类似专家行为的政策。与IL基线和各种预先确定的差异计量相比,$f$-GAIL在六个基于物理的控制任务中学习了更好的政策,数据效率更高。