We propose Taylor Series Imitation Learning (TaSIL), a simple augmentation to standard behavior cloning losses in the context of continuous control. TaSIL penalizes deviations in the higher-order Taylor series terms between the learned and expert policies. We show that experts satisfying a notion of $\textit{incremental input-to-state stability}$ are easy to learn, in the sense that a small TaSIL-augmented imitation loss over expert trajectories guarantees a small imitation loss over trajectories generated by the learned policy. We provide sample-complexity bounds for TaSIL that scale as $\tilde{\mathcal{O}}(1/n)$ in the realizable setting, for $n$ the number of expert demonstrations. Finally, we demonstrate experimentally the relationship between the robustness of the expert policy and the order of Taylor expansion required in TaSIL, and compare standard Behavior Cloning, DART, and DAgger with TaSIL-loss-augmented variants. In all cases, we show significant improvement over baselines across a variety of MuJoCo tasks.
翻译:我们建议泰勒系列模拟学习(TasIL),这是在持续控制的背景下对标准行为克隆损失的简单补充。TasIL惩罚高阶泰勒系列在所学政策和专家政策之间的偏差。我们表明,专家满足了美元(textitit)/incial Increate-pination-to State sustainable)概念,这是很容易了解的,因为与专家轨迹相比,微小的TasIL(TasIL)的模拟损失保证了在所学政策产生的轨迹上的微小模仿损失。我们提供了TasIL(TasIL)的样本复杂性约束,在可实现的环境下,以美元计为1美元(n)美元,用于专家演示数量。最后,我们实验性地展示了专家政策的稳健性和TasIL(TaSIL)所要求的泰勒扩展顺序之间的关系,并将标准Behavircloning、DART和Dagger(Dagger)与TasIL(M)损失变式比较。我们在所有情况下都显示在一系列任务的基准上显著改进了。