Interactions with either environments or expert policies during training are needed for most of the current imitation learning (IL) algorithms. For IL problems with no interactions, a typical approach is Behavior Cloning (BC). However, BC-like methods tend to be affected by distribution shift. To mitigate this problem, we come up with a Robust Model-Based Imitation Learning (RMBIL) framework that casts imitation learning as an end-to-end differentiable nonlinear closed-loop tracking problem. RMBIL applies Neural ODE to learn a precise multi-step dynamics and a robust tracking controller via Nonlinear Dynamics Inversion (NDI) algorithm. Then, the learned NDI controller will be combined with a trajectory generator, a conditional VAE, to imitate an expert's behavior. Theoretical derivation shows that the controller network can approximate an NDI when minimizing the training loss of Neural ODE. Experiments on Mujoco tasks also demonstrate that RMBIL is competitive to the state-of-the-art generative adversarial method (GAIL) and achieves at least 30% performance gain over BC in uneven surfaces.
翻译:培训期间的大多数模拟学习( IL) 算法都需要与环境或专家政策互动。 对于没有互动的 IL 算法,典型的方法是行为克隆( BC)。 但是, BC 类方法往往会受到分布变化的影响。 为了缓解这一问题,我们提出了一个模型模拟模拟模拟模拟模拟学习( RMBIL) 框架, 将模拟学习作为一种端到端的不同非线性闭路跟踪问题。 RMBIL 实验还表明, NUBIL 应用 NE 来通过非线性动态转换( NDI) 算法学习精确的多步动态和强力跟踪控制器。 之后, 学习过的 NDI 控制器将与轨迹生成器( 有条件 VAE ) 合并, 以模拟专家的行为。 理论衍生显示, 当最大限度地减少 Neal ODE 的培训损失时, 控制器网络可以接近 NDI 。 Mujoco 任务实验还表明, RMBIL 具有竞争力, 通过非线性基因对抗法( GAIL) 取得至少30%的成绩。