Imitation Learning (IL) can generate computationally efficient policies from demonstrations provided by Model Predictive Control (MPC). However, IL methods often require extensive data-collection and training efforts, limiting changes to the policy if the task changes, and they produce policies with limited robustness to new disturbances. In this work, we propose an IL strategy to efficiently compress a computationally expensive MPC into a deep neural network policy that is robust to previously unseen disturbances. By using a robust variant of the MPC, called Robust Tube MPC, and leveraging properties from the controller, we introduce computationally efficient data augmentation methods that enable a significant reduction of the number of MPC demonstrations and training efforts required to generate a robust policy. Our approach opens the possibility of zero-shot transfer of a policy trained from a single MPC demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a new domain with previously unseen bounded model errors/perturbations. Numerical evaluations performed using linear and nonlinear MPC for agile flight on a multirotor show that our method outperforms strategies commonly employed in IL (such as Dataset-Aggregation (DAgger) and Domain Randomization (DR)) in terms of demonstration-efficiency, training time, and robustness to perturbations unseen during training. Experimental evaluations validate the efficiency and real-world robustness.
翻译:暂无翻译