Imitation learning (IL) can generate computationally efficient sensorimotor policies from demonstrations provided by computationally expensive model-based sensing and control algorithms. However, commonly employed IL methods are often data-inefficient, requiring the collection of a large number of demonstrations and producing policies with limited robustness to uncertainties. In this work, we combine IL with an output feedback robust tube model predictive controller (RTMPC) to co-generate demonstrations and a data augmentation strategy to efficiently learn neural network-based sensorimotor policies. Thanks to the augmented data, we reduce the computation time and the number of demonstrations needed by IL, while providing robustness to sensing and process uncertainty. We tailor our approach to the task of learning a trajectory tracking visuomotor policy for an aerial robot, leveraging a 3D mesh of the environment as part of the data augmentation process. We numerically demonstrate that our method can learn a robust visuomotor policy from a single demonstration--a two-orders of magnitude improvement in demonstration efficiency compared to existing IL methods.
翻译:模拟学习(IL)能够从计算成本昂贵的模型感官和控制算法提供的演示中产生计算效率高的感官模式政策,然而,通常使用的IL方法往往数据效率低,需要收集大量演示和制定对不确定因素的强力有限的政策。在这项工作中,我们把IL与产出反馈强管模型预测控制器(RTMPC)结合到联合生成演示和数据增强战略,以便有效地学习神经网络感官政策。由于数据增加,我们减少了IL所需的测试时间和数量,同时提供了感测和过程不确定性的稳健性。我们调整了我们的方法,以学习航空机器人的轨迹跟踪政策,利用环境的3D网块作为数据增强过程的一部分。我们用数字来证明,我们的方法可以从演示效率比现有的IL方法提高的单一示范-一二级中学习一个强的抗浮机政策。