Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation (Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation)

The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as the ones based on MPC, can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient IL algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Our evaluation is performed in a high-fidelity simulation environment and shows that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under a wind disturbance that corresponds to about $50\%$ of the weight of the robot and that is $36\%$ larger than the maximum wind seen during training.

翻译：通过管控MPC引导的数据增强实现高效深度学习鲁棒性和适应性策略在具有挑战性的非结构化环境中，部署灵敏的自主系统需要适应性能力和对不确定性的强壮性。现有的Robust和Adaptive控制器，如基于MPC的控制器，可以在线Onboard计算成本大的情况下实现出色的性能。出现了一些有效地从MPC中学习Robust和Onboard可部署策略的策略，但它们仍然缺乏基本的适应性能力。在本文中，我们通过将策略的设计基于可以在线估算的学习低维的模型/环境表示，扩展了现有的高效IL算法，使其具备适应挑战模型/环境不确定性的能力。我们的方法的关键思想在于通过修改IL过程，对策略进行修正，从而使其适应具有挑战性的模型/环境不确定性。我们在多转子上，定制我们的方法，将其应用于学习一种适应性的位置和姿态控制策略，以在具有挑战性的扰动下跟踪轨迹。我们的评估是在高保真仿真环境中进行的，并表明可在约1.3小时内获得高质量的适应性策略。我们另外实验证明了对训练内外分布的不确定性的快速适应，实现了其均方根位置误差低于6.1厘米，这是在训练期间看到的最大风力的36％大，对应着机器人质量的50％的风扰动。