Model-based control requires an accurate model of the system dynamics for precisely and safely controlling the robot in complex and dynamic environments. Moreover, in presence of variations in the operating conditions, the model should be continuously refined to compensate for dynamics changes. In this paper, we propose a self-supervised learning approach to actively model robot discrete-time dynamics. We combine offline learning from past experience and online learning from present robot interaction with the unknown environment. These two ingredients enable highly sample-efficient and adaptive learning for accurate inference of the model dynamics in real-time even in operating regimes significantly different from the training distribution. Moreover, we design an uncertainty-aware model predictive controller that is conditioned to the aleatoric (data) uncertainty of the learned dynamics. The controller actively selects the optimal control actions that (i) optimize the control performance and (ii) boost the online learning sample efficiency. We apply the proposed method to a quadrotor system in multiple challenging real-world experiments. Our approach exhibits high flexibility and generalization capabilities by consistently adapting to unseen flight conditions, while it significantly outperforms classical and adaptive control baselines.
翻译:以模型为基础的控制要求精确和安全地控制在复杂和动态环境中的机器人的系统动态的精确模型。此外,在操作条件存在差异的情况下,该模型应该不断完善,以补偿动态变化。在本文件中,我们提议了一种自我监督的学习方法,以积极模拟机器人离散时间动态。我们结合了从过去的经验中进行的离线学习和从目前机器人与未知环境的互动中进行的在线学习。这两个要素使得能够进行高度抽样高效和适应性学习,以准确推断实时的模型动态。即使在与培训分布大不相同的操作系统中也是如此。此外,我们设计了一种具有不确定性的模型预测控制器,该控制器以学习的动态的偏差(数据)不确定性为条件。控制器积极选择了最佳的控制行动,以便(一) 优化控制性功能,(二) 提高在线学习样本效率。我们在多重具有挑战性的现实世界实验中将拟议方法应用于 quadrtororor系统。我们的方法显示高度的灵活性和概括化能力,方法是不断适应看不见的飞行条件,同时大大超出经典和适应控制基线。