Policy Search and Model Predictive Control~(MPC) are two different paradigms for robot control: policy search has the strength of automatically learning complex policies using experienced data, while MPC can offer optimal control performance using models and trajectory optimization. An open research question is how to leverage and combine the advantages of both approaches. In this work, we provide an answer by using policy search for automatically choosing high-level decision variables for MPC, which leads to a novel policy-search-for-model-predictive-control framework. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Such a formulation allows optimizing policies in a self-supervised fashion. We validate this framework by focusing on a challenging problem in agile drone flight: flying a quadrotor through fast-moving gates. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world. The proposed framework offers a new perspective for merging learning and control.
翻译:政策搜索和模型预测控制~(MPC)是机器人控制的两个不同范式:政策搜索具有利用有经验的数据自动学习复杂政策的力量,而MPC能够利用模型和轨迹优化提供最佳控制性能。一个公开的研究问题是如何利用和结合这两种方法的优势。在这项工作中,我们通过利用政策搜索自动选择MPC的高层次决定变量来提供答案,从而形成一个新的政策搜索模型预测控制框架。具体地说,我们将MPC设计成一个参数化控制器,在这个控制器中,难以优化的决策变量代表为高层次政策。这种设计可以以自我监督的方式优化政策。我们确认这一框架的方法是侧重于一个具有挑战性的问题:通过快速移动的门飞行一个二次三角轨道。实验表明我们的控制器在模拟和现实世界中都取得了稳健的实时控制性运行。拟议的框架为合并学习和控制提供了新的视角。