The combination of policy search and deep neural networks holds the promise of automating a variety of decision-making tasks. Model Predictive Control (MPC) provides robust solutions to robot control tasks by making use of a dynamical model of the system and solving an optimization problem online over a short planning horizon. In this work, we leverage probabilistic decision-making approaches and the generalization capability of artificial neural networks to the powerful online optimization by learning a deep high-level policy for the MPC (High-MPC). Conditioning on robot's local observations, the trained neural network policy is capable of adaptively selecting high-level decision variables for the low-level MPC controller, which then generates optimal control commands for the robot. First, we formulate the search of high-level decision variables for MPC as a policy search problem, specifically, a probabilistic inference problem. The problem can be solved in a closed-form solution. Second, we propose a self-supervised learning algorithm for learning a neural network high-level policy, which is useful for online hyperparameter adaptations in highly dynamic environments. We demonstrate the importance of incorporating the online adaption into autonomous robots by using the proposed method to solve a challenging control problem, where the task is to control a simulated quadrotor to fly through a swinging gate. We show that our approach can handle situations that are difficult for standard MPC.
翻译:政策搜索和深心神经网络的结合使得政策搜索和深心神经网络有可能使各种决策任务自动化。 模型预测控制(MPC)通过使用系统的动态模型和在短短的规划视野中解决优化问题,为机器人控制任务提供了强有力的解决方案。 在这项工作中,我们利用人造神经网络的概率决策方法和一般化能力,以强大的在线优化为杠杆,学习MPC(高MPC)的深度高层次政策。 在机器人的本地观测中,经过培训的神经网络政策能够适应地为低层次的MPC控制器选择高层次决策变量,从而产生对机器人的最佳控制命令。 首先,我们设计为MPC寻找高层次决策变量,作为政策搜索问题,特别是概率推导论问题。 这个问题可以通过封闭式解决方案加以解决。 其次,我们建议一种自优学习方法,用于学习神经网络高层政策,这对于在高度动态环境中进行在线超分计调整非常有用,从而产生最佳的控制机器人的最佳控制程序。 我们用一个具有挑战性的方法,即通过模拟模型将一个自动调整到自动操作的方法来显示,通过模拟操作,从而显示一个具有挑战性控制。