This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system's trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multi-link robot arm, 6-DoF maneuvering quadrotor, and 6-DoF rocket powered landing.
翻译:本文开发了一种Pontryagin可区别的编程方法(PDP),它建立了一个统一的框架,以解决广泛的学习和控制任务。PDP通过两种新颖的技术区分了现有方法:首先,我们通过Pontryagin的最大原则区分了我们,这样就可以在一个最佳控制系统内获得关于可金枪鱼参数的轨迹的分析衍生物,从而能够在动态、政策或/和控制客观功能的端到端学习;其次,我们提议在PDP框架的后端通道上建立一个辅助控制系统,而这一辅助控制系统的输出是原系统参数轨迹的分析衍生物,这些参数可以通过标准控制工具迭接解决。我们研究了PDP的三个学习模式:反强化学习、系统识别和控制/规划。我们展示了PDP在不同的高维系统,包括多链式机器人臂、6-DoF调控重矩和6-DoF火箭动力着陆的每个学习模式上的能力。