SafePontryagin 可区分的编程 (Safe Pontryagin Differentiable Programming)

We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic safe differentiable framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of both immediate and long-term constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of state and input constraints by incorporating them into the cost and loss through barrier functions. We prove the following fundamental features of Safe PDP: first, both the constrained solution and its gradient in backward pass can be approximated by solving a more efficient unconstrained counterpart; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy using a barrier parameter; and third, importantly, any intermediate results throughout the approximation and optimization are strictly respecting all constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safe learning and control tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging control systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.

翻译：我们建议采用安全 Pontryagin 差异化规划方法(Safe PDP), 建立理论和算法安全的安全度差异化框架,以解决广泛的安全关键学习和控制任务 -- -- 这些问题需要保证在学习和控制进展的任何阶段立即和长期的制约满意度; 本着内部点方法的精神,安全PDP处理不同类型的状态和投入限制,通过屏障功能将其纳入成本和损失中。我们证明安全PDP的以下基本特征:第一,限制的解决方案及其在后通道的梯度可以通过解决一个效率更高的不受限制的对应方来近似;第二,解决方案及其梯度的近似值可以用一个屏障参数来控制任意准确性;第三,重要的是,整个近似和优化的任何中间结果都严格尊重所有限制,从而保证整个学习和控制过程的安全性。我们证明,安全PDP有能力解决各种安全学习和控制任务,包括安全的政策优化、安全运动规划以及从演示中学习MPC,这些不同的具有挑战性的控制系统,例如6-DoF操控地盘和6-DoF 火箭着陆场和6-DoF。