We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of system constraints on states and inputs by incorporating them into the cost or loss through barrier functions. We prove three fundamentals of the proposed Safe PDP: first, both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy by a barrier parameter; and third, importantly, all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safety-critical tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
翻译:我们提议了一个安全 Pontryagin 差异化规划(Safe PDP)方法,该方法建立了一个理论和算法框架,以解决一系列广泛的安全关键学习和控制任务 -- -- 这些问题要求在学习和控制进展的任何阶段保证安全约束满意度;本着内部点方法的精神,安全PDP处理对各州和投入的不同类型的系统限制,通过屏障功能将其纳入成本或损失中。 我们证明拟议的安全PDP的三个基本要素:首先,解决方案及其在落后通道的梯度可以通过解决其效率更高的不受限制的对应方而加以近似;其次,解决方案及其梯度的近似可因屏障参数的任意精确性而加以控制;以及第三,重要的是,在整个近似和优化过程中,所有中间结果都严格遵守限制,从而保证整个学习和控制过程的安全。我们证明安全PDP有能力解决各种安全关键任务,包括安全的政策优化、安全动作规划以及从演示中学习MPC等不同挑战性系统,如6-DoF操纵孔径和6-DoF火箭动力着陆。