Learning-based control algorithms have led to major advances in robotics at the cost of decreased safety guarantees. Recently, neural networks have also been used to characterize safety through the use of barrier functions for complex nonlinear systems. Learned barrier functions approximately encode and enforce a desired safety constraint through a value function, but do not provide any formal guarantees. In this paper, we propose a local dynamic programming (DP) based approach to "patch" an almost-safe learned barrier at potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains a novel barrier that provides formal safety guarantees, yet retains the global structure of the learned barrier. Our local DP based reachability algorithm, HJ-Patch, updates the barrier function "minimally" at points that both (a) neighbor the barrier safety boundary and (b) do not satisfy the safety condition. We view this as a key step to bridging the gap between learning-based barrier functions and Hamilton-Jacobi reachability analysis, providing a framework for further integration of these approaches. We demonstrate that for well-trained barriers we reduce the computational load by 2 orders of magnitude with respect to standard DP-based reachability, and demonstrate scalability to a 6-dimensional system, which is at the limit of standard DP-based reachability.
翻译:学习驱动的控制算法在机器人技术方面取得了重大进展,但往往会降低安全保障。最近,神经网络已被用于通过使用复杂非线性系统的屏障函数来表征安全性。学习到的屏障函数通过值函数大致编码并强制执行所需的安全约束,但不提供任何正式的保证。本文提出了一种基于局部动态规划(DP)的算法来在状态空间中的潜在不安全点“修补”一个几乎安全的学习屏障。该算法HJ-Patch获得了一种新颖的屏障,提供了正式的安全保证,同时保留了学习屏障的全局结构。我们的局部DP可达性算法HJ-Patch在满足以下两点的点处“最小化”更新屏障函数:(a)邻接于屏障安全边界处和(b)不满足安全条件。我们将其视为填补基于学习的屏障函数和Hamilton-Jacobi可达性分析之间差距的关键步骤,为进一步整合这些方法提供了一个框架。我们证明,对于训练良好的屏障,我们将计算负载降低了两个数量级,相对于标准DP可达性,同时展示了对6维系统的可扩展性,这是标准DP可达性的极限。