神经网络修复与可达性分析 (Neural Network Repair with Reachability Analysis)

Safety is a critical concern for the next generation of autonomy that is likely to rely heavily on deep neural networks for perception and control. Formally verifying the safety and robustness of well-trained DNNs and learning-enabled systems under attacks, model uncertainties, and sensing errors is essential for safe autonomy. This research proposes a framework to repair unsafe DNNs in safety-critical systems with reachability analysis. The repair process is inspired by adversarial training which has demonstrated high effectiveness in improving the safety and robustness of DNNs. Different from traditional adversarial training approaches where adversarial examples are utilized from random attacks and may not be representative of all unsafe behaviors, our repair process uses reachability analysis to compute the exact unsafe regions and identify sufficiently representative examples to enhance the efficacy and efficiency of the adversarial training. The performance of our framework is evaluated on two types of benchmarks without safe models as references. One is a DNN controller for aircraft collision avoidance with access to training data. The other is a rocket lander where our framework can be seamlessly integrated with the well-known deep deterministic policy gradient (DDPG) reinforcement learning algorithm. The experimental results show that our framework can successfully repair all instances on multiple safety specifications with negligible performance degradation. In addition, to increase the computational and memory efficiency of the reachability analysis algorithm, we propose a depth-first-search algorithm that combines an existing exact analysis method with an over-approximation approach based on a new set representation. Experimental results show that our method achieves a five-fold improvement in runtime and a two-fold improvement in memory usage compared to exact analysis.

翻译：安全是下一代自主的关键关切,它很可能严重依赖深层神经网络进行感知和控制。正式核查训练有素的DNN和学习辅助系统在攻击、模型不确定性和感知错误下的安全性和稳健性是安全自主的关键。这项研究提出了一个框架,用于在安全关键系统中修复不安全的DNN,并进行可及性分析。修复过程受到对抗性培训的启发,这显示在改善DNN的安全性和稳健性方面具有高度效力。不同于传统的对抗性培训方法,即利用随机攻击的敌对性实例,可能不能代表所有不安全的行为。我们的修复过程利用了可及性分析来计算准确的不安全区域,并确定了具有充分代表性的系统,以提高对抗性培训的效能和效率。我们的框架在两种基准上进行了评估,没有安全性模型作为参考。一个是避免飞机碰撞与培训数据接触的DNNN控制器。另一个是火箭着陆器,我们的框架可以与众所周知的深刻的确定性政策梯度(DPG)加强第一阶段学习算法进行无缝的整合。实验性结果显示,我们框架在精确性分析中提高了准确性方法的精确性分析,一个比重的精确性计算方法,我们在精确性分析中可以显示一个比重的精确性计算方法进行。