Over the last decade, Neural Networks (NNs) have been widely used in numerous applications including safety-critical ones such as autonomous systems. Despite their emerging adoption, it is well known that NNs are susceptible to Adversarial Attacks. Hence, it is highly important to provide guarantees that such systems work correctly. To remedy these issues we introduce a framework for repairing unsafe NNs w.r.t. safety specification, that is by utilizing satisfiability modulo theories (SMT) solvers. Our method is able to search for a new, safe NN representation, by modifying only a few of its weight values. In addition, our technique attempts to maximize the similarity to original network with regard to its decision boundaries. We perform extensive experiments which demonstrate the capability of our proposed framework to yield safe NNs w.r.t. the Adversarial Robustness property, with only a mild loss of accuracy (in terms of similarity). Moreover, we compare our method with a naive baseline to empirically prove its effectiveness. To conclude, we provide an algorithm to automatically repair NNs given safety properties, and suggest a few heuristics to improve its computational performance. Currently, by following this approach we are capable of producing small-sized (i.e., with up to few hundreds of parameters) correct NNs, composed of the piecewise linear ReLU activation function. Nevertheless, our framework is general in the sense that it can synthesize NNs w.r.t. any decidable fragment of first-order logic specification.
翻译:在过去的十年中,神经网络(NN)被广泛用于许多应用中,包括安全临界数据,例如自主系统。尽管这些系统正在被采用,但众所周知,NNS很容易受到反向攻击。因此,提供这种系统正确运转的保障非常重要。为了解决这些问题,我们引入了一个框架来修复不安全的NNS w.r.t.安全规格,即使用可视性模调理论(SMT)解答器。我们的方法能够通过修改其一些重量值来寻找新的、安全的NNN表示法。此外,我们的技术尝试在决定界限方面尽量扩大与原网络的相似性。我们进行了广泛的实验,展示了我们提议的框架能够产生安全的NNS w.r.t.d.d.d.d.dversarial 强性财产,只有轻微的准确性损失(相似性)。此外,我们将我们的方法与一个天真的基准比起来,可以证明它的有效性。结论是,我们用一个算法来自动修复 NNWs(安全特性)的原型网络。我们进行了一个比喻,并且能够用几百个直线性计算。 改进它的精度的精度,现在的精度,可以改进它的精度的精度的精确的精确的精确度,可以进行。