Although Deep Neural Networks (DNNs) have shown incredible performance in perceptive and control tasks, several trustworthy issues are still open. One of the most discussed topics is the existence of adversarial perturbations, which has opened an interesting research line on provable techniques capable of quantifying the robustness of a given input. In this regard, the Euclidean distance of the input from the classification boundary denotes a well-proved robustness assessment as the minimal affordable adversarial perturbation. Unfortunately, computing such a distance is highly complex due the non-convex nature of NNs. Despite several methods have been proposed to address this issue, to the best of our knowledge, no provable results have been presented to estimate and bound the error committed. This paper addresses this issue by proposing two lightweight strategies to find the minimal adversarial perturbation. Differently from the state-of-the-art, the proposed approach allows formulating an error estimation theory of the approximate distance with respect to the theoretical one. Finally, a substantial set of experiments is reported to evaluate the performance of the algorithms and support the theoretical findings. The obtained results show that the proposed strategies approximate the theoretical distance for samples close to the classification boundary, leading to provable robustness guarantees against any adversarial attacks.
翻译:虽然深神经网络(DNNS)在认知和控制任务方面表现出令人难以置信的成绩,但若干值得信赖的问题仍然有待解决,讨论最多的议题之一是存在对抗性扰动,这为量化某一投入的稳健性打开了一条令人感兴趣的研究线;在这方面,从分类边界输入的欧里德距离代表了一种得到良好证明的稳健性评估,作为最低可负担的对抗性扰动。不幸的是,计算这种距离非常复杂,因为NNNS的非阴性性质。尽管提出了解决这一问题的几种方法,但据我们所知,没有提出可辨别的结果来估计和约束错误。这份文件通过提出两种轻量级战略来寻找最低限度的对抗性扰动性干扰,与目前的情况不同,拟议的方法允许就理论性与理论性相对近的距离拟定一个错误估计理论。最后,报告了一系列实质性的实验,以评价算法的绩效,并支助近距离攻击的理论性结论。获得的结果显示,从理论上看,从理论上看,从理论上看,从理论上看,从理论上看,从理论上看,从理论上看,从理论上看,可以得出任何结论性结论性结论。