The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, knowing the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. Unfortunately, state-of-the-art techniques for computing such a distance are computationally expensive and hence not suited for online applications. This work proposes a novel family of classifiers, namely Signed Distance Classifiers (SDCs), that, from a theoretical perspective, directly output the exact distance of x from the classification boundary, rather than a probability score (e.g., SoftMax). SDCs represent a family of robust-by-design classifiers. To practically address the theoretical requirements of a SDC, a novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance classifier, hence allowing an online certifiable classification of x at the cost of a single inference.
翻译:由于存在对抗性攻击,在安全临界系统中使用神经网络需要安全和稳健的模型。了解任何输入x或相等地了解x与分类边界之间的距离,可以评估分类稳健性,提供可验证的预测。不幸的是,计算这种距离的最先进技术在计算上成本昂贵,因此不适合在线应用。这项工作提议了一个新型的分类系统,即 " 远程分级器 ",从理论角度看,直接输出x与分类边界之间的准确距离,而不是概率分数(例如,SoftMax)。 SDC代表一个强势、按部就班的分类师组成的组合。为了切实满足SDC的理论要求,介绍了一个名为 " 统一-显著神经网络 " 的新网络结构。实验结果显示,拟议的结构与签名的远程分级器相近,因此允许以单一推论的代价对x进行在线分。