Adversarial robustness has become a topic of growing interest in machine learning since it was observed that neural networks tend to be brittle. We propose an information-geometric formulation of adversarial defense and introduce FIRE, a new Fisher-Rao regularization for the categorical cross-entropy loss, which is based on the geodesic distance between the softmax outputs corresponding to natural and perturbed input features. Based on the information-geometric properties of the class of softmax distributions, we derive an explicit characterization of the Fisher-Rao Distance (FRD) for the binary and multiclass cases, and draw some interesting properties as well as connections with standard regularization metrics. Furthermore, for a simple linear and Gaussian model, we show that all Pareto-optimal points in the accuracy-robustness region can be reached by FIRE while other state-of-the-art methods fail. Empirically, we evaluate the performance of various classifiers trained with the proposed loss on standard datasets, showing up to a simultaneous 1\% of improvement in terms of clean and robust performances while reducing the training time by 20\% over the best-performing methods.
翻译:由于观察到神经网络往往处于紧张状态,因此对机器学习的兴趣日益浓厚。我们建议采用对抗性防御的信息几何配方,并引入FIRE,这是针对绝对跨热带损失的新的Fisher-Rao正规化法,其依据是自然和扰动输入特征相对应的软成份之间的大地距离。根据软成份类别软成份的信息几何特性,我们对二元和多级案例的Fisher-Rao距离(FRD)作了明确的定性,并绘制了一些有趣的属性以及与标准正规化指标的连接。此外,对于简单的线性和高斯模式,我们表明,在精准-扰动区域的所有Pareto最佳点都可以由FIRE达到,而其他最先进的方法则无法达到。我们从中评估了在标准数据集损失方面受过培训的各类分类人员的业绩,显示在清洁和稳健的绩效方面同时取得了改进,同时减少了20个最佳的进度。