A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model. Our approach, named SPADE, exploits bijective distance mapping between the input/output graphs constructed for approximating the manifolds corresponding to the input/output data. By leveraging the generalized Courant-Fischer theorem, we propose a SPADE score for evaluating the adversarial robustness of a given model, which is proved to be an upper bound of the best Lipschitz constant under the manifold setting. To reveal the most non-robust data samples highly vulnerable to adversarial attacks, we develop a spectral graph embedding procedure leveraging dominant generalized eigenvectors. This embedding step allows assigning each data sample a robustness score that can be further harnessed for more effective adversarial training. Our experiments show the proposed SPADE method leads to promising empirical results for neural network models adversarially trained with the MNIST and CIFAR-10 data sets.
翻译:采用黑盒光谱方法来评价某一机器学习模型的对抗性强度。 我们的方法名为SPADE,它利用为接近输入/输出数据相关方块而建造的输入/输出图之间的双向距离绘图。 通过利用通用的Coulant-Fischer理论,我们提出一个用于评价某一模型对抗性强度的SPADE评分,该模型被证明是多元环境下最佳Lipschitz常数的顶层。为了揭示最易受对抗性攻击的非robust数据样本,我们开发了利用占支配地位的通用精子的光谱嵌入图程序。这种嵌入式梯子使每个数据样本都有一个稳健性评分,可以进一步用于更有效的对抗性训练。我们的实验展示了拟议的SPADE方法,它为在MNIST和CIFAR-10数据集下接受对抗性训练的神经网络模型带来了有希望的经验结果。