A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model. Our approach, named SPADE, exploits bijective distance mapping between the input/output graphs constructed for approximating the manifolds corresponding to the input/output data. By leveraging the generalized Courant-Fischer theorem, we propose a SPADE score for evaluating the adversarial robustness of a given model, which is proved to be an upper bound of the best Lipschitz constant under the manifold setting. To reveal the most non-robust data samples highly vulnerable to adversarial attacks, we develop a spectral graph embedding procedure leveraging dominant generalized eigenvectors. This embedding step allows assigning each data sample a robustness score that can be further harnessed for more effective adversarial training. Our experiments show the proposed SPADE method leads to promising empirical results for neural network models that are adversarially trained with the MNIST and CIFAR-10 data sets.
翻译:采用黑盒光谱方法来评价某一机器学习模型的对抗性强度。 我们的方法名为SPADE,它利用为接近输入/输出数据相关方块而建造的输入/输出图之间的双向距离映射。 通过利用通用的Coulant-Fischer理论,我们提议了一个用于评价某一模型对抗性强度的SPADE评分,这被证明是多元环境下最佳Lipschitz常数的顶层。为了揭示最易受到对抗性攻击的非RObust数据样本,我们开发了一个光谱嵌入程序,利用占支配地位的通用精子。这种嵌入式缩放使每个数据样本都有一个稳健性评分,可以进一步用于更有效的对抗性训练。我们的实验展示了拟议的SPADE方法,从而有望为与MNIST和CIFAR-10数据集进行对抗性训练的神经网络模型带来有希望的经验结果。