Despite many proposed algorithms to provide robustness to deep learning (DL) models, DL models remain susceptible to adversarial attacks. We hypothesize that the adversarial vulnerability of DL models stems from two factors. The first factor is data sparsity which is that in the high dimensional input data space, there exist large regions outside the support of the data distribution. The second factor is the existence of many redundant parameters in the DL models. Owing to these factors, different models are able to come up with different decision boundaries with comparably high prediction accuracy. The appearance of the decision boundaries in the space outside the support of the data distribution does not affect the prediction accuracy of the model. However, it makes an important difference in the adversarial robustness of the model. We hypothesize that the ideal decision boundary is as far as possible from the support of the data distribution. In this paper, we develop a training framework to observe if DL models are able to learn such a decision boundary spanning the space around the class distributions further from the data points themselves. Semi-supervised learning was deployed during training by leveraging unlabeled data generated in the space outside the support of the data distribution. We measured adversarial robustness of the models trained using this training framework against well-known adversarial attacks and by using robustness metrics. We found that models trained using our framework, as well as other regularization methods and adversarial training support our hypothesis of data sparsity and that models trained with these methods learn to have decision boundaries more similar to the aforementioned ideal decision boundary. The code for our training framework is available at https://github.com/MahsaPaknezhad/AdversariallyRobustTraining.
翻译:尽管提出了许多旨在为深层次学习(DL)模型提供稳健性的算法,但DL模型仍然容易受到对抗性攻击。我们假设DL模型的对抗性脆弱性来自两个因素。第一个因素是数据宽度,即在高维输入数据空间,存在数据分布支持之外的大区域。第二个因素是DL模型中存在许多冗余参数。由于这些因素,不同的模型能够产生不同的决定界限,且具有可比的高对称预测准确性。在数据分布支持的外部空间中,决定界限的外表并不影响模型的预测准确性。然而,这在模型的对抗性强度方面有很大的差别。我们假设,在数据分布支持方面,理想的决定界限是尽可能的。在本文件中,我们开发一个培训框架,如果DL模型能够从更熟悉的等级分布模型中学习这样的决定界限,从数据分布点本身即可看出。在培训期间,利用不贴贴标签的模型进行模拟的ROL/Rialder学习, 使用经过训练的模型进行我们所训练的稳健健健健健的模型,我们用经过训练的模型进行数据分布的模型,我们所测量的模型,我们用经过训练的模型,我们用经过训练的模型进行这种训练的模型的模型,通过训练的校正标定式的模型来测量的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的模型, 。