The phenomenon of adversarial examples in deep learning models has caused substantial concern over their reliability. While many deep neural networks have shown impressive performance in terms of predictive accuracy, it has been shown that in many instances an imperceptible perturbation can falsely flip the network's prediction. Most research has then focused on developing defenses against adversarial attacks or learning under a worst-case adversarial loss. In this work, we take a step back and aim to provide a framework for determining whether a model's label change under small perturbation is justified (and when it is not). We carefully argue that adversarial robustness should be defined as a locally adaptive measure complying with the underlying distribution. We then suggest a definition for an adaptive robust loss, derive an empirical version of it, and develop a resulting data-augmentation framework. We prove that our adaptive data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels and provide illustrative empirical evaluations.
翻译:深层次学习模型中的对抗性实例现象已引起人们对其可靠性的极大关切。虽然许多深层神经网络在预测准确性方面表现出令人印象深刻的性能,但已经表明,在许多情况下,不可察觉的干扰会错误地推翻网络的预测。随后,大多数研究侧重于在最坏的对抗性攻击或最坏的对抗性损失下发展防御或学习。在这项工作中,我们退后一步,目的是提供一个框架,以确定在小扰动下改变模型标签是否合理(在不可靠的情况下)。我们仔细论证,应将对抗性稳健性定义为符合基本分布的当地适应性措施。我们然后提出适应性强力损失的定义,提出经验性版本,并据此制定数据增强框架。我们证明,适应性的数据增强在确定性标签下保持了最早的邻居分类的一致性,并提供了说明性的经验评价。