Deep Neural Networks (DNNs) have achieved excellent performance in various fields. However, DNNs' vulnerability to Adversarial Examples (AE) hinders their deployments to safety-critical applications. This paper presents a novel AE detection framework, named BEYOND, for trustworthy predictions. BEYOND performs the detection by distinguishing the AE's abnormal relation with its augmented versions, i.e. neighbors, from two prospects: representation similarity and label consistency. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label for its highly informative representation capacity compared to supervised learning models. For clean samples, their representations and predictions are closely consistent with their neighbors, whereas those of AEs differ greatly. Furthermore, we explain this observation and show that by leveraging this discrepancy BEYOND can effectively detect AEs. We develop a rigorous justification for the effectiveness of BEYOND. Furthermore, as a plug-and-play model, BEYOND can easily cooperate with the Adversarial Trained Classifier (ATC), achieving the state-of-the-art (SOTA) robustness accuracy. Experimental results show that BEYOND outperforms baselines by a large margin, especially under adaptive attacks. Empowered by the robust relation net built on SSL, we found that BEYOND outperforms baselines in terms of both detection ability and speed. Our code will be publicly available.
翻译:深心神经网络(DNNS)在不同领域取得了优异的成绩。然而,DNNS在对准示例(AE)上的脆弱性阻碍了他们被部署到安全关键应用中。本文件展示了一个名为BEYOND的新颖的AE检测框架,用于进行值得信赖的预测。BEYOND通过区分AE及其扩大版本的异常关系(即邻居)来进行检测,这种异常关系来自两个前景:代表相似性和标签一致性。一个现成的自爆学习模式(SSL)被用来提取代表性,并预测其与受监督的学习模式相比高度信息化的代表能力。对于清洁样品,其表达和预测与邻居非常一致,而AEE的检测框架则大相径庭。此外,我们解释了这一观察,并表明BEONDUD能够利用这种差异有效地检测AE。我们为BOND建立严格的理由。此外,作为插播模式,BEYOND可以很容易与AD(AT)培训师(ATC)合作,实现州-BAS-CFSB基准下的准确性测试能力,特别是通过BFAFAUDRB(SUDRBBB)在BBBBBBBBBBBBBFRAFDRUDRB中建立的大规模基准下展示。