Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs).Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i.e., attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between both teacher and student models during knowledge distillation, ARGD can eradicate more backdoor triggers than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94.85% reduction in ASR, while ACC can be improved by up to 3.23%.
翻译:由于人工智能技术的繁荣,对手设计了越来越多的后门来攻击深神经网络。虽然最先进的神经注意力蒸馏法(NAD)能够有效地消除DNN的后门触发器,但它仍然受到不可忽略的攻击成功率(ASR)和降低的ACCUracy(ACC)的影响,因为NAD利用同一顺序的注意特征(即注意地图)专注于后门防御。在本文中,我们引入了一个名为“注意反应图蒸馏”(ARGD)的新颖的后门防御框架,它充分探讨注意特征与不同订单之间的关系,同时利用我们拟议的注意反应图(ARGs) 。根据教师和学生模型在知识蒸馏过程中的一致,ARGD可以消除比NAD更多的后门触发器。全面实验结果表明,针对最近的六次后门攻击(即注意地图),ARGD将NAD排出NAD, 减到ASR的94.85%,而ACC则可以改进到3.23%。