In this paper, we propose a Universal Defence based on Clustering and Centroids Analysis (CCA-UD) against backdoor attacks. The goal of the proposed defence is to reveal whether a Deep Neural Network model is subject to a backdoor attack by inspecting the training dataset. CCA-UD first clusters the samples of the training set by means of density-based clustering. Then, it applies a novel strategy to detect the presence of poisoned clusters. The proposed strategy is based on a general misclassification behaviour obtained when the features of a representative example of the analysed cluster are added to benign samples. The capability of inducing a misclassification error is a general characteristic of poisoned samples, hence the proposed defence is attack-agnostic. This mask a significant difference with respect to existing defences, that, either can defend against only some types of backdoor attacks, e.g., when the attacker corrupts the label of the poisoned samples, or are effective only when some conditions on the poisoning ratios adopted by the attacker or the kind of triggering pattern used by the attacker are satisfied. Experiments carried out on several classification tasks, considering different types of backdoor attacks and triggering patterns, including both local and global triggers, reveal that the proposed method is very effective to defend against backdoor attacks in all the cases, always outperforming the state of the art techniques.
翻译:在本文中,我们提议基于集群和中枢分析(CCA-UD)的通用防御,以防范幕后攻击;拟议防御的目的是通过检查训练数据集,揭示深神经网络模型是否受到幕后攻击; CCA-UD首先将培训样本集中到密度组群中; 然后,它采用新颖的战略,以发现有毒集体的存在; 拟议的战略基于在分析的组群的典型特征添加到良性样本时获得的普遍错误分类行为; 诱导错误分类错误的能力是中毒样品的一般特征,因此,拟议的防御是攻击性 -- -- 不可知性的。 这掩盖了现有防御方面的重大差异,即攻击者只能保护某些类型的后门攻击,例如,当攻击者腐蚀有毒集体的标签时,或者只有在攻击者采用的中毒比率的某些条件或攻击者使用的触发模式得到满足时,才有效。 在几项分类任务中进行了实验,考虑到不同种类的后门攻击,因此拟议的防御是攻击的 -- -- 攻击方法总是能够引发各种后门攻击,因此总是可以避免某些类型的后门攻击方式,包括提议的当地和反向式攻击。