Deep neural networks (DNNs) are recently shown to be vulnerable to backdoor attacks, where attackers embed hidden backdoors in the DNN model by injecting a few poisoned examples into the training dataset. While extensive efforts have been made to detect and remove backdoors from backdoored DNNs, it is still not clear whether a backdoor-free clean model can be directly obtained from poisoned datasets. In this paper, we first construct a causal graph to model the generation process of poisoned data and find that the backdoor attack acts as the confounder, which brings spurious associations between the input images and target labels, making the model predictions less reliable. Inspired by the causal understanding, we propose the Causality-inspired Backdoor Defense (CBD), to learn deconfounded representations for reliable classification. Specifically, a backdoored model is intentionally trained to capture the confounding effects. The other clean model dedicates to capturing the desired causal effects by minimizing the mutual information with the confounding representations from the backdoored model and employing a sample-wise re-weighting scheme. Extensive experiments on multiple benchmark datasets against 6 state-of-the-art attacks verify that our proposed defense method is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples. Further analysis shows that CBD can also resist potential adaptive attacks. The code is available at \url{https://github.com/zaixizhang/CBD}.
翻译:深心神经网络(DNNS)最近显示很容易受到后门攻击,攻击者通过在培训数据集中插入几个有毒实例,将隐藏的后门嵌入DNN模型,从而在DNN模型中隐藏隐藏的后门;虽然已经作出大量努力,以探测和清除后门DNNN的后门,但仍不清楚是否能直接从有毒数据集中获取一个无后门清洁模型。在本文件中,我们首先为模拟有毒数据的生成过程而构建一个因果图表,发现后门攻击行为是混乱的,使输入图像和目标标签之间产生虚假的联系,使模型预测不那么可靠。在因果理解的启发下,我们提议由Causality所激发的后门防卫(CBD)来学习关于可靠分类的无根据的表述。具体地说,一个后门型模型是有意用来捕捉混结效应的。其他清洁模型致力于通过将相互信息的最小化和后门模型的模拟陈述来获取预期的因果效应,并采用抽样的再加权计划,使模型的预测更不可靠。在因果攻击中进行广泛的实验,同时进行多基基基底线的后门的后门防御(CDRD) 预测,同时进行可能的精确分析,同时进行可能的精确分析,在6级(HDRDRBD) 进一步分析,同时进行可能的预测。</s>