Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical. Code is available at: https://github.com/luluppang/BCU.
翻译:由于深度神经网络(DNN)需求的计算越来越大,公司和组织已开始外包培训过程。然而,外部培训的DNN可能会遭到后门攻击。防范此类攻击是至关重要的,即对可疑模型进行后期处理,使其后门行为得到缓解,同时其对清洁输入的正常预测能力保持不受损。为了消除异常后门行为,现有方法主要依赖于额外的有标签干净样本。然而,这种要求可能是不现实的,因为培训数据通常对最终用户不可用。在本文中,我们研究了绕过这种障碍的可能性。我们提出了一种新颖的防御方法,不需要培训标签。通过精心设计的逐层权重重新初始化和知识蒸馏,我们的方法可以有效地清洗可疑网络的后门行为,而其在正常输入上的正常行为保持不受损害。实验中,我们展示了我们的方法,无需标签即可与使用标签的最先进防御方法相当。我们还观察到即使在分布之外的数据上也有有希望的防御效果,这使我们的方法非常实用。代码可在以下链接处获得: https://github.com/luluppang/BCU。