Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.
翻译:神经网络很容易受到后门中毒袭击,攻击者恶意毒害培训,并在测试输入中插入触发器,以改变对受害者模型的预测。现有的后门攻击防御要么不提供正式保障,要么提供昂贵的、可计算且无效的概率保障。我们介绍了PECAN,这是防范后门攻击的有效且经认证的方法。PECAN的关键洞察力是将现成的试验时回避认证技术应用到一套神经网络上,这些网络受过关于数据脱节的训练。我们在图像分类和恶意软件检测数据集方面对PECAN进行了评估。我们的结果表明,PECAN能够(1) 在防御力量和效率方面大大超过最先进的经认证的后门防御系统,(2) 在真正的后门攻击方面,PECAN能够将攻击成功率从数量上降低,如果与文献的基线范围相比较的话。