We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a denoising diffusion probabilistic model (DDPM) to degrade an image with noise and then recover said image using the learned reverse diffusion. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. This performance is achieved without requiring access to the model parameters of the Trojan network, meaning Sancdifi operates as a black-box defense.
翻译:我们提议了一种小算法,即 " 高射量有条件扩散(Sancdifi) ",这是针对后门攻击的最先进的防御手段。 Sancdifi使用一种无名扩散概率模型(DDPM),用噪音来降解图像,然后用学习的反向扩散来恢复所述图像。 关键是,我们计算出基于地图的突出面罩,以制约我们的传播,让DDPM在最突出的像素上进行更强有力的扩散。 结果, Sancdifi非常有效地将后门攻击毒害的数据中的触发器分解出来。 同时, Sancdifi在应用到清洁数据时, 也可靠地恢复了显著的特征。 实现这一功能不需要使用Trojan网络的模型参数, 意思是 Sancdifi作为黑盒防御工具运作。