Diffusion models have recently achieved remarkable success in image generation, yet growing evidence shows their vulnerability to backdoor attacks, where adversaries implant covert triggers to manipulate outputs. While existing defenses can detect many such attacks via visual inspection and neural network-based analysis, we identify a more lightweight and stealthy threat, termed BadBlocks. BadBlocks selectively contaminates specific blocks within the UNet architecture while preserving the normal behavior of the remaining components. Compared with prior methods, it requires only about 30% of the computation and 20% of the GPU time, yet achieves high attack success rates with minimal perceptual degradation. Extensive experiments demonstrate that BadBlocks can effectively evade state-of-the-art defenses, particularly attention-based detection frameworks. Ablation studies further reveal that effective backdoor injection does not require fine-tuning the entire network and highlight the critical role of certain layers in backdoor mapping. Overall, BadBlocks substantially lowers the barrier for backdooring large-scale diffusion models, even on consumer-grade GPUs.
翻译:扩散模型近期在图像生成领域取得了显著成功,然而越来越多的证据表明其易受后门攻击的影响,即攻击者通过植入隐蔽触发器来操纵输出。虽然现有防御方法可通过视觉检查和基于神经网络的分析检测许多此类攻击,但我们发现了一种更为轻量且隐蔽的威胁,称为BadBlocks。BadBlocks选择性地污染UNet架构中的特定模块,同时保持其余组件的正常行为。与现有方法相比,其仅需约30%的计算量和20%的GPU时间,即可实现高攻击成功率且感知退化极小。大量实验表明,BadBlocks能有效规避最先进的防御机制,特别是基于注意力的检测框架。消融研究进一步揭示:有效的后门注入无需对整个网络进行微调,并凸显了特定层在后门映射中的关键作用。总体而言,BadBlocks大幅降低了在消费级GPU上对大规模扩散模型植入后门的门槛。