Recent studies show that text to image (T2I) diffusion models are vulnerable to backdoor attacks, where a trigger in the input prompt can steer generation toward harmful or unintended content. To address this, we introduce PEPPER (PErcePtion Guided PERturbation), a backdoor defense that rewrites the caption into a semantically distant yet visually similar caption while adding unobstructive elements. With this rewriting strategy, PEPPER disrupt the trigger embedded in the input prompt, dilute the influence of trigger tokens and thereby achieve enhanced robustness. Experiments show that PEPPER is particularly effective against text encoder based attacks, substantially reducing attack success while preserving generation quality. Beyond this, PEPPER can be paired with any existing defenses yielding consistently stronger and generalizable robustness than any standalone method. Our code will be released on Github.
翻译:近期研究表明,文本到图像(T2I)扩散模型易受后门攻击,输入提示中的触发器可引导生成有害或非预期内容。为此,我们提出PEPPER(感知引导扰动),一种通过将输入描述重写为语义差异显著但视觉相似、并添加非干扰性元素的后门防御方法。该重写策略可破坏输入提示中嵌入的触发器,削弱触发词符的影响,从而提升鲁棒性。实验表明,PEPPER对基于文本编码器的攻击尤为有效,在保持生成质量的同时显著降低攻击成功率。此外,PEPPER可与现有防御方法结合使用,相比单一方法能持续产生更强且可泛化的鲁棒性。代码将在Github上开源。