Backdoors are powerful attacks against deep neural networks (DNNs). By poisoning training data, attackers can inject hidden rules (backdoors) into DNNs, which only activate on inputs containing attack-specific triggers. While existing work has studied backdoor attacks on a variety of DNN models, they only consider static models, which remain unchanged after initial deployment. In this paper, we study the impact of backdoor attacks on a more realistic scenario of time-varying DNN models, where model weights are updated periodically to handle drifts in data distribution over time. Specifically, we empirically quantify the "survivability" of a backdoor against model updates, and examine how attack parameters, data drift behaviors, and model update strategies affect backdoor survivability. Our results show that one-shot backdoor attacks (i.e., only poisoning training data once) do not survive past a few model updates, even when attackers aggressively increase trigger size and poison ratio. To stay unaffected by model update, attackers must continuously introduce corrupted data into the training pipeline. Together, these results indicate that when models are updated to learn new data, they also "forget" backdoors as hidden, malicious features. The larger the distribution shift between old and new training data, the faster backdoors are forgotten. Leveraging these insights, we apply a smart learning rate scheduler to further accelerate backdoor forgetting during model updates, which prevents one-shot backdoors from surviving past a single model update.
翻译:后门是针对深层神经网络的强大攻击。 通过毒害培训数据,袭击者可以将隐藏规则(后门)注入DNN, 后者仅对含有攻击特定触发因素的投入进行激活。 虽然现有工作已经研究过各种DNN模型的后门攻击, 但它们只考虑静态模型, 在最初部署后, 这些模型保持不变。 在本文中, 我们研究后门攻击对时间变化的 DNN模型这一更现实的情景的影响, 模型重量定期更新, 以便处理数据流随时间推移。 具体地说, 我们通过实验将后门的“ 生存能力” 与模型更新相比进行量化, 并研究攻击参数、 数据漂移行为和模式更新战略如何影响后门生存能力。 我们的结果显示, 一发后门攻击( 即只有中毒培训数据一次) 无法通过几个模型更新而幸存下来, 即使攻击者会急剧增加触发规模和毒害比率。 为了不受模型更新的影响, 攻击者必须不断将腐败的数据引入培训管道中。 此外, 这些结果表明, 当模型被更新到新数据时, 当模型来学习新数据时, 变速度时, 它们“ 更快地在旧的路径中, 它们“ 更快地应用后 ” 隐藏的学习时间 。