Backdoor attacks inject poisoning samples during training, with the goal of enforcing a machine-learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their success are not yet well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the success of backdoor attacks inherently depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters, and (ii) the fraction of backdoor samples injected into the training set. These factors affect how fast a machine-learning model learns to correlate the presence of a backdoor trigger with the target class. Interestingly, our analysis shows that there exists a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks become ineffective, thereby suggesting novel criteria to improve existing defenses.
翻译:在培训期间,后门攻击注射中毒样本,目的是执行一个机器学习模型,以便在测试时提出特定触发因素时输出攻击者选择的等级。虽然后门攻击在各种场合和不同的模型中已经证明,但影响其成功的因素还没有得到很好理解。在这项工作中,我们提供了一个统一框架,在渐进学习和影响功能的视角下研究后门学习过程。我们表明后门攻击的成功必然取决于(一) 学习算法的复杂性,由超参数控制,以及(二) 输入训练成套的后门样品的分数。这些因素影响到一个机器学习模型学会如何迅速将后门触发因素的存在与目标类别联系起来。有趣的是,我们的分析表明,在超参数空间中存在一个区域,清洁试验样品的准确度仍然很高,而后门攻击则变得无效,从而提出了改进现有防御的新标准。