A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set. The watermark does not impact the test-time performance of the model on typical data; however, the model reliably errs on watermarked examples. To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor filtering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufficient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors.
翻译:后门数据中毒袭击是一种对抗性攻击,攻击者在其中将几个标记水的、贴错标签的培训范例输入到一个训练组中。 水标记不会影响典型数据模型的测试时间性表现; 但是, 模型在水标记实例上有可靠的错误。 为了更好地了解后门数据中毒袭击, 我们提出了一个正式的理论框架, 可以在其中讨论后门数据中毒袭击的分类问题。 然后我们用这个框架来分析这些袭击的重要统计和计算问题。 在统计方面, 我们确定一个参数, 我们称之为记忆能力, 来捕捉学习问题对后门袭击的内在脆弱性。 这让我们可以争论一些自然学习问题对后门袭击的坚固性。 我们支持攻击者的结果包括明确构建后门袭击, 我们的坚固性结果显示, 一些自然问题环境无法在后门袭击中成功。 我们从计算的角度显示, 在某些假设下, 对抗性培训可以检测后门在训练场的存在。 然后我们显示, 在类似假设下, 两个紧密的逻辑性逻辑性 意味着, 我们从广义上学习一个可靠的逻辑到一般的逻辑, 我们从一个方向上, 向后门的逻辑上, 学习一个相当的逻辑上, 学习一个必要的, 学习一个必要的,, 向后门和一般的逻辑上, 我们从一个等同一个等同一个方向, 学习一个等同一个等的逻辑上, 学习一个等同一个方向, 我们从一个方向的逻辑上, 学习一个方向, 学习一个方向, 学习一个和一般的逻辑上, 学习一个等的逻辑上, 学习到一个等的逻辑上, 。