Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using "digital trigger patterns." In contrast, "physical backdoors" use physical objects as triggers, have only recently been identified, and are qualitatively different enough to resist all defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large datasets containing real images of physical objects co-located with targets of classification. Building these datasets is time- and labor-intensive. This works seeks to address the challenge of accessibility for research on physical backdoor attacks. We hypothesize that there may be naturally occurring physically co-located objects already present in popular datasets such as ImageNet. Once identified, a careful relabeling of these data can transform them into training samples for physical backdoor attacks. We propose a method to scalably identify these subsets of potential triggers in existing datasets, along with the specific classes they can poison. We call these naturally occurring trigger-class subsets natural backdoor datasets. Our techniques successfully identify natural backdoors in widely-available datasets, and produce models behaviorally equivalent to those trained on manually curated datasets. We release our code to allow the research community to create their own datasets for research on physical backdoor attacks.
翻译:有关后门毒物攻击的广泛文献研究过使用“ 数字触发模式” 来攻击和防御后门。 相比之下, “ 物理后门” 使用物理物体作为触发器,直到最近才发现,而且质量上的差异足以抵制所有针对数字触发器的防御。 物理后门研究受以下限制: 进入包含物理物体真实图像的大型数据集, 与分类目标同处一处。 建立这些数据集需要时间和劳力的密集性。 这项工作旨在应对物理后门攻击研究的无障碍性挑战。 我们假设在图像网络等流行数据集中可能已经存在自然发生的物理共置物体。 一旦发现, 仔细重新贴上这些数据的标签, 可以将其转化为物理后门攻击的培训样本。 我们提出一种方法, 以精确的方式确定现有数据集中这些潜在触发物的子集, 以及它们可以毒害的具体类别。 我们将这些自然发生的触发触发型级分类的自然后门数据集称为自然后门数据集。 我们的技术成功地在可广泛获取的数据集中成功地识别自然的后门, 并且制作模型, 能够将它们作为我们所训练的手动数据在后门上进行后门研究的后门数据。