Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. However, recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL), due to their lesser need for labeled data. Successful backdoor attacks have also been demonstrated in these new settings. However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings. By evaluating 56 attack settings, we show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label attack. In addition, they either become inapplicable or suffer large performance losses when applied to SSL and TL. We propose a new detection method called Active Separation via Offset (ASSET), which actively induces different model behaviors between the backdoor and clean samples to promote their separation. We also provide procedures to adaptively select the number of suspicious points to remove. In the end-to-end SL setting, ASSET is superior to existing methods in terms of consistency of defensive performance across different attacks and robustness to changes in poison ratios; in particular, it is the only method that can detect the state-of-the-art clean-label attack. Moreover, ASSET's average detection rates are higher than the best existing methods in SSL and TL, respectively, by 69.3% and 33.2%, thus providing the first practical backdoor defense for these new DL settings. We open-source the project to drive further development and encourage engagement: https://github.com/ruoxi-jia-group/ASSET.
翻译:传统上,在终端到终端监管的学习(SL)环境中研究后门数据检测。然而,近年来,由于对标签数据的需求较少,自我监督学习(SSL)和转移学习(TL)的采用率呈上升趋势,原因是对标签数据的需求较少。在这些新环境下也展示了成功的后门袭击。然而,我们对于现有检测方法在各种学习环境中的适用性缺乏透彻的理解。通过对56个袭击环境进行评估,我们发现,大多数现有检测方法的性能在不同袭击和毒药比率之间有很大差异,而且所有最先进的清洁标签袭击都失败了。此外,它们要么变得不适用,要么在应用SL和TL时遭受了巨大的性能损失。我们提出了一种名为“通过Offset(ASSET)主动分离”的新型检测方法,积极诱导出后门和干净样本之间不同的模式行为,以促进其分离。我们还提供了适应性地选择要删除的可疑点的程序。在SLFOF2的后端设置中,ASET比现有的最佳防御性评估方法要优于现有方法。提供不同袭击中的最佳防御性操作,因此,SASL3的SARSL的检测率是SASAR标准,因此,在SASAR标准中可以分别测测测测测测出现有标准。我们测算出现有标准。