Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated through experiments for different backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional unlabeled 500K TinyImages dataset. Finally, we explore the direction of combining self-supervised representation learning with self-training for further improvement in backdoor defense.
翻译:最近对模式培训中的后门攻击的研究显示,污染了一小部分培训数据就足以对下游任务中有毒试验时间数据作出错误的操纵预测,同时保持高纯度的准确性。后门攻击的隐秘性在今天的机器学习范式中带来了巨大的防御挑战。在本文中,我们探索了通过额外的无标签数据进行自我训练以减缓后门攻击的可能性。我们首先进行了试点研究,以表明香草自训练在后门减缓方面是无效的。为此,我们提议在自我训练假标签阶段利用强大而适当的数据增强来保护后门攻击。我们发现,新的自我训练制度在很大程度上有助于防御后门攻击。通过在CIFAR-10上试验不同的后门触发器以及将CIFAR-10和另外的500K TinyImage数据集结合起来,证明了这种训练的有效性。最后,我们探索了将自我强化的代言语学习与自我训练相结合以进一步改进后门防卫的方向。