At present, backdoor attacks attract attention as they do great harm to deep learning models. The adversary poisons the training data making the model being injected with a backdoor after being trained unconsciously by victims using the poisoned dataset. In the field of text, however, existing works do not provide sufficient defense against backdoor attacks. In this paper, we propose a Noise-augmented Contrastive Learning (NCL) framework to defend against textual backdoor attacks when training models with untrustworthy data. With the aim of mitigating the mapping between triggers and the target label, we add appropriate noise perturbing possible backdoor triggers, augment the training dataset, and then pull homology samples in the feature space utilizing contrastive learning objective. Experiments demonstrate the effectiveness of our method in defending three types of textual backdoor attacks, outperforming the prior works.
翻译:目前,后门攻击引起了人们的注意,因为它们对深层学习模式造成了极大伤害。敌手毒害了使模型在受害者使用有毒数据集进行无意识训练后被注入后门的培训数据。然而,在文本领域,现有工作不足以抵御幕后攻击。在本文中,我们提议建立一个噪音放大反反向学习框架,以在培训模式不可靠数据时防范文字反向攻击。为了减少触发器和目标标签之间的映射,我们添加了适当的噪音扰动可能的后门触发器,增加培训数据集,然后利用对比性学习目标将同质学样本拉入特征空间。实验表明我们的方法在保护三种文字后门攻击方面的效力,超过了先前的工作。</s>