As machine learning (ML) systems are being increasingly employed in the real world to handle sensitive tasks and make decisions in various fields, the security and privacy of those models have also become increasingly critical. In particular, Deep Neural Networks (DNN) have been shown to be vulnerable to backdoor attacks whereby adversaries have access to the training data and the opportunity to manipulate such data by inserting carefully developed samples into the training dataset. Although the NLP community has produced several studies on generating backdoor attacks proving the vulnerable state of language modes, to the best of our knowledge, there does not exist any work to combat such attacks. To bridge this gap, we present RobustEncoder: a novel clustering-based technique for detecting and removing backdoor attacks in the text domain. Extensive empirical results demonstrate the effectiveness of our technique in detecting and removing backdoor triggers. Our code is available at https://github.com/marwanomar1/Backdoor-Learning-for-NLP
翻译:随着在现实世界越来越多地使用机器学习(ML)系统处理敏感任务和在各个领域作出决定,这些模型的安全和隐私也变得越来越重要,特别是深神经网络(DNN)被证明很容易受到幕后攻击,因为对手获得培训数据,有机会通过在培训数据集中插入精心开发的样本来操纵这些数据。虽然国家语言方案社区已经就后门攻击的产生进行了几项研究,以证明语言模式的脆弱状态,但据我们所知,没有任何工作可以打击这种攻击。为弥补这一差距,我们介绍了RobustEncoder:一种基于集群的新型技术,用以探测和消除文本领域的后门攻击。广泛的实证结果表明,我们在探测和清除后门触发器方面的技术是有效的。我们的代码可在https://github.com/marwanomar1/Backdoor-Learning-for-NLP上查阅。我们可查到https://github.com/maromar1/Backdoor-LP。