Self-supervised learning (SSL) is a commonly used approach to learning and encoding data representations. By using a pre-trained SSL image encoder and training a downstream classifier on top of it, impressive performance can be achieved on various tasks with very little labeled data. The increasing usage of SSL has led to an uptick in security research related to SSL encoders and the development of various Trojan attacks. The danger posed by Trojan attacks inserted in SSL encoders lies in their ability to operate covertly and spread widely among various users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This is because downstream tasks are not always known, dataset labels are not available, and even the original training dataset is not accessible during the SSL encoder Trojan detection. This paper presents an innovative technique called SSL-Cleanse that is designed to detect and mitigate backdoor attacks in SSL encoders. We evaluated SSL-Cleanse on various datasets using 300 models, achieving an average detection success rate of 83.7% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.24% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse.
翻译:自监督学习(SSL)是一种常用的学习和编码数据表示方法。通过使用经过预先训练的SSL图像编码器并在其上训练下游分类器,可以在使用非常少的标记数据时在各种任务上实现出色的性能。
SSL的使用增多引起了与SSL编码器相关的安全研究和各种木马攻击的发展。木马攻击对于插入在SSL编码器中的危害在于它们可以隐秘地运行并在各种用户和设备之间广泛传播。插入后门行为的存在可能会由下游分类器意外地继承木马编码器,使得检测和缓解威胁变得更加困难。虽然当前有监督学习中的木马检测方法有可能保护SSL下游分类器,但在其广泛传播之前,识别和解决SSL编码器中的触发器是一项具有挑战性的任务。这是因为下游任务不总是已知的,数据集标签不可用,甚至在SSL编码器木马检测期间,原始训练数据集也不可访问。本文提出了一种创新的技术称为“SSL-Cleanse”,旨在检测和缓解SSL编码器中的后门攻击。我们在各种数据集上使用300个模型对SSL-Cleanse进行了评估,在ImageNet-100上实现了平均检测成功率83.7%。经过减轻后门攻击后,平均来说,木马编码器实现了0.24%的成功攻击率,而几乎不会失去准确性,证明了SSL-Cleanse的有效性。