STAS:在带有一个目标的无标签样本的自监督的构件中,有效而稳定的Trojan攻击 (ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders with One Target Unlabelled Sample)

Emerging self-supervised learning (SSL) has become a popular image representation encoding method to obviate the reliance on labeled data and learn rich representations from large-scale, ubiquitous unlabelled data. Then one can train a downstream classifier on top of the pre-trained SSL image encoder with few or no labeled downstream data. Although extensive works show that SSL has achieved remarkable and competitive performance on different downstream tasks, its security concerns, e.g, Trojan attacks in SSL encoders, are still not well-studied. In this work, we present a novel Trojan Attack method, denoted by ESTAS, that can enable an effective and stable attack in SSL encoders with only one target unlabeled sample. In particular, we propose consistent trigger poisoning and cascade optimization in ESTAS to improve attack efficacy and model accuracy, and eliminate the expensive target-class data sample extraction from large-scale disordered unlabelled data. Our substantial experiments on multiple datasets show that ESTAS stably achieves > 99% attacks success rate (ASR) with one target-class sample. Compared to prior works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on average.

翻译：由自我监督的新兴学习(SSL)已成为一种流行的图像代号(SSL)方法,以避免依赖标签数据,并从大规模、无处不在的无标签数据中学习丰富的展示。然后,可以在经过事先训练的SSL图像编码器之上培训一个下游分类,但很少有或没有标签的下游数据。尽管广泛的工作表明SSL在不同下游任务上取得了显著和竞争性的业绩,但其安全关切,例如SLS的Trojan攻击编码器等,仍未得到充分研究。在这项工作中,我们介绍了一种新型Trojan攻击方法,由ESAAS指出,这种方法能够使SL编码器中只有一个目标未贴标签的样本有效和稳定地进行攻击。特别是,我们提议在ESAAS中持续触发中毒和级优化,以提高攻击效果和模型准确性,并消除大规模无标签数据中昂贵的目标级数据样本提取。我们在多个数据集上进行的大量实验表明,ESAAS稳步达到 > 99%的攻击成功率(ASR),而一个目标值为ASTS > 平均比例的ARC将提高。