The prompt-based learning paradigm has gained much research attention recently. It has achieved state-of-the-art performance on several NLP tasks, especially in the few-shot scenarios. While steering the downstream tasks, few works have been reported to investigate the security problems of the prompt-based models. In this paper, we conduct the first study on the vulnerability of the continuous prompt learning algorithm to backdoor attacks. We observe that the few-shot scenarios have posed a great challenge to backdoor attacks on the prompt-based models, limiting the usability of existing NLP backdoor methods. To address this challenge, we propose BadPrompt, a lightweight and task-adaptive algorithm, to backdoor attack continuous prompts. Specially, BadPrompt first generates candidate triggers which are indicative for predicting the targeted label and dissimilar to the samples of the non-targeted labels. Then, it automatically selects the most effective and invisible trigger for each sample with an adaptive trigger optimization algorithm. We evaluate the performance of BadPrompt on five datasets and two continuous prompt models. The results exhibit the abilities of BadPrompt to effectively attack continuous prompts while maintaining high performance on the clean test sets, outperforming the baseline models by a large margin. The source code of BadPrompt is publicly available at https://github.com/papersPapers/BadPrompt.
翻译:以迅速为基础的学习模式最近引起了许多研究关注。 它在一些国家劳工计划任务上取得了最先进的表现, 特别是在几发情景中。 在指导下游任务时,几乎没有报告调查基于迅速模式的安全问题的著作。 在本文件中,我们对连续的迅速学习算法对后门攻击的脆弱性进行了首次研究。 我们观察到, 几发情景对基于迅速模式的后门攻击提出了巨大挑战, 限制了现有国家劳工计划后门方法的可用性。 为了应对这一挑战, 我们提议了BadPrompt、轻量级和任务适应性算法, 用于后门攻击连续的提示。 特别是, BadPrompt首先生成了用于预测目标标签和与非目标标签样本不同的候选触发器。 然后, 它自动选择了每种样本中最有效、最看不见的触发器, 并使用适应性的触发器优化算法。 我们评估了五个数据集和两个连续的快速模型的BadPrompt的性能。 结果表明, BadPrompers 能够有效攻击持续地进行大规模基准/ 快速测试 。