Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models - 专知论文

会员服务 ·

0

Performer · Prompt · state-of-the-art · MoDELS · 语言模型化 ·

2023 年 5 月 2 日

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

翻译：暂无翻译

Shuai Zhao,Jinming Wen,Luu Anh Tuan,Junbo Zhao,Jie Fu

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.

翻译：暂无翻译

0

相关内容

Performer

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

小麦铜转运蛋白TaCT1在干旱胁迫响应和条锈病抗性过程中的功能和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂筏相关蛋白β-adducin调控PSGL-1介导的中性粒细胞起始黏附的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

浸润性T淋巴细胞表达IRF-7对骨性关节炎微环境的调控作用与补肾活血中药干预的研究

国家自然科学基金

0+阅读 · 2014年12月31日

HIF-1α对IgG免疫复合物诱导巨噬细胞炎症反应的调控作用

国家自然科学基金

1+阅读 · 2013年12月31日

NLRP3炎症小体介导炎性微环境对舌癌干细胞形成的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎症小体及相关信号通路介导钩端螺旋体感染性炎症反应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TAM/Gas6在石英粉尘致炎性反应及纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

He和H离子注入Si基材料引起的表面剥离及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nalp3炎性体在石英粉尘致纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Arxiv

0+阅读 · 2023年6月14日

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Arxiv

0+阅读 · 2023年6月14日

Multi-target Backdoor Attacks for Code Pre-trained Models

Arxiv

0+阅读 · 2023年6月14日

A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

Arxiv

0+阅读 · 2023年6月14日

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Arxiv

0+阅读 · 2023年6月13日

Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios

Arxiv

0+阅读 · 2023年6月13日

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

Arxiv

0+阅读 · 2023年6月13日

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Arxiv

0+阅读 · 2023年6月13日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Backdoor Learning: A Survey

Arxiv

15+阅读 · 2020年10月26日

VIP会员

文章信息

相关主题

state-of-the-art

语言模型化

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

相关论文

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Arxiv

0+阅读 · 2023年6月14日

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Arxiv

0+阅读 · 2023年6月14日

Multi-target Backdoor Attacks for Code Pre-trained Models

Arxiv

0+阅读 · 2023年6月14日

A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

Arxiv

0+阅读 · 2023年6月14日

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Arxiv

0+阅读 · 2023年6月13日

Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios

Arxiv

0+阅读 · 2023年6月13日

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

Arxiv

0+阅读 · 2023年6月13日

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Arxiv

0+阅读 · 2023年6月13日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Backdoor Learning: A Survey

Arxiv

15+阅读 · 2020年10月26日

相关基金

小麦铜转运蛋白TaCT1在干旱胁迫响应和条锈病抗性过程中的功能和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂筏相关蛋白β-adducin调控PSGL-1介导的中性粒细胞起始黏附的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

浸润性T淋巴细胞表达IRF-7对骨性关节炎微环境的调控作用与补肾活血中药干预的研究

国家自然科学基金

0+阅读 · 2014年12月31日

HIF-1α对IgG免疫复合物诱导巨噬细胞炎症反应的调控作用

国家自然科学基金

1+阅读 · 2013年12月31日

NLRP3炎症小体介导炎性微环境对舌癌干细胞形成的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎症小体及相关信号通路介导钩端螺旋体感染性炎症反应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TAM/Gas6在石英粉尘致炎性反应及纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

He和H离子注入Si基材料引起的表面剥离及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nalp3炎性体在石英粉尘致纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员