Adversarial Demonstration Attacks on Large Language Models - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Performer · 稳健性 · Learning ·

2023 年 5 月 24 日

Adversarial Demonstration Attacks on Large Language Models

翻译：暂无翻译

Jiongxiao Wang,Zichen Liu,Keun Hee Park,Muhao Chen,Chaowei Xiao

from arxiv, Work in Progress

With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: attackers can manipulate only the demonstrations without changing the input to perform an attack. In this paper, we investigate the security concern of ICL from an adversarial perspective, focusing on the impact of demonstrations. We propose an ICL attack based on TextAttack, which aims to only manipulate the demonstration without changing the input to mislead the models. Our results demonstrate that as the number of demonstrations increases, the robustness of in-context learning would decreases. Furthermore, we also observe that adversarially attacked demonstrations exhibit transferability to diverse input examples. These findings emphasize the critical security risks associated with ICL and underscore the necessity for extensive research on the robustness of ICL, particularly given its increasing significance in the advancement of LLMs.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

靶向调控SDF-1/CXCR4信号通路干预关节软骨退变的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

锆基合金氧化物纳米管阵列的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

DCC/UNC5H2对实验性脑梗死后远隔细胞凋亡和神经可塑性的作用

国家自然科学基金

0+阅读 · 2009年12月31日

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Arxiv

0+阅读 · 2023年7月7日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于AI的动态任务分配策略实现多智能体系统有意义人类控制》报告

《超越连接：AI驱动网络未来愿景》最新报告

人工智能赋能多域作战：能力与挑战

《战场空间决策优势：AI基础与应用研究》总结报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Arxiv

0+阅读 · 2023年7月7日

Advances in adversarial attacks and defenses in computer vision: A survey

Arxiv

22+阅读 · 2021年9月2日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Generating Adversarial Examples with Adversarial Networks

Arxiv

10+阅读 · 2018年1月15日

相关基金

靶向调控SDF-1/CXCR4信号通路干预关节软骨退变的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

锆基合金氧化物纳米管阵列的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

DCC/UNC5H2对实验性脑梗死后远隔细胞凋亡和神经可塑性的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员