恢复触发状态：在强化学习中保护模型免受后门攻击 (Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning) - 专知论文

会员服务 ·

0

后门攻击 · 攻击 · RTS · 动态模型 · 强化学习 ·

2023 年 4 月 4 日

Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning

翻译：恢复触发状态：在强化学习中保护模型免受后门攻击

Hao Chen,Chen Gong,Yizhe Wang,Xinwen Hou

A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent. Such attacks compromise the RL system's reliability, leading to potentially catastrophic results in various key fields. In contrast, relatively limited research has investigated effective defenses against backdoor attacks in RL. This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks. RTS involves building a surrogate network to approximate the dynamics model. Developers can then recover the environment from the triggered state to a clean state, thereby preventing attackers from activating backdoors hidden in the agent by presenting the trigger. When training the surrogate to predict states, we incorporate agent action information to reduce the discrepancy between the actions taken by the agent on predicted states and the actions taken on real states. RTS is the first approach to defend against backdoor attacks in a single-agent setting. Our results show that using RTS, the cumulative reward only decreased by 1.41% under the backdoor attack.

翻译：后门攻击允许恶意用户操纵环境或破坏训练数据，从而向受训代理中插入后门。此类攻击会危及RL系统的可靠性，可能导致各个重要领域出现潜在灾难性后果。相比之下，对于防御RL中的后门攻击，相对较少的研究探讨了有效的防御方法。本文提出了一种新颖的方法——恢复触发状态（RTS），有效保护受害代理免受后门攻击。RTS 包括构建替代网络以近似环境动态模型。开发人员可以从触发状态恢复环境到清洁状态，从而防止攻击者通过触发将后门激活在代理中。训练替代模型以预测状态时，我们使用代理的行动信息，以降低代理在预测状态上所采取行动和在真实状态上所采取行动之间的差异。RTS 是第一种在单一代理设置中防御后门攻击的方法。我们的实验结果表明，在后门攻击下，使用 RTS，累计奖励仅下降了 1.41%。

0

相关内容

后门攻击

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

专知会员服务

12+阅读 · 2022年8月27日

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

专知会员服务

20+阅读 · 2022年7月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

专知会员服务

80+阅读 · 2020年3月4日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于负调查的云数据隐私保护关键问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

神经反馈康复训练的反馈策略和控制方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

溶酶体自我稳定通路对中性粒细胞胞外陷阱（NETs）形成的调控效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

TCDD经SSeCKS/TRAF6通路诱导星形胶质细胞激活致神经毒性的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

酸敏感离子通道(ASICs)在过敏性紫癜患儿血管内皮细胞损伤中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

Kalirin 7 在雌激素调节海马神经元可塑性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

云存储系统安全关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

脱－γ－羧基凝血酶原(Des-γ-carboxyl prothrombin DCP)促进肝癌恶性增殖与转移作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

脑损伤过程中星形胶质细胞保护神经元的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Sequential Underspecified Instrument Selection for Cause-Effect Estimation

Sequential Underspecified Instrument Selection for Cause-Effect Estimation

Arxiv

0+阅读 · 2023年5月25日

Comparing Software Developers with ChatGPT: An Empirical Investigation

Arxiv

0+阅读 · 2023年5月25日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

Augmented Random Search for Multi-Objective Bayesian Optimization of Neural Networks

Arxiv

0+阅读 · 2023年5月23日

Adversarial Color Projection: A Projector-based Physical Attack to DNNs

Arxiv

0+阅读 · 2023年5月23日

REGARD: Rules of EngaGement for Automated cybeR Defense to aid in Intrusion Response

Arxiv

0+阅读 · 2023年5月23日

Adversarial Neon Beam: A Light-based Physical Attack to DNNs

Arxiv

0+阅读 · 2023年5月23日

Multi-Agent Reinforcement Learning for Network Routing in Integrated Access Backhaul Networks

Arxiv

0+阅读 · 2023年5月12日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

专知会员服务

12+阅读 · 2022年8月27日

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

专知会员服务

20+阅读 · 2022年7月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

【综述】联邦学习的威胁，Threats to Federated Learning: A Survey

专知会员服务

80+阅读 · 2020年3月4日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Sequential Underspecified Instrument Selection for Cause-Effect Estimation

Sequential Underspecified Instrument Selection for Cause-Effect Estimation

Arxiv

0+阅读 · 2023年5月25日

Comparing Software Developers with ChatGPT: An Empirical Investigation

Arxiv

0+阅读 · 2023年5月25日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

Augmented Random Search for Multi-Objective Bayesian Optimization of Neural Networks

Arxiv

0+阅读 · 2023年5月23日

Adversarial Color Projection: A Projector-based Physical Attack to DNNs

Arxiv

0+阅读 · 2023年5月23日

REGARD: Rules of EngaGement for Automated cybeR Defense to aid in Intrusion Response

Arxiv

0+阅读 · 2023年5月23日

Adversarial Neon Beam: A Light-based Physical Attack to DNNs

Arxiv

0+阅读 · 2023年5月23日

Multi-Agent Reinforcement Learning for Network Routing in Integrated Access Backhaul Networks

Arxiv

0+阅读 · 2023年5月12日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

基于负调查的云数据隐私保护关键问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

神经反馈康复训练的反馈策略和控制方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

溶酶体自我稳定通路对中性粒细胞胞外陷阱（NETs）形成的调控效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

TCDD经SSeCKS/TRAF6通路诱导星形胶质细胞激活致神经毒性的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

酸敏感离子通道(ASICs)在过敏性紫癜患儿血管内皮细胞损伤中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

Kalirin 7 在雌激素调节海马神经元可塑性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

云存储系统安全关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

脱－γ－羧基凝血酶原(Des-γ-carboxyl prothrombin DCP)促进肝癌恶性增殖与转移作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

脑损伤过程中星形胶质细胞保护神经元的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员