线性强盗何时可以攻击? (When Are Linear Stochastic Bandits Attackable?) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 线性的 · 回合 · contrastive ·

2021 年 10 月 18 日

When Are Linear Stochastic Bandits Attackable?

翻译：线性强盗何时可以攻击?

Huazheng Wang,Haifeng Xu,Hongning Wang

We study adversarial attacks on linear stochastic bandits, a sequential decision making problem with many important applications in recommender systems, online advertising, medical treatment, and etc. By manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this observation, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a full necessity and sufficiency characterization of attackability based on the geometry of the context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the current environment is attackable, and if Yes, modifies the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed method.

翻译：我们研究对线性随机强盗的对抗性攻击,这是在建议系统、在线广告、医疗等许多重要应用中相继决策的问题。通过操纵奖赏,对手的目的是控制土匪算法的行为。也许令人惊讶的是,我们首先显示某些攻击目标永远无法实现。这与没有背景的随机强盗形成鲜明对比,其内在原因是线性随机强盗中的武器相互关联。受此观察的驱使,本文研究的是美元武装线性强盗环境的可攻击性。我们首先根据上下文矢量的几何方法,对攻击性作出完全必要和充分的定性。我们随后提出了针对LinUCB和Robust 阶段消除的两阶段攻击方法。该方法首先说明目前的环境是否可攻击,如果是的话,则改变奖励,以只使用亚线性成本来迫使算法拉动目标直线时间。数字实验进一步证实拟议方法的有效性和成本效益。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

专知会员服务

36+阅读 · 2019年11月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

Arxiv

0+阅读 · 2021年12月13日

BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning

Arxiv

0+阅读 · 2021年12月12日

Learning with Subset Stacking

Arxiv

0+阅读 · 2021年12月12日

Efficient Action Poisoning Attacks on Linear Contextual Bandits

Arxiv

0+阅读 · 2021年12月10日

Are We There Yet? Timing and Floating-Point Attacks on Differential Privacy Systems

Arxiv

0+阅读 · 2021年12月10日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Local Model Poisoning Attacks to Byzantine-Robust Federated Learning

Arxiv

4+阅读 · 2019年11月26日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

专知会员服务

36+阅读 · 2019年11月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

Arxiv

0+阅读 · 2021年12月13日

BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning

Arxiv

0+阅读 · 2021年12月12日

Learning with Subset Stacking

Arxiv

0+阅读 · 2021年12月12日

Efficient Action Poisoning Attacks on Linear Contextual Bandits

Arxiv

0+阅读 · 2021年12月10日

Are We There Yet? Timing and Floating-Point Attacks on Differential Privacy Systems

Arxiv

0+阅读 · 2021年12月10日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Local Model Poisoning Attacks to Byzantine-Robust Federated Learning

Arxiv

4+阅读 · 2019年11月26日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

微信扫码咨询专知VIP会员