强盗强盗学习不完美背景 (Robust Bandit Learning with Imperfect Context) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 稳健性 · ARM · 上下文赌博机/上下文老虎机 · INFORMS ·

2021 年 3 月 4 日

Robust Bandit Learning with Imperfect Context

翻译：强盗强盗学习不完美背景

Jianyi Yang,Shaolei Ren

from arxiv, Accepted by AAAI-21

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis.

翻译：上下文多武器土匪的一个标准假设是,在选择手臂之前,真实的背景是完全知道的。然而,在许多实际应用中(如云层资源管理),在选择手臂之前,背景信息只能通过预测错误或对抗性修改才能获得。在本文中,我们研究了一个背景土匪环境,在每次回合结束时披露真实背景时,只能提供不完美的环境来选择手臂。我们建议两种强大的手臂选择算法:最大程度地利用最坏情况的奖励的Max MinUCB(最大程度的最小值UCB)和最大限度地减少最坏情况的退化的MinWD(最小度最坏情况退化),以尽量减少最坏情况的遗憾。重要的是,我们分析Max MinUCB和MINWD的稳健性,与了解真实背景的神器相比,我们得出遗憾和奖赏的界限。我们的结果显示,随着时间的流逝,Max MincucB和MinWD都以同样的方式和最优秀的对应方来履行奖赏功能。最后,我们应用Max Mincuard B和MinWD 来进行在线边缘数据中心的选择,并进行合成模拟来验证我们的理论分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

122+阅读 · 2019年11月24日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

Arxiv

0+阅读 · 2021年4月27日

Floodgate: inference for model-free variable importance

Arxiv

0+阅读 · 2021年4月27日

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Arxiv

0+阅读 · 2021年4月26日

A Survey Of Regression Algorithms And Connections With Deep Learning

A Survey Of Regression Algorithms And Connections With Deep Learning

Arxiv

0+阅读 · 2021年4月26日

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

Arxiv

0+阅读 · 2021年4月24日

Robust Federated Learning by Mixture of Experts

Robust Federated Learning by Mixture of Experts

Arxiv

0+阅读 · 2021年4月23日

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

Arxiv

0+阅读 · 2021年4月23日

Robust Graph Neural Network Against Poisoning Attacks via Transfer Learning

Arxiv

6+阅读 · 2019年8月20日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning

Arxiv

3+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

赌博机/老虎机

上下文赌博机/上下文老虎机

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

【普林斯顿大学-微软】加权元学习，Weighted Meta-Learning

专知会员服务

40+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

122+阅读 · 2019年11月24日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

Arxiv

0+阅读 · 2021年4月27日

Floodgate: inference for model-free variable importance

Arxiv

0+阅读 · 2021年4月27日

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Arxiv

0+阅读 · 2021年4月26日

A Survey Of Regression Algorithms And Connections With Deep Learning

A Survey Of Regression Algorithms And Connections With Deep Learning

Arxiv

0+阅读 · 2021年4月26日

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

Arxiv

0+阅读 · 2021年4月24日

Robust Federated Learning by Mixture of Experts

Robust Federated Learning by Mixture of Experts

Arxiv

0+阅读 · 2021年4月23日

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

Arxiv

0+阅读 · 2021年4月23日

Robust Graph Neural Network Against Poisoning Attacks via Transfer Learning

Arxiv

6+阅读 · 2019年8月20日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning

Arxiv

3+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员