受报偿受污染的斯托克强盗中以平均为基础的最佳武器识别 (Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination) - 专知论文

会员服务 ·

0

ARM · 可辨认的 · 优化器 · Bandits · 赌博机/老虎机 ·

2021 年 11 月 14 日

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

翻译：受报偿受污染的斯托克强盗中以平均为基础的最佳武器识别

Arpan Mukherjee,Ali Tajer,Pin-Yu Chen,Payel Das

This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.

翻译：本文调查了$\ textit{ 污染了$ stochistic 多重武器匪徒中最佳手臂识别问题。在此背景下, 从任何手臂获得的奖赏都由对抗模型样本替换, 概率为$\varepsilon。考虑固定信心( 无限- horizon) 设置, 学习者的目标是用最大平均值来识别手臂。由于奖赏的对抗性污染, 每一臂的平均值只能部分地识别。本文提出了两种算法, 一种基于差距的算法, 一种基于连续消除法, 一种基于差距的算法, 一种基于连续消除法,, 以亚高加索土匪中最佳手臂识别法为基础。这些算法包含平均估计值, 以最佳的错误来保证真实平均值与估算值的偏差。此外, 这些算法无法同时达到最佳的样本复杂性。具体地说, 就基于差距的算法而言, 样本复杂度与不变因素一样,, 在连续消除法算法的算法中, 它最符合逻辑因素。最后, 提供数字实验是为了说明现有算算算算算算的利的结果。

0

相关内容

ARM

安谋控股公司，又称ARM公司，跨国性半导体设计与软件公司，总部位于英国英格兰剑桥。主要的产品是ARM架构处理器的设计，将其以知识产权的形式向客户进行授权，同时也提供软件开发工具。维基百科

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Refined normal approximations for the central and noncentral chi-square distributions and some applications

Arxiv

0+阅读 · 2022年1月19日

Multi-channel Resource Allocation for Smooth Streaming: Non-convexity and Bandits

Arxiv

0+阅读 · 2022年1月18日

On the Optimality of the Oja's Algorithm for Online PCA

Arxiv

0+阅读 · 2022年1月18日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年1月18日

Least squares estimators based on the Adams method for stochastic differential equations with small Lévy noise

Arxiv

0+阅读 · 2022年1月18日

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

Arxiv

0+阅读 · 2022年1月17日

Faster Rates of Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2022年1月16日

Stochastic Multi-Armed Bandits with Control Variates

Arxiv

0+阅读 · 2022年1月15日

Error estimation for the time to a threshold value in evolutionary partial differential equations

Arxiv

0+阅读 · 2022年1月14日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Refined normal approximations for the central and noncentral chi-square distributions and some applications

Arxiv

0+阅读 · 2022年1月19日

Multi-channel Resource Allocation for Smooth Streaming: Non-convexity and Bandits

Arxiv

0+阅读 · 2022年1月18日

On the Optimality of the Oja's Algorithm for Online PCA

Arxiv

0+阅读 · 2022年1月18日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年1月18日

Least squares estimators based on the Adams method for stochastic differential equations with small Lévy noise

Arxiv

0+阅读 · 2022年1月18日

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

Arxiv

0+阅读 · 2022年1月17日

Faster Rates of Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2022年1月16日

Stochastic Multi-Armed Bandits with Control Variates

Arxiv

0+阅读 · 2022年1月15日

Error estimation for the time to a threshold value in evolutionary partial differential equations

Arxiv

0+阅读 · 2022年1月14日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员