非固定的强盗和以小套最佳武器学习元武器 (Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · 情景 · 可辨认的 · 学习器 ·

2022 年 9 月 16 日

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

翻译：非固定的强盗和以小套最佳武器学习元武器

MohammadJavad Azizi,Thang Duong,Yasin Abbasi-Yadkori,András György,Claire Vernade,Mohammad Ghavamzadeh

We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. An adversary may design the tasks, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). We design an algorithm based on a reduction to bandit submodular maximization and show that, in the regime of large number of tasks and small number of optimal arms, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+N^{2/3}M\tau)$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2}\sqrt{M K \tau})$ regret.

翻译：我们研究一个顺序决定问题, 即学习者面对的是一连串以K$武装的突击性土匪任务。对手可以设计任务, 但对手只能选择一个小( 但未知) 美元武器子集中每个任务的最佳手臂。任务界限可能会为人所知( 土匪元学习设置), 或者未知( 非静止土匪设置) 。我们设计了一个基于减少强盗亚调最大化的算法, 并显示, 在大量任务和少量最佳武器的制度下, 它在两种环境中的遗憾都小于 $\ tilde{O} (\ sqrt{ KNT}) 的简单基线。在使用为非静止土匪问题设计的标准算法时, 只能从中选择最佳的基调公式。对于固定任务长度为$\taau 的土匪元学习问题, 我们展示了算法的遗憾与$tilde{O} (Nms\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

氧化石墨烯对植物病原真菌的杀菌机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

南极乔治王岛陆地-潮间带-近海沉积物放线菌群落结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

鄱阳湖湿地苦草分布特征及其形成机理

国家自然科学基金

0+阅读 · 2012年12月31日

红树林生境异质性的时空尺度效应与鱼类多样性的维持机制

国家自然科学基金

0+阅读 · 2012年12月31日

鄱阳湖湿地香根草富集重金属的拉曼光谱快速检测方法

国家自然科学基金

0+阅读 · 2011年12月31日

全球变化条件下西藏主要暗针叶林森林群落碳储量的动态变化研究

国家自然科学基金

0+阅读 · 2011年12月31日

晚型G巨星行星搜寻和恒星大气参数及化学元素丰度分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土坡面细沟侵蚀和细沟间侵蚀贡献率变化规律的研究

国家自然科学基金

0+阅读 · 2011年12月31日

南海水下珊瑚礁白化光学遥感方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

An Optimal Lower Bound for Simplex Range Reporting

Arxiv

0+阅读 · 2022年10月26日

D- and A-optimal Screening Designs

Arxiv

0+阅读 · 2022年10月25日

Learning Proximal Operators to Discover Multiple Optima

Arxiv

0+阅读 · 2022年10月24日

Learning and Covering Sums of Independent Random Variables with Unbounded Support

Arxiv

0+阅读 · 2022年10月24日

Inference on the Best Policies with Many Covariates

Arxiv

0+阅读 · 2022年10月22日

Distance-to-Set Priors and Constrained Bayesian Inference

Arxiv

0+阅读 · 2022年10月21日

On convergence and mass distributions of multivariate Archimedean copulas and their interplay with the Williamson transform

Arxiv

0+阅读 · 2022年10月21日

Competing Bandits in Time Varying Matching Markets

Arxiv

0+阅读 · 2022年10月21日

Optimal plug-in Gaussian processes for modelling derivatives

Arxiv

0+阅读 · 2022年10月20日

Online Resource Allocation with Buyback: Optimal Algorithms via Primal-Dual

Arxiv

0+阅读 · 2022年10月20日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

An Optimal Lower Bound for Simplex Range Reporting

Arxiv

0+阅读 · 2022年10月26日

D- and A-optimal Screening Designs

Arxiv

0+阅读 · 2022年10月25日

Learning Proximal Operators to Discover Multiple Optima

Arxiv

0+阅读 · 2022年10月24日

Learning and Covering Sums of Independent Random Variables with Unbounded Support

Arxiv

0+阅读 · 2022年10月24日

Inference on the Best Policies with Many Covariates

Arxiv

0+阅读 · 2022年10月22日

Distance-to-Set Priors and Constrained Bayesian Inference

Arxiv

0+阅读 · 2022年10月21日

On convergence and mass distributions of multivariate Archimedean copulas and their interplay with the Williamson transform

Arxiv

0+阅读 · 2022年10月21日

Competing Bandits in Time Varying Matching Markets

Arxiv

0+阅读 · 2022年10月21日

Optimal plug-in Gaussian processes for modelling derivatives

Arxiv

0+阅读 · 2022年10月20日

Online Resource Allocation with Buyback: Optimal Algorithms via Primal-Dual

Arxiv

0+阅读 · 2022年10月20日

相关基金

氧化石墨烯对植物病原真菌的杀菌机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

南极乔治王岛陆地-潮间带-近海沉积物放线菌群落结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

鄱阳湖湿地苦草分布特征及其形成机理

国家自然科学基金

0+阅读 · 2012年12月31日

红树林生境异质性的时空尺度效应与鱼类多样性的维持机制

国家自然科学基金

0+阅读 · 2012年12月31日

鄱阳湖湿地香根草富集重金属的拉曼光谱快速检测方法

国家自然科学基金

0+阅读 · 2011年12月31日

全球变化条件下西藏主要暗针叶林森林群落碳储量的动态变化研究

国家自然科学基金

0+阅读 · 2011年12月31日

晚型G巨星行星搜寻和恒星大气参数及化学元素丰度分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土坡面细沟侵蚀和细沟间侵蚀贡献率变化规律的研究

国家自然科学基金

0+阅读 · 2011年12月31日

南海水下珊瑚礁白化光学遥感方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员