Thompson 单式强盗抽样 (Thompson Sampling for Unimodal Bandits) - 专知论文

会员服务 ·

0

单峰值 · Bandits · 赌博机/老虎机 · Better · Extensibility ·

2021 年 6 月 16 日

Thompson Sampling for Unimodal Bandits

翻译：Thompson 单式强盗抽样

Long Yang,Zhao Li,Zehong Hu,Shasha Ruan,Shijian Li,Gang Pan,Hongyang Chen

from arxiv, There are some technical parts need to be improved. We will fix these places and provide an updated version

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $\mathcal{O}(\log T)$, which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.

翻译：在本文中,我们提议了Thompson为 emph{unmodal} 土匪抽样算法,预期的奖赏是对部分订购的武器的单一方式。为了在每一步更好地利用单式结构,而不是从整个决策空间进行探索,我们的算法只能根据手臂周围的后部分布来做决定,而后部分布有最高的经验平均估计值。我们理论上证明,对于Bernoulli 来说,我们的算法的遗憾达到了单式强盗的较低范围,因此,这是同样最理想的。对于Gausian来说,我们的算法的遗憾是$\mathcal{O}(\log T)$,这比标准的Thompson抽样算法要好得多。广泛的实验显示了合成数据集和现实世界应用的拟议算法的有效性。

0

相关内容

单峰值

【ICML2021】策略梯度贝叶斯鲁棒优化的模仿学习

专知会员服务

25+阅读 · 2021年6月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

解密高光谱

解密高光谱

无人机

9+阅读 · 2018年5月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

Batched Thompson Sampling for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年8月15日

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arxiv

0+阅读 · 2021年8月13日

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Arxiv

0+阅读 · 2021年8月13日

Efficient active learning of sparse halfspaces with arbitrary bounded noise

Arxiv

0+阅读 · 2021年8月13日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【ICML2021】策略梯度贝叶斯鲁棒优化的模仿学习

专知会员服务

25+阅读 · 2021年6月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

【斯坦福大学Chelsea Finn-NeurIPS 2019】贝叶斯元学习

专知会员服务

38+阅读 · 2019年12月17日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

解密高光谱

解密高光谱

无人机

9+阅读 · 2018年5月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

相关论文

Batched Thompson Sampling for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年8月15日

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arbitrage-Free Implied Volatility Surface Generation with Variational Autoencoders

Arxiv

0+阅读 · 2021年8月13日

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Arxiv

0+阅读 · 2021年8月13日

Efficient active learning of sparse halfspaces with arbitrary bounded noise

Arxiv

0+阅读 · 2021年8月13日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员