Thompson 集束武器强盗抽样 (Thompson Sampling for Bandits with Clustered Arms) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 簇 · Performer · 样本 · CASE ·

2021 年 9 月 6 日

Thompson Sampling for Bandits with Clustered Arms

翻译：Thompson 集束武器强盗抽样

Emil Carlsson,Devdatt Dubhashi,Fredrik D. Johansson

from arxiv, Paper accepted to IJCAI-2021. The supplementary material is not part of the IJCAI-21 Proceedings

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

翻译：我们提出基于多层次Thompson抽样办法的算法,在武器集中的环境下,针对随机多武装土匪及其具有线性预期回报的背景变体,在理论上和实验上,我们展示了利用特定集束结构如何与使用标准的Thompson抽样相比极大地改善遗憾和计算成本。对于随机多武装土匪来说,我们给出了预期累积遗憾的上限,表明其如何取决于集束的质量。最后,我们进行了一项经验性评估,表明我们的算法与先前为集束武器土匪提议的算法相比表现良好。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【2021新书】贝叶斯优化，364页pdf阐述高斯过程理论与实践

专知会员服务

98+阅读 · 2021年10月11日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【AAAI2021】信息瓶颈和有监督表征解耦

【AAAI2021】信息瓶颈和有监督表征解耦

专知会员服务

21+阅读 · 2021年1月27日

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

专知会员服务

74+阅读 · 2021年1月25日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Arxiv

0+阅读 · 2021年10月27日

Federated Linear Contextual Bandits

Arxiv

0+阅读 · 2021年10月27日

Learning-Augmented $k$-means Clustering

Arxiv

0+阅读 · 2021年10月27日

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Arxiv

0+阅读 · 2021年10月26日

Bandits with Knapsacks beyond the Worst-Case

Arxiv

0+阅读 · 2021年10月26日

On Slowly-varying Non-stationary Bandits

Arxiv

0+阅读 · 2021年10月25日

Linear Contextual Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年10月25日

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2021年10月23日

Projection-Free Algorithm for Stochastic Bi-level Optimization

Arxiv

0+阅读 · 2021年10月22日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【2021新书】贝叶斯优化，364页pdf阐述高斯过程理论与实践

专知会员服务

98+阅读 · 2021年10月11日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【AAAI2021】信息瓶颈和有监督表征解耦

【AAAI2021】信息瓶颈和有监督表征解耦

专知会员服务

21+阅读 · 2021年1月27日

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

专知会员服务

74+阅读 · 2021年1月25日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《概率数值计算：贝叶斯求积法与人机协作》最新博士论文

【NTU博士论文】多模态神经三维资产合成

人工智能：实时战斗适应

《运用作战人员数字孪生与生成式人工智能预测任务成果》最新文献

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Arxiv

0+阅读 · 2021年10月27日

Federated Linear Contextual Bandits

Arxiv

0+阅读 · 2021年10月27日

Learning-Augmented $k$-means Clustering

Arxiv

0+阅读 · 2021年10月27日

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Arxiv

0+阅读 · 2021年10月26日

Bandits with Knapsacks beyond the Worst-Case

Arxiv

0+阅读 · 2021年10月26日

On Slowly-varying Non-stationary Bandits

Arxiv

0+阅读 · 2021年10月25日

Linear Contextual Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年10月25日

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2021年10月23日

Projection-Free Algorithm for Stochastic Bi-level Optimization

Arxiv

0+阅读 · 2021年10月22日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员