《Thompson对与贪婪甲骨文合并的半暴徒和贪婪甲骨文的汤普森抽样的严谨分析》 (The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle) - 专知论文

会员服务 ·

0

Oracle · 贪心 · 近似 · 赌博机/老虎机 · 优化器 ·

2021 年 11 月 8 日

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

翻译：《Thompson对与贪婪甲骨文合并的半暴徒和贪婪甲骨文的汤普森抽样的严谨分析》

Fang Kong,Yueran Yang,Wei Chen,Shuai Li

from arxiv, Accepted in NeurIPS, 2021

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example (Wang and Chen, 2018) has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.

翻译：Thompson 取样( TS) 吸引了对土匪地区的极大兴趣。它在1930年代引入, 但直到近似节点才在理论上被证明。它在组合式多武装土匪( CMAB) 设置中的所有分析都需要精确的神器才能提供任何输入的最佳解决方案。但是, 这样的神器通常不可行, 因为许多组合式优化问题都是NP- 硬的, 只有近似或触角可以找到。例如( Wang 和 Chen, 2018) 显示 TS 无法以近似或触角来学习。然而, 这个神器并不常见, 并且只设计为特定的问题实例。它的趋同分析TS( CAB) 的趋近似分析是否可以超越CMA 的精确度。在本文中,我们在贪婪或触角下研究这个问题, 这是一个常见的( 同意) 或有理论保证解决许多( 离谱的) 组合式优化问题。我们提供了一种基于问题的低级的遗憾, $( T/ Delta2), 来量化TS 的精确性( ) 和高端点( ) 也无法使 CMA) 共同解决问题。

0

相关内容

Oracle

甲骨文公司，全称甲骨文股份有限公司(甲骨文软件系统有限公司)，是全球最大的企业级软件公司，总部位于美国加利福尼亚州的红木滩。1989年正式进入中国市场。2013年，甲骨文已超越 IBM ，成为继 Microsoft 后全球第二大软件公司。

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

专知

8+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

A minimax framework for quantifying risk-fairness trade-off in regression

Arxiv

0+阅读 · 2022年1月10日

Deniable Encryption in a Quantum World

Arxiv

0+阅读 · 2022年1月7日

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Arxiv

0+阅读 · 2022年1月6日

Risk Loadings in Classification Ratemaking

Arxiv

0+阅读 · 2022年1月6日

Oracle separations of hybrid quantum-classical circuits

Arxiv

0+阅读 · 2022年1月6日

Bridging Adversarial and Nonstationary Multi-armed Bandit

Arxiv

0+阅读 · 2022年1月5日

Tracking Most Severe Arm Changes in Bandits

Arxiv

0+阅读 · 2022年1月5日

Modelling Cournot Games as Multi-agent Multi-armed Bandits

Arxiv

0+阅读 · 2022年1月1日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms

Arxiv

6+阅读 · 2020年2月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

顶会论文 || 65篇"IJCAI"深度强化学习论文汇总

专知

8+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

A minimax framework for quantifying risk-fairness trade-off in regression

Arxiv

0+阅读 · 2022年1月10日

Deniable Encryption in a Quantum World

Arxiv

0+阅读 · 2022年1月7日

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Arxiv

0+阅读 · 2022年1月6日

Risk Loadings in Classification Ratemaking

Arxiv

0+阅读 · 2022年1月6日

Oracle separations of hybrid quantum-classical circuits

Arxiv

0+阅读 · 2022年1月6日

Bridging Adversarial and Nonstationary Multi-armed Bandit

Arxiv

0+阅读 · 2022年1月5日

Tracking Most Severe Arm Changes in Bandits

Arxiv

0+阅读 · 2022年1月5日

Modelling Cournot Games as Multi-agent Multi-armed Bandits

Arxiv

0+阅读 · 2022年1月1日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms

Arxiv

6+阅读 · 2020年2月25日

微信扫码咨询专知VIP会员