《激励探索的价格:通过汤普森抽样和抽样复杂度的特征》 (The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity) - 专知论文

会员服务 ·

0

样本复杂度 · 赌博机/老虎机 · 样本 · 可约的 · 优化器 ·

2021 年 8 月 13 日

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

翻译：《激励探索的价格:通过汤普森抽样和抽样复杂度的特征》

Mark Sellke,Aleksandrs Slivkins

We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily large depending on the Bayesian priors, and scale exponentially in the number of arms. A more basic problem of sampling each arm once runs into similar factors. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility. We prove that Thompson Sampling, a standard bandit algorithm, is incentive-compatible if initialized with sufficiently many data points. The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected. The problem is largely reduced to that of sample complexity: how many rounds are needed? We address this question, providing matching upper and lower bounds and instantiating them in various corollaries. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the "strength of beliefs".

翻译：我们考虑有激励性的探索:多武装强盗的版本,其中武器的选择由自利的代理人控制,而算法只能发布建议。算法控制信息的流动,信息不对称可以激励代理人探索。先前的工作达到最佳的遗憾率,最多可达到因巴伊西亚前科而任意扩大的倍增性因素,且武器数量成倍增加。每个手臂取样的更基本问题曾经有类似的因素。我们注重奖励的代价:为奖励兼容性而广泛解释的性能损失。我们证明,标准强势算法Thompson 抽样法如果以足够多的数据点初始化,则具有激励兼容性。因此,由于奖励而导致的业绩损失限于收集这些数据点时的最初几轮。问题基本上减少到抽样复杂性:需要多少轮?我们讨论这一问题,提供上下几轮的尺寸,并在各种滚动器库中立即进行。一般情况下,最佳的抽样复杂度是武器数量上的多元性和“指数性”。

0

相关内容

样本复杂度

样本复杂度

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Statically Bounded-Memory Delayed Sampling for Probabilistic Streams

Arxiv

0+阅读 · 2021年10月13日

Contextual Search in the Presence of Irrational Agents

Arxiv

0+阅读 · 2021年10月12日

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Arxiv

0+阅读 · 2021年10月11日

PNS: Population-Guided Novelty Search for Reinforcement Learning in Hard Exploration Environments

Arxiv

0+阅读 · 2021年10月10日

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Arxiv

1+阅读 · 2021年10月9日

Exponential Upper Bounds for the Runtime of Randomized Search Heuristics

Arxiv

0+阅读 · 2021年10月9日

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Arxiv

1+阅读 · 2021年10月9日

When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits

Arxiv

0+阅读 · 2021年10月8日

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年10月7日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

样本复杂度

赌博机/老虎机

相关VIP内容

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Statically Bounded-Memory Delayed Sampling for Probabilistic Streams

Arxiv

0+阅读 · 2021年10月13日

Contextual Search in the Presence of Irrational Agents

Arxiv

0+阅读 · 2021年10月12日

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Arxiv

0+阅读 · 2021年10月11日

PNS: Population-Guided Novelty Search for Reinforcement Learning in Hard Exploration Environments

Arxiv

0+阅读 · 2021年10月10日

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Arxiv

1+阅读 · 2021年10月9日

Exponential Upper Bounds for the Runtime of Randomized Search Heuristics

Arxiv

0+阅读 · 2021年10月9日

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Arxiv

1+阅读 · 2021年10月9日

When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits

Arxiv

0+阅读 · 2021年10月8日

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年10月7日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员