与众众多外在势力连结的强盗 (Batched Bandits with Crowd Externalities) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 情景 · 时间步 · Next · 近似 ·

2021 年 9 月 29 日

Batched Bandits with Crowd Externalities

翻译：与众众多外在势力连结的强盗

Romain Laroche,Othmane Safsafi,Raphael Feraud,Nicolas Broutin

from arxiv, 31 pages

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $\mathcal{O}(\sqrt{\frac{\ln x}{x}}+\epsilon)$ where $x$ is the size of the crowd and $\epsilon$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $\mathcal{O}\left(\max(K\ln T,\sqrt{T\ln T})\right)$, where $K$ is the number of arms and $T$ is the horizon.

翻译：在 Bashched 多Armed 强盗( BBAB) 中, 政策不允许在每次步骤中更新。通常, 设定会显示允许政策更新的最大数量和算法安排, 以便最大限度地减少预期的遗憾。在本文中, 我们描述 BMAB 的一个新设置, 并有以下曲折: 政策更新的时间不受 BMAB 算法控制, 而每批中收到的被称为\ textit{ crowd} 的数据数量则受过去选择的军备的影响。我们首先设计一个近于最佳的政策, 大致知道我们在 $\ mathcal{O} (\ qrt\\\ t} (\ sqrt\xxx ⁇ x epsilon) 中表示遗憾的参数, $x美元是人群的大小, $\\ exsilon 是一个参数错误。下一步, 我们实施一个由UCB 启发的算法, 保证在$mathcal {O ⁇ lef(\ max)\ $ $ 和 $美元之间。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【麻省理工学院课程】MIT 6.S191：Introduction to Deep Learning , 深度学习导论,NSF研究员Alexander Amini

【麻省理工学院课程】MIT 6.S191：Introduction to Deep Learning , 深度学习导论,NSF研究员Alexander Amini

专知会员服务

34+阅读 · 2019年11月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Adaptive Multi-Goal Exploration

Arxiv

0+阅读 · 2021年11月23日

Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics

Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics

Arxiv

0+阅读 · 2021年11月23日

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

Arxiv

0+阅读 · 2021年11月23日

Heuristic-Guided Reinforcement Learning

Heuristic-Guided Reinforcement Learning

Arxiv

0+阅读 · 2021年11月22日

Fast Rate Learning in Stochastic First Price Bidding

Arxiv

0+阅读 · 2021年11月22日

Solving SDP Faster: A Robust IPM Framework and Efficient Implementation

Arxiv

0+阅读 · 2021年11月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【麻省理工学院课程】MIT 6.S191：Introduction to Deep Learning , 深度学习导论,NSF研究员Alexander Amini

【麻省理工学院课程】MIT 6.S191：Introduction to Deep Learning , 深度学习导论,NSF研究员Alexander Amini

专知会员服务

34+阅读 · 2019年11月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Adaptive Multi-Goal Exploration

Arxiv

0+阅读 · 2021年11月23日

Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics

Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics

Arxiv

0+阅读 · 2021年11月23日

Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates

Arxiv

0+阅读 · 2021年11月23日

Heuristic-Guided Reinforcement Learning

Heuristic-Guided Reinforcement Learning

Arxiv

0+阅读 · 2021年11月22日

Fast Rate Learning in Stochastic First Price Bidding

Arxiv

0+阅读 · 2021年11月22日

Solving SDP Faster: A Robust IPM Framework and Efficient Implementation

Arxiv

0+阅读 · 2021年11月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员