Markovian 土匪强化学习:波斯别家抽样比乐观主义更能伸缩吗? (Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 学成 · 线性的 · Performer ·

2022 年 5 月 3 日

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

翻译：Markovian 土匪强化学习:波斯别家抽样比乐观主义更能伸缩吗?

Nicolas Gast,Bruno Gaujal,Kimang Khun

We study learning algorithms for the classical Markovian bandit problem with discount. We explain how to adapt PSRL [24] and UCRL2 [2] to exploit the problem structure. These variants are called MB-PSRL and MB-UCRL2. While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of episodes, $n$ is the number of bandits and $S$ is the number of states of each bandit (the exact bound in S, n and K is given in the paper). Up to a factor $\sqrt S$, this matches the lower bound of $\Omega(\sqrt{SnK})$ that we also derive in the paper. MB-PSRL is also computationally efficient: its runtime is linear in the number of bandits. We further show that this linear runtime cannot be achieved by adapting classical non-Bayesian algorithms such as UCRL2 or UCBVI to Markovian bandit problems. Finally, we perform numerical experiments that confirm that MB-PSRL outperforms other existing algorithms in practice, both in terms of regret and of computation time.

翻译：我们用折扣来研究古典Markovian盗匪问题的算法。我们解释如何调整PSRL[ 24]和UCRL2[2], 以利用问题结构。这些变式称为MB- PSRL和MB-UCRL2。虽然PSRL和UCRL2的香草执行的遗憾约束和运行时间在盗匪数量上成倍上升,但我们显示,MB- PSRL和MB-UCRL2的同比差是$\Omega(sqrt{SnK})和MB-UCRL2的同比差为$(S\qrt{O}(S\qrt{nK})(S\qrt{nK})的同比值较低。 MB-PSRL的同比值也是计算效率: 美元运行时间线性是麻匪数目中的线性时间, 美元, 美元是每个土匪数目。我们进一步显示, AS-L 的轨迹性演算法无法通过我们BRBSBR的不计数。

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

SCI后铁超载及其致Ferroptosis在白质继发损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

远距离量子密码实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

RLCK第七亚家族蛋白在植物天然免疫中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

针刺任督脉经穴治疗中风吞咽障碍的功能可塑性研究

国家自然科学基金

0+阅读 · 2011年12月31日

HGF诱导NSCLC细胞对EGFR-TKIs耐药机制的研究。

国家自然科学基金

0+阅读 · 2011年12月31日

宇宙暗物质和弱引力透镜功率谱的信息量研究

国家自然科学基金

0+阅读 · 2011年12月31日

Joint Energy Dispatch and Unit Commitment in Microgrids Based on Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Arxiv

5+阅读 · 2022年6月21日

Augment with Care: Contrastive Learning for Combinatorial Problems

Arxiv

0+阅读 · 2022年6月20日

Guided Safe Shooting: model based reinforcement learning with safety constraints

Arxiv

0+阅读 · 2022年6月20日

Off-Beat Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Arxiv

0+阅读 · 2022年6月17日

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Arxiv

0+阅读 · 2022年6月17日

Bootstrapped Transformer for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Joint Energy Dispatch and Unit Commitment in Microgrids Based on Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年6月21日

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Arxiv

5+阅读 · 2022年6月21日

Augment with Care: Contrastive Learning for Combinatorial Problems

Arxiv

0+阅读 · 2022年6月20日

Guided Safe Shooting: model based reinforcement learning with safety constraints

Arxiv

0+阅读 · 2022年6月20日

Off-Beat Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年6月19日

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Arxiv

0+阅读 · 2022年6月17日

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Arxiv

0+阅读 · 2022年6月17日

Bootstrapped Transformer for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月17日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

相关基金

SCI后铁超载及其致Ferroptosis在白质继发损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

远距离量子密码实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

RLCK第七亚家族蛋白在植物天然免疫中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

针刺任督脉经穴治疗中风吞咽障碍的功能可塑性研究

国家自然科学基金

0+阅读 · 2011年12月31日

HGF诱导NSCLC细胞对EGFR-TKIs耐药机制的研究。

国家自然科学基金

0+阅读 · 2011年12月31日

宇宙暗物质和弱引力透镜功率谱的信息量研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员