在外源性全球Markov进程下学习无休无休无休的强盗 (Learning in Restless Bandits under Exogenous Global Markov Process) - 专知论文

会员服务 ·

0

马尔可夫过程 · 赌博机/老虎机 · Processing（编程语言） · ARM · Extensibility ·

2022 年 2 月 10 日

Learning in Restless Bandits under Exogenous Global Markov Process

翻译：在外源性全球Markov进程下学习无休无休无休的强盗

Tomer Gafni,Michal Yemini,Kobi Cohen

from arxiv, 13 pages, 6 figures. arXiv admin note: text overlap with arXiv:1906.08120

We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule, which is non-identical among different arms. At each time, a player chooses an arm out of $N$ arms to play, and receives a random reward from a finite set of reward states. The arms are restless, that is, their local state evolves regardless of the player's actions. Motivated by recent studies on related RMAB settings, the regret is defined as the reward loss with respect to a player that knows the dynamics of the problem, and plays at each time $t$ the arm that maximizes the expected immediate value. The objective is to develop an arm-selection policy that minimizes the regret. To that end, we develop the Learning under Exogenous Markov Process (LEMP) algorithm. We analyze LEMP theoretically and establish a finite-sample bound on the regret. We show that LEMP achieves a logarithmic regret order with time. We further analyze LEMP numerically and present simulation results that support the theoretical findings and demonstrate that LEMP significantly outperforms alternative algorithms.

翻译：我们考虑扩大无休止的多武装匪徒(RMAB)问题,因为那里有未知的外生全球马可夫(Markov)进程,它控制着每个手臂的奖赏分配。在每一个全球国家,每个手臂的奖赏过程都根据未知的马尔科维亚规则演变,而该规则在不同武器之间并不完全相同。玩家每次都选择用一臂来玩,并从一组有限的奖赏国家获得随机奖赏。武器是无休止的,也就是说,它们的当地状态是演化的,而不管玩家的行为如何。根据最近对相关马卡布设置的研究,遗憾被定义为对了解问题动态的玩家的奖赏损失。我们表明,LEMP在每次玩耍时都会用美元来发挥最大预期的直接价值。我们的目标是制定一条选择武器的政策,从而将遗憾降到最低程度。为此,我们在Exgenous Markov 进程(LEMP) 算法下开发了学习方法。我们分析了LEMP理论上的演算法,并在遗憾的基础上进一步建立了一个限定的缩。我们表明LEMP能够大幅地分析数字逻辑和逻辑结果。

0

相关内容

马尔可夫过程

马尔可夫过程

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于压缩感知的大尺度SAR图像三维重建及其算法的研究

国家自然科学基金

1+阅读 · 2014年12月31日

无爪图及其扩展图的因子的研究

国家自然科学基金

0+阅读 · 2014年12月31日

计及多种不确定性的风电次同步振荡分析及其控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

光纤LSPR传感器检测多元重金属的指纹识别与波长调控机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

适用于非方及约束的复杂多变量系统内模控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

简缩极化与全极化SAR的一体化目标分解与分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于模糊定性强化学习的复杂不确定系统的模糊协调控制机理研究

国家自然科学基金

3+阅读 · 2009年12月31日

钝齿棒杆菌精氨酸生物合成途径中argR基因调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

VIP会员

文章信息

相关主题

马尔可夫过程

赌博机/老虎机

Processing（编程语言）

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Almost Optimal Algorithms for Two-player Zero-Sum Linear Mixture Markov Games

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

相关基金

基于压缩感知的大尺度SAR图像三维重建及其算法的研究

国家自然科学基金

1+阅读 · 2014年12月31日

无爪图及其扩展图的因子的研究

国家自然科学基金

0+阅读 · 2014年12月31日

计及多种不确定性的风电次同步振荡分析及其控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

光纤LSPR传感器检测多元重金属的指纹识别与波长调控机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

适用于非方及约束的复杂多变量系统内模控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

简缩极化与全极化SAR的一体化目标分解与分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于模糊定性强化学习的复杂不确定系统的模糊协调控制机理研究

国家自然科学基金

3+阅读 · 2009年12月31日

钝齿棒杆菌精氨酸生物合成途径中argR基因调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员