Online Reinforcement Learning in Periodic MDP (Online Reinforcement Learning in Periodic MDP) - 专知论文

会员服务 ·

0

周期的 · Learning · 情景 · 上置信界限 · 状态转移矩阵 ·

2023 年 3 月 16 日

Online Reinforcement Learning in Periodic MDP

翻译：Online Reinforcement Learning in Periodic MDP

Ayush Aniket,Arpan Chattopadhyay

from arxiv, arXiv admin note: substantial text overlap with arXiv:2207.12045

We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period $N$ and as $\mathcal{O}(\sqrt{Tlog T})$ with the horizon length $T$. Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm PUCRLB which enhances upon PUCRL2, both in terms of regret ($O(\sqrt{N})$ dependency on period) and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment in which the period is unknown but a set of candidate periods are known. Numerical results demonstrate the efficacy of all the algorithms.

翻译：摘要：我们研究了周期性马尔可夫决策过程(MDP)的在线学习，这是一种特殊类型的非静态MDP，其中状态转移概率和奖励函数都会周期性变化，且在平均回报最大化场景下。我们通过将周期指数加入状态空间，将问题制定为一个静态MDP，并提出了一个周期性上限置信度强化学习-2(PUCRL2)算法。我们证明了PUCRL2的遗憾值与周期N成线性关系，并与时限长度T成$\mathcal{O}(\sqrt{Tlog T})$的关系。利用增广MDP的转移矩阵稀疏信息，我们提出了另一个算法PUCRLB，它在遗憾值(周期O($\sqrt{N}$))和经验表现等方面均优于PUCRL2。最后，我们提出了另外两种算法U-PUCRL2和U-PUCRLB，用于处理环境的扩展不确定性，其中周期未知但一组候选周期已知。数值结果表明了所有算法的有效性。

0

相关内容

周期的

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于Budyko假说及GRACE重力卫星观测对流域水量平衡变化的多时间尺度研究

国家自然科学基金

0+阅读 · 2017年12月31日

多脉冲强流电子束的能量累积效应对多相Al-Co-Ce合金非晶态转变过程的影响机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

内源性大麻素介导自主运动增强学习记忆及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

极化调制p-n结电流的高密度铁电二极管存储器

国家自然科学基金

0+阅读 · 2011年12月31日

嵌入式多媒体流计算的质量驱动机制与共生调优

国家自然科学基金

0+阅读 · 2011年12月31日

量子开放系统的近似方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

钙离子在吸入麻醉剂神经细胞毒性和预处理保护作用中的机制

国家自然科学基金

0+阅读 · 2009年12月31日

拟南芥铝毒敏感突变体als1及其相关基因的功能分析

国家自然科学基金

0+阅读 · 2008年12月31日

Provably Safe Reinforcement Learning: A Theoretical and Experimental Comparison

Arxiv

0+阅读 · 2023年5月9日

Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning

Arxiv

0+阅读 · 2023年5月9日

Reinforcement Learning for Topic Models

Arxiv

0+阅读 · 2023年5月8日

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月8日

Truncating Trajectories in Monte Carlo Reinforcement Learning

Arxiv

0+阅读 · 2023年5月7日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

上置信界限

状态转移矩阵

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向无人机集群的避障动态传感器覆盖算法》最新38页

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Provably Safe Reinforcement Learning: A Theoretical and Experimental Comparison

Arxiv

0+阅读 · 2023年5月9日

Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning

Arxiv

0+阅读 · 2023年5月9日

Reinforcement Learning for Topic Models

Arxiv

0+阅读 · 2023年5月8日

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月8日

Truncating Trajectories in Monte Carlo Reinforcement Learning

Arxiv

0+阅读 · 2023年5月7日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

基于Budyko假说及GRACE重力卫星观测对流域水量平衡变化的多时间尺度研究

国家自然科学基金

0+阅读 · 2017年12月31日

多脉冲强流电子束的能量累积效应对多相Al-Co-Ce合金非晶态转变过程的影响机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

内源性大麻素介导自主运动增强学习记忆及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

极化调制p-n结电流的高密度铁电二极管存储器

国家自然科学基金

0+阅读 · 2011年12月31日

嵌入式多媒体流计算的质量驱动机制与共生调优

国家自然科学基金

0+阅读 · 2011年12月31日

量子开放系统的近似方法及其应用

国家自然科学基金

1+阅读 · 2011年12月31日

钙离子在吸入麻醉剂神经细胞毒性和预处理保护作用中的机制

国家自然科学基金

0+阅读 · 2009年12月31日

拟南芥铝毒敏感突变体als1及其相关基因的功能分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员