Squarorot 致遗憾的连连时间的附带事件马可夫决定程序 (Square-root regret bounds for continuous-time episodic Markov decision processes) - 专知论文

会员服务 ·

0

Markov · Processing（编程语言） · Learning · 值迭代 · 上置信界限 ·

2022 年 10 月 3 日

Square-root regret bounds for continuous-time episodic Markov decision processes

翻译：Squarorot 致遗憾的连连时间的附带事件马可夫决定程序

Xuefeng Gao,Xun Yu Zhou

We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.

翻译：我们研究在有限偏顺偶发环境中持续时间的Markov决策程序(MDPs)的强化学习。我们根据价值迭代和上层信心约束的方法提出一种学习算法。我们从最坏情况下获得对拟议算法的预期遗憾,并建立了最坏情况下较低的界限,两者的界限都是关于事件数量的平方根顺序。最后,我们进行模拟实验,以说明我们的算法的性能。

0

相关内容

Markov

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

miR-185抑制前列腺癌细胞中雄激素受体的表达及其介导的信号通路的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

慢性间断低氧对家兔颏舌肌运动皮质区调控上气道扩张肌的影响及作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

On the Hill relation and the mean reaction time for metastable processes

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Decentralized Complete Dictionary Learning via $\ell^{4}$-Norm Maximization

Arxiv

0+阅读 · 2022年11月7日

Sparse Horseshoe Estimation via Expectation-Maximisation

Arxiv

0+阅读 · 2022年11月7日

On the connection between Bregman divergence and value in regularized Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Multiscale mortar mixed finite element methods for the Biot system of poroelasticity

Arxiv

0+阅读 · 2022年11月5日

Space-time finite element methods for distributed optimal control of the wave equation

Arxiv

0+阅读 · 2022年11月4日

Bayesian methods of vector autoregressions with tensor decompositions

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

VIP会员

文章信息

相关主题

Processing（编程语言）

上置信界限

相关VIP内容

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

《商用大语言模型的升级风险管理：国家安全运用》

自主人工智能：未来战争是否将是自主化的？

《从装备到文化：美陆军技术素养建设启示录》最新报告

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Hill relation and the mean reaction time for metastable processes

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Decentralized Complete Dictionary Learning via $\ell^{4}$-Norm Maximization

Arxiv

0+阅读 · 2022年11月7日

Sparse Horseshoe Estimation via Expectation-Maximisation

Arxiv

0+阅读 · 2022年11月7日

On the connection between Bregman divergence and value in regularized Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Multiscale mortar mixed finite element methods for the Biot system of poroelasticity

Arxiv

0+阅读 · 2022年11月5日

Space-time finite element methods for distributed optimal control of the wave equation

Arxiv

0+阅读 · 2022年11月4日

Bayesian methods of vector autoregressions with tensor decompositions

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

相关基金

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

miR-185抑制前列腺癌细胞中雄激素受体的表达及其介导的信号通路的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

慢性间断低氧对家兔颏舌肌运动皮质区调控上气道扩张肌的影响及作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员