有限州Markov 链条的学习常态分布 (Learn Quasi-stationary Distributions of Finite State Markov Chain) - 专知论文

会员服务 ·

0

马尔可夫链 · 学成 · 价值函数 · 强化学习 · 优化器 ·

2022 年 1 月 18 日

Learn Quasi-stationary Distributions of Finite State Markov Chain

翻译：有限州Markov 链条的学习常态分布

Zhiqiang Cai,Ling Lin,Xiang Zhou

from arxiv, 18 pages, 5 figures

We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. Based on the fixed-point formulation of quasi-stationary distribution, we minimize the KL-divergence of two Markovian path distributions induced by the candidate distribution and the true target distribution. To solve this challenging minimization problem by gradient descent, we apply the reinforcement learning technique by introducing the reward and value functions. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and the value function. The numerical examples of finite state Markov chain are tested to demonstrate the new method.

翻译：我们建议一种强化学习(RL)法来计算准静止分布的表达方式。根据准静止分布的固定点公式,我们尽量减少候选人分布和真正目标分布引发的两种马尔科维亚路径分布的KL-divergence。为了通过梯度下降解决这一挑战性最小化问题,我们采用强化学习技术,引入奖励和价值功能。我们得出相应的政策梯度定理,并设计一种演员-批评算法来学习最佳解决方案和价值函数。对限定状态马尔科夫链的数字示例进行了测试,以展示新方法。

0

相关内容

马尔可夫链

马尔可夫链

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

复杂散射机制场景的SAR图像认知方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于视觉注意机制的SAR图像小目标检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于集成异构网络的表型-基因关联挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂场景光流场计算的鲁棒性和病态问题分析

国家自然科学基金

0+阅读 · 2009年12月31日

不同色纠缠光子的同步

国家自然科学基金

0+阅读 · 2009年12月31日

基于杂波统计建模的SAR图像目标智能快速检测方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Arxiv

1+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Arxiv

0+阅读 · 2022年4月18日

An error analysis of generative adversarial networks for learning distributions

Arxiv

0+阅读 · 2022年4月16日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

马尔可夫链

相关VIP内容

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Arxiv

1+阅读 · 2022年4月19日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Arxiv

0+阅读 · 2022年4月18日

An error analysis of generative adversarial networks for learning distributions

Arxiv

0+阅读 · 2022年4月16日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

相关基金

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

精神分裂症易感因子ErbB4对篮状细胞和吊灯状细胞神经环路发育的调控和机制

国家自然科学基金

0+阅读 · 2014年12月31日

复杂散射机制场景的SAR图像认知方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于视觉注意机制的SAR图像小目标检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于集成异构网络的表型-基因关联挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂场景光流场计算的鲁棒性和病态问题分析

国家自然科学基金

0+阅读 · 2009年12月31日

不同色纠缠光子的同步

国家自然科学基金

0+阅读 · 2009年12月31日

基于杂波统计建模的SAR图像目标智能快速检测方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员