前向和后向的Bellman等式提高了DEC-POMDP的EM算法效率 (Forward and Backward Bellman equations improve the efficiency of EM algorithm for DEC-POMDP) - 专知论文

会员服务 ·

0

后向 · 前向 · 逆矩阵 · 部分可观测马尔可夫决策过程 · 潜变量/隐变量 ·

2021 年 5 月 6 日

Forward and Backward Bellman equations improve the efficiency of EM algorithm for DEC-POMDP

翻译：前向和后向的Bellman等式提高了DEC-POMDP的EM算法效率

Takehiro Tottori,Tetsuya J. Kobayashi

Decentralized partially observable Markov decision process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by the EM algorithm. However, in EM for DEC-POMDP, the forward--backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose the Bellman EM algorithm (BEM) and the modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward--backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM calculates an inverse matrix. We circumvent this shortcoming in MBEM by calculating the forward and backward Bellman equations without the inverse matrix. Our numerical experiments demonstrate that the convergence of MBEM is faster than that of EM.

翻译：由于DEC-POMDP的规划可以被解释为潜伏变量模型的最大可能性估计,DEC-POMDP可以通过EM算法加以解决。然而,在DEC-POMDP的EM中,前向后向算法需要按照影响计算效率的无限地平线来计算。在本文中,我们提议Bellman EM算法(BEM)和修改后的Bellman EM算法(MBEM),方法是将前向和后向贝尔曼等式引入EM。BEM的效率可能高于EM,因为BEM计算前向和后向贝尔曼等式,而不是远至无限地平线的前向后算法。然而,在问题大的时候,BEM计算出一个反向矩阵时,BEM总比EM效率高。我们通过不采用反向矩阵计算前向和后向贝尔曼等式来绕过MBEM。我们的数字实验表明,MBEM的趋同速度比MEM的趋同速度要快。

0

相关内容

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

53+阅读 · 2020年11月21日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

110+阅读 · 2020年5月15日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

133+阅读 · 2020年4月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

179+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

53+阅读 · 2020年1月30日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

41+阅读 · 2019年10月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

176+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

MARS: A second-order reduction algorithm for high-dimensional sparse precision matrices estimation

Arxiv

0+阅读 · 2021年6月25日

A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

Arxiv

0+阅读 · 2021年6月24日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2021年6月24日

Bayesian Optimization with High-Dimensional Outputs

Arxiv

0+阅读 · 2021年6月24日

Semi-supervised Vector-valued Learning: From Theory to Algorithm

Arxiv

0+阅读 · 2021年6月24日

Nonlinear Matrix Approximation with Radial Basis Function Components

Arxiv

0+阅读 · 2021年6月23日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

部分可观测马尔可夫决策过程

潜变量/隐变量

相关VIP内容

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

53+阅读 · 2020年11月21日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

110+阅读 · 2020年5月15日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

133+阅读 · 2020年4月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

179+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

53+阅读 · 2020年1月30日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

41+阅读 · 2019年10月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

176+阅读 · 2019年10月11日

热门VIP内容

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

MARS: A second-order reduction algorithm for high-dimensional sparse precision matrices estimation

Arxiv

0+阅读 · 2021年6月25日

A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

Arxiv

0+阅读 · 2021年6月24日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2021年6月24日

Bayesian Optimization with High-Dimensional Outputs

Arxiv

0+阅读 · 2021年6月24日

Semi-supervised Vector-valued Learning: From Theory to Algorithm

Arxiv

0+阅读 · 2021年6月24日

Nonlinear Matrix Approximation with Radial Basis Function Components

Arxiv

0+阅读 · 2021年6月23日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员