前向和后向的Bellman等式提高了DEC-POMDP的EM算法效率 (Forward and Backward Bellman equations improve the efficiency of EM algorithm for DEC-POMDP) - 专知论文

会员服务 ·

0

前向-后向算法 · 逆矩阵 · 后向 · 前向 · 部分可观测马尔可夫决策过程 ·

2021 年 3 月 19 日

Forward and Backward Bellman equations improve the efficiency of EM algorithm for DEC-POMDP

翻译：前向和后向的Bellman等式提高了DEC-POMDP的EM算法效率

Takehiro Tottori,Tetsuya J. Kobayashi

Decentralized Partially Observable Markov Decision Process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by EM algorithm. However, in EM for DEC-POMDP, the forward-backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose Bellman EM algorithm (BEM) and Modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward-backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM calculates an inverse matrix. We circumvent this shortcoming in MBEM by calculating the forward and backward Bellman equations without the inverse matrix. Our numerical experiments demonstrate that the convergence of MBEM is faster than that of EM.

翻译：由于DEC-POMDP的规划可以被解释为潜伏变量模型的最大可能性估计,DEC-POMDP可以通过EM算法解决。然而,在DEC-POMDP的EM中,前向后向算法需要计算到影响计算效率的无限地平线。在本文中,我们建议Bellman EM算法(BEM)和改造Bellman EM算法(MBEMEM),将前向和后向Bellman等式引入EM。BEM的效率可能高于EM,因为BEM计算前向和后向Bellman等式,而不是远至无限地平线的前向算法。然而,当问题大的时候,BEM计算出一个反向矩阵时,BEM总比EM效率高。我们通过不采用反向矩阵计算前向和后向Bellman等式来绕过MBEM。我们的数字实验表明,MBEM的趋同速度比后者快。

0

相关内容

前向-后向算法

前向-后向算法

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Quantum algorithm for doubling the amplitude of the search problem's solution states

Quantum algorithm for doubling the amplitude of the search problem's solution states

Arxiv

0+阅读 · 2021年5月14日

A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix

Arxiv

0+阅读 · 2021年5月13日

Asymptotic Properties of Penalized Spline Estimators in Concave Extended Linear Models: Rates of Convergence

Arxiv

0+阅读 · 2021年5月13日

Fourier Growth of Parity Decision Trees

Arxiv

0+阅读 · 2021年5月13日

Efficient Algorithms for Estimating the Parameters of Mixed Linear Regression Models

Arxiv

0+阅读 · 2021年5月12日

Is Pessimism Provably Efficient for Offline RL?

Arxiv

0+阅读 · 2021年5月12日

Application of the Level-$2$ Quantum Lasserre Hierarchy in Quantum Approximation Algorithms

Arxiv

0+阅读 · 2021年5月12日

Nearly optimal central limit theorem and bootstrap approximations in high dimensions

Arxiv

0+阅读 · 2021年5月12日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

前向-后向算法

部分可观测马尔可夫决策过程

相关VIP内容

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多传感器融合感知综述：背景、方法、挑战与前景

《基于深度学习模型的图像军事目标检测》

TKDE | 推荐系统鲁棒性全面综述及鲁棒性评测库

中文版 | 战场创新：以色列-伊朗与俄罗斯-乌克兰战场如何重塑现代战争

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Quantum algorithm for doubling the amplitude of the search problem's solution states

Quantum algorithm for doubling the amplitude of the search problem's solution states

Arxiv

0+阅读 · 2021年5月14日

A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix

Arxiv

0+阅读 · 2021年5月13日

Asymptotic Properties of Penalized Spline Estimators in Concave Extended Linear Models: Rates of Convergence

Arxiv

0+阅读 · 2021年5月13日

Fourier Growth of Parity Decision Trees

Arxiv

0+阅读 · 2021年5月13日

Efficient Algorithms for Estimating the Parameters of Mixed Linear Regression Models

Arxiv

0+阅读 · 2021年5月12日

Is Pessimism Provably Efficient for Offline RL?

Arxiv

0+阅读 · 2021年5月12日

Application of the Level-$2$ Quantum Lasserre Hierarchy in Quantum Approximation Algorithms

Arxiv

0+阅读 · 2021年5月12日

Nearly optimal central limit theorem and bootstrap approximations in high dimensions

Arxiv

0+阅读 · 2021年5月12日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员