通过有条件的 Entropy 部分观察的 Markov 决策程序的主动轨迹估计 (Active Trajectory Estimation for Partially Observed Markov Decision Processes via Conditional Entropy) - 专知论文

会员服务 ·

0

部分可观测马尔可夫决策过程 · 估计/估计量 · 平滑 · Processing（编程语言） · 价值函数 ·

2021 年 4 月 4 日

Active Trajectory Estimation for Partially Observed Markov Decision Processes via Conditional Entropy

翻译：通过有条件的 Entropy 部分观察的 Markov 决策程序的主动轨迹估计

Timothy L. Molloy,Girish N. Nair

from arxiv, 7 pages, 3 figures, accepted for presentation at 2021 European Control Conference

In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.

翻译：在本文中,我们考虑了控制部分观测到的马尔科夫决定过程(POMDP)的问题,以便积极估计其在一个固定的地平线上的状态轨迹,同时最小的不确定性。我们提出了一个新颖的积极平滑问题,其目标是直接将平滑的环流(即(联合)状态轨道分布的有条件的环流)在固定间贝叶斯平滑中直接最小化。我们的配方与先前的积极方法形成对照,这些方法将巴伊西亚过滤器提供的有条件的(边际)状态估计(边际)的元素总和最小化。通过在POMDP信仰(或信息)状态上建立一种新型的顺畅的顺畅的环流,我们表明,我们积极的平滑问题可以重新拟订为(完全观察到的)马尔科夫决定过程,其价值功能在信仰状态中是相同的。价值函数的共性特别重要,因为它使得我们能够利用标准POMDP解决方案的近似线性函数近似地解决我们积极的平滑问题。我们用模拟的微不确定性模型来说明我们积极平滑动的近似方法的解决方案。

0

相关内容

部分可观测马尔可夫决策过程

部分可观测马尔可夫决策过程

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

250+阅读 · 2020年5月18日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

专知会员服务

28+阅读 · 2019年12月5日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Arxiv

0+阅读 · 2021年5月31日

The Role of Entropy in Guiding a Connection Prover

Arxiv

0+阅读 · 2021年5月31日

Parameter Estimation for the SEIR Model Using Recurrent Nets

Arxiv

0+阅读 · 2021年5月30日

Galerkin Neural Networks: A Framework for Approximating Variational Equations with Error Control

Arxiv

0+阅读 · 2021年5月28日

Parameter estimation in CKLS model by continuous observations

Arxiv

0+阅读 · 2021年5月28日

A New Algorithm for the LQR Problem with Partially Unknown Dynamics

Arxiv

0+阅读 · 2021年5月28日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

Bayesian Optimisation for Constrained Problems

Bayesian Optimisation for Constrained Problems

Arxiv

0+阅读 · 2021年5月27日

Dimension-Free Empirical Entropy Estimation

Dimension-Free Empirical Entropy Estimation

Arxiv

0+阅读 · 2021年5月27日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

部分可观测马尔可夫决策过程

估计/估计量

Processing（编程语言）

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

【经典书】贝叶斯编程，378页pdf，Bayesian Programming

专知会员服务

250+阅读 · 2020年5月18日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

【电子书推荐】入门深度学习的Python书籍，Grokking Deep Learning最新版，Google DeepMind|Andrew Trask

专知会员服务

28+阅读 · 2019年12月5日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Arxiv

0+阅读 · 2021年5月31日

The Role of Entropy in Guiding a Connection Prover

Arxiv

0+阅读 · 2021年5月31日

Parameter Estimation for the SEIR Model Using Recurrent Nets

Arxiv

0+阅读 · 2021年5月30日

Galerkin Neural Networks: A Framework for Approximating Variational Equations with Error Control

Arxiv

0+阅读 · 2021年5月28日

Parameter estimation in CKLS model by continuous observations

Arxiv

0+阅读 · 2021年5月28日

A New Algorithm for the LQR Problem with Partially Unknown Dynamics

Arxiv

0+阅读 · 2021年5月28日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

Bayesian Optimisation for Constrained Problems

Bayesian Optimisation for Constrained Problems

Arxiv

0+阅读 · 2021年5月27日

Dimension-Free Empirical Entropy Estimation

Dimension-Free Empirical Entropy Estimation

Arxiv

0+阅读 · 2021年5月27日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

微信扫码咨询专知VIP会员