In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.
翻译:在本文中,我们考虑了控制部分观测到的马尔科夫决定过程(POMDP)的问题,以便积极估计其在一个固定的地平线上的状态轨迹,同时最小的不确定性。我们提出了一个新颖的积极平滑问题,其目标是直接将平滑的环流(即(联合)状态轨道分布的有条件的环流)在固定间贝叶斯平滑中直接最小化。我们的配方与先前的积极方法形成对照,这些方法将巴伊西亚过滤器提供的有条件的(边际)状态估计(边际)的元素总和最小化。通过在POMDP信仰(或信息)状态上建立一种新型的顺畅的顺畅的环流,我们表明,我们积极的平滑问题可以重新拟订为(完全观察到的)马尔科夫决定过程,其价值功能在信仰状态中是相同的。价值函数的共性特别重要,因为它使得我们能够利用标准POMDP解决方案的近似线性函数近似地解决我们积极的平滑问题。我们用模拟的微不确定性模型来说明我们积极平滑动的近似方法的解决方案。