We study the problem of controlling a partially observed Markov decision process (POMDP) to either aid or hinder the estimation of its state trajectory. We encode the estimation objectives via the smoother entropy, which is the conditional entropy of the state trajectory given measurements and controls. Consideration of the smoother entropy contrasts with previous approaches that instead resort to marginal (or instantaneous) state entropies due to tractability concerns. By establishing novel expressions for the smoother entropy in terms of the POMDP belief state, we show that both the problems of minimising and maximising the smoother entropy in POMDPs can surprisingly be reformulated as belief-state Markov decision processes with concave cost and value functions. The significance of these reformulations is that they render the smoother entropy a tractable optimisation objective, with structural properties amenable to the use of standard POMDP solution techniques for both active estimation and obfuscation. Simulations illustrate that optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
翻译:我们研究了控制部分观察到的Markov决定过程(POMDP)的问题,以便帮助或阻碍估计其状态轨迹。我们通过光滑的 entropy来编码估算目标,这是根据测量和控制而进行的国家轨迹的有条件的环流。考虑光滑的 entropy 与以前的做法相比,后者由于可移动性考虑而诉诸于边际(或即时)国家异性。通过在POMDP 信仰状态中为光滑的向导建立新的表达方式,我们表明,在POMDP 中,光滑的向导体最小化和最大化的问题可能会令人惊讶地被重新表述为信仰状态的 Markov 决定过程, 并附有共融成本和价值功能。这些重新表述的意义是, 使光滑的向导线是一个可移动的优化目标, 其结构特性与使用标准的POMDP 解决方案技术进行积极估计和粘附。模拟表明,光滑的向导体的精度可导致更优的轨迹估计, 和与替代方法相比难以理解。