We study the problem of controlling a partially observed Markov decision process (POMDP) to either aid or hinder the estimation of its state trajectory by optimising the conditional entropy of the state trajectory given measurements and controls, a quantity we dub the smoother entropy. Our consideration of the smoother entropy contrasts with previous active state estimation and obfuscation approaches that instead resort to measures of marginal (or instantaneous) state uncertainty due to tractability concerns. By establishing novel expressions of the smoother entropy in terms of the usual POMDP belief state, we show that our active estimation and obfuscation problems can be reformulated as Markov decision processes (MDPs) that are fully observed in the belief state. Surprisingly, we identify belief-state MDP reformulations of both active estimation and obfuscation with concave cost and cost-to-go functions, which enables the use of standard POMDP techniques to construct tractable bounded-error (approximate) solutions. We show in simulations that optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
翻译:我们研究了控制部分观察到的Markov决定过程(POMDP)的问题,以便通过优化测量和控制,优化国家轨迹的有条件环流,从而帮助或阻碍对其国家轨迹的估计,我们研究了控制部分观测到的Markov决定过程的问题。我们对光滑的环流的考虑与先前的积极国家估计和模糊的方法形成对比,而后者则诉诸于边际(或即时)状态的不确定性措施,因为可移性问题。我们根据POMDP通常的信仰状态,对光滑的环流进行新的表达,我们表明,我们的积极估计和模糊问题可以作为在信仰状态中充分观察到的Markov决定过程(MDPs)重新拟订。令人惊讶的是,我们发现,对主动估计和迷惑两种方法的信念-状态的重新拟订,与连锁成本和成本-成本-go函数相比,使得能够使用标准的POMDP技术来构建可拉动的捆绑的(近)解决方案。我们在模拟中显示,对光滑的诱导器的优化可导致高级的轨迹估计和反转的替代方法。