This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.
翻译:本文以间歇状态信息来调查 MDP 。 我们考虑一个假设方案, 让控制器通过不可靠的通信渠道来看待该过程的状态信息。 整个时间范围的国家信息传输模式是伯努利丢失过程。 因此, 问题在于找到在出现国家信息损失的情况下选择行动的最佳政策 。 我们首先将问题发展成信仰 MDP 来建立结构结果 。 系统研究国家信息损失对预期的折扣奖励总额的影响 。 然后, 我们将问题重新描述为树树树木 MDP, 其空间由树木结构组织 。 树木 MDP 的两个有限状态近似点被开发出来, 以有效找到近乎最佳的政策 。 最后, 我们提出了一个固定值的定点准值转换算法, 事实证明它比标准值代数要快 。 数字结果证明了我们方法的有效性 。