The stochastic shortest path problem (SSPP) asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This paper addresses the optimization of the variance-penalized expectation (VPE) of the accumulated weight, which is a variant of the SSPP in which a multiple of the variance of accumulated weights is incurred as a penalty. It is shown that the optimal VPE in MDPs with non-negative weights as well as an optimal deterministic finite-memory scheduler can be computed in exponential space. The threshold problem whether the maximal VPE exceeds a given rational is shown to be EXPTIME-hard and to lie in NEXPTIME. Furthermore, a result of interest in its own right obtained on the way is that a variance-minimal scheduler among all expectation-optimal schedulers can be computed in polynomial time.
翻译:随机最短路径问题(SSPP) 要求解决Markov 决策程序中的非确定性选择, 以便达到目标状态之前的预期累积重量最大化。 本文涉及对累积重量的差分对应预期( VPE)的优化, 这是 SSPPP 的变种, 累积重量的多重差异会作为一种惩罚发生。 事实证明, 具有非负性重量的 MDP 中的最佳 VPE 和最佳确定性定数定数定时器可以在指数空间中计算。 最大 VPE 是否超过给定的理性的临界值问题被显示为 EXPTIME 硬度, 并存在于 NEXPTIME 中。 此外, 对自身权利的兴趣在方式上的结果是, 所有预期- 最佳排表器之间的差异最小排表器可以在多元时间计算 。