The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.
翻译:一个在局部转移下对约束MDP和随机最短路径进行全多项式时间逼近的方案
翻译后的摘要:
固定时序的约束马尔可夫决策过程(C-MDP)是在操作约束下规划随机环境中的一种通用模型。几率约束MDP(CC-MDP)是其变体,允许限定约束违反的可能性,这在许多安全关键型应用程序中是必要的。CC-MDP还可以对一类称为具有死路的随机最短路径(SSP)的MDP建模,其中存在目标达成的概率和代价之间的权衡。本文研究(C)C-MDP的结构,特别是一种涉及局部转移的重要变体。在这个变体中,状态可达性表现出一定程度的局部性和对其他状态的独立性。更具体地说,给定时间的状态数量,其可达未来状态是固定的。即使在规划时间范围为2时,局部转移的(C)C-MDP也是NP难题。在本文中,我们提出了一个全多项式时间逼近的(C)C-MDP方案,它可以计算(接近)最优的确定性策略。这种算法在理论上是最好的逼近算法之一,并为约束MDP及其变体的逼近性提供了启示。