Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs -- via an extension of backwards induction -- with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem.
翻译:解决通用的 Markov 决策程序( MDPs) 是一个计算上很困难的问题。 另一方面, 解决有限和偏顺的 MDPs, 与众所周知的多元时间算法高度相容。 是什么驱动了这种极端差异, 并且存在这些截然相反的复杂性之间的问题? 在本文中, 我们确定并分析一个小类, 其动态能满足特定漂移条件的一般州行动空间的最短路径问题 。 这个构思概括了通过降低可及性对地平线的传统、 时间概念: 一种称为再感应性的属性。 事实证明, 最佳政策可以在聚度时恢复, 通过向后感应, 在再感应 MDPs 中有效模拟。 正在讨论拟议方法的实际考虑, 并对一个精密的最佳清算问题进行数字核查 。