In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most adverse transition kernel from a prescribed ambiguity set. In this paper, we develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets that decomposes the problem into a sequence of robust Bellman updates and simplex projections. Exploiting the rich structure present in the simplex projections corresponding to phi-divergence ambiguity sets, we show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers as well as a recent first-order solution scheme, thus rendering them attractive alternatives to classical MDPs in practical applications.
翻译:近年来,稳健的马尔科夫决策流程(MDPs)已成为受不确定性影响的动态决策问题的重要模型框架。与传统的MDPs形成鲜明的模型框架相反,传统的MDPs通过一个已知的过渡内核的随机过程模拟动态,这只能说明随机性,而强健的MDPs又通过一个已知的过渡内核的模拟过程,通过优化从规定的模棱两可中最不利的过渡内核来解释模糊性。在本文件中,我们为强健的MDPs制定了一个新的解决方案框架,带有立方形模糊性,将问题分解成一个稳健的贝尔曼更新和简单预测的序列。我们利用简单预测中的丰富结构,与视视网点的模糊性各组相匹配,我们表明,相关的长方形强的MDPs能够大大快于最先进的商业解决方案以及最近的一级解决方案,从而在实际应用中为典型的MDP提供具有吸引力的替代方案。