Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of $O \left( A^{2} S^{3}\log(S)\log(\epsilon^{-1}) \epsilon^{-1} \right)$ for the best choice of parameters on ellipsoidal and Kullback-Leibler $s$-rectangular uncertainty sets, where $S$ and $A$ is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of $O(A^{1.5}S^{1.5})$) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with $s$-rectangular KL uncertainty sets.
翻译:robust Markov 决策进程( MDPs) 是模型不确定性的序列决策问题模型的强大框架。 本文为解决稳健的 MDP 提出第一阶框架。 我们的算法中将原始双向第一阶更新与大约值迭代更新。 通过仔细控制数值迭代更新的精确度和成本之间的权衡, 我们实现的ERgodic 趋同率为$O\left( A ⁇ 2} S ⁇ 3 ⁇ log( g)( epsilon ⁇ -1}) \ epsilon ⁇ -1}\right) $, 用于最佳选择 elloplipal 和 Kullback- Leibel $s- 矩形不确定性数据集的参数。 我们的算法也比我们最坚固的K- Q- Q- 方形不确定性框架( 美元) 显示, 我们的算法也比我们最坚固的硬的矩形框架( K- L) 。