Markov decision processes (MDPs) are known to be sensitive to parameter specification. Distributionally robust MDPs alleviate this issue by allowing for \emph{ambiguity sets} which give a set of possible distributions over parameter sets. The goal is to find an optimal policy with respect to the worst-case parameter distribution. We propose a framework for solving Distributionally robust MDPs via first-order methods, and instantiate it for several types of Wasserstein ambiguity sets. By developing efficient proximal updates, our algorithms achieve a convergence rate of $O\left(NA^{2.5}S^{3.5}\log(S)\log(\epsilon^{-1})\epsilon^{-1.5} \right)$ for the number of kernels $N$ in the support of the nominal distribution, states $S$, and actions $A$; this rate varies slightly based on the Wasserstein setup. Our dependence on $N,A$ and $S$ is significantly better than existing methods, which have a complexity of $O\left(N^{3.5}A^{3.5}S^{4.5}\log^{2}(\epsilon^{-1}) \right)$. Numerical experiments show that our algorithm is significantly more scalable than state-of-the-art approaches across several domains.
翻译:Markov 决策程序( MDPs) 已知对参数规格非常敏感。 分布式强的 MDPs 可以通过允许 \ emph{ ambiguity sets} 来缓解这一问题, 允许在参数组上提供一系列可能的分布。 目标是找到最坏参数分布的最佳政策 。 我们提出一个框架, 以便通过一阶方法解决分布式强的 MDPs, 并针对几种瓦塞斯坦模棱两可的集体进行即时处理。 通过开发高效的近似更新, 我们的算法比现有的方法( $left( N ⁇ 2.5} S) 3.5 log( )\ log( ) (\ epsilon) ⁇ - 1.5}\ right) $( 美元) 。 在支持名义分布时, $S$S, $S, $, 美元, 和 $S, $S, $, 和 $S, $, 大大优于现有方法, $O\\ left, lical_ groal_ aroum_ aro) pral_ acal_ groom_ arus proom_ pal_ pal_ arus progrocal_