We present a novel $Q$-learning algorithm to solve distributionally robust Markov decision problems, where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.
翻译:我们提出了一种新的以Q$为单位的学习算法,以解决分配上稳健的马尔科夫决定问题,在这个算法中,基本马尔科夫决定程序相应的过渡概率的模棱两可的模棱两可是围绕一个(可能估计的)参考衡量尺度的瓦西斯坦球。 我们证明所介绍的算法是趋同的,并提供了几个例子,用真实数据来说明我们的算法的可伸缩性以及在解决随机最佳控制问题时考虑分配上的稳健性的好处,特别是当估计的分布在实际中被错误地描述时。