In this paper we propose a novel $Q$-learning algorithm allowing to solve distributionally robust Markov decision problems for which the ambiguity set of probability measures can be chosen arbitrarily as long as it comprises only a finite amount of measures. Therefore, our approach goes beyond the well-studied cases involving ambiguity sets of balls around some reference measure with the distance to reference measure being measured with respect to the Wasserstein distance or the Kullback--Leibler divergence. Hence, our approach allows the applicant to create ambiguity sets better tailored to her needs and to solve the associated robust Markov decision problem via a $Q$-learning algorithm whose convergence is guaranteed by our main result. Moreover, we showcase in several numerical experiments the tractability of our approach.
翻译:暂无翻译