Multiple-environment MDPs (MEMDPs) capture finite sets of MDPs that share the states but differ in the transition dynamics. These models form a proper subclass of partially observable MDPs (POMDPs). We consider the synthesis of policies that robustly satisfy an almost-sure reachability property in MEMDPs, that is, one policy that satisfies a property for all environments. For POMDPs, deciding the existence of robust policies is an EXPTIME-complete problem. In this paper, we show that this problem is PSPACE-complete for MEMDPs, while the policies in general require exponential memory. We exploit the theoretical results to develop and implement an algorithm that shows promising results in synthesizing robust policies for various benchmarks.
翻译:多环境 MDP(MEMDPs) 捕捉了数量有限的多DP(MDPs), 与各州相同,但在转型动态方面却有所不同。 这些模型构成了部分可观测的 MDP(POMDPs) 的适当子类。 我们考虑将能够强有力地满足MEMDP(MEMDPs)中几乎可以达到的属性的政策(即一种满足所有环境的属性的政策)综合在一起。 对于POMDPs来说,决定是否存在稳健的政策是EXPTIME 完整的问题。 在本文中,我们表明这个问题对于MEMDPs来说是PSPACE的完成问题,而一般的政策则需要指数记忆。 我们利用理论结果来制定和实施一种算法,该算法在综合各种基准的稳健政策方面显示出有希望的结果。