We consider the problem of computing minimum and maximum probabilities of satisfying an $\omega$-regular property in a bounded-parameter Markov decision process (BMDP). BMDP arise from Markov decision processes (MDP) by allowing for uncertainty on the transition probabilities in the form of intervals where the actual probabilities are unknown. $\omega$-regular languages form a large class of properties, expressible as, e.g., Rabin or parity automata, encompassing rich specifications such as linear temporal logic. In a BMDP the probability to satisfy the property depends on the unknown transitions probabilities as well as on the policy. In this paper, we compute the extreme values. This solves the problem specifically suggested by Dutreix and Coogan in CDC 2018, extending their results on interval Markov chains with no adversary. The main idea is to reinterpret their work as analysis of interval MDP and accordingly the BMDP problem as analysis of an $\omega$-regular stochastic game, where a solution is provided. This method extends smoothly further to bounded-parameter stochastic games.
翻译:我们考虑了在封闭的参数Markov(BMDP)决定程序中计算满足美元正值财产的最低和最大概率的问题。BMDP产生于Markov决定过程(MDP),允许在实际概率未知的情况下以间隔形式对过渡概率进行不确定性,从而在实际概率未知的情况下,以间隔形式对过渡概率进行计算。$omega-普通语言形成一大类属性,表现为拉宾或等等同自动式语言,包括大量规格,如线性时间逻辑等。在BMDP中,满足该财产的可能性取决于未知的过渡概率以及政策。我们在本文中计算了极端值。这解决了Dutreix和Cogan在CDC 2018年具体提出的问题,将其结果扩展至没有对手的间隔马尔科夫链。主要想法是将其工作作为间隔MDP的分析重新进行互换,并相应地将BMDP问题作为对美元正值常规的游戏的分析,其中提供了一种解决方案。这种方法向约束性游戏更顺利地延伸至约束式的模拟。