We consider parametric Markov decision processes (pMDPs) that are augmented with unknown probability distributions over parameter values. The problem is to compute the probability to satisfy a temporal logic specification within any concrete MDP that corresponds to a sample from these distributions. As this problem is infeasible to solve precisely, we resort to sampling techniques that exploit the so-called scenario approach. Based on a finite number of samples of the parameters, the proposed method yields high-confidence bounds on the probability of satisfying the specification. The number of samples required to obtain a high confidence on these bounds is independent of the number of states and the number of random parameters. Experiments on a large set of benchmarks show that several thousand samples suffice to obtain tight and high-confidence lower and upper bounds on the satisfaction probability.
翻译:我们考虑的参数Markov决定过程(pMDPs)比参数值增加了未知的概率分布。问题在于计算在任何与这些分布样本相对应的具体的MDP中满足时间逻辑规格的概率。由于这个问题无法准确解决,我们采用利用所谓的假设情景方法的取样技术。根据参数的有限样品数量,拟议方法在满足规格的概率方面产生高度自信的界限。获得对这些界限高度信任所需的样品数量不取决于国家数量和随机参数的数量。对大量基准的实验显示,几千个样品足以在满足概率方面获得紧凑和高度自信的下限和上限。