We consider parametric Markov decision processes (pMDPs) that are augmented with unknown probability distributions over parameter values. The problem is to compute the probability to satisfy a temporal logic specification with any concrete MDP that corresponds to a sample from these distributions. As solving this problem precisely is infeasible, we resort to sampling techniques that exploit the so-called scenario approach. Based on a finite number of samples of the parameters, the proposed method yields high-confidence bounds on the probability of satisfying the specification. The number of samples required to obtain a high confidence on these bounds is independent of the number of states and the number of random parameters. Experiments on a large set of benchmarks show that several thousand samples suffice to obtain tight and high-confidence lower and upper bounds on the satisfaction probability.
翻译:我们考虑的参数Markov决定过程(pMDPs)比参数值增加了未知概率分布。问题在于用与这些分布样本相对应的任何具体MDP来计算满足时间逻辑规格的概率。由于这个问题的精确解决不可行,我们采用利用所谓的情景假设方法的抽样技术。根据参数的有限样本数量,拟议方法在满足规格的概率方面产生高度的自信。获得对这些界限高度信任所需的样本数量独立于国家数量和随机参数的数量。对大量基准的实验显示,几千个样本足以在满足概率方面获得紧凑和高度自信的下限和上限。