Model checking undiscounted reachability and expected-reward properties on Markov decision processes (MDPs) is key for the verification of systems that act under uncertainty. Popular algorithms are policy iteration and variants of value iteration; in tool competitions, most participants rely on the latter. These algorithms generally need worst-case exponential time. However the problem can equally be formulated as a linear program, solvable in polynomial time. In this paper, we give a detailed overview of today's state-of-the-art algorithms for MDP model checking with a focus on performance and correctness. We highlight their fundamental differences, and describe various optimisations and implementation variants. We experimentally compare floating-point and exact-arithmetic implementations of all algorithms on three benchmark sets using two probabilistic model checkers. Our results show that (optimistic) value iteration is a sensible default, but other algorithms are preferable in specific settings. This paper thereby provides a guide for MDP verification practitioners -- tool builders and users alike.
翻译:对Markov决策程序(MDPs)进行模型性能和预期回报性检查是核实在不确定情况下运行的系统的关键。通用算法是政策迭代和价值迭代的变异;在工具竞争中,大多数参与者依赖后者。这些算法一般需要最坏的指数时间。然而,问题同样可以作为一个线性程序来表述,在多元时间中是可溶的。在本文中,我们详细概述当今最先进的MDP模式的算法,以性能和正确性为重点进行检查。我们强调了这些算法的基本差异,并描述了各种优化和执行变异。我们用两种概率模型核对器实验比较了三个基准组上所有算法的浮动点和精确差异性实施情况。我们的结果表明,(optic)值迭代是一个合理的默认,但其他算法在特定情况下更可取。本文为 MDP 验证从业者提供了指南, 包括工具构建者和用户。