We formally verify executable algorithms for solving Markov decision processes (MDPs) in the interactive theorem prover Isabelle/HOL. We build on existing formalizations of probability theory to analyze the expected total reward criterion on infinite-horizon problems. Our developments formalize the Bellman equation and give conditions under which optimal policies exist. Based on this analysis, we verify dynamic programming algorithms to solve tabular MDPs. We evaluate the formally verified implementations experimentally on standard problems and show they are practical. Furthermore, we show that, combined with efficient unverified implementations, our system can compete with and even outperform state-of-the-art systems.
翻译:我们在互动理论证明Isabelle/HOL中正式核实解决Markov决定程序(MDPs)的可执行算法。 我们利用现有的概率理论正规化来分析无限正数问题的预期总报酬标准。 我们的发展将贝尔曼方程式正式化,并为存在最佳政策提供条件。 基于这一分析, 我们核查动态程序算法, 以解决表格式MDPs。 我们根据标准问题对经正式核实的执行法进行了实验, 并表明这些执行法是切实可行的。 此外, 我们证明,与高效的、未经核实的执行法相结合, 我们的系统可以与最先进的系统竞争, 甚至优于最先进的系统。