This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.
翻译:本文介绍COOL-MC,这是一个综合最新强化学习(RL)和模型检查的工具,具体地说,该工具以OpenAI健身房和概率模型检查器 " 风暴 " 为基础。COOL-MC提供以下特征:(1) 一个模拟器,用于在OpenAI健身房培训RL政策,用于作为 " 风暴 " 投入的Markov决策程序(MDPs),(2) 一个新的 " 风暴 " 模型构建器,利用回调功能核查(神经网络)RL政策,(3) 与OpenAI健身房或 " 风暴 " 中具体规定的模式和政策相关的正式抽象数据,(4) 算法,以获得关于所谓 " 许可政策 " 绩效的界限,我们描述COOL-MC的组成部分和结构,并展示其在多个基准环境中的特点。