We present an algorithm for formal verification and parameter synthesis of continuous state-space Markov chains. This class of problems captures the design and analysis of a wide variety of autonomous and cyber-physical systems defined by nonlinear and black-box modules. In order to solve these problems, one has to maximize certain probabilistic objective functions overall choices of initial states and parameters. In this paper, we identify the assumptions that make it possible to view this problem as a multi-armed bandit problem. Based on this fresh perspective, we propose an algorithm (HOO-MB) for solving the problem that carefully instantiates an existing bandit algorithm -- Hierarchical Optimistic Optimization -- with appropriate parameters. As a consequence, we obtain theoretical regret bounds on sample efficiency of our solution that depends on key problem parameters like smoothness, near-optimality dimension, and batch size. The batch size parameter enables us to strike a balance between the sample efficiency and the memory usage of the algorithm. Our experiments, using the tool HooVer, suggest that the approach scales to realistic-sized problems and is often more sample-efficient compared to PlasmaLab -- a leading tool for verification of stochastic systems. Specifically, HooVer has distinct advantages in analyzing models in which the objective function has sharp slopes. In addition, HooVer shows promising behavior in parameter synthesis for a linear quadratic regulator (LQR) example.
翻译:我们提出了一个正式的算法,用于对连续的州空间Markov 链进行正式的核查和参数合成。 这一类问题包含对非线性和黑盒模块定义的各种自主和网络物理系统的设计和分析。 为了解决这些问题,我们必须尽量扩大某些概率目标功能, 以总体选择初始状态和参数。 在本文件中, 我们确定能够将这一问题视为多臂强盗问题的假设。 基于这个新视角, 我们提议了一个算法( HOO- MB), 以解决一个问题, 仔细地将现有的强盗算法 -- -- 高度主义的优化优化优化 -- 与适当的参数同步。 因此, 我们从理论上对解决方案的样本效率进行遗憾, 取决于诸如平滑度、接近优化度和批量大小等关键的问题参数。 批量大小参数使我们能够在样本效率和对算法的记忆用量之间取得平衡。 我们的实验, 利用HooVer工具, 提出, 将方法尺度推到现实规模的问题, 并且往往比Plas-L 直线性分析工具的精确度优势。