To guarantee that machine learning models yield outputs that are not only accurate, but also robust, recent works propose formally verifying robustness properties of machine learning models. To be applicable to realistic safety-critical systems, the used verification algorithms need to manage the combinatorial explosion resulting from vast variations in the input domain, and be able to verify correctness properties derived from versatile and domain-specific requirements. In this paper, we formalise the VoTE algorithm presented earlier as a tool description, and extend the tool set with mechanisms for systematic scalability studies. In particular, we show a) how the separation of property checking from the core verification engine enables verification of versatile requirements, b) the scalability of the tool, both in terms of time taken for verification and use of memory, and c) that the algorithm has attractive properties that lend themselves well for massive parallelisation. We demonstrate the application of the tool in two case studies, namely digit recognition and aircraft collision avoidance, where the first case study serves to assess the resource utilisation of the tool, and the second to assess the ability to verify versatile correctness properties.
翻译:为了保证机器学习模型产生不仅准确而且稳健的产出,最近的工作提议正式核查机器学习模型的稳健性。为了适用于现实的安全临界系统,所使用的核查算法需要管理输入领域差异巨大造成的组合爆炸,并能够核查来自多功能和特定领域要求的正确性。在本文件中,我们正式确定早些时候作为工具说明的VoTE算法,并扩展与系统可扩缩性研究机制有关的工具。特别是,我们表明,财产检查与核心核查引擎分离,如何能够核查多种功能要求,b)该工具的可扩缩性,无论是在核查和使用记忆所花的时间方面,还是在(c)该算法具有吸引性的特性,能够促进大规模平行化。我们在两个案例研究中展示了该工具的应用情况,即数字识别和避免飞机碰撞,第一个案例研究用来评估工具的资源利用情况,第二个案例研究用来评估核实多种功能的准确性。