Combinations of active automata learning, model-based testing and model checking have been successfully used in numerous applications, e.g., for spotting bugs in implementations of major network protocols and to support refactoring of embedded controllers. However, in the large majority of these applications, model checking is only used at the very end, when no counterexample can be found anymore for the latest hypothesis model. This contrasts with the original proposal of black-box checking (BBC) by Peled, Vardi & Yannakakis, which applies model checking for all hypotheses, also the intermediate ones. In this article, we present the first systematic evaluation of the ability of BBC to find bugs quickly, based on 77 benchmarks models from real protocol implementations and controllers for which specifications of safety properties are available. Our main finding are: (a) In cases where the full model can be learned, BBC detects violations of the specifications with just 3.4% of the queries needed by an approach in which model checking is only used for the full model. (b) Even when the full model cannot be learned, BBC is still able to detect many violations of the specification. In particular, BBC manages to detect 94% of the safety properties violations in the challenging RERS 2019 industrial LTL benchmarks. (c) Our results also confirm that BBC is way more effective than existing MBT algorithms in finding deep bugs in implementations.
翻译:主动自动机学习、基于模型的测试和模型检查的组合已在众多应用中被成功采用,例如用于发现主要网络协议实现中的错误,以及支持嵌入式控制器的重构。然而,在绝大多数应用中,模型检查仅在最终阶段使用,即当最新的假设模型无法再找到反例时。这与Peled、Vardi和Yannakakis最初提出的黑盒检查(BBC)方案形成对比,后者对所有假设模型(包括中间模型)均应用模型检查。本文基于来自实际协议实现和控制器的77个基准模型(这些模型具备安全属性的规范),首次系统性地评估了BBC快速发现错误的能力。我们的主要发现如下:(a)在能够学习完整模型的情况下,BBC仅需使用仅在完整模型上应用模型检查方法所需查询的3.4%,即可检测到规范违规。(b)即使无法学习完整模型,BBC仍能检测到许多规范违规。具体而言,在具有挑战性的RERS 2019工业LTL基准测试中,BBC成功检测到94%的安全属性违规。(c)我们的结果也证实,在发现实现中的深层错误方面,BBC远比现有的基于模型的测试算法更为有效。