ZooKeeper is a coordination service, widely used as a backbone of various distributed systems. Though its reliability is of critical importance, testing is insufficient for an industrial-strength system of the size and complexity of ZooKeeper, and deep bugs can still be found. To this end, we resort to formal TLA+ specifications to further improve the reliability of ZooKeeper. Our primary objective is usability and automation, rather than full verification. We incrementally develop three levels of specifications for ZooKeeper. We first obtain the protocol specification, which unambiguously specify the Zab protocol behind ZooKeeper. We then proceed to a finer grain and obtain the system specification, which serves as the super-doc for system development. In order to further leverage the model-level specification to improve the reliability of the code-level implementation, we develop the test specification, which guides the explorative testing of the ZooKeeper implementation. The formal specifications help eliminate the ambiguities in the protocol design and provide comprehensive system documentation. They also help find new critical deep bugs in system implementation, which are beyond the reach of state-of-the-art testing techniques.
翻译:ZooDefer是一个协调处,广泛用作各种分布式系统的骨干。虽然它的可靠性至关重要,但测试对ZooDefer的大小和复杂性的工业强度系统来说是不够的,而且仍然能找到深重的错误。为此,我们采用正式的TLA+规格来进一步提高ZooDefer的可靠性。我们的首要目标是使用性和自动化,而不是全面核查。我们逐步为ZooDefer制定三个规格。我们首先获得协议规格,其中明确指定ZooDebraer后面的Zab协议。然后我们着手进行细粒测试,并获得系统规格,作为系统开发的超级软件。为了进一步利用模型规格来提高代码级执行的可靠性,我们制定了测试规格,用以指导对ZooDefer执行的探索性测试。正式规格有助于消除协议设计中的模糊性,并提供全面的系统文件。他们还帮助在系统实施中找到新的临界深度错误,这些错误超出了州级测试技术的范围。