Beyond implementation correctness of a distributed system, it is equally important to understand exactly what users should expect to see from that system. Even if the system itself works as designed, insufficient understanding of its user-visible semantics can cause bugs in its dependencies. By focusing a formal specification effort on precisely defining the expected user-facing behaviors of the Azure Cosmos DB service at Microsoft, we were able to write a formal specification of the database that was significantly smaller and conceptually simpler than any other specification of Cosmos DB, while representing a wider range of valid user-observable behaviors than existing more detailed specifications. Many of the additional behaviors we documented were previously poorly understood outside of the Cosmos DB development team, even informally, leading to data consistency errors in Microsoft products that depend on it. Using this model, we were able to raise two key issues in Cosmos DB's public-facing documentation, which have since been addressed. We were also able to offer a fundamental solution to a previous high-impact outage within another Azure service that depends on Cosmos DB.
翻译:除了执行分布式系统正确性之外,同样重要的是要确切了解用户期望从该系统中看到什么。即使系统本身按设计运作,对其用户可见的语义理解不足,也会在其依赖性中造成错误。通过将正式的规格工作重点放在精确界定微软Azure宇宙DB服务的预期用户影响行为上,我们得以在数据库中写出一个正式的规格,该规格比宇宙DB的其他规格要小得多,概念上简单得多,同时代表了比现有更详细的规格更广泛的有效用户可观察行为。我们记录的其他许多行为以前在宇宙DB开发小组之外被不甚了解,甚至非正式地导致依赖它的微软产品的数据一致性错误。我们利用这一模型,在宇宙DB的公众数据文件中提出了两个关键问题,这些问题后来得到了处理。我们还能够为另一个依赖宇宙DB的Azure服务中的前一次高影响外差提供一种基本解决方案。