Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace failed nodes with healthy ones via dynamic reconfiguration. MongoDB is a document oriented database with a distributed replication mechanism derived from the Raft protocol. In this paper, we present MongoRaftReconfig, a novel dynamic reconfiguration protocol for the MongoDB replication system. MongoRaftReconfig utilizes a logless approach to managing configuration state and decouples the processing of configuration changes from the main database operation log. The protocol's design was influenced by engineering constraints faced when attempting to redesign an unsafe, legacy reconfiguration mechanism that existed previously in MongoDB. We provide a safety proof of MongoRaftReconfig, along with a formal specification in TLA+. To our knowledge, this is the first published safety proof and formal specification of a reconfiguration protocol for a Raft-based system. We also present results from model checking its safety properties on finite protocol instances. Finally, we discuss the conceptual novelties of MongoRaftReconfig, how it can be understood as an optimized and generalized version of the single server reconfiguration algorithm of Raft, and present an experimental evaluation of how its optimizations can provide performance benefits for reconfigurations.
翻译:基于复制的州机器模型的分布式复制系统已成为现代数据库系统的基础,无处不在。为了确保在存在故障的情况下提供这些系统,这些系统必须能够动态地以动态重组的方式以健康节点取代故障节点。 MOngoDB是一个面向文件的数据库,有一个根据Raft 协议建立的分布式复制机制。 在本文中,我们介绍了MongoRaftReconfig,这是MongoDB复制系统的新颖动态重组协议。MongoRaftReconfig使用一种无逻辑的方法管理配置状态,并消除了主要数据库运行日志中配置变化的处理。协议的设计必须能够受到工程限制的影响,而这是试图重新设计一个以前在MongoDB中存在的不安全、遗留的重组机制时遇到的。 我们为MongoRaft Reconfig提供了安全证明, 以及TLA+中的正式规范。 据我们所知,这是对基于Raft 系统的重新配置协议的首次公布的安全证明和正式规范。 我们还介绍了从模型中检查其配置变化安全特性的结果。 最后,我们讨论了目前对Mong-Raconf AStorimal requistrup Sup 的系统进行概念上的优化的升级,我们了解了它如何能进行。