Consensus protocols are crucial for reliable distributed systems as they let them cope with network and server failures. For decades, most consensus protocols have been designed as variations of the seminal Paxos, yet in 2014 Raft was presented as a new, "understandable" protocol, meant to be easier to implement than the notoriously subtle Paxos family. Raft has since been used in various industrial projects, e.g. Hashicorp's Consul or etcd (used by Google's Kubernetes). The correctness of Raft is established via a manual proof, based on a TLA+ specification of the protocol. This paper reports our experience in modeling Raft in the LNT process algebra. We found a couple of issues with the original TLA+ specification of Raft, which has been corrected since. More generally, this exercise offers a great opportunity to discuss how to best use the features of the LNT formal language and the associated CADP verification toolbox to model distributed protocols, including network and server failures.
翻译:对可靠的分布式系统来说,共识协议至关重要,因为它们能应付网络和服务器故障。几十年来,大多数共识协议都是设计成具有先质的和平协议的变异,然而,2014年拉夫特被作为一个新的“可理解”协议提出,这比臭名昭著的同侪家庭更容易执行。拉夫特后来在各种工业项目中被使用,例如哈希科公司领事或(谷歌Kubernetes使用)等。拉夫特的正确性是通过人工证明建立的,它基于协议的TLA+规格。本文报告了我们在LNT进程代号中模拟拉夫特的经验。我们发现了与拉夫特原TLA+规格有关的若干问题,此后已经纠正。更一般地说,这项工作为讨论如何最好地使用LNT正式语言的特征和相关的CADP核查工具箱来模拟分布式协议,包括网络和服务器故障。