The problem of total-order (uniform reliable) broadcast is fundamental in fault-tolerant distributed computing since it abstracts a broad set of problems requiring processes to uniformly deliver messages in the same order in which they were sent. Existing solutions (that tolerate process failures) reduce the total-order broadcast problem to the one of multivalued consensus. Our study aims at the design of an even more reliable solution. We do so through the lenses of self-stabilization-a very strong notion of fault tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). This work proposes the first (to the best of our knowledge) self-stabilizing algorithm for total-order (uniform reliable) broadcast for asynchronous message-passing systems prone to process failures and transient faults. As we show, the proposed solution facilitates the elegant construction of self-stabilizing state-machine replication using bounded memory.
翻译:全序(统一可靠)广播问题是全序(统一可靠)广播的根本问题,因为它总结了一系列广泛的问题,要求各种程序按照发送信息时的先后顺序统一发送信息; 现有解决办法(容忍程序故障)将全序广播问题降低到多值共识的问题; 我们的研究旨在设计更可靠的解决办法; 我们通过自我稳定镜(一个非常强烈的接受错误概念)这样做; 除了节点和通信故障之外,自我稳定算法可以在出现任意的瞬时故障后恢复; 这些故障是对系统设计用于运行的假设(只要算法代码保持不变)的任何违反; 这项工作提出了第一个(我们最了解的)全序(统一可靠)自动稳定算法, 用于容易处理故障和瞬时错误的无同步电传信系统。 正如我们所显示的, 拟议的解决办法有助于利用封闭的记忆进行自我稳定的国家机器复制的优雅构建。