Traditional resilient systems operate on fully-replicated fault-tolerant clusters, which limits their scalability and performance. One way to make the step towards resilient high-performance systems that can deal with huge workloads, is by enabling independent fault-tolerant clusters to efficiently communicate and cooperate with each other, as this also enables the usage of high-performance techniques such as sharding and parallel processing. Recently, such inter-cluster communication was formalized as the Byzantine cluster-sending problem, and worst-case optimal protocols have been proposed that solve this problem. Unfortunately, these protocols have an all-case linear complexity in the size of the clusters involved. In this paper, we propose probabilistic cluster-sending techniques that can reliably send messages from one Byzantine fault-tolerant cluster to another with only an expected constant message complexity, this independent of the size of the clusters involved. Depending on the robustness of the clusters involved, our techniques require only two-to-four message round-trips. Furthermore, our protocols can support worst-case linear communication between clusters, which is optimal, and deal with asynchronous and unreliable communication. As such, our work provides a strong foundation for the further development of resilient high-performance systems.
翻译:传统抗御力强的系统在完全复制的防缺陷系统集群上运作,这限制了它们的可缩放性和性能。向具有复原力的高性能系统迈出一步,能够应付巨大的工作量,其途径之一是使独立的防故障系统集群能够高效地相互沟通与合作,因为这也能够使用高性能技术,如散装和平行处理等。最近,这类集群间通信正式化,如拜占庭集群问题,并且提出了解决这一问题的最坏情况最佳协议。不幸的是,这些协议在所涉集群的规模上具有全方位线性复杂性。在本文件中,我们提议了能够可靠地从一个拜占庭的防缺陷集群向另一个集群发送信息的概率性集群发送技术,这种技术只具有预期的持续信息复杂性,这种复杂性与所涉集群的规模无关。根据所涉集群的稳健性,我们的技术只需要两到四种信息圆三来解决这一问题。此外,我们的协议可以支持最坏的集群间直线通信,这是最优化的,并且处理不稳和不稳的通信系统。我们的工作为高的系统提供了坚实的基础。