Mission critical systems deployed in data centers today are facing more sophisticated failures. Byzantine fault tolerant (BFT) protocols are capable of masking these types of failures, but are rarely deployed due to their performance cost and complexity. In this work, we propose a new approach to designing high performance BFT protocols in data centers. By re-examining the ordering responsibility between the network and the BFT protocol, we advocate a new abstraction offered by the data center network infrastructure. Concretely, we design a new authenticated ordered multicast primitive (AOM) that provides transferable authentication and non-equivocation guarantees. Feasibility of the design is demonstrated by two hardware implementations of AOM -- one using HMAC and the other using public key cryptography for authentication -- on new-generation programmable switches. We then co-design a new BFT protocol, Matrix, that leverages the guarantees of AOM to eliminate cross-replica coordination and authentication in the common case. Evaluation results show that Matrix outperforms state-of-the-art protocols on both latency and throughput metrics by a wide margin, demonstrating the benefit of our new network ordering abstraction for BFT systems.
翻译:在这项工作中,我们提出了在数据中心设计高性能BFT协议的新方法。通过重新审查网络与BFT协议之间的定购责任,我们倡导数据中心网络基础设施提供新的抽象信息。具体地说,我们设计了一个新的经认证的多播式原始程序(AOM),提供可转让认证和不可撤销的保证。设计的可行性表现在AOM的两套硬件实施中 -- -- 一个使用HMAC,另一个使用公用钥匙进行认证 -- -- 在新一代可编程开关上,我们共同设计了一个新的BFT协议,即矩阵,利用AOM的保证消除通用案件中的交叉复制协调和认证。评价结果显示,矩阵超越了我们新的网络为BFT系统订购抽象软件的好处。