Modern data centers are becoming increasingly equipped with RDMA-capable NICs. These devices enable distributed systems to rely on algorithms designed for shared memory. RDMA allows consensus to terminate within a few microsecond in failure-free scenarios, yet, RDMA-optimized algorithms still use expensive two-sided operations in case of failure. In this work, we present a new leader-based consensus algorithm that relies solely on one-sided RDMA verbs. Our algorithm is based on Paxos, it decides in a single one-sided RDMA operation in the common case, and changes leader also in a single one-sided RDMA operation in case of failure. We implement our algorithm in the form of an SMR system named Velos, and we evaluated our system against the state-of-the-art competitor Mu. Compared to Mu, our solution adds a small overhead of approximately 0.6 microseconds in failure-free executions and shines during failover periods during which it is 13 times faster in changing leader.
翻译:现代数据中心正在日益配备具有RDMA能力的NIC。 这些设备使分布式系统能够依赖为共享记忆而设计的算法。 RDMA允许共识在几微秒内在无故障情况下终止,然而,RDMA优化算法在失败时仍然使用昂贵的双向操作。 在这项工作中,我们提出了一个新的基于领导人的共识算法,它完全依赖单方的RDMA动词。我们的算法以Paxos为基础,它在普通情况下以单方的RDMA操作为基础,在失败时以单方的RDMA操作为主。我们以名为Velos的SMR系统的形式执行我们的算法,我们对照最先进的竞争者Mu. 与Mu 相比,我们的解决方案增加了大约0.6微秒的无故障处决费用,在失败期间闪耀光光,在失败期间,更换领导人的速度是13倍。