Widely deployed consensus protocols in the cloud are often leader-based and optimized for low latency under synchronous network conditions. However, cloud networks can experience disruptions such as network partitions, high-loss links, and configuration errors. These disruptions interfere with the operation of leader-based protocols, as their view change mechanisms interrupt the normal case replication and cause the system to stall. This paper proposes RACS, a novel randomized consensus protocol that ensures robustness against adversarial network conditions. RACS achieves optimal one-round trip latency under synchronous network conditions while remaining resilient to adversarial network conditions. RACS follows a simple design inspired by Raft, the most widely used consensus protocol in the cloud, and therefore enables seamless integration with the existing cloud software stack -- a goal no previous asynchronous protocol has successfully achieved. Experiments with a prototype deployed on Amazon EC2 confirm that RACS achieves a throughput of 28k cmd/sec under adversarial cloud network conditions, whereas existing leader-based protocols such as Multi-Paxos and Raft provide less than 2.8k cmd/sec. Under synchronous network conditions, RACS matches the performance of Multi-Paxos and Raft, achieving a throughput of 200k cmd/sec with a latency of 300ms, confirming that RACS introduces no unnecessary overhead. Finally, SADL-RACS-an optimized version of RACS designed for high performance and robustness-achieves an impressive throughput of 500k cmd/sec under synchronous network conditions and 196k cmd/sec under adversarial network conditions, further enhancing both performance and robustness.
翻译:暂无翻译