The Raft algorithm maintains strong consistency across data replicas in Cloud. This algorithm divides nodes into leaders and followers, to satisfy read/write requests spanning geo-diverse sites. With the increase of workload, Raft shall provide scale-out performance in proportion. However, traditional scale-out techniques encounter bottlenecks in Raft, and when the provisioned sites exhaust local resources, the performance loss will grow exponentially. To provide scalability in Raft, this paper proposes a cost-effective mechanism for elastic auto-scaling in Raft, called BlackWater-Raft or BW-Raft. BW-Raft extends the original Raft with the following abstractions: (1) secretary nodes that take over expensive log synchronization operations from the leader, relaxing the performance constraints on locks. (2) massive low cost observer nodes that handle reads only, improving throughput for typical data intensive services. These abstractions are stateless, allowing elastic scale-out on unreliable yet cheap spot instances. In theory, we demonstrate that BW-Raft can maintain Raft's strong consistency guarantees when scaling out, processing a 50X increase in the number of nodes compared to the original Raft. We have prototyped the BW-Raft on key-value services and evaluated it with many state-of-the-arts on Amazon EC2 and Alibaba Cloud. Our results show that within the same budget, BW-Raft's resource footprint increments are 5-7X smaller than Multi-Raft, and 2X better than original Raft. Using spot instances, BW-Raft can reduces costs by 84.5\% compared to Multi-Raft. In the real world experiments, BW-Raft improves goodput of the 95th-percentile SLO by 9.4X, thus serving as an alternative for services scaling out with strong consistency.
翻译:raft 算法保持了云中数据复制器的高度一致性。 此算法将节点分为领导者和追随者, 以满足跨地球二极点的读/ 写请求。 随着工作量的增加, Raft 将按比例提供缩放性业绩。 但是, 传统的缩放技术在Raft 遇到瓶颈, 当提供地点耗尽当地资源时, 性能损失会成倍增长。 为了在Raft 提供弹性自动缩放性能, 本文建议了一种成本效率高的机制。 这个算法将节点分成分为领导者和追随者, 以满足跨地球二维特点的读写/ 要求。 BW 将原始节点扩展为原始节点, 以下列抽象方式扩展原始节点:(1) 秘书节点从领导那里接管昂贵的日志同步操作, 放松锁上的业绩限制。 (2) 大规模的低成本观察员节点只会读取典型的数据密集服务。 这些抽象的是不透明的, 允许在不可靠但廉价的点上进行缩缩缩放。 理论上, 我们用多功能可以保持软的S- Raft lift lift lift 。