Replication ensures data availability in fault-prone distributed systems. The celebrated CAP theorem stipulates that replicas cannot guarantee both strong consistency and availability under network partitions. A popular alternative, adopted by CRDTs, is to relax consistency to be eventual. It enables progress to be wait-free, as replicas can serve requests immediately. Yet, wait-free replication faces a key challenge: due to asynchrony and concurrency, operations may be constantly reordered, leading to results inconsistent with their original contexts and preventing them from stabilizing over time. Moreover, a particular client may experience starvation if, from some point on, each of its operations is reordered at least once. We make two contributions. First, we formalize the problem addressed by wait-free replicated data types (e.g., CRDTs) as eventual state-machine replication. We then augment it with stability and fairness ensuring, respectively, that (1)~all replicas share a growing stable prefix of operations, and (2)~no client starves. Second, we present a generic DAG-based framework to achieve eventual state-machine replication for any replicated data type, where replicas exchange their local views and merge them using a reconciliation function. We then propose reconciliation functions ensuring stability and fairness.
翻译:复制机制确保了易错分布式系统中的数据可用性。著名的CAP定理规定,在网络分区情况下,副本无法同时保证强一致性与可用性。CRDT采用的一种流行替代方案是将一致性放宽为最终一致性。这使得副本能够立即处理请求,从而实现无等待的进展。然而,无等待复制面临一个关键挑战:由于异步性与并发性,操作可能被不断重排,导致结果与其原始上下文不一致,并阻碍其随时间达到稳定状态。此外,若某个客户端的每个操作从某一时刻起至少被重排一次,该客户端可能遭遇饥饿问题。本文作出两项贡献。首先,我们将无等待复制数据类型(如CRDT)所处理的问题形式化为最终状态机复制,并通过稳定性和公平性对其进行增强,分别确保:(1)所有副本共享一个不断增长的稳定操作前缀;(2)无客户端遭遇饥饿。其次,我们提出一个基于DAG的通用框架,可为任意复制数据类型实现最终状态机复制,其中副本通过交换本地视图并利用协调函数进行合并。我们进一步提出了能保证稳定性与公平性的协调函数。