100台机器能同意吗? (Can 100 Machines Agree?)

Agreement protocols have been typically deployed at small scale, e.g., using three to five machines. This is because these protocols seem to suffer from a sharp performance decay. More specifically, as the size of a deployment---i.e., degree of replication---increases, the protocol performance greatly decreases. There is not much experimental evidence for this decay in practice, however, notably for larger system sizes, e.g., beyond a handful of machines. In this paper we execute agreement protocols on up to 100 machines and observe on their performance decay. We consider well-known agreement protocols part of mature systems, such as Apache ZooKeeper, etcd, and BFT-Smart, as well as a chain and a novel ring-based agreement protocol which we implement ourselves. We provide empirical evidence that current agreement protocols execute gracefully on 100 machines. We observe that throughput decay is initially sharp (consistent with previous observations); but intriguingly---as each system grows beyond a few tens of replicas---the decay dampens. For chain- and ring-based replication, this decay is slower than for the other systems. The positive takeaway from our evaluation is that mature agreement protocol implementations can sustain out-of-the-box 300 to 500 requests per second when executing on 100 replicas on a wide-area public cloud platform. Chain- and ring-based replication can reach between 4K and 11K (up to 20x improvements) depending on the fault assumptions.

翻译：协议协议协议通常在小规模部署,例如,使用三至五台机器。这是因为这些协议协议似乎受到性能急剧衰减的影响。更具体地说,随着部署规模的大小,即复制程度的提高,协议性能的大幅下降。但在实践中并没有多少实验性证据表明这种衰败,特别是对于更大的系统规模,例如,超过少数机器。在本文件中,我们执行最多100台机器的协议协议,并观察其性能的衰变。我们考虑的是著名的协议协议协议协议协议部分的成熟系统,如Apache ZooDefer 等和BFT-Smart,以及我们自己执行的链条和新的环基协议协议协议。我们提供经验性证据表明,目前的协议在100台机器上是优雅的。我们观察到,在最初,过量衰变变的系统(与以往的观察一致),但奇怪的是,每个系统都发展得超过几十台的改进。对于基于链和环状的复制系统来说,这种衰变变的系统比其他系统要慢。在300台级的系统上,从100台级到执行一个成熟的递增要求。