FLAC: 分配交易实际故障警报原子委员会议定书 (FLAC: Practical Failure-Aware Atomic Commit Protocol for Distributed Transactions)

In distributed transaction processing, atomic commit protocol (ACP) is used to ensure database consistency. With the use of commodity compute nodes and networks, failures such as system crashes and network partitioning are common. It is therefore important for ACP to dynamically adapt to the operating condition for efficiency while ensuring the consistency of the database. Existing ACPs often assume stable operating conditions, hence, they are either non-generalizable to different environments or slow in practice. In this paper, we propose a novel and practical ACP, called Failure-Aware Atomic Commit (FLAC). In essence, FLAC includes three sub-protocols, which are specifically designed for three different environments: (i) no failure occurs, (ii) participant nodes might crash but there is no delayed connection, or (iii) both crashed nodes and delayed connection can occur. It models these environments as the failure-free, crash-failure, and network-failure robustness levels. During its operation, FLAC can monitor if any failure occurs and dynamically switch to operate the most suitable sub-protocol, using a robustness level state machine, whose parameters are fine-tuned by reinforcement learning. Consequently, it improves both the response time and throughput, and effectively handles nodes distributed across the Internet where crash and network failures might occur. We implement FLAC in a distributed transactional key-value storage system based on Google Percolator and evaluate its performance with both a micro benchmark and a macro benchmark of real workload. The results show that FLAC achieves up to 2.22x throughput improvement and 2.82x latency speedup, compared to existing ACPs for high-contention workloads.

翻译：在分布式交易处理中,原子承诺协议(ACP)用于确保数据库的一致性。在使用商品计算节点和网络时,系统崩溃和网络分割等故障是常见的。因此,对于非加太国家来说,重要的是动态地适应效率的操作条件,同时确保数据库的一致性。现有的非加太国家往往假设稳定的操作条件,因此,它们不是不普遍适用于不同环境,就是在实践中缓慢。在本文件中,我们提议了一个创新而实用的ACP,称为“失灵软件”原子化(FLAC)。实质上,FLAC包括三个子协议,这是专门为三种不同环境设计的:(一) 没有故障,(二) 参与者节点可能会崩溃,但不会延迟连接,或(三) 可能同时崩溃节点和延迟连接。它把这些环境模拟为无故障、失难和网络失灵。在操作过程中,拉加太集团可以监测是否发生故障,并动态地转换为最合适的次协议改进。使用一个稳健的州级级水平机器,其参数可能崩溃,但通过加固的存储系统有效进行交易。因此,我们可以通过直观的基级交易系统进行升级和升级。