We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory. uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50$\times$ faster than MinBFT , a state of the art $2f{+}1$ BFT SMR based on Intel's SGX. We use uBFT to replicate two KV-stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20us) and become Byzantine tolerant with as little as 10us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance.
翻译:我们提出了uBFT,这是一种状态机复制系统,使用仅 $2f{+}1$ 个副本就可以容忍 $f$ 个拜占庭故障,并在数据中心实现微秒级延迟的首个系统。uBFT提供的拜占庭容错(BFT)非常重要,因为纯粹的崩溃在实际系统中似乎是一种幻觉,据报道实际系统会以许多意想不到的方式失败。uBFT仅依赖一个小型的不定制的可信计算基础——分离的存储器,消耗实际上有界的内存。uBFT基于一种名为 Consistent Tail Broadcast 的新抽象,我们使用它来防止等价错误并限制内存使用。我们使用基于 RDMA 的分离存储器实现了uBFT,并获得了至少10微秒的端到端延迟。这至少比 MinBFT 快50倍,MinBFT是一种基于英特尔SGX的最先进的 $2f{+}1$ BFT SMR。我们使用uBFT复制了两个KV存储器(Memcached和Redis)以及一个金融订单匹配引擎(Liquibook)。这些应用具有低延迟(高达20微秒),并且只需要额外花费10微秒就可以实现拜占庭容错。uBFT的代价是少量可靠的分离存储器(不到1 MiB),在我们的原型中,分离存储器由数个通过RDMA连接且为容错复制的内存服务器组成。