High performance rack-scale offerings package disaggregated pools of compute, memory and storage hardware in a single rack to run diverse workloads with varying requirements, including applications that need low and predictable latency. The intra-rack network is typically high speed Ethernet, which can suffer from congestion leading to packet drops and may not satisfy the stringent tail latency requirements for some workloads (including remote memory/storage accesses). In this paper, we design a Predictable Low Latency(PL2) network architecture for rack-scale systems with Ethernet as interconnecting fabric. PL2 leverages programmable Ethernet switches to carefully schedule packets such that they incur no loss with NIC and switch queues maintained at small, near-zero levels. In our 100 Gbps rack-prototype, PL2 keeps 99th-percentile memcached RPC latencies under 60us even when the RPCs compete with extreme offered-loads of 400%, without losing traffic. Network transfers for a machine learning training task complete 30% faster than a receiver-driven scheme implementation modeled after Homa (222ms vs 321ms 99%ile latency per iteration).
翻译:在本文中,我们设计了一个可预测的低长期(PL2)网络结构,用于与以太网作为互联结构的系统架架式系统;PL2 杠杆式可编程的以太网开关,以仔细安排包件,使其在小型、近零级保持的NIC和开关排不会造成损失;在我们100 Gbps 的拉链-proto型中,PL2 将某些工作量(包括远程内存/存储存/存存存存存入口)的99个中位的RPC延后期保持在60.us以下,即使RPC与400%的极低供货竞争,同时又不造成交通损失。