Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and DCQCN, are still far from optimal in many plausible scenarios, particularly for tail latency. Many operators compensate by running their networks at low average utilization, dramatically increasing costs. In this paper, we argue that we have reached the practical limits of end-to-end congestion control. Instead, we propose, implement, and evaluate a new congestion control architecture called Backpressure Flow Control (BFC). BFC provides per-hop per-flow flow control, but with bounded state, constant-time switch operations, and careful use of buffers. We demonstrate BFC's feasibility by implementing it on Tofino2, a state-of-the-art P4-based programmable hardware switch. In simulation, we show that BFC achieves near optimal throughput and tail latency behavior even under challenging conditions such as high network load and incast cross traffic. Compared to existing end-to-end schemes, BFC achieves 2.3 - 60 X lower tail latency for short flows and 1.6 - 5 X better average completion time for long flows.
翻译:数据中心网络的有效拥堵控制正日益变得日益具有挑战性,因为潜伏敏感交通量、脂肪链路和异常交通量越来越多。广泛部署的算法,如DCTCP和DCQCN,在许多合理的情景中仍然远远不尽理想,特别是尾部悬浮。许多运营商通过运行其网络而以低平均利用率来补偿其网络,费用急剧增加。在本文中,我们争辩说,我们已经达到了端对端拥堵控制的实际限度。相反,我们提议、实施和评估一个新的阻塞控制结构,即后压流控制(BFC)。BFC提供每股流量控制,但有封闭状态、固定时间开关操作和谨慎使用缓冲。我们通过在Tofino2 上执行BFCFC的可行性,这是一个以P4为基础的最先进的程序硬件开关。在模拟中,我们显示BFC即使在高网络负荷和穿梭交通等具有挑战性的条件下,也几乎实现了最佳的过量和尾拉行为。与现有的端到端计划相比,BFCCS达到1.6平均完成速度为0.3和X的平均速度5。