Operating at petabit-scale, ByteDance's cloud gateways are deployed at critical aggregation points to orchestrate a wide array of business traffic. However, this massive scale imposes significant resource pressure on our previous-generation cloud gateways, rendering them unsustainable in the face of ever-growing cloud-network traffic. As the DPU market rapidly expands, we see a promising path to meet our escalating business traffic demands by integrating DPUs with our established Tofino-based gateways. DPUs augment these gateways with substantially larger table capacities and richer programmability without compromising previously low-latency and high-throughput forwarding. Despite compelling advantages, the practical integration of DPUs into cloud gateways remains unexplored, primarily due to underlying challenges. In this paper, we present Zephyrus, a production-scale gateway built upon a unified P4 pipeline spanning high-performance Tofino and feature-rich DPUs, which successfully overcomes these challenges. We further introduce a hierarchical co-offloading architecture (HLCO) to orchestrate traffic flow within this heterogeneous gateway, achieving > 99% hardware offloading while retaining software fallback paths for complex operations. Zephyrus outperforms LuoShen (NSDI '24) with 33% higher throughput and our evaluation further indicates 21% lower power consumption and 14% lower hardware cost. Against FPGA-based systems, Albatross (SIGCOMM '25), it doubles the throughput at a substantially lower Total Cost of Ownership (TCO), showcasing its superior performance-per-dollar. Beyond these performance gains, we also share key lessons from several years of developing and operating Zephyrus at production scale. We believe these insights provide valuable references for researchers and practitioners designing performant cloud gateways.


翻译:在Petabit量级下,字节跳动的云网关部署于关键流量汇聚节点,负责协调各类业务流量。然而,如此庞大的规模给上一代云网关带来了巨大的资源压力,使其难以应对持续增长的云网络流量。随着DPU市场的迅速扩张,我们看到了通过将DPU与现有基于Tofino的网关相结合,以满足日益增长的业务流量需求的一条可行路径。DPU在不影响原有低延迟、高吞吐量转发性能的前提下,为这些网关提供了显著更大的表容量和更丰富的可编程能力。尽管优势显著,但将DPU实际集成到云网关中的方案仍未被探索,这主要源于一些根本性挑战。本文提出Zephyrus,一个基于跨高性能Tofino与功能丰富DPU的统一P4流水线构建的生产级网关,它成功克服了这些挑战。我们进一步引入分层协同卸载架构(HLCO)来协调该异构网关内的流量,实现了>99%的硬件卸载率,同时为复杂操作保留了软件回退路径。Zephyrus的吞吐量比LuoShen(NSDI '24)高出33%,我们的评估进一步表明其功耗降低21%,硬件成本降低14%。与基于FPGA的系统Albatross(SIGCOMM '25)相比,其吞吐量翻倍,同时总拥有成本(TCO)显著降低,展现了其卓越的性价比。除了这些性能提升,我们还分享了多年来在生产规模下开发和运营Zephyrus的关键经验。我们相信这些见解能为设计高性能云网关的研究人员和从业者提供有价值的参考。

0
下载
关闭预览

相关内容

卸载从硬盘删除程序文件和文件夹以及从注册表删除相关数据的操作
Top
微信扫码咨询专知VIP会员