解锁程序可编程开关的内嵌浮动点操作的能量 (Unlocking the Power of Inline Floating-Point Operations on Programmable Switches)

The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network protocols must resort to expensive workarounds. Applications involving floating point data, including distributed training for machine learning and distributed query processing, are key examples. In this paper, we propose FPISA, a floating point representation designed to work efficiently in programmable switches. We first implement FPISA on an Intel Tofino switch, but find that it has limitations that impact throughput and accuracy. We then propose hardware changes to address these limitations based on the open-source Banzai switch architecture, and synthesize them in a 15-nm standard-cell library to demonstrate their feasibility. Finally, we use FPISA to implement accelerators for training for machine learning and for query processing, and evaluate their performance on a switch implementing our changes using emulation. We find that FPISA allows distributed training to use 25-75% fewer CPU cores and provide up to 85.9% better throughput in a CPU-constrained environment than SwitchML. For distributed query processing with floating point data, FPISA enables up to 2.7x better throughput than Spark.

翻译：带有可编程数据平面的开关的出现使新网络功能得以迅速发展,并为加速范围广泛的应用级功能提供了一个平台。然而,现有的开关硬件的设计并没有以应用加速为目的,因此,传统网络协议中未使用的操作或数据型应用程序必须采用昂贵的工作变通办法。涉及浮动点数据的应用,包括用于机器学习和分发查询处理的分散培训,是关键的例子。在本文件中,我们提议了FPISA,这是一个旨在高效操作可编程开关的浮动点代表。我们首先在Intel Tofino开关上实施FPISA,但发现它具有影响吞吐量和准确性的局限性。我们随后建议根据开放源Banzai开关结构进行硬件修改,解决这些限制,并将其合成成一个15-nm标准细胞图书馆,以展示其可行性。最后,我们使用FPISA实施加速器,用于培训机器学习和查询处理,并评价他们用模拟软件转换执行变化的转换的性能。我们发现,PISISISA允许在25-75 %的S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-R-S-S-S-S-S-S-S-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-