The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider the faster growth of network speed when compared to CPU frequencies. In-network compute alleviates the host CPU load by running tasks directly in the network, enabling additional computation/communication overlap and potentially improving overall application performance. However, sustaining bandwidths provided by next-generation networks, e.g., 400 Gbit/s, can become a challenge. sPIN is a programming model for in-NIC compute, where users specify handler functions that are executed on the NIC, for each incoming packet belonging to a given message or flow. It enables a CUDA-like acceleration, where the NIC is equipped with lightweight processing elements that process network packets in parallel. We investigate the architectural specialties that a sPIN NIC should provide to enable high-performance, low-power, and flexible packet processing. We introduce PsPIN, a first open-source sPIN implementation, based on a multi-cluster RISC-V architecture and designed according to the identified architectural specialties. We investigate the performance of PsPIN with cycle-accurate simulations, showing that it can process packets at 400 Gbit/s for several use cases, introducing minimal latencies (26 ns for 64 B packets) and occupying a total area of 18.5 mm 2 (22 nm FDSOI).
翻译:将数据和控制任务卸载到网络的能力正变得越来越重要,特别是如果我们考虑到与CPU频率相比,网络速度的增长速度会加快,特别是当我们考虑与CPU频率相比,网络内计算通过在网络中直接运行任务来减轻主机CPU的负荷,从而能够增加计算/通信重叠,并有可能改进总体应用性能;然而,维持下一代网络提供的带宽,例如400 Gbit/s,可能会成为一个挑战。 SPIN是NI计算的一个程序化模型,用户在NIC中指定每个收到的属于特定信息或流的包件的操作员功能。它使CUDA能够像CUDA那样的加速,因为NIC配有处理网络包的轻量处理元件。我们调查SPIN NIC应该提供的建筑特性,以便能够高性能、低功率和灵活的包处理。我们引入了PsPIN,这是第一个基于多集群的RISC-V结构架构执行的公开源码 SPIN,并且根据所确定的建筑特征设计设计。我们调查了PSUA类似CA的加速加速度,在那里安装了64 Ps Ps Ps/MISIM 的18 Blium-CIM,可以模拟使用若干区域。