The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities and might provide suboptimal performance when dealing with custom operators and data types, with sparse data, or when reproducibility of the aggregation is a concern. To deal with these problems, in this work we design a flexible programmable switch by using as a building block PsPIN, a RISC-V architecture implementing the sPIN programming model. We then design, model, and analyze different algorithms for executing the aggregation on this architecture, showing performance improvements compared to state-of-the-art approaches.
翻译:全减操作是分布式应用程序中最常用的通信例行操作之一。 为了改进带宽并减少网络流量, 可以通过卸载到网络交换器上, 将主机提供的数据汇总, 并将数据反馈回汇总结果。 然而, 现有解决方案提供了有限的定制机会, 并可能在处理自定义操作员和数据类型、 数据稀少、 集合的可复制性时提供不最优化的性能。 为了解决这些问题, 我们在此工作中设计了一个灵活的可编程切换器, 其方法是使用一个实施 SPIN 编程模型( RISC- V ) 的构件 PsPIN, 一个实施 SPIN 编程模型( RISC- V ) 的架构。 然后, 我们设计、 建模和分析执行该架构整合的不同算法, 显示与最新方法相比的性能改进。