Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communication case, the NetDAM can be used as a software and hardware friendly programmable architeture with high performance alternative for RDMA.
翻译:数据密集应用软件,如分布式AI培训,可能需要多兆字节的内存能力,并配有多兆字节带宽。我们直接将内存附在Ethernet控制器上,并采用一些可编程逻辑来设计一个高效硬件“板块”,用于内存集合和模拟/网络内计算。我们建造了NetDAM的FGA原型,我们演示了MPI-Allede 通信案例,NetDAM可以用作软件和硬件友好型程序化器,为RDMA提供高性能的替代产品。