Achieving maximum possible rate of inferencing with minimum hardware resources plays a major role in reducing enterprise operational costs. In this paper we explore use of PCIe streaming on FPGA based platforms to achieve high throughput. PCIe streaming is a unique capability available on FPGA that eliminates the need for memory copy overheads. We have presented our results for inferences on a gradient boosted trees model, for online retail recommendations. We compare the results achieved with the popular library implementations on GPU and the CPU platforms and observe that the PCIe streaming enabled FPGA implementation achieves the best overall measured performance. We also measure power consumption across all platforms and find that the PCIe streaming on FPGA platform achieves the 25x and 12x better energy efficiency than an implementation on CPU and GPU platforms, respectively. We discuss the conditions that need to be met, in order to achieve this kind of acceleration on the FPGA. Further, we analyze the run time statistics on GPU and FPGA and identify opportunities to enhance performance on both the platforms.
翻译:在本文件中,我们探索利用基于FPGA的平台上的PCIe流水,以达到高输送量。PCIe流水是FPGA平台上的一种独特的能力,可以消除对记忆复制间接费用的需求。我们介绍了关于梯度增殖树模型的推论结果,供在线零售建议使用。我们比较了在GPU和CPU平台上流行的图书馆实施情况所取得的成果,并观察到PCIe流水使FPGA的实施取得了最佳的总体衡量性能。我们还测量了所有平台的电力消耗,发现PCIe流水在FPGA平台上分别实现了25x和12x更高的能源效率,而不是在CPU和GPUPP平台上的实施。我们讨论了需要满足的条件,以便在PGA平台上实现这种加速。此外,我们分析了关于GPU和FPGA运行的时间统计,并查明了提高两个平台业绩的机会。