Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose LightRW, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.
翻译:图动态随机游走(GDRWs)最近已经成为图分析和学习应用领域的强有力范式,包括图嵌入和图神经网络。尽管许多现有研究都优化了GDRWs在多核CPU上的性能,但是大规模的随机内存访问和昂贵的同步导致资源利用不足,并且GDRWs的处理通常是许多图应用的关键性能瓶颈。本文研究了一种替代架构,即FPGA,来解决GDRWs中的这些问题,因为FPGA具有硬件定制的能力,我们能够探索细粒度的管道执行和专门的内存访问优化。具体而言,我们提出了LightRW,一种用于GDRWs的基于FPGA的新型加速器。LightRW采用一系列优化来实现芯片上的细粒度管道执行,并利用FPGA的大规模并行性,同时显著减少内存访问。由于目前通常使用的GDRWs采样方法不支持细粒度管道执行,我们开发了一种并行的蓄水池采样方法,以每个周期采样多个顶点,以实现有效的管道执行。为了解决随机内存访问问题,我们提出了一种度-aware配置缓存方法,该方法将热门顶点缓冲到芯片上以减轻随机内存访问,并提出了一种动态突发式访问引擎来有效地检索邻居。实验结果显示,我们的优化技术能够显著提高FPGA上GDRWs的性能。此外,LightRW相对于基于状态-of-the-art CPU的MetaPath和Node2vec随机游走,分别提供了高达9.55倍和9.10倍的加速比。本研究在GitHub上开源,网址为https://github.com/Xtra-Computing/LightRW。