Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose {LightRW}, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.
翻译:图动态随机游走(GDRWs)近期成为图分析和学习应用的强大范例,包括图嵌入和图神经网络。尽管许多现有研究优化了GDRWs在多核CPU上的性能,但是巨大的随机内存访问和昂贵的同步导致资源利用不足,而GDRWs的处理通常是许多图应用程序的关键性能瓶颈。本文研究替代架构FPGA,以解决GDRWs中的这些问题,因为FPGA具有硬件定制性,因此我们能够探索精细化的流水线执行和专门的内存访问优化。具体而言,我们提出了一种新颖的基于FPGA的GDRWs加速器 {LightRW}。LightRW采用一系列优化,以实现芯片上的精细化流水线执行,并利用FPGA的大规模并行性,同时显著减少内存访问。由于当前在GDRWs中普遍使用的采样方法不支持高效的精细化流水线执行,我们开发了一种并行游泳池采样方法,以在每个周期内采样多个顶点,以实现有效的流水线执行。为了解决随机内存访问问题,我们提出一种度数感知的可配置缓存方法,将热点顶点缓冲在芯片上,以减轻随机内存访问,以及一种动态突发式访问引擎,以高效地检索邻居。实验结果表明,我们的优化技术能够显著提高FPGA上的GDRWs性能。此外,LightRW比基于CPU的MetaPath和Node2vec随机游走技术分别提供高达9.55倍和9.10倍的加速。该研究已在GitHub上开源。