Recently, Graph Neural Networks (GNNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, to realize efficient GNN training is challenging, especially on large graphs. The reasons are many-folded: 1) GNN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory. 2) GNN training involves both memory-intensive and computation-intensive operations, challenging current CPU/GPU platforms. 3) The irregularity of graphs can result in severe resource under-utilization and load-imbalance problems. This paper presents a GNNear accelerator to tackle these challenges. GNNear adopts a DIMM-based memory system to provide sufficient memory capacity. To match the heterogeneous nature of GNN training, we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs), making full use of the high aggregated local bandwidth. We adopt a Centralized-Acceleration-Engine (CAE) to process the computation-intensive Update operations. We further propose several optimization strategies to deal with the irregularity of input graphs and improve GNNear's performance. Comprehensive evaluations on 16 GNN training tasks demonstrate that GNNear achieves 30.8$\times$/2.5$\times$ geomean speedup and 79.6$\times$/7.3$\times$(geomean) higher energy efficiency compared to Xeon E5-2698-v4 CPU and NVIDIA V100 GPU.
翻译:最近,图形神经网络(GNN)已成为分析非欧元图形数据的最先进的算法,然而,实现高效的GNN培训具有挑战性,特别是在大图上。原因很多:1 GNN培训产生大量的记忆足迹;大图全批培训甚至需要数百至数千千兆字节的记忆。 2 GNN培训涉及记忆密集和计算密集型操作,挑战目前的CPU/GPU平台。 3 图表的不规律性可能导致严重的资源利用不足和负载平衡问题。本文展示了一个GNNNear培训的加速器来应对这些挑战。GNNNNN培训有许多原因:(1) GNNNN培训产生了一个基于DIM的存储系统来提供足够的记忆足足足足的足足足足的足迹。为了匹配GNNN培训的混合性质,我们将存储密集的减少操作卸载到DIMNMNMNY(NMENME)中,充分利用高累积的本地带宽。3美元。我们采用了一种中央化-加价$$(CNNE) 比较高速度的GMQ0.8 更新操作。