Recently, Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, it is challenging to realize efficient GCN training, especially on large graphs. The reasons are many-folded: 1) GCN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data reduction and computation-intensive features/gradients update operations. Such a heterogeneous nature challenges current CPU/GPU platforms. 3) The irregularity of graphs and the complex training dataflow jointly increase the difficulty of improving a GCN training system's efficiency. This paper presents GCNear, a hybrid architecture to tackle these challenges. Specifically, GCNear adopts a DIMM-based memory system to provide easy-to-scale memory capacity. To match the heterogeneous nature, we categorize GCN training operations as memory-intensive Reduce and computation-intensive Update operations. We then offload Reduce operations to on-DIMM NMEs, making full use of the high aggregated local bandwidth. We adopt a CAE with sufficient computation capacity to process Update operations. We further propose several optimization strategies to deal with the irregularity of GCN tasks and improve GCNear's performance. We also propose a Multi-GCNear system to evaluate the scalability of GCNear.
翻译:最近,图表革命网络(GCN)已成为用于分析非欧元图形数据的最先进的算法,然而,实现高效的GCN培训,特别是在大图表上,具有挑战性,但实现高效的GCN培训,特别是在大图表上,其原因很多:(1) GCN培训产生了大量的记忆足迹;大型图表全批培训甚至需要数百至数千千兆字节的记忆系统来缓冲中间数据以进行回传。(2) GCN培训涉及记忆密集型数据减少和计算密集型特征/梯度更新操作。这种差异性性质对当前的CPU/GPU平台提出了挑战。(3) 图表的不规律性和复杂的培训数据流共同增加了提高GCN培训系统效率的难度。本文介绍了GCNear,这是一个应对这些挑战的混合结构。具体地说,GCNear采用了基于DIMMM的记忆系统来提供容易到规模的记忆能力。为了与混杂性质相匹配,我们将GCN培训行动归类为记忆密集型减少和计算密集型CN更新操作。我们随后在GCN上卸下运行一个高水平的操作到GCN升级的GMMNMA,我们将G的GMAS升级的系统与GM的升级能力与GMCA的升级能力进行一个全面的升级。