Graph convolutional networks (GCNs) are becoming increasingly popular as they can process a wide variety of data formats that prior deep neural networks cannot easily support. One key challenge in designing hardware accelerators for GCNs is the vast size and randomness in their data access patterns which greatly reduces the effectiveness of the limited on-chip cache. Aimed at improving the effectiveness of the cache by mitigating the irregular data accesses, prior studies often employ the vertex tiling techniques used in traditional graph processing applications. While being effective at enhancing the cache efficiency, those approaches are often sensitive to the tiling configurations where the optimal setting heavily depends on target input datasets. Furthermore, the existing solutions require manual tuning through trial-and-error or rely on sub-optimal analytical models. In this paper, we propose Slice-and-Forge (SnF), an efficient hardware accelerator for GCNs which greatly improves the effectiveness of the limited on-chip cache. SnF chooses a tiling strategy named feature slicing that splits the features into vertical slices and processes them in the outermost loop of the execution. This particular choice results in a repetition of the identical computational patterns over irregular graph data over multiple rounds. Taking advantage of such repetitions, SnF dynamically tunes its tile size. Our experimental results reveal that SnF can achieve 1.73x higher performance in geomean compared to prior work on multi-engine settings, and 1.46x higher performance in geomean on small scale settings, without the need for off-line analyses.
翻译:图形混凝土网络(GCNs)越来越受欢迎,因为它们可以处理以前深神经网络无法轻易支持的多种数据格式。设计GCN的硬件加速器方面的一个关键挑战就是其数据访问模式的庞大规模和随机性,这大大降低了有限芯片缓存的有效性。为了通过减少不规则的数据存取来提高缓存的效能,先前的研究经常使用传统图形处理应用程序中使用的顶部打字技术。这些方法在提高缓存效率方面十分有效,但对于最优化设置高度依赖目标输入数据集的平铺设置往往十分敏感。此外,现有解决方案需要通过试验和机载加速器对数据访问模式进行手工调整,或者依靠亚最佳分析模型。在本文件中,我们建议用切片和Forge(SnF)来提高缓冲器的效能。在提高缓冲缓存效率方面十分有效,SnF选择了一种小缩略图策略,在最优化的平面刻度设置中将特性分为垂直切片和直径直径的设置。SrmalF在前的直径分析中,需要先先进行直径的直径的直径的图像。